How to use the API¶
This guide is about using NOMAD's REST APIs directly, e.g. via Python's request.
To access the processed data with our client library nomad-lab
follow
How to access processed data. You can also watch our
video tutorial on the API.
Different options to use the API¶
NOMAD offers all its functionality through application programming interfaces (APIs). More specifically RESTful HTTP APIs that allows you to use NOMAD as a set of resources (think data) that can be uploaded, accessed, downloaded, searched for, etc. via HTTP requests.
You can get an overview on all NOMAD APIs on the API page. We will focus here on NOMAD's main API (v1). In fact, this API is also used by the web interface and should provide everything you need.
There are different tools and libraries to use the NOMAD API that come with different trade-offs between expressiveness, learning curve, and convenience.
You can use your browser
For example to see the metadata for all entries with elements Ti and O go here: https://nomad-lab.eu/prod/v1/api/v1/entries?elements=Ti&elements=O
Use *curl* or *wget*
REST API's use resources located via URLs. You access URLs with curl or wget. Same Ti, O example as before:
curl "https://nomad-lab.eu/prod/v1/api/v1/entries?results.material.elements=Ti&results.material.elements=O" | python -m json.tool
Use Python and requests
Requests is a popular Python library to use the internets HTTP protocol that is used to
communicate with REST APIs. Install with pip install requests
.
See the initial example.
Use our dashboard
The NOMAD API has an OpenAPI dashboard. This is an interactive documentation of all API functions that allows you to try these functions in the browser.
Use NOMAD's Python package
Install the NOMAD Python client library and use it's ArchiveQuery
functionality for a more convenient query based access of archive data following the
How-to access the processed data guide.
Using request¶
If you are comfortable with REST APIs and using Pythons requests
library, this example
demonstrates the basic concepts of NOMAD's main API. You can get more documentation and
details on all functions from the API dashboard.
The following issues a search query for all entries that have both Ti and O
among the elements of their respective materials. It restricts the results to
one entry and only returns the entry_id
.
import requests
import json
base_url = 'http://nomad-lab.eu/prod/v1/api/v1'
response = requests.post(
f'{base_url}/entries/query',
json={
'query': {
'results.material.elements': {
'all': ['Ti', 'O']
}
},
'pagination': {
'page_size': 1
},
'required': {
'include': ['entry_id']
}
})
response_json = response.json()
print(json.dumps(response.json(), indent=2))
This will give you something like this:
{
"owner": "public",
"query": {
"name": "results.material.elements",
"value": {
"all": [
"Ti",
"O"
]
}
},
"pagination": {
"page_size": 1,
"order_by": "entry_id",
"order": "asc",
"total": 17957,
"next_page_after_value": "--SZVYOxA2jTu_L-mSxefSQFmeyF"
},
"required": {
"include": [
"entry_id"
]
},
"data": [
{
"entry_id": "--SZVYOxA2jTu_L-mSxefSQFmeyF"
}
]
}
The entry_id
is a unique identifier for, well, entries. You can use it to access
other entry data. For example, you want to access the entry's archive. More
precisely, you want to gather the formula and energies from the main workflow result.
The following requests the archive based on the entry_id
and only requires some archive sections.
first_entry_id = response_json['data'][0]['entry_id']
response = requests.post(
f'{base_url}/entries/{first_entry_id}/archive/query',
json={
'required': {
'workflow': {
'calculation_result_ref': {
'energy': '*',
'system_ref': {
'chemical_composition': '*'
}
}
}
}
})
response_json = response.json()
print(json.dumps(response_json, indent=2))
The result will look like this:
{
"required": {
"workflow": {
"calculation_result_ref": {
"energy": "*",
"system_ref": {
"chemical_composition": "*"
}
}
}
},
"entry_id": "--SZVYOxA2jTu_L-mSxefSQFmeyF",
"data": {
"entry_id": "--SZVYOxA2jTu_L-mSxefSQFmeyF",
"upload_id": "YXUIZpw5RJyV3LAsFI2MmQ",
"parser_name": "parsers/fhi-aims",
"archive": {
"run": [
{
"system": [
{
"chemical_composition": "OOSrTiOOOSrTiOOOSrTiOFF"
}
],
"calculation": [
{
"energy": {
"fermi": -1.1363378335891879e-18,
"total": {
"value": -5.697771591896252e-14
},
"correlation": {
"value": -5.070133798617076e-17
},
"exchange": {
"value": -2.3099755059272454e-15
},
"xc": {
"value": -2.360676843913416e-15
},
"xc_potential": {
"value": 3.063766944960246e-15
},
"free": {
"value": -5.697771595558439e-14
},
"sum_eigenvalues": {
"value": -3.3841806795825544e-14
},
"total_t0": {
"value": -5.697771593727346e-14
},
"correction_entropy": {
"value": -1.8310927833270112e-23
},
"correction_hartree": {
"value": -4.363790430157292e-17
},
"correction_xc": {
"value": -2.3606768439090564e-15
}
},
"system_ref": "/run/0/system/0"
}
]
}
],
"workflow": [
{
"calculation_result_ref": "/run/0/calculation/0"
}
]
}
}
}
You can work with the results in the given JSON (or respective Python dict/list) data already. If you have NOMAD's Python library installed , you can take the archive data and use the Python interface. The Python interface will help with code-completion (e.g. in notebook environments), resolve archive references (e.g. from workflow to calculation to system), and allow unit conversion:
from nomad.datamodel import EntryArchive
from nomad.metainfo import units
archive = EntryArchive.m_from_dict(response_json['data']['archive'])
result = archive.workflow[0].calculation_result_ref
print(result.system_ref.chemical_composition)
print(result.energy.total.value.to(units('eV')))
This will give you an output like this:
Different kinds of data¶
We distinguish between different kinds of NOMAD data and there are different functions in the API:
- Entry metadata, a summary of extracted data for an entry.
- Raw files, the files as they were uploaded to NOMAD.
- Archive data, all of the extracted data for an entry.
There are also different entities (see also Datamodel) with different functions in the API:
- Entries
- Uploads
- Datasets
- Users
The API URLs typically start with the entity, followed by the kind of data. Examples are:
entries/query
- Query entries for metadataentries/archive/query
- Query entries for archive dataentries/{entry-id}/raw
- Download raw data for a specific entryuploads/{upload-id}/raw/path/to/file
- Download a specific file of an upload
Common concepts¶
The initial example above, showed how to execute a basic search. This includes some fundamental concepts that can be applied to many parts of the API. Let's discuss some of the common concepts.
Response layout¶
Functions that have a JSON response, will have a common layout. First, the response will contain all keys and values of the request. The request is not repeated verbatim, but
in a normalized form. Abbreviations in search queries might be expanded, default values for optional parameters are added, or additional response specific information
is included. Second, the response will contain the results under the key data
.
Owner¶
All functions that allow a query will also allow to specify the owner
. Depending on
the API function, its default value will be mostly visible
. Some values are only
available if you are logged in.
The owner
allows to limit the scope of the search based on entry ownership.
This is useful if you only want to search among all publicly downloadable
entries or only among your own entries, etc.
These are the possible owner values and their meaning:
admin
: No restriction. Only usable by an admin user.all
: Published entries (with or without embargo), or entries that belong to you or are shared with you.public
: Published entries without embargo.shared
: Entries that belong to you or are shared with you.staging
: Unpublished entries that belong to you or are shared with you.user
: Entries that belong to you.visible
: Published entries without embargo, or unpublished entries that belong to you or are shared with you.
Queries¶
A query can be a very simple list of parameters. Different parameters or values of the same parameter are combined with a logical and. The following query would search for all entries that are VASP calculations, contain Na and Cl, and are authored by Stefano Curtarolo and Chris Wolverton.
{
"results.material.elements": ["Na", "Cl"],
"results.method.simulation.program_name": "VASP",
"authors": ["Stefano Curtarolo", "Chris Wolverton"]
}
A short cut to change the logical combination of values in a list, is to
add a suffix to the quantity :any
:
{
"results.material.elements": ["Na", "Cl"],
"results.method.simulation.program_name": "VASP",
"authors:any": ["Stefano Curtarolo", "Chris Wolverton"]
}
Otherwise, you can also write complex logical combinations of parameters like this:
{
"and": [
{
"or": [
{
"results.material.elements": ["Cl", "Na"]
},
{
"results.material.elements": ["H", "O"]
}
]
},
{
"not": {
"results.material.symmetry.crystal_system": "cubic"
}
}
]
}
none:
and any:
(the default).
By default all quantity values have to equal the given values to match. For some values you can also use comparison operators like this:
{
"upload_create_time": {
"gt": "2020-01-01",
"lt": "2020-08-01"
},
"results.properties.geometry_optimization.final_energy_difference": {
"lte": 1.23e-18
}
}
or shorter with suffixes:
{
"upload_create_time:gt": "2020-01-01",
"upload_create_time:lt": "2020-08-01",
"results.properties.geometry_optimization.final_energy_difference:lte": 1.23e-18
}
The searchable quantities are a subset of the NOMAD Archive quantities defined in the NOMAD Metainfo. The searchable quantities also depend on the API endpoint.
There is also an additional query parameter that you can use to formulate queries based on the optimade filter language:
Pagination¶
When you issue a query, usually not all results can be returned. Instead, an API returns
only one page. This behavior is controlled through pagination parameters,
like page_site
, page
, page_offset
, or page_after_value
.
Let's consider a search for entries as an example.
response = requests.post(
f'{base_url}/entries/query',
json={
'query': {
'results.material.elements': {
'all': ['Ti', 'O']
}
},
'pagination': {
'page_size': 10
}
}
)
This will only result in a response with a maximum of 10 entries. The response will contain a
pagination
object like this:
{
"page_size": 10,
"order_by": "entry_id",
"order": "asc",
"total": 17957,
"next_page_after_value": "--SZVYOxA2jTu_L-mSxefSQFmeyF"
}
In this case, the pagination is based on after values. This means that the search can
be continued with a follow up request at a certain point characterized by the
next_page_after_value
. If you follow up with:
response = requests.post(
f'{base_url}/entries/query',
json={
'query': {
'results.material.elements': {
'all': ['Ti', 'O']
}
},
'pagination': {
'page_size': 10,
'page_after_value': '--SZVYOxA2jTu_L-mSxefSQFmeyF'
}
}
)
Here is a full example that collects the first 100 formulas from entries that match a certain query by paginating.
import requests
base_url = 'http://nomad-lab.eu/prod/v1/api/v1'
json_body = {
'query': {
'results.material.elements': {
'all': ['Ti', 'O']
}
},
'pagination': {
'page_size': 10
},
'required': {
'include': ['results.material.chemical_formula_hill']
}
}
formulas = set()
while len(formulas) < 100:
response = requests.post(f'{base_url}/entries/query', json=json_body)
response_json = response.json()
for data in response_json['data']:
formulas.add(data['results']['material']['chemical_formula_hill'])
next_value = response_json['pagination'].get('next_page_after_value')
if not next_value:
break
json_body['pagination']['page_after_value'] = next_value
print(formulas)
Authentication¶
Most of the API operations do not require any authorization and can be freely used without a user or credentials. However, to upload, edit, or view your own and potentially unpublished data, the API needs to authenticate you.
The NOMAD API uses OAuth and tokens to authenticate users. We provide simple operations that allow you to acquire an access token via username and password:
import requests
response = requests.get(
'https://nomad-lab.eu/prod/v1/api/v1/auth/token', params=dict(username='myname', password='mypassword'))
token = response.json()['access_token']
response = requests.get(
'https://nomad-lab.eu/prod/v1/api/v1/uploads',
headers={'Authorization': f'Bearer {token}'})
uploads = response.json()['data']
If you have the NOMAD Python package installed. You can use its Auth
implementation:
import requests
from nomad.client import Auth
response = requests.get(
'https://nomad-lab.eu/prod/v1/api/v1/uploads',
auth=Auth(user='myname or email', password='mypassword'))
uploads = response.json()['data']
To use authentication in the dashboard, simply use the Authorize button. The dashboard GUI will manage the access token and use it while you try out the various operations.
App token¶
If the short-term expiration of the default access token does not suit your needs,
you can request an app token with a user-defined expiration. For example, you can
send the GET request /auth/app_token?expires_in=86400
together with some way of
authentication, e.g. header Authorization: Bearer <access token>
. The API will return
an app token, which is valid for 24 hours in subsequent request headers with the format
Authorization: Bearer <app token>
. The request will be declined if the expiration is
larger than the maximum expiration defined by the API config.
Warning
Despite the name, the app token is used to impersonate the user who requested it. It does not discern between different uses and will only become invalid once it expires (or when the API's secret is changed).
Search for entries¶
See using requests for a typical search example. Combine the different concepts above to create the queries that you need.
Searching for entries is typically just an initial step. Once you know what entries exist you'll probably want to do one of the following things.
Download raw files¶
You can use queries to download raw files, but typically you don't want to download file-by-file or entry-by-entry. Therefore, we allow to download a large set of files in one big zip-file. Here, you might want to use a program like curl to download directly from the shell:
curl "https://nomad-lab.eu/prod/v1/api/v1/entries/raw?results.material.elements=Ti&results.material.elements=O" -o download.zip
Access processed data (archives)¶
Above under using requests, you've already learned how to access
archive data. A special feature of the archive API functions is that you can define what is required
from the archives.
response = requests.post(
f'{base_url}/entries/archive/query',
json={
'query': ...,
'pagination': ...,
'required': {
'workflow': {
'calculation_result_ref': {
'energy': '*',
'system_ref': {
'chemical_composition': '*'
}
}
}
}
})
The required
part allows you to specify what parts of the requested archives
should be returned. The NOMAD Archive is a hierarchical data format and
you can require certain branches (i.e. sections) in the hierarchy.
By specifying certain sections with specific contents or all contents (via
the directive "*"
), you can determine what sections and what quantities should
be returned. The default is the whole archive, i.e., "*"
.
For example to specify that you are only interested in the metadata
use:
Or to only get the energy_total
from each individual entry, use:
You can also request certain parts of a list, e.g. the last calculation:
These required specifications are also very useful to get workflow results. This works because we can use references (e.g. workflow to final result calculation) and the API will resolve these references and return the respective data. For example just the total energy value and reduced formula from the resulting calculation:
{
"workflow": {
"calculation_result_ref": {
"energy": "*",
"system_ref": {
"value": {
"chemical_composition": "*"
}
}
}
}
}
You can also resolve all references in a branch with the include-resolved
directive. This will resolve all references in the branch, and also all references
in referenced sections:
By default, the targets of "resolved" references are added to the archive at
their original hierarchy positions.
This means, all references are still references, but they are resolvable within
the returned data, since they targets are now part of the data. Another option
is to add
"resolve-inplace": true
to the root of required. Here, the reference targets will
replace the references:
You can browse the NOMAD metainfo schema or the archive of each entry (e.g. a VASP example) in the web-interface.
Limits¶
The API allows you to ask many requests in parallel and to put a lot of load on NOMAD servers. Since this can accidentally or deliberately reduce the service quality for other, we have to enforce a few limits.
- rate limit: you can only run a certain amount of requests at the same time
- rate limit: you can only run a certain amount of requests per second
- api limit: many API endpoints will enforce a maximum page size
If you get responses with an HTTP code 503 Service Unavailable, you are hitting a rate limit and you cannot use the service until you fall back into our limits. Consider, to ask fewer requests in a larger time frame.
Rate limits are enforced based on your IP address. Please note that when you or your colleagues are sharing a single external IPs from within a local network, e.g. via NAT, you are also sharing the rate limits. Depending on the NOMAD installation, these limits can be as low as 30 requests per second or 10 concurrent requests.
Consider to use endpoints that allow you to retrieve full pages of resources, instead of endpoints that force you to access resources one at a time. See also the sections on types of data and pagination.
However, pagination also has its limits and you might ask for pages that are too large. If you get responses in the 400 range, e.g. 422 Unprocessable Content or 400 Bad request, you might hit an api limit. Those responses are typically accompanied by an error message in the response body that will inform you about the limit, e.g. the maximum allowed page size.