Skip to content

Using the APIs

Getting started

If you are comfortable with REST APIs and using Pythons requests library, this example demonstrates the basic concepts of NOMAD's main API. You can get more documentation and details on all functions from the API dashboard.

The following issues a search query for all entries that have both Ti and O among the elements of their respective materials. It restricts the results to one entry and only returns the entry_id.

import requests
import json

base_url = 'http://nomad-lab.eu/prod/v1/api/v1'

response = requests.post(
    f'{base_url}/entries/query',
    json={
        'query': {
            'results.material.elements': {
                'all': ['Ti', 'O']
            }
        },
        'pagination': {
            'page_size': 1
        },
        'required': {
            'include': ['entry_id']
        }
    })
response_json = response.json()
print(json.dumps(response.json(), indent=2))

This will give you something like this:

{
  "owner": "public",
  "query": {
    "name": "results.material.elements",
    "value": {
      "all": [
        "Ti",
        "O"
      ]
    }
  },
  "pagination": {
    "page_size": 1,
    "order_by": "entry_id",
    "order": "asc",
    "total": 17957,
    "next_page_after_value": "--SZVYOxA2jTu_L-mSxefSQFmeyF"
  },
  "required": {
    "include": [
      "entry_id"
    ]
  },
  "data": [
    {
      "entry_id": "--SZVYOxA2jTu_L-mSxefSQFmeyF"
    }
  ]
}

The entry_id is a unique identifier for, well, entries. You can use it to access other entry data. For example, you want to access the entry's archive. More precisely, you want to gather the formula and energies from the main workflow result. The following requests the archive based on the entry_id and only requires some archive sections.

first_entry_id = response_json['data'][0]['entry_id']
response = requests.post(
    f'{base_url}/entries/{first_entry_id}/archive/query',
    json={
        'required': {
            'workflow': {
                'calculation_result_ref': {
                    'energy': '*',
                    'system_ref': {
                        'chemical_composition': '*'
                    }
                }
            }
        }
    })
response_json = response.json()
print(json.dumps(response_json, indent=2))

The result will look like this:

{
  "required": {
    "workflow": {
      "calculation_result_ref": {
        "energy": "*",
        "system_ref": {
          "chemical_composition": "*"
        }
      }
    }
  },
  "entry_id": "--SZVYOxA2jTu_L-mSxefSQFmeyF",
  "data": {
    "entry_id": "--SZVYOxA2jTu_L-mSxefSQFmeyF",
    "upload_id": "YXUIZpw5RJyV3LAsFI2MmQ",
    "parser_name": "parsers/fhi-aims",
    "archive": {
      "run": [
        {
          "system": [
            {
              "chemical_composition": "OOSrTiOOOSrTiOOOSrTiOFF"
            }
          ],
          "calculation": [
            {
              "energy": {
                "fermi": -1.1363378335891879e-18,
                "total": {
                  "value": -5.697771591896252e-14
                },
                "correlation": {
                  "value": -5.070133798617076e-17
                },
                "exchange": {
                  "value": -2.3099755059272454e-15
                },
                "xc": {
                  "value": -2.360676843913416e-15
                },
                "xc_potential": {
                  "value": 3.063766944960246e-15
                },
                "free": {
                  "value": -5.697771595558439e-14
                },
                "sum_eigenvalues": {
                  "value": -3.3841806795825544e-14
                },
                "total_t0": {
                  "value": -5.697771593727346e-14
                },
                "correction_entropy": {
                  "value": -1.8310927833270112e-23
                },
                "correction_hartree": {
                  "value": -4.363790430157292e-17
                },
                "correction_xc": {
                  "value": -2.3606768439090564e-15
                }
              },
              "system_ref": "/run/0/system/0"
            }
          ]
        }
      ],
      "workflow": [
        {
          "calculation_result_ref": "/run/0/calculation/0"
        }
      ]
    }
  }
}

You can work with the results in the given JSON (or respective Python dict/list) data already. If you have NOMAD's Python library installed , you can take the archive data and use the Python interface. The Python interface will help with code-completion (e.g. in notebook environments), resolve archive references (e.g. from workflow to calculation to system), and allow unit conversion:

from nomad.datamodel import EntryArchive
from nomad.metainfo import units

archive = EntryArchive.m_from_dict(response_json['data']['archive'])
result = archive.workflow[0].calculation_result_ref
print(result.system_ref.chemical_composition)
print(result.energy.total.value.to(units('eV')))

This will give you an output like this:

OOSrTiOOOSrTiOOOSrTiOFF
-355626.93095025205 electron_volt

Overview

NOMAD offers all its functionality through application programming interfaces (APIs). More specifically RESTful HTTP APIs that allows you to use NOMAD as a set of resources (think data) that can be uploaded, accessed, downloaded, searched for, etc. via HTTP requests.

You can get an overview on all NOMAD APIs on the API page. We will focus here on NOMAD's main API (v1). In fact, this API is also used by the web interface and should provide everything you need.

There are different tools and libraries to use the NOMAD API that come with different trade-offs between expressiveness, learning curve, and convenience.

You can use your browser

For example to see the metadata for all entries with elements Ti and O go here: http://localhost:8000/fairdi/nomad/latest/api/v1/entries?elements=Ti&elements=O

Use curl or wget

REST API's use resources located via URLs. You access URLs with curl or wget. Same Ti, O example as before:

curl "http://localhost:8000/fairdi/nomad/latest/api/v1/entries?results.material.elements=Ti&results.material.elements=O" | python -m json.tool

Use Python and requests

Requests is a popular Python library to use the internets HTTP protocol that is used to communicate with REST APIs. Install with pip install requests. See the initial example.

Use our dashboard

The NOMAD API has an OpenAPI dashboard. This is an interactive documentation of all API functions that allows you to try these functions in the browser.

Use NOMAD's Python package

Install the NOMAD Python client library and use it's ArchiveQuery functionality for a more convenient query based access of archive data.

Different kinds of data

We distinguish between different kinds of NOMAD data and there are different functions in the API:

  • Entry metadata, a summary of extracted data for an entry.
  • Raw files, the files as they were uploaded to NOMAD.
  • Archive data, all of the extracted data for an entry.

There are also different entities (see also Datamodel) with different functions in the API:

  • Entries
  • Uploads
  • Datasets
  • Users

The API URLs typically start with the entity, followed by the kind of data. Examples are:

  • entries/query - Query entries for metadata
  • entries/archive/query - Query entries for archive data
  • entries/{entry-id}/raw - Download raw data for a specific entry
  • uploads/{upload-id}/raw/path/to/file - Download a specific file of an upload

Common concepts

The initial example above, showed how to execute a basic search. This includes some fundamental concepts that can be applied to many parts of the API. Let's discuss some of the common concepts.

Response layout

Functions that have a JSON response, will have a common layout. First, the response will contain all keys and values of the request. The request is not repeated verbatim, but in a normalized form. Abbreviations in search queries might be expanded, default values for optional parameters are added, or additional response specific information is included. Second, the response will contain the results under the key data.

Owner

All functions that allow a query will also allow to specify the owner. Depending on the API function, its default value will be mostly visible. Some values are only available if you are logged in.

The owner allows to limit the scope of the searched based on entry ownership. This is useful, if you only want to search among all publically downloadable entries, or only among your own entries, etc.

These are the possible owner values and their meaning:

  • public: Consider all entries that can be publically downloaded, i.e. only published entries without embargo.
  • user: Only consider entries that belong to you.
  • shared: Only consider entries that belong to you or are shared with you.
  • visible: Consider all entries that are visible to you. This includes entries with embargo or unpublished entries that belong to you or are shared with you.
  • staging: Only search through unpublished entries.
  • all: Consider all entries.

Queries

A query can be a very simple list of parameters. Different parameters or values of the same parameter are combined with a logical and. The following query would search for all entries that are VASP calculations, contain Na and Cl, and are authored by Stefano Curtarolo and Chris Wolverton.

{
    "results.material.elements": ["Na", "Cl"],
    "results.method.simulation.program_name": "VASP",
    "authors": ["Stefano Curtarolo", "Chris Wolverton"]
}

A short cut to change the logical combination of values in a list, is to add a suffix to the quantity :any:

{
    "results.material.elements": ["Na", "Cl"],
    "results.method.simulation.program_name": "VASP",
    "authors:any": ["Stefano Curtarolo", "Chris Wolverton"]
}

Otherwise, you can also write complex logical combinations of parameters like this:

{
    "and": [
        {
            "or": [
                {
                    "results.material.elements": ["Cl", "Na"]
                },
                {
                    "results.material.elements": ["H", "O"]
                }
            ]
        },
        {
            "not": {
                "results.material.symmetry.crystal_system": "cubic"
            }
        }
    ]
}
Other short-cut prefixes are none: and any: (the default).

By default all quantity values have to equal the given values to match. For some values you can also use comparison operators like this:

{
    "upload_create_time": {
        "gt": "2020-01-01",
        "lt": "2020-08-01"
    },
    "results.properties.geometry_optimization.final_energy_difference": {
        "lte": 1.23e-18
    }
}

or shorter with suffixes:

{
    "upload_create_time:gt": "2020-01-01",
    "upload_create_time:lt": "2020-08-01",
    "results.properties.geometry_optimization.final_energy_difference:lte": 1.23e-18
}

The searchable quantities are a subset of the NOMAD Archive quantities defined in the NOMAD Metainfo. The searchable quantities also depend on the API endpoint.

There is also an additional query parameter that you can use to formulate queries based on the optimade filter language:

{
    "optimade_filter": "nelements >= 2 AND elements HAS ALL 'Ti', 'O'"
}

Pagination

When you issue a query, usually not all results can be returned. Instead, an API returns only one page. This behavior is controlled through pagination parameters, like page_site, page, page_offset, or page_after_value.

Let's consider a search for entries as an example.

response = requests.post(
    f'{base_url}/entries/query',
    json={
        'query': {
            'results.material.elements': {
                'all': ['Ti', 'O']
            }
        },
        'pagination': {
            'page_size': 10
        }
    }
)

This will only result in a response with a maximum of 10 entries. The response will contain a pagination object like this:

{
    "page_size": 10,
    "order_by": "entry_id",
    "order": "asc",
    "total": 17957,
    "next_page_after_value": "--SZVYOxA2jTu_L-mSxefSQFmeyF"
}

In this case, the pagination is based on after values. This means that the search can be continued with a follow up request at a certain point characterized by the next_page_after_value. If you follow up with:

response = requests.post(
    f'{base_url}/entries/query',
    json={
        'query': {
            'results.material.elements': {
                'all': ['Ti', 'O']
            }
        },
        'pagination': {
            'page_size': 10,
            'page_after_value': '--SZVYOxA2jTu_L-mSxefSQFmeyF'
        }
    }
)
You will get the next 10 results.

Authentication

Most of the API operations do not require any authorization and can be freely used without a user or credentials. However, to upload, edit, or view your own and potentially unpublished data, the API needs to authenticate you.

The NOMAD API uses OAuth and tokens to authenticate users. We provide simple operations that allow you to acquire an access token via username and password:

import requests

response = requests.get(
    'http://localhost:8000/fairdi/nomad/latest/api/v1/auth/token', params=dict(username='myname', password='mypassword'))
token = response.json()['access_token']

response = requests.get(
    'http://localhost:8000/fairdi/nomad/latest/api/v1/uploads',
    headers={'Authorization': f'Bearer {token}'})
uploads = response.json()['data']

If you have the NOMAD Python package installed. You can use its Auth implementation:

import requests
from nomad.client import Auth

response = requests.get(
    'http://localhost:8000/fairdi/nomad/latest/api/v1/uploads',
    auth=Auth(user='myname or email', password='mypassword'))
uploads = response.json()['data']

To use authentication in the dashboard, simply use the Authorize button. The dashboard GUI will manage the access token and use it while you try out the various operations.

Search for entries

See getting started for a typical search example. Combine the different concepts above to create the queries that you need.

Searching for entries is typically just an initial step. Once you know what entries exist you'll probably want to do one of the following things.

Download raw files

You can use queries to download raw files, but typically you don't want to download file-by-file or entry-by-entry. Therefore, we allow to download a large set of files in one big zip-file. Here, you might want to use a program like curl to download directly from the shell:

curl "http://localhost:8000/fairdi/nomad/latest/api/v1/entries/raw?results.material.elements=Ti&results.material.elements=O" -o download.zip

Access archives

Above under getting started, you've already learned how to access archive data. A special feature of the archive API functions is that you can define what is required from the archives.

response = requests.post(
    f'{base_url}/entries/archive/query',
    json={
        'query': ...,
        'pagination': ...,
        'required': {
            'workflow': {
                'calculation_result_ref': {
                    'energy': '*',
                    'system_ref': {
                        'chemical_composition': '*'
                    }
                }
            }
        }
    })

The required part allows you to specify what parts of the requested archives should be returned. The NOMAD Archive is a hierarchical data format and you can require certain branches (i.e. sections) in the hierarchy. By specifying certain sections with specific contents or all contents (via the directive "*"), you can determine what sections and what quantities should be returned. The default is the whole archive, i.e., "*".

For example to specify that you are only interested in the metadata use:

{
    "metadata": "*"
}

Or to only get the energy_total from each individual entry, use:

{
    "run": {
        "configuration": {
            "energy": "*"
        }
    }
}

You can also request certain parts of a list, e.g. the last calculation:

{
    "run": {
        "calculation[-1]": "*"
    }
}

These required specifications are also very useful to get workflow results. This works because we can use references (e.g. workflow to final result calculation) and the API will resolve these references and return the respective data. For example just the total energy value and reduced formula from the resulting calculation:

{
    "workflow": {
        "calculation_result_ref": {
            "energy": "*",
            "system_ref": {
                "value": {
                    "chemical_composition": "*"
                }
            }
        }
    }
}

You can also resolve all references in a branch with the include-resolved directive. This will resolve all references in the branch, and also all references in referenced sections:

{
    "workflow":
        "calculation_result_ref": "include-resolved"
    }
}

By default, the targets of "resolved" references are added to the archive at their original hierarchy positions. This means, all references are still references, but they are resolvable within the returned data, since they targets are now part of the data. Another option is to add "resolve-inplace": true to the root of required. Here, the reference targets will replace the references:

{
    "resolve-inplace": true,
    "workflow":
        "calculation_result_ref": "include-resolved"
    }
}

You can browse the NOMAD metainfo schema or the archive of each entry (e.g. a VASP example) in the web-interface.