Python Reference

nomad.metainfo

nomad.config

This module describes all configurable parameters for the nomad python code. The configuration is used for all executed python code including API, worker, CLI, and other scripts. To use the configuration in your own scripts or new modules, simply import this module.

All parameters are structured into objects for two reasons. First, to have categories. Second, to allow runtime manipulation that is not effected by python import logic. The categories are choosen along infrastructure components: mongo, elastic, etc.

This module also provides utilities to read the configuration from environment variables and .yaml files. This is done automatically on import. The precedence is env over .yaml over defaults.

class nomad.config.NomadConfig(**kwargs)

A class for configuration categories. It is a dict subclass that uses attributes as key/value pairs.

__init__(**kwargs)

Initialize self. See help(type(self)) for accurate signature.

nomad.config.apply(key, value) → None

Changes the config according to given key and value. The keys are interpreted as paths to config values with _ as a separator. E.g. fs_staging leading to config.fs.staging

nomad.config.load_config(config_file: str = 'nomad.yaml') → None

Loads the configuration from the config_file and environment.

Parameters

config_file – Override the configfile, default is file stored in env variable NOMAD_CONFIG or nomad.yaml.

nomad.infrastructure

This module provides function to establish connections to the database, searchengine, etc. infrastructure services. Usually everything is setup at once with setup(). This is run once for each api and worker process. Individual functions for partial setups exist to facilitate testing, aspects of nomad.cli, etc.

nomad.infrastructure.elastic_client = None

The elastic search client.

nomad.infrastructure.mongo_client = None

The pymongo mongodb client.

nomad.infrastructure.setup()

Uses the current configuration (nomad/config.py and environment) to setup all the infrastructure services (repository db, mongo, elastic search) and logging. Will create client instances for the databases and has to be called before they can be used.

nomad.infrastructure.setup_files()
nomad.infrastructure.setup_mongo(client=False)

Creates connection to mongodb.

nomad.infrastructure.setup_elastic()

Creates connection to elastic search.

class nomad.infrastructure.Keycloak

A class that encapsulates all keycloak related functions for easier mocking and configuration

__init__()

Initialize self. See help(type(self)) for accurate signature.

authorize_flask(basic: bool = True) → str

Authorizes the current flask request with keycloak. Uses either Bearer or Basic authentication, depending on available headers in the request. Bearer auth is basically offline (besides retrieving and caching keycloaks public key for signature validation). Basic auth causes authentication agains keycloak with each request.

Will set g.user, either with None or user data from the respective OIDC token.

Returns: An error message or None

add_user(user, bcrypt_password=None, invite=False)

Adds the given nomad.datamodel.User instance to the configured keycloak realm using the keycloak admin API.

search_user(query: str = None, max=1000, **kwargs)
get_user(user_id: str = None, username: str = None, user=None) → object

Retrives all available information about a user from the keycloak admin interface. This must be used to retrieve complete user information, because the info solely gathered from tokens (i.e. for the authenticated user g.user) is generally incomplete.

property access_token
nomad.infrastructure.reset(remove: bool)

Resets the databases mongo, elastic/calcs, and all files. Be careful. In contrast to remove(), it will only remove the contents of dbs and indicies. This function just attempts to remove everything, there is no exception handling or any warranty it will succeed.

Parameters

remove – Do not try to recreate empty databases, remove entirely.

nomad.infrastructure.send_mail(name: str, email: str, message: str, subject: str)

Used to programmatically send mails.

Parameters
  • name – The email recipient name.

  • email – The email recipient address.

  • messsage – The email body.

  • subject – The subject line.

nomad.datamodel

Introduction

This is about the datamodel that is used to represent NOMAD entries in our databases and search engines. The respective data, also known as (repository) metadata is also part of the NOMAD Archive and the datamodel is also defined based on the NOMAD Metainfo (section metadata). It covers all information that users can search for and that can be easily rendered on the GUI. The information is readily available through the repo API.

_images/data.png

See also the datamodel section in the introduction.

This module contains classes that allow to represent the core nomad data entities (entries/calculations, users, datasets) on a high level of abstraction independent from their representation in the different modules nomad.processing, nomad.parsing, nomad.search, nomad.app.

Datamodel entities

The entities in the datamodel are defined as NOMAD Metainfo sections. They are treated similarily to all Archive data. The entry/calculation datamodel data is created during processing. It is not about representing every detail, but those parts that are directly involved in api, processing, mirroring, or other ‘infrastructure’ operations.

The class User is used to represent users and their attributes.

class nomad.datamodel.User(m_def: Optional[nomad.metainfo.metainfo.Section] = None, m_resource: nomad.metainfo.metainfo.MResource = None, **kwargs)

A NOMAD user.

Typically a NOMAD user has a NOMAD account. The user related data is managed by NOMAD keycloak user-management system. Users are used to denote uploaders, authors, people to shared data with embargo with, and owners of datasets.

Parameters
  • user_id – The unique, persistent keycloak UUID

  • username – The unique, persistent, user chosen username

  • first_name – The users first name (including all other given names)

  • last_name – The users last name

  • affiliation – The name of the company and institutes the user identifies with

  • affiliation_address – The address of the given affiliation

  • create – The time the account was created

  • repo_user_id – The id that was used to identify this user in the NOMAD CoE Repository

  • is_admin – Bool that indicated, iff the user the use admin user

user_id

An optimized replacement for Quantity suitable for primitive properties.

username

An optimized replacement for Quantity suitable for primitive properties.

created

Definition of an atomic piece of data.

Quantity definitions are the main building block of meta-info schemas. Each quantity represents a single piece of data.

To define quantities, use objects of this class as classattribute values in section classes. The name of a quantity is automatically taken from its section class attribute. You can provide all other attributes to the constructor with keyword arguments

See Sections to learn about section classes. In Python terms, Quantity is a descriptor. Descriptors define how to get and set attributes in a Python object. This allows us to use sections like regular Python objects and quantity like regular Python attributes.

Beyond basic Definition attributes, Quantities are defined with the following attributes.

Parameters
  • type

    Defines the datatype of quantity values. This is the type of individual elements in a potentially complex shape. If you define a list of integers for example, the shape would be list and the type integer: Quantity(type=int, shape=['0..*']).

    The type can be one of:

    • a build-in primitive Python type: int, str, bool, float

    • an instance of MEnum, e.g. MEnum('one', 'two', 'three')

    • a section to define references to other sections as quantity values

    • a custom meta-info DataType, see Environments

    • a numpy dtype, e.g. np.dtype('float32')

    • typing.Any to support any value

    If set to dtype, this quantity will use a numpy array or scalar to store values internally. If a regular (nested) Python list or Python scalar is given, it will be automatically converted. The given dtype will be used in the numpy value.

    To define a reference, either a section class or instance of Section can be given. See Sections for details. Instances of the given section constitute valid values for this type. Upon serialization, references section instance will represented with metainfo URLs. See References and metainfo URLs.

    For quantities with more than one dimension, only numpy arrays and dtypes are allowed.

  • shape

    The shape of the quantity. It defines its dimensionality.

    A shape is a list, where each item defines one dimension. Each dimension can be:

    • an integer that defines the exact size of the dimension, e.g. [3] is the shape of a 3D spacial vector

    • a string that specifies a possible range, e.g. 0..*, 1..*, 3..6

    • the name of an int typed and shapeless quantity in the same section which values define the length of this dimension, e.g. number_of_atoms defines the length of atom_positions

    Range specifications define lower and upper bounds for the possible dimension length. The * can be used to denote an arbitrarily high upper bound.

    Quantities with dimensionality (length of the shape) higher than 1, must be numpy arrays. Theire type must be a dtype.

  • unit

    The physics unit for this quantity. It is optional.

    Units are represented with the pint_ Python package. Pint defines units and their algebra. You can either use pint units directly, e.g. units.m / units.s. The metainfo provides a preconfigured pint unit registry units. You can also provide the unit as pint parsable string, e.g. 'meter / seconds' or 'm/s'.

  • default

    The default value for this quantity. The value must match type and shape.

    Be careful with a default value like [] as it will be the default value for all occurrences of this quantity.

  • synonym_for – The name of a quantity defined in the same section as string. This will make this quantity a synonym for the other quantity. All other properties (type, shape, unit, etc.) are ignored. Getting or setting from/to this quantity will be delegated to the other quantity. Synonyms are always virtual.

  • derived – A Python callable that takes the containing section as input and outputs the value for this quantity. This quantity cannot be set directly, its value is only derived by the given callable. The callable is executed when this quantity is get. Derived quantities are always virtual.

  • cached – A bool indicating that derived values should be cached unless the underlying section has changed.

  • virtual – A boolean that determines if this quantity is virtual. Virtual quantities can be get/set like regular quantities, but their values are not (de-)serialized, hence never permanently stored.

  • is_scalar – Derived quantity that is True, iff this quantity has shape of length 0

repo_user_id

Optional, legacy user id from the old NOMAD CoE repository.

is_admin

Definition of an atomic piece of data.

Quantity definitions are the main building block of meta-info schemas. Each quantity represents a single piece of data.

To define quantities, use objects of this class as classattribute values in section classes. The name of a quantity is automatically taken from its section class attribute. You can provide all other attributes to the constructor with keyword arguments

See Sections to learn about section classes. In Python terms, Quantity is a descriptor. Descriptors define how to get and set attributes in a Python object. This allows us to use sections like regular Python objects and quantity like regular Python attributes.

Beyond basic Definition attributes, Quantities are defined with the following attributes.

Parameters
  • type

    Defines the datatype of quantity values. This is the type of individual elements in a potentially complex shape. If you define a list of integers for example, the shape would be list and the type integer: Quantity(type=int, shape=['0..*']).

    The type can be one of:

    • a build-in primitive Python type: int, str, bool, float

    • an instance of MEnum, e.g. MEnum('one', 'two', 'three')

    • a section to define references to other sections as quantity values

    • a custom meta-info DataType, see Environments

    • a numpy dtype, e.g. np.dtype('float32')

    • typing.Any to support any value

    If set to dtype, this quantity will use a numpy array or scalar to store values internally. If a regular (nested) Python list or Python scalar is given, it will be automatically converted. The given dtype will be used in the numpy value.

    To define a reference, either a section class or instance of Section can be given. See Sections for details. Instances of the given section constitute valid values for this type. Upon serialization, references section instance will represented with metainfo URLs. See References and metainfo URLs.

    For quantities with more than one dimension, only numpy arrays and dtypes are allowed.

  • shape

    The shape of the quantity. It defines its dimensionality.

    A shape is a list, where each item defines one dimension. Each dimension can be:

    • an integer that defines the exact size of the dimension, e.g. [3] is the shape of a 3D spacial vector

    • a string that specifies a possible range, e.g. 0..*, 1..*, 3..6

    • the name of an int typed and shapeless quantity in the same section which values define the length of this dimension, e.g. number_of_atoms defines the length of atom_positions

    Range specifications define lower and upper bounds for the possible dimension length. The * can be used to denote an arbitrarily high upper bound.

    Quantities with dimensionality (length of the shape) higher than 1, must be numpy arrays. Theire type must be a dtype.

  • unit

    The physics unit for this quantity. It is optional.

    Units are represented with the pint_ Python package. Pint defines units and their algebra. You can either use pint units directly, e.g. units.m / units.s. The metainfo provides a preconfigured pint unit registry units. You can also provide the unit as pint parsable string, e.g. 'meter / seconds' or 'm/s'.

  • default

    The default value for this quantity. The value must match type and shape.

    Be careful with a default value like [] as it will be the default value for all occurrences of this quantity.

  • synonym_for – The name of a quantity defined in the same section as string. This will make this quantity a synonym for the other quantity. All other properties (type, shape, unit, etc.) are ignored. Getting or setting from/to this quantity will be delegated to the other quantity. Synonyms are always virtual.

  • derived – A Python callable that takes the containing section as input and outputs the value for this quantity. This quantity cannot be set directly, its value is only derived by the given callable. The callable is executed when this quantity is get. Derived quantities are always virtual.

  • cached – A bool indicating that derived values should be cached unless the underlying section has changed.

  • virtual – A boolean that determines if this quantity is virtual. Virtual quantities can be get/set like regular quantities, but their values are not (de-)serialized, hence never permanently stored.

  • is_scalar – Derived quantity that is True, iff this quantity has shape of length 0

static get(*args, **kwargs) → nomad.datamodel.datamodel.User
m_def: Section = nomad.datamodel.datamodel.User:Section

The class Dataset is used to represent datasets and their attributes.

class nomad.datamodel.Dataset(m_def: Optional[nomad.metainfo.metainfo.Section] = None, m_resource: nomad.metainfo.metainfo.MResource = None, **kwargs)

A Dataset is attached to one or many entries to form a set of data.

Parameters
  • dataset_id – The unique identifier for this dataset as a string. It should be a randomly generated UUID, similar to other nomad ids.

  • name – The human readable name of the dataset as string. The dataset name must be unique for the user.

  • user_id – The unique user_id of the owner and creator of this dataset. The owner must not change after creation.

  • doi – The optional Document Object Identifier (DOI) associated with this dataset. Nomad can register DOIs that link back to the respective representation of the dataset in the nomad UI. This quantity holds the string representation of this DOI. There is only one per dataset. The DOI is just the DOI name, not its full URL, e.g. “10.17172/nomad/2019.10.29-1”.

  • pid – The original NOMAD CoE Repository dataset PID. Old DOIs still reference datasets based on this id. Is not used for new datasets.

  • created – The date when the dataset was first created.

m_def: Section = nomad.datamodel.datamodel.Dataset:Section
dataset_id

An optimized replacement for Quantity suitable for primitive properties.

name

An optimized replacement for Quantity suitable for primitive properties.

user_id

An optimized replacement for Quantity suitable for primitive properties.

doi

An optimized replacement for Quantity suitable for primitive properties.

pid

An optimized replacement for Quantity suitable for primitive properties.

created

Definition of an atomic piece of data.

Quantity definitions are the main building block of meta-info schemas. Each quantity represents a single piece of data.

To define quantities, use objects of this class as classattribute values in section classes. The name of a quantity is automatically taken from its section class attribute. You can provide all other attributes to the constructor with keyword arguments

See Sections to learn about section classes. In Python terms, Quantity is a descriptor. Descriptors define how to get and set attributes in a Python object. This allows us to use sections like regular Python objects and quantity like regular Python attributes.

Beyond basic Definition attributes, Quantities are defined with the following attributes.

Parameters
  • type

    Defines the datatype of quantity values. This is the type of individual elements in a potentially complex shape. If you define a list of integers for example, the shape would be list and the type integer: Quantity(type=int, shape=['0..*']).

    The type can be one of:

    • a build-in primitive Python type: int, str, bool, float

    • an instance of MEnum, e.g. MEnum('one', 'two', 'three')

    • a section to define references to other sections as quantity values

    • a custom meta-info DataType, see Environments

    • a numpy dtype, e.g. np.dtype('float32')

    • typing.Any to support any value

    If set to dtype, this quantity will use a numpy array or scalar to store values internally. If a regular (nested) Python list or Python scalar is given, it will be automatically converted. The given dtype will be used in the numpy value.

    To define a reference, either a section class or instance of Section can be given. See Sections for details. Instances of the given section constitute valid values for this type. Upon serialization, references section instance will represented with metainfo URLs. See References and metainfo URLs.

    For quantities with more than one dimension, only numpy arrays and dtypes are allowed.

  • shape

    The shape of the quantity. It defines its dimensionality.

    A shape is a list, where each item defines one dimension. Each dimension can be:

    • an integer that defines the exact size of the dimension, e.g. [3] is the shape of a 3D spacial vector

    • a string that specifies a possible range, e.g. 0..*, 1..*, 3..6

    • the name of an int typed and shapeless quantity in the same section which values define the length of this dimension, e.g. number_of_atoms defines the length of atom_positions

    Range specifications define lower and upper bounds for the possible dimension length. The * can be used to denote an arbitrarily high upper bound.

    Quantities with dimensionality (length of the shape) higher than 1, must be numpy arrays. Theire type must be a dtype.

  • unit

    The physics unit for this quantity. It is optional.

    Units are represented with the pint_ Python package. Pint defines units and their algebra. You can either use pint units directly, e.g. units.m / units.s. The metainfo provides a preconfigured pint unit registry units. You can also provide the unit as pint parsable string, e.g. 'meter / seconds' or 'm/s'.

  • default

    The default value for this quantity. The value must match type and shape.

    Be careful with a default value like [] as it will be the default value for all occurrences of this quantity.

  • synonym_for – The name of a quantity defined in the same section as string. This will make this quantity a synonym for the other quantity. All other properties (type, shape, unit, etc.) are ignored. Getting or setting from/to this quantity will be delegated to the other quantity. Synonyms are always virtual.

  • derived – A Python callable that takes the containing section as input and outputs the value for this quantity. This quantity cannot be set directly, its value is only derived by the given callable. The callable is executed when this quantity is get. Derived quantities are always virtual.

  • cached – A bool indicating that derived values should be cached unless the underlying section has changed.

  • virtual – A boolean that determines if this quantity is virtual. Virtual quantities can be get/set like regular quantities, but their values are not (de-)serialized, hence never permanently stored.

  • is_scalar – Derived quantity that is True, iff this quantity has shape of length 0

The class MongoMetadata is used to tag metadata stored in mongodb.

class nomad.datamodel.MongoMetadata

NOMAD entry quantities that are stored in mongodb and not necessarely in the archive.

m_def: Category = nomad.datamodel.datamodel.MongoMetadata:Category

The class EntryMetadata is used to represent all metadata about an entry.

class nomad.datamodel.EntryMetadata(m_def: Optional[nomad.metainfo.metainfo.Section] = None, m_resource: nomad.metainfo.metainfo.MResource = None, **kwargs)
upload_id

The upload_id of the calculations upload (random UUID).

calc_id

The unique mainfile based calculation id.

calc_hash

The raw file content based checksum/hash of this calculation.

pid

The unique persistent id of this calculation.

mainfile

The upload relative mainfile path.

domain

Must be the key for a registered domain. This determines which actual subclass is instantiated.

files

A list of all files, relative to upload.

upload_time

The time when the calc was uploaded.

uploader

An object describing the uploading user, has at least user_id

processed

Boolean indicating if this calc was successfully processed and archive data and calc metadata is available.

last_processing

A datatime with the time of the last successful processing.

nomad_version

A string that describes the version of the nomad software that was used to do the last successful processing.

comment

An arbitrary string with user provided information about the entry.

references

A list of URLs for resources that are related to the entry.

uploader

Id of the uploader of this entry.

coauthors

Ids of all co-authors (excl. the uploader) of this entry. Co-authors are shown as authors of this entry alongside its uploader.

shared_with

Ids of all users that this entry is shared with. These users can find, see, and download all data for this entry, even if it is in staging or has an embargo.

with_embargo

Entries with embargo are only visible to the uploader, the admin user, and users the entry is shared with (see shared_with).

upload_time

The time that this entry was uploaded

datasets

Ids of all datasets that this entry appears in

m_def: Section = nomad.datamodel.datamodel.EntryMetadata:Section
upload_id

The persistent and globally unique identifier for the upload of the entry

calc_id

A persistent and globally unique identifier for the entry

calc_hash

A raw file content based checksum/hash

mainfile

The path to the mainfile from the root directory of the uploaded files

files

The paths to the files within the upload that belong to this entry. All files within the same directory as the entry’s mainfile are considered the auxiliary files that belong to the entry.

pid

The unique, sequentially enumerated, integer PID that was used in the legacy NOMAD CoE. It allows to resolve URLs of the old NOMAD CoE Repository.

raw_id

The code specific identifier extracted from the entrie’s raw files if such an identifier is supported by the underlying code

domain

The material science domain

published

Indicates if the entry is published

processed

Indicates that the entry is successfully processed.

last_processing

The datetime of the last processing

nomad_version

The NOMAD version used for the last processing

nomad_commit

The NOMAD commit used for the last processing

parser_name

The NOMAD parser used for the last processing

comment

A user provided comment for this entry

references

User provided references (URLs) for this entry

external_db

The repository or external database where the original data resides

uploader

The uploader of the entry

origin

A short human readable description of the entries origin. Usually it is the handle of an external database/repository or the name of the uploader.

coauthors

A user provided list of co-authors

authors

All authors (uploader and co-authors)

shared_with

A user provided list of userts to share the entry with

owners

All owner (uploader and shared with users)

with_embargo

Indicated if this entry is under an embargo

upload_time

The date and time this entry was uploaded to nomad

upload_name

The user provided upload name

datasets

A list of user curated datasets this entry belongs to.

external_id

A user provided external id. Usually the id for an entry in an external database where the data was imported from.

last_edit

The date and time the user metadata was edited last

formula

A (reduced) chemical formula

atoms

The atom labels of all atoms of the entry’s material

only_atoms

The atom labels concatenated in order-number order

n_atoms

The number of atoms in the entry’s material

ems

Defines what sections can appear as sub-sections of another section.

Like quantities, sub-sections are defined in a section class as attributes of this class. An like quantities, each sub-section definition becomes a property of the corresponding section definition (parent). A sub-section definition references another section definition as the sub-section (child). As a consequence, parent section instances can contain child section instances as sub-sections.

Contrary to the old NOMAD metainfo, we distinguish between sub-section the section and sub-section the property. This allows to use on child section definition as sub-section of many different parent section definitions.

Parameters
  • sub_section – A Section or Python class object for a section class. This will be the child section definition. The defining section the child section definition.

  • repeats – A boolean that determines wether this sub-section can appear multiple times in the parent section.

dft

Defines what sections can appear as sub-sections of another section.

Like quantities, sub-sections are defined in a section class as attributes of this class. An like quantities, each sub-section definition becomes a property of the corresponding section definition (parent). A sub-section definition references another section definition as the sub-section (child). As a consequence, parent section instances can contain child section instances as sub-sections.

Contrary to the old NOMAD metainfo, we distinguish between sub-section the section and sub-section the property. This allows to use on child section definition as sub-section of many different parent section definitions.

Parameters
  • sub_section – A Section or Python class object for a section class. This will be the child section definition. The defining section the child section definition.

  • repeats – A boolean that determines wether this sub-section can appear multiple times in the parent section.

qcms

Defines what sections can appear as sub-sections of another section.

Like quantities, sub-sections are defined in a section class as attributes of this class. An like quantities, each sub-section definition becomes a property of the corresponding section definition (parent). A sub-section definition references another section definition as the sub-section (child). As a consequence, parent section instances can contain child section instances as sub-sections.

Contrary to the old NOMAD metainfo, we distinguish between sub-section the section and sub-section the property. This allows to use on child section definition as sub-section of many different parent section definitions.

Parameters
  • sub_section – A Section or Python class object for a section class. This will be the child section definition. The defining section the child section definition.

  • repeats – A boolean that determines wether this sub-section can appear multiple times in the parent section.

encyclopedia

Defines what sections can appear as sub-sections of another section.

Like quantities, sub-sections are defined in a section class as attributes of this class. An like quantities, each sub-section definition becomes a property of the corresponding section definition (parent). A sub-section definition references another section definition as the sub-section (child). As a consequence, parent section instances can contain child section instances as sub-sections.

Contrary to the old NOMAD metainfo, we distinguish between sub-section the section and sub-section the property. This allows to use on child section definition as sub-section of many different parent section definitions.

Parameters
  • sub_section – A Section or Python class object for a section class. This will be the child section definition. The defining section the child section definition.

  • repeats – A boolean that determines wether this sub-section can appear multiple times in the parent section.

apply_user_metadata(metadata: dict)

Applies a user provided metadata dict to this calc.

apply_domain_metadata(archive)

Used to apply metadata that is related to the domain.

Domains

The datamodel supports different domains. This means that most domain metadata of an entry/calculation is stored in domain-specific sub sections of the EntryMetadata section. We currently have the following domain specific metadata classes/sections:

class nomad.datamodel.dft.DFTMetadata(m_def: Optional[nomad.metainfo.metainfo.Section] = None, m_resource: nomad.metainfo.metainfo.MResource = None, **kwargs)
m_def: Section = nomad.datamodel.dft.DFTMetadata:Section
basis_set

The used basis set functions.

xc_functional

The libXC based xc functional classification used in the simulation.

xc_functional_names

The list of libXC functional names that where used in this entry.

system

The system type of the simulated system.

compound_type

The compound type of the simulated system.

crystal_system

The crystal system type of the simulated system.

spacegroup

The spacegroup of the simulated system as number.

spacegroup_symbol

The spacegroup as international short symbol.

code_name

The name of the used code.

code_version

The version of the used code.

n_geometries

Number of unique geometries.

n_calculations

Number of single configuration calculation sections

n_total_energies

Number of total energy calculations

n_quantities

Number of metainfo quantities parsed from the entry.

quantities

All quantities that are used by this entry.

searchable_quantities

All quantities with existence filters in the search GUI.

geometries

Hashes for each simulated geometry

group_hash

Hashes that describe unique geometries simulated by this code run.

labels

The labels taken from AFLOW prototypes and springer.

labels_springer_compound_class

Springer compund classification.

labels_springer_classification

Springer classification by property.

optimade

Metadata used for the optimade API.

workflow

Definition of an atomic piece of data.

Quantity definitions are the main building block of meta-info schemas. Each quantity represents a single piece of data.

To define quantities, use objects of this class as classattribute values in section classes. The name of a quantity is automatically taken from its section class attribute. You can provide all other attributes to the constructor with keyword arguments

See Sections to learn about section classes. In Python terms, Quantity is a descriptor. Descriptors define how to get and set attributes in a Python object. This allows us to use sections like regular Python objects and quantity like regular Python attributes.

Beyond basic Definition attributes, Quantities are defined with the following attributes.

Parameters
  • type

    Defines the datatype of quantity values. This is the type of individual elements in a potentially complex shape. If you define a list of integers for example, the shape would be list and the type integer: Quantity(type=int, shape=['0..*']).

    The type can be one of:

    • a build-in primitive Python type: int, str, bool, float

    • an instance of MEnum, e.g. MEnum('one', 'two', 'three')

    • a section to define references to other sections as quantity values

    • a custom meta-info DataType, see Environments

    • a numpy dtype, e.g. np.dtype('float32')

    • typing.Any to support any value

    If set to dtype, this quantity will use a numpy array or scalar to store values internally. If a regular (nested) Python list or Python scalar is given, it will be automatically converted. The given dtype will be used in the numpy value.

    To define a reference, either a section class or instance of Section can be given. See Sections for details. Instances of the given section constitute valid values for this type. Upon serialization, references section instance will represented with metainfo URLs. See References and metainfo URLs.

    For quantities with more than one dimension, only numpy arrays and dtypes are allowed.

  • shape

    The shape of the quantity. It defines its dimensionality.

    A shape is a list, where each item defines one dimension. Each dimension can be:

    • an integer that defines the exact size of the dimension, e.g. [3] is the shape of a 3D spacial vector

    • a string that specifies a possible range, e.g. 0..*, 1..*, 3..6

    • the name of an int typed and shapeless quantity in the same section which values define the length of this dimension, e.g. number_of_atoms defines the length of atom_positions

    Range specifications define lower and upper bounds for the possible dimension length. The * can be used to denote an arbitrarily high upper bound.

    Quantities with dimensionality (length of the shape) higher than 1, must be numpy arrays. Theire type must be a dtype.

  • unit

    The physics unit for this quantity. It is optional.

    Units are represented with the pint_ Python package. Pint defines units and their algebra. You can either use pint units directly, e.g. units.m / units.s. The metainfo provides a preconfigured pint unit registry units. You can also provide the unit as pint parsable string, e.g. 'meter / seconds' or 'm/s'.

  • default

    The default value for this quantity. The value must match type and shape.

    Be careful with a default value like [] as it will be the default value for all occurrences of this quantity.

  • synonym_for – The name of a quantity defined in the same section as string. This will make this quantity a synonym for the other quantity. All other properties (type, shape, unit, etc.) are ignored. Getting or setting from/to this quantity will be delegated to the other quantity. Synonyms are always virtual.

  • derived – A Python callable that takes the containing section as input and outputs the value for this quantity. This quantity cannot be set directly, its value is only derived by the given callable. The callable is executed when this quantity is get. Derived quantities are always virtual.

  • cached – A bool indicating that derived values should be cached unless the underlying section has changed.

  • virtual – A boolean that determines if this quantity is virtual. Virtual quantities can be get/set like regular quantities, but their values are not (de-)serialized, hence never permanently stored.

  • is_scalar – Derived quantity that is True, iff this quantity has shape of length 0

code_name_from_parser()
update_group_hash()
apply_domain_metadata(entry_archive)
class nomad.datamodel.ems.EMSMetadata(m_def: Optional[nomad.metainfo.metainfo.Section] = None, m_resource: nomad.metainfo.metainfo.MResource = None, **kwargs)
m_def: Section = nomad.datamodel.ems.EMSMetadata:Section
chemical

An optimized replacement for Quantity suitable for primitive properties.

sample_constituents

An optimized replacement for Quantity suitable for primitive properties.

sample_microstructure

An optimized replacement for Quantity suitable for primitive properties.

experiment_summary

An optimized replacement for Quantity suitable for primitive properties.

origin_time

Definition of an atomic piece of data.

Quantity definitions are the main building block of meta-info schemas. Each quantity represents a single piece of data.

To define quantities, use objects of this class as classattribute values in section classes. The name of a quantity is automatically taken from its section class attribute. You can provide all other attributes to the constructor with keyword arguments

See Sections to learn about section classes. In Python terms, Quantity is a descriptor. Descriptors define how to get and set attributes in a Python object. This allows us to use sections like regular Python objects and quantity like regular Python attributes.

Beyond basic Definition attributes, Quantities are defined with the following attributes.

Parameters
  • type

    Defines the datatype of quantity values. This is the type of individual elements in a potentially complex shape. If you define a list of integers for example, the shape would be list and the type integer: Quantity(type=int, shape=['0..*']).

    The type can be one of:

    • a build-in primitive Python type: int, str, bool, float

    • an instance of MEnum, e.g. MEnum('one', 'two', 'three')

    • a section to define references to other sections as quantity values

    • a custom meta-info DataType, see Environments

    • a numpy dtype, e.g. np.dtype('float32')

    • typing.Any to support any value

    If set to dtype, this quantity will use a numpy array or scalar to store values internally. If a regular (nested) Python list or Python scalar is given, it will be automatically converted. The given dtype will be used in the numpy value.

    To define a reference, either a section class or instance of Section can be given. See Sections for details. Instances of the given section constitute valid values for this type. Upon serialization, references section instance will represented with metainfo URLs. See References and metainfo URLs.

    For quantities with more than one dimension, only numpy arrays and dtypes are allowed.

  • shape

    The shape of the quantity. It defines its dimensionality.

    A shape is a list, where each item defines one dimension. Each dimension can be:

    • an integer that defines the exact size of the dimension, e.g. [3] is the shape of a 3D spacial vector

    • a string that specifies a possible range, e.g. 0..*, 1..*, 3..6

    • the name of an int typed and shapeless quantity in the same section which values define the length of this dimension, e.g. number_of_atoms defines the length of atom_positions

    Range specifications define lower and upper bounds for the possible dimension length. The * can be used to denote an arbitrarily high upper bound.

    Quantities with dimensionality (length of the shape) higher than 1, must be numpy arrays. Theire type must be a dtype.

  • unit

    The physics unit for this quantity. It is optional.

    Units are represented with the pint_ Python package. Pint defines units and their algebra. You can either use pint units directly, e.g. units.m / units.s. The metainfo provides a preconfigured pint unit registry units. You can also provide the unit as pint parsable string, e.g. 'meter / seconds' or 'm/s'.

  • default

    The default value for this quantity. The value must match type and shape.

    Be careful with a default value like [] as it will be the default value for all occurrences of this quantity.

  • synonym_for – The name of a quantity defined in the same section as string. This will make this quantity a synonym for the other quantity. All other properties (type, shape, unit, etc.) are ignored. Getting or setting from/to this quantity will be delegated to the other quantity. Synonyms are always virtual.

  • derived – A Python callable that takes the containing section as input and outputs the value for this quantity. This quantity cannot be set directly, its value is only derived by the given callable. The callable is executed when this quantity is get. Derived quantities are always virtual.

  • cached – A bool indicating that derived values should be cached unless the underlying section has changed.

  • virtual – A boolean that determines if this quantity is virtual. Virtual quantities can be get/set like regular quantities, but their values are not (de-)serialized, hence never permanently stored.

  • is_scalar – Derived quantity that is True, iff this quantity has shape of length 0

experiment_location

An optimized replacement for Quantity suitable for primitive properties.

method

An optimized replacement for Quantity suitable for primitive properties.

data_type

An optimized replacement for Quantity suitable for primitive properties.

probing_method

An optimized replacement for Quantity suitable for primitive properties.

repository_name

An optimized replacement for Quantity suitable for primitive properties.

repository_url

An optimized replacement for Quantity suitable for primitive properties.

entry_repository_url

An optimized replacement for Quantity suitable for primitive properties.

preview_url

An optimized replacement for Quantity suitable for primitive properties.

quantities

An optimized replacement for Quantity suitable for primitive properties.

group_hash

An optimized replacement for Quantity suitable for primitive properties.

apply_domain_metadata(entry_archive)
class nomad.datamodel.OptimadeEntry(m_def: Optional[nomad.metainfo.metainfo.Section] = None, m_resource: nomad.metainfo.metainfo.MResource = None, **kwargs)
m_def: Section = nomad.datamodel.optimade.OptimadeEntry:Section
elements

Names of the different elements present in the structure.

nelements

Number of different elements in the structure as an integer.

elements_ratios

Relative proportions of different elements in the structure.

chemical_formula_descriptive

The chemical formula for a structure as a string in a form chosen by the API implementation.

chemical_formula_reduced

The reduced chemical formula for a structure as a string with element symbols and integer chemical proportion numbers. The proportion number MUST be omitted if it is 1.

chemical_formula_hill

The chemical formula for a structure in Hill form with element symbols followed by integer chemical proportion numbers. The proportion number MUST be omitted if it is 1.

chemical_formula_anonymous

The anonymous formula is the chemical_formula_reduced, but where the elements are instead first ordered by their chemical proportion number, and then, in order left to right, replaced by anonymous symbols A, B, C, …, Z, Aa, Ba, …, Za, Ab, Bb, … and so on.

dimension_types

List of three integers. For each of the three directions indicated by the three lattice vectors (see property lattice_vectors). This list indicates if the direction is periodic (value 1) or non-periodic (value 0). Note: the elements in this list each refer to the direction of the corresponding entry in lattice_vectors and not the Cartesian x, y, z directions.

lattice_vectors

The three lattice vectors in Cartesian coordinates, in ångström (Å).

cartesian_site_positions

Cartesian positions of each site. A site is an atom, a site potentially occupied by an atom, or a placeholder for a virtual mixture of atoms (e.g., in a virtual crystal approximation).

nsites

An integer specifying the length of the cartesian_site_positions property.

species_at_sites

Name of the species at each site (where values for sites are specified with the same order of the cartesian_site_positions property). The properties of the species are found in the species property.

structure_features

A list of strings that flag which special features are used by the structure.

  • disorder: This flag MUST be present if any one entry in the species list has a

chemical_symbols list that is longer than 1 element. - unknown_positions: This flag MUST be present if at least one component of the cartesian_site_positions list of lists has value null. - assemblies: This flag MUST be present if the assemblies list is present.

species

Defines what sections can appear as sub-sections of another section.

Like quantities, sub-sections are defined in a section class as attributes of this class. An like quantities, each sub-section definition becomes a property of the corresponding section definition (parent). A sub-section definition references another section definition as the sub-section (child). As a consequence, parent section instances can contain child section instances as sub-sections.

Contrary to the old NOMAD metainfo, we distinguish between sub-section the section and sub-section the property. This allows to use on child section definition as sub-section of many different parent section definitions.

Parameters
  • sub_section – A Section or Python class object for a section class. This will be the child section definition. The defining section the child section definition.

  • repeats – A boolean that determines wether this sub-section can appear multiple times in the parent section.

class nomad.datamodel.encyclopedia.WyckoffVariables(m_def: Optional[nomad.metainfo.metainfo.Section] = None, m_resource: nomad.metainfo.metainfo.MResource = None, **kwargs)
m_def: Section = nomad.datamodel.encyclopedia.WyckoffVariables:Section
x

The x variable if present.

y

The y variable if present.

z

The z variable if present.

class nomad.datamodel.encyclopedia.WyckoffSet(m_def: Optional[nomad.metainfo.metainfo.Section] = None, m_resource: nomad.metainfo.metainfo.MResource = None, **kwargs)
m_def: Section = nomad.datamodel.encyclopedia.WyckoffSet:Section
wyckoff_letter

The Wyckoff letter for this set.

indices

Indices of the atoms belonging to this group.

element

Chemical element at this Wyckoff position.

variables

Defines what sections can appear as sub-sections of another section.

Like quantities, sub-sections are defined in a section class as attributes of this class. An like quantities, each sub-section definition becomes a property of the corresponding section definition (parent). A sub-section definition references another section definition as the sub-section (child). As a consequence, parent section instances can contain child section instances as sub-sections.

Contrary to the old NOMAD metainfo, we distinguish between sub-section the section and sub-section the property. This allows to use on child section definition as sub-section of many different parent section definitions.

Parameters
  • sub_section – A Section or Python class object for a section class. This will be the child section definition. The defining section the child section definition.

  • repeats – A boolean that determines wether this sub-section can appear multiple times in the parent section.

class nomad.datamodel.encyclopedia.LatticeParameters(m_def: Optional[nomad.metainfo.metainfo.Section] = None, m_resource: nomad.metainfo.metainfo.MResource = None, **kwargs)
m_def: Section = nomad.datamodel.encyclopedia.LatticeParameters:Section
a

Length of the first basis vector.

b

Length of the second basis vector.

c

Length of the third basis vector.

alpha

Angle between second and third basis vector.

beta

Angle between first and third basis vector.

gamma

Angle between first and second basis vector.

class nomad.datamodel.encyclopedia.IdealizedStructure(m_def: Optional[nomad.metainfo.metainfo.Section] = None, m_resource: nomad.metainfo.metainfo.MResource = None, **kwargs)
m_def: Section = nomad.datamodel.encyclopedia.IdealizedStructure:Section
atom_labels

Type (element, species) of each atom.

atom_positions

Atom positions given in coordinates that are relative to the idealized cell.

lattice_vectors

Lattice vectors of the idealized structure. For bulk materials it is the Bravais cell. This cell is representative and is idealized to match the detected symmetry properties.

lattice_vectors_primitive

Lattice vectors of the the primitive unit cell in a form to be visualized within the idealized cell. This cell is representative and is idealized to match the detected symmemtry properties.

periodicity

Automatically detected true periodicity of each lattice direction. May not correspond to the periodicity used in the calculation.

number_of_atoms

Number of atoms in the idealized structure.”

cell_volume

Volume of the idealized cell. The cell volume can only be reported consistently after idealization and may not perfectly correspond to the original simulation cell.

wyckoff_sets

Defines what sections can appear as sub-sections of another section.

Like quantities, sub-sections are defined in a section class as attributes of this class. An like quantities, each sub-section definition becomes a property of the corresponding section definition (parent). A sub-section definition references another section definition as the sub-section (child). As a consequence, parent section instances can contain child section instances as sub-sections.

Contrary to the old NOMAD metainfo, we distinguish between sub-section the section and sub-section the property. This allows to use on child section definition as sub-section of many different parent section definitions.

Parameters
  • sub_section – A Section or Python class object for a section class. This will be the child section definition. The defining section the child section definition.

  • repeats – A boolean that determines wether this sub-section can appear multiple times in the parent section.

lattice_parameters

Defines what sections can appear as sub-sections of another section.

Like quantities, sub-sections are defined in a section class as attributes of this class. An like quantities, each sub-section definition becomes a property of the corresponding section definition (parent). A sub-section definition references another section definition as the sub-section (child). As a consequence, parent section instances can contain child section instances as sub-sections.

Contrary to the old NOMAD metainfo, we distinguish between sub-section the section and sub-section the property. This allows to use on child section definition as sub-section of many different parent section definitions.

Parameters
  • sub_section – A Section or Python class object for a section class. This will be the child section definition. The defining section the child section definition.

  • repeats – A boolean that determines wether this sub-section can appear multiple times in the parent section.

class nomad.datamodel.encyclopedia.Bulk(m_def: Optional[nomad.metainfo.metainfo.Section] = None, m_resource: nomad.metainfo.metainfo.MResource = None, **kwargs)
m_def: Section = nomad.datamodel.encyclopedia.Bulk:Section
bravais_lattice

The Bravais lattice type in the Pearson notation, where the first lowercase letter indicates the crystal system, and the second uppercase letter indicates the lattice type. The value can only be one of the 14 different Bravais lattices in three dimensions.

Crystal system letters:

a = Triclinic m = Monoclinic o = Orthorhombic t = Tetragonal h = Hexagonal and Trigonal c = Cubic

Lattice type letters:

P = Primitive S (A, B, C) = One side/face centred I = Body centered R = Rhombohedral centring F = All faces centred

crystal_system

The detected crystal system. One of seven possibilities in three dimensions.

has_free_wyckoff_parameters

Whether the material has any Wyckoff sites with free parameters. If a materials has free Wyckoff parameters, at least some of the atoms are not bound to a particular location in the structure but are allowed to move with possible restrictions set by the symmetry.

point_group

Point group in Hermann-Mauguin notation, part of crystal structure classification. There are 32 point groups in three dimensional space.

space_group_number

Integer representation of the space group, part of crystal structure classification, part of material definition.

space_group_international_short_symbol

International short symbol notation of the space group.

structure_prototype

The prototypical material for this crystal structure.

structure_type

Classification according to known structure type, considering the point group of the crystal and the occupations with different atom types.

strukturbericht_designation

Classification of the material according to the historically grown “strukturbericht”.

class nomad.datamodel.encyclopedia.Material(m_def: Optional[nomad.metainfo.metainfo.Section] = None, m_resource: nomad.metainfo.metainfo.MResource = None, **kwargs)
m_def: Section = nomad.datamodel.encyclopedia.Material:Section
material_type

“Broad structural classification for the material, e.g. bulk, 2D, 1D… “,

material_id

A fixed length, unique material identifier in the form of a hash digest.

material_name

Most meaningful name for a material if one could be assigned

material_classification

Contains the compound class and classification of the material according to springer materials in JSON format.

formula

Formula giving the composition and occurrences of the elements in the Hill notation. For periodic materials the formula is calculated fom the primitive unit cell.

formula_reduced

Formula giving the composition and occurrences of the elements in the Hill notation where the number of occurences have been divided by the greatest common divisor.

species_and_counts

The formula separated into individual terms containing both the atom type and count. Used for searching parts of a formula.

species

The formula separated into individual terms containing only unique atom species. Used for searching materials containing specific elements.

bulk

Defines what sections can appear as sub-sections of another section.

Like quantities, sub-sections are defined in a section class as attributes of this class. An like quantities, each sub-section definition becomes a property of the corresponding section definition (parent). A sub-section definition references another section definition as the sub-section (child). As a consequence, parent section instances can contain child section instances as sub-sections.

Contrary to the old NOMAD metainfo, we distinguish between sub-section the section and sub-section the property. This allows to use on child section definition as sub-section of many different parent section definitions.

Parameters
  • sub_section – A Section or Python class object for a section class. This will be the child section definition. The defining section the child section definition.

  • repeats – A boolean that determines wether this sub-section can appear multiple times in the parent section.

idealized_structure

Defines what sections can appear as sub-sections of another section.

Like quantities, sub-sections are defined in a section class as attributes of this class. An like quantities, each sub-section definition becomes a property of the corresponding section definition (parent). A sub-section definition references another section definition as the sub-section (child). As a consequence, parent section instances can contain child section instances as sub-sections.

Contrary to the old NOMAD metainfo, we distinguish between sub-section the section and sub-section the property. This allows to use on child section definition as sub-section of many different parent section definitions.

Parameters
  • sub_section – A Section or Python class object for a section class. This will be the child section definition. The defining section the child section definition.

  • repeats – A boolean that determines wether this sub-section can appear multiple times in the parent section.

class nomad.datamodel.encyclopedia.Method(m_def: Optional[nomad.metainfo.metainfo.Section] = None, m_resource: nomad.metainfo.metainfo.MResource = None, **kwargs)
m_def: Section = nomad.datamodel.encyclopedia.Method:Section
method_type

Generic name for the used methodology.

core_electron_treatment

How the core electrons are described.

functional_long_name

Full identified for the used exchange-correlation functional.

functional_type

Basic type of the used exchange-correlation functional.

method_id

A fixed length, unique method identifier in the form of a hash digest. The hash is created by using several method settings as seed. This hash is only defined if a set of well-defined method settings is available for the used program.

group_eos_id

A fixed length, unique identifier for equation-of-state calculations. Only calculations within the same upload and with a method hash available will be grouped under the same hash.

group_parametervariation_id

A fixed length, unique identifier for calculations where structure is identical but the used computational parameters are varied. Only calculations within the same upload and with a method hash available will be grouped under the same hash.

gw_starting_point

The exchange-correlation functional that was used as a starting point for this GW calculation.

gw_type

Basic type of GW calculation.

smearing_kind

Smearing function used for the electronic structure calculation.

smearing_parameter

Parameter for smearing, usually the width.

class nomad.datamodel.encyclopedia.Calculation(m_def: Optional[nomad.metainfo.metainfo.Section] = None, m_resource: nomad.metainfo.metainfo.MResource = None, **kwargs)
m_def: Section = nomad.datamodel.encyclopedia.Calculation:Section
calculation_type

Defines the type of calculation that was detected for this entry.

class nomad.datamodel.encyclopedia.Energies(m_def: Optional[nomad.metainfo.metainfo.Section] = None, m_resource: nomad.metainfo.metainfo.MResource = None, **kwargs)
m_def: Section = nomad.datamodel.encyclopedia.Energies:Section
energy_total

Total energy.

energy_total_T0

Total energy projected to T=0.

energy_free

Free energy.

class nomad.datamodel.encyclopedia.Properties(m_def: Optional[nomad.metainfo.metainfo.Section] = None, m_resource: nomad.metainfo.metainfo.MResource = None, **kwargs)
m_def: Section = nomad.datamodel.encyclopedia.Properties:Section
atomic_density

Atomic density of the material (atoms/volume).”

mass_density

Mass density of the material.

band_gap

Band gap value. If multiple spin channels are present, this value is taken from the channel with smallest band gap value.

band_gap_direct

Whether band gap is direct or not. If multiple spin channels are present, this value is taken from the channel with smallest band gap value.

energies

Defines what sections can appear as sub-sections of another section.

Like quantities, sub-sections are defined in a section class as attributes of this class. An like quantities, each sub-section definition becomes a property of the corresponding section definition (parent). A sub-section definition references another section definition as the sub-section (child). As a consequence, parent section instances can contain child section instances as sub-sections.

Contrary to the old NOMAD metainfo, we distinguish between sub-section the section and sub-section the property. This allows to use on child section definition as sub-section of many different parent section definitions.

Parameters
  • sub_section – A Section or Python class object for a section class. This will be the child section definition. The defining section the child section definition.

  • repeats – A boolean that determines wether this sub-section can appear multiple times in the parent section.

electronic_band_structure

Reference to an electronic band structure.

electronic_dos

Reference to an electronic density of states.

phonon_band_structure

Reference to a phonon band structure.

phonon_dos

Reference to a phonon density of states.

thermodynamical_properties

Reference to a section containing thermodynamical properties.

class nomad.datamodel.encyclopedia.EncyclopediaMetadata(m_def: Optional[nomad.metainfo.metainfo.Section] = None, m_resource: nomad.metainfo.metainfo.MResource = None, **kwargs)
m_def: Section = nomad.datamodel.encyclopedia.EncyclopediaMetadata:Section
material

Defines what sections can appear as sub-sections of another section.

Like quantities, sub-sections are defined in a section class as attributes of this class. An like quantities, each sub-section definition becomes a property of the corresponding section definition (parent). A sub-section definition references another section definition as the sub-section (child). As a consequence, parent section instances can contain child section instances as sub-sections.

Contrary to the old NOMAD metainfo, we distinguish between sub-section the section and sub-section the property. This allows to use on child section definition as sub-section of many different parent section definitions.

Parameters
  • sub_section – A Section or Python class object for a section class. This will be the child section definition. The defining section the child section definition.

  • repeats – A boolean that determines wether this sub-section can appear multiple times in the parent section.

method

Defines what sections can appear as sub-sections of another section.

Like quantities, sub-sections are defined in a section class as attributes of this class. An like quantities, each sub-section definition becomes a property of the corresponding section definition (parent). A sub-section definition references another section definition as the sub-section (child). As a consequence, parent section instances can contain child section instances as sub-sections.

Contrary to the old NOMAD metainfo, we distinguish between sub-section the section and sub-section the property. This allows to use on child section definition as sub-section of many different parent section definitions.

Parameters
  • sub_section – A Section or Python class object for a section class. This will be the child section definition. The defining section the child section definition.

  • repeats – A boolean that determines wether this sub-section can appear multiple times in the parent section.

properties

Defines what sections can appear as sub-sections of another section.

Like quantities, sub-sections are defined in a section class as attributes of this class. An like quantities, each sub-section definition becomes a property of the corresponding section definition (parent). A sub-section definition references another section definition as the sub-section (child). As a consequence, parent section instances can contain child section instances as sub-sections.

Contrary to the old NOMAD metainfo, we distinguish between sub-section the section and sub-section the property. This allows to use on child section definition as sub-section of many different parent section definitions.

Parameters
  • sub_section – A Section or Python class object for a section class. This will be the child section definition. The defining section the child section definition.

  • repeats – A boolean that determines wether this sub-section can appear multiple times in the parent section.

calculation

Defines what sections can appear as sub-sections of another section.

Like quantities, sub-sections are defined in a section class as attributes of this class. An like quantities, each sub-section definition becomes a property of the corresponding section definition (parent). A sub-section definition references another section definition as the sub-section (child). As a consequence, parent section instances can contain child section instances as sub-sections.

Contrary to the old NOMAD metainfo, we distinguish between sub-section the section and sub-section the property. This allows to use on child section definition as sub-section of many different parent section definitions.

Parameters
  • sub_section – A Section or Python class object for a section class. This will be the child section definition. The defining section the child section definition.

  • repeats – A boolean that determines wether this sub-section can appear multiple times in the parent section.

status

The final Encyclopedia processing status for this entry. The meaning of the status is as follows:

Status | Description |
——————————– | —————————————————————————– |
“success” | Processed successfully |
“unsupported_material_type” | The detected material type is currently not supported by the Encyclopedia. |
“unsupported_calculation_type” | The detected calculation type is currently not supported by the Encyclopedia. |
“invalid_metainfo” | The entry could not be processed due to missing or invalid metainfo. |
“failure” | The entry could not be processed due to an unexpected exception. |

nomad.files

Uploads contains classes and functions to create and maintain file structures for uploads.

There are two different structures for uploads in two different states: staging and public. Possible operations on uploads differ based on this state. Staging is used for processing, heavily editing, creating hashes, etc. Public is supposed to be a almost readonly (beside metadata) storage.

fs/staging/<upload>/raw/**
                   /archive/<calc>.json
fs/public/<upload>/raw-public.plain.zip
                  /raw-restricted.plain.zip
                  /archive-public.json.zip
                  /archive-restricted.json.zip

There is an implicit relationship between files, based on them being in the same directory. Each directory with at least one mainfile is a calculation directory and all the files are aux files to that mainfile. This is independent of the respective files actually contributing data or not. A calculation directory might contain multiple mainfile. E.g., user simulated multiple states of the same system, have one calculation based on the other, etc. In this case the other mainfile is an aux file to the original mainfile and vice versa.

Published files are kept in pairs of public and restricted files. Here the multiple mainfiles per directory provides a dilemma. If on mainfile is restricted, all its aux files should be restricted too. But if one of the aux files is actually a mainfile it might be published!

There are multiple ways to solve this. Due to the rarity of the case, we take the most simple solution: if one file is public, all files are made public, execpt those being other mainfiles. Therefore, the aux files of a restricted calc might become public!

nomad.files.always_restricted(path: str)

Used to put general restrictions on files, e.g. due to licensing issues. Will be called during packing and while accessing public files.

nomad.files.copytree(src, dst)

A close on shutils.copytree that does not try to copy the stats on all files. This is unecessary for our usecase and also causes permission denies for unknown reasons.

class nomad.files.PathObject(bucket: str, object_id: str, os_path: str = None, prefix: bool = False, create_prefix: bool = False)

Object storage-like abstraction for paths in general. :param bucket: The bucket to store this object in :param object_id: The object id (i.e. directory path) :param os_path: Override the “object storage” path with the given path. :param prefix: Add a x-digit prefix directory, e.g. foo/test/ -> foo/tes/test :param create_prefix: Create the prefix right away

__init__(bucket: str, object_id: str, os_path: str = None, prefix: bool = False, create_prefix: bool = False) → None

Initialize self. See help(type(self)) for accurate signature.

delete() → None
exists() → bool
property size

The os determined file size.

class nomad.files.DirectoryObject(bucket: str, object_id: str, create: bool = False, **kwargs)

Object storage-like abstraction for directories. :param bucket: The bucket to store this object in :param object_id: The object id (i.e. directory path) :param create: True if the directory structure should be created. Default is False.

__init__(bucket: str, object_id: str, create: bool = False, **kwargs) → None

Initialize self. See help(type(self)) for accurate signature.

join_dir(path, create: bool = None) → nomad.files.DirectoryObject
join_file(path) → nomad.files.PathObject
exists() → bool
exception nomad.files.ExtractError
exception nomad.files.Restricted
class nomad.files.UploadFiles(bucket: str, upload_id: str, is_authorized: Callable[[], bool] = <function UploadFiles.<lambda>>, create: bool = False)
__init__(bucket: str, upload_id: str, is_authorized: Callable[[], bool] = <function UploadFiles.<lambda>>, create: bool = False) → None

Initialize self. See help(type(self)) for accurate signature.

property user_metadata
to_staging_upload_files(create: bool = False, **kwargs) → nomad.files.StagingUploadFiles

Casts to or creates corresponding staging upload files or returns None.

static get(upload_id: str, *args, **kwargs) → nomad.files.UploadFiles
raw_file(file_path: str, *args, **kwargs) → IO

Opens a raw file and returns a file-like object. Additional args, kwargs are delegated to the respective open call. :param file_path: The path to the file relative to the upload.

Raises
  • KeyError – If the file does not exist.

  • Restricted – If the file is restricted and upload access evaluated to False.

raw_file_size(file_path: str) → int
Returns

The size of the given raw file.

raw_file_manifest(path_prefix: str = None) → Generator[str, None, None]

Returns the path for all raw files in the archive (with a given prefix). :param path_prefix: An optional prefix; only returns those files that have the prefix.

Returns

An iterable over all (matching) raw files.

raw_file_list(directory: str) → List[Tuple[str, int]]

Gives a list of directory contents and its size. :param directory: The directory to list

Returns

A list of tuples with file name and size.

read_archive(calc_id: str, access: str = None) → nomad.archive.ArchiveReader

Returns an nomad.archive.ArchiveReader that contains the given calc_id. Both restricted and public archive are searched by default. The optional access parameter can be used to limit this lookup to the public or restricted archive.

close()

Release possibly held system resources (e.g. file handles).

class nomad.files.StagingUploadFiles(upload_id: str, is_authorized: Callable[[], bool] = <function StagingUploadFiles.<lambda>>, create: bool = False)
__init__(upload_id: str, is_authorized: Callable[[], bool] = <function StagingUploadFiles.<lambda>>, create: bool = False) → None

Initialize self. See help(type(self)) for accurate signature.

to_staging_upload_files(create: bool = False, **kwargs) → nomad.files.StagingUploadFiles

Casts to or creates corresponding staging upload files or returns None.

property size

The os determined file size.

raw_file(file_path: str, *args, **kwargs) → IO

Opens a raw file and returns a file-like object. Additional args, kwargs are delegated to the respective open call. :param file_path: The path to the file relative to the upload.

Raises
  • KeyError – If the file does not exist.

  • Restricted – If the file is restricted and upload access evaluated to False.

raw_file_size(file_path: str) → int

Returns: The size of the given raw file.

raw_file_object(file_path: str) → nomad.files.PathObject
write_archive(calc_id: str, data: Any) → int

Writes the data as archive file and returns the archive file size.

read_archive(calc_id: str, access: str = None) → nomad.archive.ArchiveReader

Returns an nomad.archive.ArchiveReader that contains the given calc_id. Both restricted and public archive are searched by default. The optional access parameter can be used to limit this lookup to the public or restricted archive.

archive_file_object(calc_id: str) → nomad.files.PathObject
add_rawfiles(path: str, move: bool = False, prefix: str = None, force_archive: bool = False, target_dir: nomad.files.DirectoryObject = None) → None

Add rawfiles to the upload. The given file will be copied, moved, or extracted.

Parameters
  • path – Path to a directory, file, or zip file. Zip files will be extracted.

  • move – Whether the file should be moved instead of copied. Zips will be extracted and then deleted.

  • prefix – Optional path prefix for the added files.

  • force_archive – Expect the file to be a zip or other support archive file. Usually those files are only extracted if they can be extracted and copied instead.

  • target_dir – Overwrite the used directory to extract to. Default is the raw directory of this upload.

property is_frozen

Returns True if this upload is already bagged.

pack(entries: Iterable[nomad.datamodel.datamodel.EntryMetadata], target_dir: nomad.files.DirectoryObject = None, skip_raw: bool = False, skip_archive: bool = False) → None

Replaces the staging upload data with a public upload record by packing all data into files. It is only available if upload is_bag. This is potentially a long running operation. :param upload: The upload with all calcs and calculation metadata of the upload

used to determine what files to pack and what the embargo situation is.

Parameters
  • target_dir – optional DirectoryObject to override where to put the files. Default is the corresponding public upload files directory.

  • skip_raw – determine to not pack the raw data, only archive and user metadata

  • skip_raw – determine to not pack the archive data, only raw and user metadata

raw_file_manifest(path_prefix: str = None) → Generator[str, None, None]

Returns the path for all raw files in the archive (with a given prefix). :param path_prefix: An optional prefix; only returns those files that have the prefix.

Returns

An iterable over all (matching) raw files.

raw_file_list(directory: str) → List[Tuple[str, int]]

Gives a list of directory contents and its size. :param directory: The directory to list

Returns

A list of tuples with file name and size.

calc_files(mainfile: str, with_mainfile: bool = True, with_cutoff: bool = True) → Iterable[str]

Returns all the auxfiles and mainfile for a given mainfile. This implements nomad’s logic about what is part of a calculation and what not. The mainfile is first entry, the rest is sorted. :param mainfile: The mainfile relative to upload :param with_mainfile: Do include the mainfile, default is True

calc_id(mainfile: str) → str

Calculates a id for the given calc. :param mainfile: The mainfile path relative to the upload that identifies the calc in the folder structure.

Returns

The calc id

Raises

KeyError – If the mainfile does not exist.

calc_hash(mainfile: str) → str

Calculates a hash for the given calc based on file contents and aux file contents. :param mainfile: The mainfile path relative to the upload that identifies the calc in the folder structure.

Returns

The calculated hash

Raises

KeyError – If the mainfile does not exist.

delete() → None
class nomad.files.ArchiveBasedStagingUploadFiles(upload_id: str, upload_path: str, *args, **kwargs)

StagingUploadFiles based on a single uploaded archive file (.zip)

Parameters

upload_path – The path to the uploaded file.

__init__(upload_id: str, upload_path: str, *args, **kwargs) → None

Initialize self. See help(type(self)) for accurate signature.

property is_valid
extract() → None
add_rawfiles(path: str, move: bool = False, prefix: str = None, force_archive: bool = False, target_dir: nomad.files.DirectoryObject = None) → None

Add rawfiles to the upload. The given file will be copied, moved, or extracted.

Parameters
  • path – Path to a directory, file, or zip file. Zip files will be extracted.

  • move – Whether the file should be moved instead of copied. Zips will be extracted and then deleted.

  • prefix – Optional path prefix for the added files.

  • force_archive – Expect the file to be a zip or other support archive file. Usually those files are only extracted if they can be extracted and copied instead.

  • target_dir – Overwrite the used directory to extract to. Default is the raw directory of this upload.

class nomad.files.PublicUploadFilesBasedStagingUploadFiles(public_upload_files: nomad.files.PublicUploadFiles, *args, **kwargs)

StagingUploadFiles based on a single uploaded archive file (.zip)

Parameters

upload_path – The path to the uploaded file.

__init__(public_upload_files: nomad.files.PublicUploadFiles, *args, **kwargs) → None

Initialize self. See help(type(self)) for accurate signature.

extract(include_archive: bool = False) → None
add_rawfiles(*args, **kwargs) → None

Add rawfiles to the upload. The given file will be copied, moved, or extracted.

Parameters
  • path – Path to a directory, file, or zip file. Zip files will be extracted.

  • move – Whether the file should be moved instead of copied. Zips will be extracted and then deleted.

  • prefix – Optional path prefix for the added files.

  • force_archive – Expect the file to be a zip or other support archive file. Usually those files are only extracted if they can be extracted and copied instead.

  • target_dir – Overwrite the used directory to extract to. Default is the raw directory of this upload.

pack(entries: Iterable[nomad.datamodel.datamodel.EntryMetadata], *args, **kwargs) → None

Packs only the archive contents and stores it in the existing public upload files.

class nomad.files.PublicUploadFiles(*args, **kwargs)
__init__(*args, **kwargs) → None

Initialize self. See help(type(self)) for accurate signature.

close()

Release possibly held system resources (e.g. file handles).

to_staging_upload_files(create: bool = False, **kwargs) → nomad.files.StagingUploadFiles

Casts to or creates corresponding staging upload files or returns None.

raw_file(file_path: str, *args, **kwargs) → IO

Opens a raw file and returns a file-like object. Additional args, kwargs are delegated to the respective open call. :param file_path: The path to the file relative to the upload.

Raises
  • KeyError – If the file does not exist.

  • Restricted – If the file is restricted and upload access evaluated to False.

raw_file_size(file_path: str) → int

Returns: The size of the given raw file.

raw_file_manifest(path_prefix: str = None) → Generator[str, None, None]

Returns the path for all raw files in the archive (with a given prefix). :param path_prefix: An optional prefix; only returns those files that have the prefix.

Returns

An iterable over all (matching) raw files.

raw_file_list(directory: str) → List[Tuple[str, int]]

Gives a list of directory contents and its size. :param directory: The directory to list

Returns

A list of tuples with file name and size.

read_archive(calc_id: str, access: str = None) → Any

Returns an nomad.archive.ArchiveReader that contains the given calc_id. Both restricted and public archive are searched by default. The optional access parameter can be used to limit this lookup to the public or restricted archive.

re_pack(entries: Iterable[nomad.datamodel.datamodel.EntryMetadata], skip_raw: bool = False, skip_archive: bool = False) → None

Replaces the existing public/restricted data file pairs with new ones, based on current restricted information in the metadata. Should be used after updating the restrictions on calculations. This is potentially a long running operation.

nomad.archive

nomad.archive.packb(o, **kwargs)
nomad.archive.unpackb(o, **kwargs)
nomad.archive.adjust_uuid_size(uuid)
exception nomad.archive.ArchiveError

An error that indicates a broken archive.

exception nomad.archive.ArchiveQueryError

An error that indicates that an archive query is either not valid or does not fit to the queried archive.

class nomad.archive.TOCPacker(toc_depth: int, *args, **kwargs)

A special msgpack packer that records a TOC while packing.

Uses a combination of the pure python msgpack fallback packer and the “real” c-based packing.

__init__(toc_depth: int, *args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

pack(obj, *args, **kwargs)
class nomad.archive.ArchiveWriter(file_or_path: Union[str, _io.BytesIO], n_entries: int, entry_toc_depth: int)
__init__(file_or_path: Union[str, _io.BytesIO], n_entries: int, entry_toc_depth: int)

Initialize self. See help(type(self)) for accurate signature.

write(b: bytes) → Tuple[int, int]
add(uuid: str, data: Any) → None
class nomad.archive.ArchiveItem(toc_entry: list, f: _io.BytesIO, offset: int = 0)
__init__(toc_entry: list, f: _io.BytesIO, offset: int = 0)

Initialize self. See help(type(self)) for accurate signature.

class nomad.archive.ArchiveList(toc_entry: list, f: _io.BytesIO, offset: int = 0)
class nomad.archive.ArchiveObject(*args, **kwargs)
__init__(*args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

to_dict()
class nomad.archive.ArchiveReader(file_or_path: Union[str, _io.BytesIO], use_blocked_toc=True)
__init__(file_or_path: Union[str, _io.BytesIO], use_blocked_toc=True)

Initialize self. See help(type(self)) for accurate signature.

fs_block_size = 4096
toc_block_size_entries = 76
toc_block_size_bytes = 4104
close()
is_closed()
nomad.archive.write_archive(path_or_file: Union[str, _io.BytesIO], n_entries: int, data: Iterable[Tuple[str, Any]], entry_toc_depth: int = 2) → None

Writes a msgpack-based archive file. The file contents will be a valid msgpack-object. The data will contain extra table-of-contents (TOC) objects that map some keys to positions in the file. Data can be partially read from these positions and deserialized with msgpack.

The data in the archive file will have the following layout:

{
    'toc_pos': b[start, end],
    'toc': {
        entry_uuid: [b[start, end], b[start, end]], ...
    },
    'data': {
        entry_uuid: {
            'toc': {
                key: {
                    'pos': [start, end],
                    'toc': ...
                },
                key: [
                    {
                        'pos': [start, end]
                        'toc': ...
                    }, ...
                ],
                ...
            },
            'data': ...
        }, ...
    }
}

The top-level TOC will map entry_uuids to positions. The key ‘toc_pos’ holds the position of the entry TOC, the second (‘toc’) the position of each entry. These positions will be absolute positions in the file. The top-level TOC will be ordered by entry_uuid. The top-level TOC positions are 2*5byte encoded integers. This will give the top-level TOC a predictable layout and will allow to partially read this TOC.

The TOC of each entry will have the same structure than the data up to a certain TOC depth. A TOC object will hold the position of the object it refers to (key ‘pos’) and further deeper TOC data (key ‘toc’). Only data objects (dict instances) will have TOC objects and only object count towards the TOC depth. Positions in the entry TOCs are regular msgpack encoded integers.

Parameters
  • file_or_path – A file path or file-like to the archive file that should be written.

  • n_entries – The number of entries that will be added to the file.

  • data – The file contents as an iterator of entry id, data tuples.

  • entry_toc_depth – The depth of the table of contents in each entry. Only objects will count for calculating the depth.

nomad.archive.read_archive(file_or_path: str, **kwargs) → nomad.archive.ArchiveReader

Allows to read a msgpack-based archive.

Parameters

file_or_path – A file path or file-like to the archive file that should be read. The respective file has to be closed by the user. The returned obj supports the ‘with’ statement and has a ‘close’ method.

Returns

A mapping (dict-like) that can be used to access the archive data. The mapping will lazily load data as it is used. The mapping needs to be closed or used within a ‘with’ statement to free the underlying file resource after use.

nomad.archive.query_archive(f_or_archive_reader: Union[str, nomad.archive.ArchiveReader, _io.BytesIO], query_dict: dict, **kwargs) → Dict

Takes an open msg-pack based archive (either as str, reader, or BytesIO) and returns the archive as JSON serializable dictionary filtered based on the given required specification.

nomad.archive.filter_archive(required: Dict[str, Any], archive_item: Union[Dict, nomad.archive.ArchiveObject], transform: Callable) → Dict
nomad.archive.create_partial_archive(archive: nomad.datamodel.datamodel.EntryArchive) → Dict

Creates a partial archive JSON serializable dict that can be stored directly. The given archive is filtered based on the metainfo category fast_access. Selected sections and other data that they reference (recursively) comprise the resulting partial archive.

TODO at the moment is hard coded and NOT informed by the metainfo. We simply add sections EntryMetadata and Workflow.

Parameters

archive – The archive as an EntryArchive instance.

Returns: the partial archive in JSON serializable dictionary form.

nomad.archive.write_partial_archive_to_mongo(archive: nomad.datamodel.datamodel.EntryArchive)

Partially writes the given archive to mongodb.

nomad.archive.read_partial_archive_from_mongo(entry_id: str, as_dict=False) → Union[nomad.datamodel.datamodel.EntryArchive, Dict]

Reads the partial archive for the given id from mongodb.

Parameters
  • entry_id – The entry id for the entry.

  • as_dict – Return the JSON serializable dictionary form of the archive not the EntryArchive form.

nomad.archive.read_partial_archives_from_mongo(entry_ids: List[str], as_dict=False) → Dict[str, Union[nomad.datamodel.datamodel.EntryArchive, Dict]]

Reads the partial archives for a set of entries of the same upload.

Parameters
  • entry_ids – A list of entry ids.

  • as_dict – Return the JSON serializable dictionary form of the archive not the EntryArchive form.

Returns

A dictionary with entry_ids as keys.

nomad.archive.compute_required_with_referenced(required)

Updates the given required dictionary to ensure that references to non required sections within a partial fast access archive are included. Only references that are directly contained in required are added. References from wildcarded sub sections are ignored.

Returns: A new required dict or None. None is returned if it is unclear if the required is only accessing information of fast access partial archives.

nomad.doi

This module contains all functions necessary to manage DOI via datacite.org and its MDS API (https://support.datacite.org/docs/mds-api-guide).

nomad.doi.edit_url(doi: str, url: str = None)

Changes the URL of an already findable DOI.

class nomad.doi.DOI(*args, **values)
doi

A unicode string field.

url

A unicode string field.

metadata_url

A unicode string field.

doi_url

A unicode string field.

state

A unicode string field.

create_time

Datetime field.

Uses the python-dateutil library if available alternatively use time.strptime to parse the dates. Note: python-dateutil’s parser is fully featured and when installed you can utilise it to convert varying types of date formats into valid python datetime objects.

Note: To default the field to the current datetime, use: DateTimeField(default=datetime.utcnow)

Note: Microseconds are rounded to the nearest millisecond.

Pre UTC microsecond support is effectively broken. Use ComplexDateTimeField if you need accurate microsecond support.

matadata_xml

A binary data field.

static create(title: str, user: nomad.datamodel.datamodel.User) → nomad.doi.DOI

Creates a unique DOI with the NOMAD DOI prefix.

create_draft()
delete(*args, **kwargs)

Delete the Document from the database. This will only take effect if the document has been previously saved.

Parameters
  • signal_kwargs – (optional) kwargs dictionary to be passed to the signal calls.

  • write_concern – Extra keyword arguments are passed down which will be used as options for the resultant getLastError command. For example, save(..., w: 2, fsync: True) will wait until at least two servers have recorded the write and will force an fsync on the primary server.

Changed in version 0.10.7: Add signal_kwargs argument

make_findable()
exception DoesNotExist
exception MultipleObjectsReturned
id

A unicode string field.

objects

The default QuerySet Manager.

Custom QuerySet Manager functions can extend this class and users can add extra queryset functionality. Any custom manager methods must accept a Document class as its first argument, and a QuerySet as its second argument.

The method function should return a QuerySet , probably the same one that was passed in, but modified in some way.

nomad.parsing

The parsing module is an interface for the existing NOMAD-coe parsers. This module redefines some of the old NOMAD-coe python-common functionality to create a more coherent interface to the parsers.

Assumption about parsers

For now, we make a few assumption about parsers - they always work on the same meta-info version - they have no conflicting python requirements - they can be loaded at the same time and can be used within the same python process - they are uniquely identified by a GIT URL and publicly accessible - their version is uniquely identified by a GIT commit SHA

Each parser is defined via an instance of Parser. The implementation LegacyParser is used for most NOMAD-coe parsers.

class nomad.parsing.Parser

Instances specify a parser. It allows to find main files from given uploaded and extracted files. Further, allows to run the parser on those ‘main files’.

name = 'parsers/parser'
__init__()

Initialize self. See help(type(self)) for accurate signature.

property metainfo_env
abstract is_mainfile(filename: str, mime: str, buffer: bytes, decoded_buffer: str, compression: str = None) → bool

Checks if a file is a mainfile for the parsers.

Parameters
  • filename – The filesystem path to the mainfile

  • mime – The mimetype of the mainfile guessed with libmagic

  • buffer – The first 2k of the mainfile contents

  • compression – The compression of the mainfile [None, 'gz', 'bz2']

abstract parse(mainfile: str, archive: nomad.datamodel.datamodel.EntryArchive, logger=None) → None

Runs the parser on the given mainfile and populates the result in the given archive root_section. It allows to be run repeatedly for different mainfiles.

Parameters
  • mainfile – A path to a mainfile that this parser can parse.

  • archive – An instance of the section EntryArchive. It might contain a section_metadata with information about the entry.

  • logger – A optional logger

The are sub-classes for parsers with special purposes.

class nomad.parsing.Parser

Instances specify a parser. It allows to find main files from given uploaded and extracted files. Further, allows to run the parser on those ‘main files’.

__init__()

Initialize self. See help(type(self)) for accurate signature.

class nomad.parsing.MatchingParser(name: str, code_name: str, code_homepage: str = None, mainfile_contents_re: str = None, mainfile_binary_header: bytes = None, mainfile_mime_re: str = 'text/.*', mainfile_name_re: str = '.*', domain='dft', supported_compressions: List[str] = [])

A parser implementation that used regular experessions to match mainfiles.

Parameters
  • code_name – The name of the code or input format

  • code_homepage – The homepage of the code or input format

  • mainfile_mime_re – A regexp that is used to match against a files mime type

  • mainfile_contents_re – A regexp that is used to match the first 1024 bytes of a potential mainfile.

  • mainfile_name_re – A regexp that is used to match the paths of potential mainfiles

  • domain – The domain that this parser should be used for. Default is ‘dft’.

  • supported_compressions – A list of [gz, bz2], if the parser supports compressed files

__init__(name: str, code_name: str, code_homepage: str = None, mainfile_contents_re: str = None, mainfile_binary_header: bytes = None, mainfile_mime_re: str = 'text/.*', mainfile_name_re: str = '.*', domain='dft', supported_compressions: List[str] = []) → None

Initialize self. See help(type(self)) for accurate signature.

class nomad.parsing.MissingParser(*args, **kwargs)

A parser implementation that just fails and is used to match mainfiles with known patterns of corruption.

__init__(*args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

class nomad.parsing.BrokenParser(*args, **kwargs)

A parser implementation that just fails and is used to match mainfiles with known patterns of corruption.

__init__(*args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

class nomad.parsing.TemplateParser(*args, **kwargs)

A parser that generates data based on a template given via the mainfile. The template is basically some archive json. Only

__init__(*args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

class nomad.parsing.GenerateRandomParser
__init__()

Initialize self. See help(type(self)) for accurate signature.

class nomad.parsing.ChaosParser

Parser that emulates typical error situations. Files can contain a json string (or object with key chaos) with one of the following string values: - exit - deadlock - consume_ram - exception - segfault - random

__init__()

Initialize self. See help(type(self)) for accurate signature.

class nomad.parsing.EmptyParser(name: str, code_name: str, code_homepage: str = None, mainfile_contents_re: str = None, mainfile_binary_header: bytes = None, mainfile_mime_re: str = 'text/.*', mainfile_name_re: str = '.*', domain='dft', supported_compressions: List[str] = [])

Implementation that produces an empty code_run

__init__(name: str, code_name: str, code_homepage: str = None, mainfile_contents_re: str = None, mainfile_binary_header: bytes = None, mainfile_mime_re: str = 'text/.*', mainfile_name_re: str = '.*', domain='dft', supported_compressions: List[str] = []) → None

Initialize self. See help(type(self)) for accurate signature.

The implementation LegacyParser is used for most NOMAD-coe parsers.

class nomad.parsing.LegacyParser(parser_class_name: str, *args, backend_factory=None, **kwargs)

A parser implementation for legacy NOMAD-coe parsers. It assumes that parsers are installed to the python environment.

Parameters
  • parser_class_name – the main parser class that implements NOMAD-coe’s

  • backend_factory – a callable that returns a backend, takes meta_info and logger as argument

__init__(parser_class_name: str, *args, backend_factory=None, **kwargs) → None

Initialize self. See help(type(self)) for accurate signature.

The parser definitions are available via the following two variables.

nomad.parsing.parsers.parsers = [<nomad.parsing.artificial.GenerateRandomParser object>, <nomad.parsing.artificial.TemplateParser object>, <nomad.parsing.artificial.ChaosParser object>, parsers/phonopy, parsers/vasp, parsers/vaspoutcar, parsers/exciting, parsers/fhi-aims, parsers/cp2k, parsers/crystal, parsers/cpmd, parsers/nwchem, parsers/bigdft, parsers/wien2k, parsers/band, parsers/gaussian, parsers/quantumespresso, parsers/abinit, parsers/orca, parsers/castep, parsers/dl-poly, parsers/lib-atoms, parsers/octopus, parsers/gpaw2, parsers/gpaw, parsers/atk, parsers/gulp, parsers/siesta, parsers/elk, parsers/elastic, parsers/gamess, parsers/turbomole, parsers/mpes, parsers/aptfim, parsers/eels, parsers/qbox, parsers/dmol, parsers/fleur, parsers/molcas, parsers/onetep, parsers/openkim, parsers/tinker, parsers/lammps, parsers/amber, parsers/gromacs, parsers/gromos, parsers/namd, parsers/charmm, parsers/dftbplus, parsers/asap, parsers/fplo, parsers/mopac, parsers/archive, <nomad.parsing.parser.BrokenParser object>]

Built-in mutable sequence.

If no argument is given, the constructor creates a new empty list. The argument must be an iterable if specified.

nomad.parsing.parsers.parser_dict = {'missing/crystal': missing/crystal, 'missing/fhi-aims': missing/fhi-aims, 'missing/octopus': missing/octopus, 'missing/wien2k': missing/wien2k, 'parser/broken': <nomad.parsing.parser.BrokenParser object>, 'parser/fleur': parsers/fleur, 'parser/molcas': parsers/molcas, 'parser/octopus': parsers/octopus, 'parser/onetep': parsers/onetep, 'parsers/abinit': parsers/abinit, 'parsers/amber': parsers/amber, 'parsers/aptfim': parsers/aptfim, 'parsers/archive': parsers/archive, 'parsers/asap': parsers/asap, 'parsers/atk': parsers/atk, 'parsers/band': parsers/band, 'parsers/bigdft': parsers/bigdft, 'parsers/broken': <nomad.parsing.parser.BrokenParser object>, 'parsers/castep': parsers/castep, 'parsers/chaos': <nomad.parsing.artificial.ChaosParser object>, 'parsers/charmm': parsers/charmm, 'parsers/cp2k': parsers/cp2k, 'parsers/cpmd': parsers/cpmd, 'parsers/crystal': parsers/crystal, 'parsers/dftbplus': parsers/dftbplus, 'parsers/dl-poly': parsers/dl-poly, 'parsers/dmol': parsers/dmol, 'parsers/eels': parsers/eels, 'parsers/elastic': parsers/elastic, 'parsers/elk': parsers/elk, 'parsers/exciting': parsers/exciting, 'parsers/fhi-aims': parsers/fhi-aims, 'parsers/fleur': parsers/fleur, 'parsers/fplo': parsers/fplo, 'parsers/gamess': parsers/gamess, 'parsers/gaussian': parsers/gaussian, 'parsers/gpaw': parsers/gpaw, 'parsers/gpaw2': parsers/gpaw2, 'parsers/gromacs': parsers/gromacs, 'parsers/gromos': parsers/gromos, 'parsers/gulp': parsers/gulp, 'parsers/lammps': parsers/lammps, 'parsers/lib-atoms': parsers/lib-atoms, 'parsers/molcas': parsers/molcas, 'parsers/mopac': parsers/mopac, 'parsers/mpes': parsers/mpes, 'parsers/namd': parsers/namd, 'parsers/nwchem': parsers/nwchem, 'parsers/octopus': parsers/octopus, 'parsers/onetep': parsers/onetep, 'parsers/openkim': parsers/openkim, 'parsers/orca': parsers/orca, 'parsers/phonopy': parsers/phonopy, 'parsers/qbox': parsers/qbox, 'parsers/quantumespresso': parsers/quantumespresso, 'parsers/random': <nomad.parsing.artificial.GenerateRandomParser object>, 'parsers/siesta': parsers/siesta, 'parsers/template': <nomad.parsing.artificial.TemplateParser object>, 'parsers/tinker': parsers/tinker, 'parsers/turbomole': parsers/turbomole, 'parsers/vasp': parsers/vasp, 'parsers/vaspoutcar': parsers/vaspoutcar, 'parsers/wien2k': parsers/wien2k}

A dict to access parsers by name. Usually ‘parsers/<…>’, e.g. ‘parsers/vasp’.

Parsers are reused for multiple calculations.

Parsers and calculation files are matched via regular expressions.

nomad.parsing.parsers.match_parser(mainfile_path: str, strict=True) → nomad.parsing.parser.Parser

Performs parser matching. This means it take the given mainfile and potentially opens it with the given callback and tries to identify a parser that can parse the file.

This is determined by filename (e.g. .out), mime type (e.g. text/, application/xml), and beginning file contents.

Parameters
  • mainfile_path – Path to the mainfile

  • strict – Only match strict parsers, e.g. no artificial parsers for missing or empty entries.

Returns: The parser, or None if no parser could be matched.

Parsers in NOMAD-coe use a backend to create output. There are different NOMAD-coe basends. In nomad@FAIRDI, we only currently only use a single backed. The following classes provide a interface definition for backends as an ABC and a concrete implementation based on nomad@fairdi’s metainfo:

class nomad.parsing.Backend(metainfo: Union[str, nomad.metainfo.legacy.LegacyMetainfoEnvironment], domain: str = None, entry_archive: nomad.datamodel.datamodel.EntryArchive = None, logger=None)

A backend that uses the new metainfo to store all data.

Parameters
  • metainfo – The main legacy metainfo package name or a legacy metainfo environment instance.

  • logger – An optional logger.

  • domain – An optional domain name.

domain

The domain that this backend contains data for.

env

The metainfo environment (all available definitions).

resource

The metainfo resource that contains all data.

entry_archive

The root section of the archive behind this backend.

logger

A logger that can be used to log metainfo and backend operation related warnings and errors.

static filter_legacy_defs(definition)
__init__(metainfo: Union[str, nomad.metainfo.legacy.LegacyMetainfoEnvironment], domain: str = None, entry_archive: nomad.datamodel.datamodel.EntryArchive = None, logger=None)

Initialize self. See help(type(self)) for accurate signature.

metaInfoEnv()

Returns the meta info used by this backend.

resolve_definition(name, section_cls: Type[MSectionBound]) → MSectionBound
openSection(name, parent_index: int = -1, return_section=False)

Opens a new section and returns its new unique gIndex.

It will assume that there is a sub-section def with the given name. It will use the latest opened section of the sub-sections parent as the parent for the new section. An Exception will be known root sections, e.g. ‘section_run’.

Parameters
  • name – The section name

  • parent_index – Index of the parent section in which the section will be opened in.

  • return_section – If True, returns the section instead of the section index.

get_open_section_for_quantity(name, g_index)

Returns the open section that contains the quantity of the given name.

closeSection(name, g_index)

Closes the section with the given meta name and index. After this, no more value can be added to this section.

openNonOverlappingSection(metaName)

Opens a new non overlapping section.

setSectionInfo(metaName, gIndex, references)

Sets info values of an open section references should be a dictionary with the gIndexes of the root sections this section refers to.

closeNonOverlappingSection(name)

Closes the current non overlapping section for the given meta name. After this, no more value can be added to this section.

openSections()

Returns the sections that are still open as metaName, gIndex tuples.

addValue(name, value, g_index=-1)

Adds a json value for the given metaName. The gIndex is used to identify the right parent section.

addRealValue(name, value, g_index=-1)

Adds a float value for the given metaName. The gIndex is used to identify the right parent section.

addArray(name, shape, g_index=-1)

Adds an uninitialized array of the given shape for the given metaName. The gIndex is used to identify the right parent section. This is neccessary before array values can be set with setArrayValues().

setArrayValues(metaName, values, offset=None, gIndex=-1)

Adds values of the given numpy array to the last array added for the given metaName and parent gIndex.

addArrayValues(name, values, gIndex=-1, override: bool = False)

Adds an array with the given numpy array values for the given metaName and parent section gIndex. Override determines whether to rewrite exisiting values in the backend.

get_sections(meta_name: str, g_index: int = -1) → List[int]

Return all gIndices for existing sections of the given meta_name and parent index.

get_value(meta_name: str, g_index=-1) → Any

Return the value set to the given meta_name in its parent section of the given index. An index of -1 (default) is only allowed if there is exactly one parent section.

startedParsingSession(mainFileUri, parserInfo, parserStatus=None, parserErrors=None)

Should be called when the parsing starts. ParserInfo should be a valid json dictionary.

finishedParsingSession(parserStatus, parserErrors, mainFileUri=None, parserInfo=None, parsingStats=None)

Called when the parsing finishes.

addMatchTelemetry(match_telemetry, gIndex=-1)
pwarn(msg)

Used to catch parser warnings.

property status

Returns status and potential errors.

reset_status() → None

nomad.normalizing

After parsing calculations have to be normalized with a set of normalizers. In NOMAD-coe those were programmed in python (we’ll reuse) and scala (we’ll rewrite).

Currently the normalizers are: - system.py (contains aspects of format stats, system, system type, and symmetry normalizer) - optimade.py - fhiaims.py - dos.py - encyclopedia.py (used to create the data in NOMAD-coe Encyclopedia)

The normalizers are available via

nomad.normalizing.normalizers: Iterable[Type[nomad.normalizing.normalizer.Normalizer]] = [<class 'nomad.normalizing.system.SystemNormalizer'>, <class 'nomad.normalizing.optimade.OptimadeNormalizer'>, <class 'nomad.normalizing.dos.DosNormalizer'>, <class 'nomad.normalizing.band_structure.BandStructureNormalizer'>, <class 'nomad.normalizing.encyclopedia.encyclopedia.EncyclopediaNormalizer'>, <class 'nomad.normalizing.workflow.WorkflowNormalizer'>]

Built-in mutable sequence.

If no argument is given, the constructor creates a new empty list. The argument must be an iterable if specified.

There is one ABC for all normalizer:

nomad.processing

Processing comprises everything that is necessary to take an uploaded user file, processes it, and store all necessary data for repository, archive, and potential future services (e.g. encyclopedia).

Processing is build on top of celery (http://www.celeryproject.org/) and mongodb (http://www.mongodb.org). Celery provides a task-based programming model for distributed computing. It uses a broker, e.g. a distributed task queue like RabbitMQ to distribute tasks. We use mongodb to store the current state of processing in Upload and Calculation documents. This combination allows us to easily distribute processing work while having the processing state, i.e. (intermediate) results, always available.

This module is structured into our celery app and abstract process base class Proc (base.py), and the concrete processing classes Upload and Calc (data.py).

This module does not contain the functions to do the actual work. Those are encapsulated in nomad.files, nomad.repo, nomad.users, nomad.parsing, and nomad.normalizing.

Refer to http://www.celeryproject.org/ to learn about celery apps and workers. The nomad celery app uses a RabbitMQ broker. We use celery to distribute processing load in a cluster.

We use an abstract processing base class and document (Proc) that provides all necessary functions to execute a process as a series of potentially distributed steps. In addition the processing state is persisted in mongodb using mongoengine. Instead of exchanging serialized state between celery tasks, we use the mongodb documents to exchange data. Therefore, the mongodb always contains the latest processing state. We also don’t have to deal with celery result backends and synchronizing with them.

class nomad.processing.base.Proc(*args, **values)

Base class for objects that are involved in processing and need persistent processing state.

It solves two issues. First, distributed operation (via celery) and second keeping state of a chain of potentially failing processing tasks. Both are controlled via decorators @process and @task. Subclasses should use these decorators on their methods. Parameters are not supported for decorated functions. Use fields on the document instead.

Processing state will be persistet at appropriate times and must not be persistet manually. All attributes are stored to mongodb.

Possible processing states are PENDING, RUNNING, FAILURE, and SUCCESS.

current_task

the currently running or last completed task

tasks_status

the overall status of the processing

errors

a list of errors that happened during processing. Error fail a processing run

warnings

a list of warnings that happened during processing. Warnings do not fail a processing run

create_time

the time of creation (not the start of processing)

complete_time

the time that processing completed (successfully or not)

current_process

the currently or last run asyncronous process

process_status

the status of the currently or last run asyncronous process

__init__(*args, **values)

Initialise a document or an embedded document.

Parameters
  • values – A dictionary of keys and values for the document. It may contain additional reserved keywords, e.g. “__auto_convert”.

  • __auto_convert – If True, supplied values will be converted to Python-type values via each field’s to_python method.

  • __only_fields – A set of fields that have been loaded for this document. Empty if all fields have been loaded.

  • _created – Indicates whether this is a brand new document or whether it’s already been persisted before. Defaults to true.

There are two concrete processes Upload and :class: Calc. Instances of both classes do represent the processing state, as well as the respective entity.

class nomad.processing.data.Upload(**kwargs)

Represents uploads in the databases. Provides persistence access to the files storage, and processing state.

name

optional user provided upload name

upload_path

the path were the uploaded files was stored

temporary

True if the uploaded file should be removed after extraction

upload_id

the upload id generated by the database

upload_time

the timestamp when the system realized the upload

user_id

the id of the user that created this upload

published

Boolean that indicates the publish status

publish_time

Date when the upload was initially published

last_update

Date of the last publishing/re-processing

joined

Boolean indicates if the running processing has joined (check_join())

id_field = 'upload_id'
upload_id

A unicode string field.

upload_path

A unicode string field.

temporary

Boolean field type.

New in version 0.1.2.

embargo_length

32-bit integer field.

name

A unicode string field.

upload_time

Datetime field.

Uses the python-dateutil library if available alternatively use time.strptime to parse the dates. Note: python-dateutil’s parser is fully featured and when installed you can utilise it to convert varying types of date formats into valid python datetime objects.

Note: To default the field to the current datetime, use: DateTimeField(default=datetime.utcnow)

Note: Microseconds are rounded to the nearest millisecond.

Pre UTC microsecond support is effectively broken. Use ComplexDateTimeField if you need accurate microsecond support.

user_id

A unicode string field.

published

Boolean field type.

New in version 0.1.2.

publish_time

Datetime field.

Uses the python-dateutil library if available alternatively use time.strptime to parse the dates. Note: python-dateutil’s parser is fully featured and when installed you can utilise it to convert varying types of date formats into valid python datetime objects.

Note: To default the field to the current datetime, use: DateTimeField(default=datetime.utcnow)

Note: Microseconds are rounded to the nearest millisecond.

Pre UTC microsecond support is effectively broken. Use ComplexDateTimeField if you need accurate microsecond support.

last_update

Datetime field.

Uses the python-dateutil library if available alternatively use time.strptime to parse the dates. Note: python-dateutil’s parser is fully featured and when installed you can utilise it to convert varying types of date formats into valid python datetime objects.

Note: To default the field to the current datetime, use: DateTimeField(default=datetime.utcnow)

Note: Microseconds are rounded to the nearest millisecond.

Pre UTC microsecond support is effectively broken. Use ComplexDateTimeField if you need accurate microsecond support.

joined

Boolean field type.

New in version 0.1.2.

meta: Any = None
__init__(**kwargs)

Initialise a document or an embedded document.

Parameters
  • values – A dictionary of keys and values for the document. It may contain additional reserved keywords, e.g. “__auto_convert”.

  • __auto_convert – If True, supplied values will be converted to Python-type values via each field’s to_python method.

  • __only_fields – A set of fields that have been loaded for this document. Empty if all fields have been loaded.

  • _created – Indicates whether this is a brand new document or whether it’s already been persisted before. Defaults to true.

property metadata

Getter, setter for user metadata. Metadata is pickled to and from the public bucket to allow sharing among all processes. Usually uploads do not have (much) user defined metadata, but users provide all metadata per upload as part of the publish process. This will change, when we introduce editing functionality and metadata will be provided through different means.

classmethod get(id: str, include_published: bool = True) → nomad.processing.data.Upload
classmethod user_uploads(user: nomad.datamodel.datamodel.User, **kwargs) → List[nomad.processing.data.Upload]

Returns all uploads for the given user. Kwargs are passed to mongo query.

property uploader
get_logger(**kwargs)
classmethod create(**kwargs) → nomad.processing.data.Upload

Creates a new upload for the given user, a user given name is optional. It will populate the record with a signed url and pending UploadProc. The upload will be already saved to the database.

Parameters

user – The user that created the upload.

delete()

Deletes this upload process state entry and its calcs.

delete_upload_local()

Deletes the upload, including its processing state and staging files. Local version without celery processing.

delete_upload()

Deletes of the upload, including its processing state and staging files. This starts the celery process of deleting the upload.

publish_upload()

Moves the upload out of staging to the public area. It will pack the staging upload files in to public upload files.

re_process_upload()

A process that performs the re-processing of a earlier processed upload.

Runs the distributed process of fully reparsing/re-normalizing an existing and already published upload. Will renew the archive part of the upload and update mongo and elastic search entries.

TODO this implementation does not do any re-matching. This will be more complex due to handling of new or missing matches.

re_pack()

A process that repacks the raw and archive data based on the current embargo data.

process_upload()

A process that performs the initial upload processing.

uploading()

A no-op task as a stand-in for receiving upload data.

property upload_files
property staging_upload_files
extracting()

The task performed before the actual parsing/normalizing: extracting the uploaded files.

match_mainfiles() → Iterator[Tuple[str, object]]

Generator function that matches all files in the upload to all parsers to determine the upload’s mainfiles.

Returns

Tuples of mainfile, filename, and parsers

parse_all()

The task used to identify mainfile/parser combinations among the upload’s files, creates respective Calc instances, and triggers their processing.

on_process_complete(process_name)

Callback that is called when the corrent process completed

check_join()

Performs an evaluation of the join condition and triggers the cleanup() task if necessary. The join condition allows to run the cleanup after all calculations have been processed. The upload processing stops after all calculation processing have been triggered (parse_all() or re_process_upload()). The cleanup task is then run within the last calculation process (the one that triggered the join by calling this method).

reset()

Resets the task chain. Assumes there no current running process.

classmethod reset_pymongo_update(worker_hostname: str = None)

Returns a pymongo update dict part to reset calculations.

cleanup()

The task that “cleans” the processing, i.e. removed obsolete files and performs pending archival operations. Depends on the type of processing.

get_calc(calc_id) → nomad.processing.data.Calc

Returns the upload calc with the given id or None.

property processed_calcs

The number of successfully or not successfully processed calculations. I.e. calculations that have finished processing.

property total_calcs

The number of all calculations.

property failed_calcs

The number of calculations with failed processing.

property pending_calcs

The number of calculations with pending processing.

all_calcs(start, end, order_by=None)

Returns all calculations, paginated and ordered.

Parameters
  • start – the start index of the requested page

  • end – the end index of the requested page

  • order_by – the property to order by

property outdated_calcs

All successfully processed and outdated calculations.

property calcs

All successfully processed calculations.

entries_metadata(user_metadata: dict = None) → Iterator[Iterable[nomad.datamodel.datamodel.EntryMetadata]]

This is the nomad.datamodel transformation method to transform processing upload’s entries into list of nomad.datamodel.EntryMetadata objects.

Parameters

user_metadata – A dict of user metadata that is applied to the resulting datamodel data and the respective calculations.

entry_ids() → Iterable[str]
user_metadata() → Iterable[nomad.datamodel.datamodel.EntryMetadata]
exception DoesNotExist
exception MultipleObjectsReturned
compress_and_set_metadata(metadata: Dict[str, Any]) → None

Stores the given user metadata in the upload document. This is the metadata adhering to the API model (UploadMetaData). Most quantities can be stored for the upload and for each calculation. This method will try to move same values from the calculation to the upload to “compress” the data.

id

A unicode string field.

objects

The default QuerySet Manager.

Custom QuerySet Manager functions can extend this class and users can add extra queryset functionality. Any custom manager methods must accept a Document class as its first argument, and a QuerySet as its second argument.

The method function should return a QuerySet , probably the same one that was passed in, but modified in some way.

tasks = ['uploading', 'extracting', 'parse_all', 'cleanup']
class nomad.processing.data.Calc(*args, **kwargs)

Instances of this class represent calculations. This class manages the elastic search index entry, files, and archive for the respective calculation.

It also contains the calculations processing and its state.

The attribute list, does not include the various metadata properties generated while parsing, including code_name, code_version, etc.

calc_id

the calc_id of this calc

parser

the name of the parser used to process this calc

upload_id

the id of the upload used to create this calculation

mainfile

the mainfile (including path in upload) that was used to create this calc

metadata

the metadata record wit calc and user metadata, see datamodel.EntryMetadata

calc_id

A unicode string field.

upload_id

A unicode string field.

mainfile

A unicode string field.

parser

A unicode string field.

metadata

A dictionary field that wraps a standard Python dictionary. This is similar to an embedded document, but the structure is not defined.

Note

Required means it cannot be empty - as the default for DictFields is {}

New in version 0.3.

Changed in version 0.5: - Can now handle complex / varying types of data

meta: Any = None
__init__(*args, **kwargs)

Initialise a document or an embedded document.

Parameters
  • values – A dictionary of keys and values for the document. It may contain additional reserved keywords, e.g. “__auto_convert”.

  • __auto_convert – If True, supplied values will be converted to Python-type values via each field’s to_python method.

  • __only_fields – A set of fields that have been loaded for this document. Empty if all fields have been loaded.

  • _created – Indicates whether this is a brand new document or whether it’s already been persisted before. Defaults to true.

classmethod get(id)
property mainfile_file
property upload
apply_entry_metadata(entry_metadata: nomad.datamodel.datamodel.EntryMetadata)
create_metadata() → nomad.datamodel.datamodel.EntryMetadata

Returns a nomad.datamodel.EntryMetadata with values from this processing object, not necessarily the user metadata nor the metadata from the archive.

entry_metadata(upload_files: nomad.files.UploadFiles) → nomad.datamodel.datamodel.EntryMetadata

Returns a complete set of nomad.datamodel.EntryMetadata including the user metadata and metadata from the archive.

Parameters
  • upload_files – The nomad.files.UploadFiles instance to read the archive from.

  • cache – A boolean that indicates if the archive file should be left unclosed, e.g. if this method is called for many entries of the same upload.

user_metadata() → nomad.datamodel.datamodel.EntryMetadata

Returns a nomad.datamodel.EntryMetadata with values from this processing object and the user metadata, not necessarily the metadata from the archive.

property upload_files
get_logger(**kwargs)

Returns a wrapped logger that additionally saves all entries to the calculation processing log in the archive.

re_process_calc()

Processes a calculation again. This means there is already metadata and instead of creating it initially, we are just updating the existing records.

process_calc()

Processes a new calculation that has no prior records in the mongo, elastic, or filesystem storage. It will create an initial set of (user) metadata.

on_fail()
on_process_complete(process_name)

Callback that is called when the corrent process completed

parsing()

The task that encapsulates all parsing related actions.

process_phonon()

Function that is run for phonon calculation before cleanup. This task is run by the celery process that is calling the join for the upload.

This function re-opens the Archive for this calculation to add method information from another referenced archive. Updates the method information in section_encyclopedia as well as the DFT domain metadata.

normalizing()

The task that encapsulates all normalizing related actions.

archiving()

The task that encapsulates all archival related actions.

write_archive(archive: nomad.datamodel.datamodel.EntryArchive)
exception DoesNotExist
exception MultipleObjectsReturned
id

A unicode string field.

objects

The default QuerySet Manager.

Custom QuerySet Manager functions can extend this class and users can add extra queryset functionality. Any custom manager methods must accept a Document class as its first argument, and a QuerySet as its second argument.

The method function should return a QuerySet , probably the same one that was passed in, but modified in some way.

tasks = ['parsing', 'normalizing', 'archiving']

nomad.search

This module represents calculations in elastic search.

exception nomad.search.AlreadyExists
exception nomad.search.ElasticSearchError
exception nomad.search.ScrollIdNotFound
exception nomad.search.InvalidQuery
nomad.search.delete_upload(upload_id)

Delete all entries with given upload_id from the index.

nomad.search.delete_entry(calc_id)

Delete the entry with the given calc_id from the index.

nomad.search.publish(calcs: Iterable[nomad.datamodel.datamodel.EntryMetadata]) → None

Update all given calcs with their metadata and set publish = True.

nomad.search.index_all(calcs: Iterable[nomad.datamodel.datamodel.EntryMetadata], do_refresh=True) → None

Adds all given calcs with their metadata to the index.

Returns

Number of failed entries.

nomad.search.refresh()
class nomad.search.SearchRequest(domain: str = 'dft', query=None)

Represents a search request and allows to execute that request. It allows to compose the following features: a query; statistics (metrics and aggregations); quantity values; scrolling, pagination for entries; scrolling for quantity values.

The query part filters NOMAD data before the other features come into effect. There are specialized methods for configuring the owner() and time_range() queries. Quantity’s can be search for by setting them as attributes.

The aggregations for statistics can be requested for pre-configured quantities. These bucket aggregations come with a metric calculated for each each possible quantity value.

The other possible form of aggregations, allows to get quantity values as results (e.g. get all datasets, get all users, etc.). Each value can be accompanied by metrics (over all entries with that value) and an example value.

Of course, searches can return a set of search results. Search objects can be configured with pagination or scrolling for these results. Pagination is the default and also allows ordering of results. Scrolling can be used if all entries need to be ‘scrolled through’. This might be necessary, since elastic search has limits on possible pages (e.g. ‘from’ must by smaller than 10000). On the downside, there is no ordering on scrolling.

There is also scrolling for quantities to go through all quantity values. There is no paging for aggregations.

__init__(domain: str = 'dft', query=None)

Initialize self. See help(type(self)) for accurate signature.

domain(domain: str = None)

Applies the domain of this request to the query. Allows to optionally update the domain of this request.

owner(owner_type: str = 'all', user_id: str = None)

Uses the query part of the search to restrict the results based on the owner. The possible types are: all for all calculations; public for calculations visible by everyone, excluding embargo-ed entries and entries only visible to the given user; visible all data that is visible by the user, excluding embargo-ed entries from other users; user for all calculations of to the given user; staging for all calculations in staging of the given user.

Parameters
  • owner_type – The type of the owner query, see above.

  • user_id – The ‘owner’ given as the user’s unique id.

Raises
  • KeyError – If the given owner_type is not supported

  • ValueError – If the owner_type requires a user but none is given, or the given user is not allowed to use the given owner_type.

search_parameters(**kwargs)

Configures the existing query with additional search parameters. Kwargs are interpreted as key value pairs. Keys have to coresspond to valid entry quantities in the domain’s (DFT calculations) datamodel. Alternatively search parameters can be set via attributes.

search_parameter(name, value)
query(query)

Adds the given query as a ‘and’ (i.e. ‘must’) clause to the request.

query_expression(expression) → nomad.search.SearchRequest
time_range(start: datetime.datetime, end: datetime.datetime)

Adds a time range to the query.

property q

The underlying elasticsearch_dsl query object

totals(metrics_to_use: List[str] = [])

Configure the request to return overall totals for the given metrics.

The statics are returned with the other quantity statistics under the pseudo quantity name ‘total’. ‘total’ contains the pseudo value ‘all’. It is used to store the metrics aggregated over all entries in the search results.

statistics(statistics: List[str], metrics_to_use: List[str] = [])

Configures the domain’s default statistics.

statistic(quantity_name: str, size: int, metrics_to_use: List[str] = [], order: Dict[str, str] = {'_key': 'asc'}, include: str = None)

This can be used to display statistics over the searched entries and allows to implement faceted search on the top values for each quantity.

The metrics contain overall and per quantity value sums of code runs (calcs), unique code runs, datasets, and additional domain specific metrics (e.g. total energies, and unique geometries for DFTcalculations). The quantities that can be aggregated to metrics are defined in module:datamodel. Aggregations and respective metrics are calculated for aggregations given in aggregations and metrics in aggregation_metrics. As a pseudo aggregation total_metrics are calculation over all search results. The aggregations gives tuples of quantities and default aggregation sizes.

The search results will contain a dictionary statistics. This has a key for each configured quantity. Each quantity key will hold a dict with a key for each quantity value. Each quantity value key will hold a dict with a key for each metric. The values will be the actual aggregated metric values.

Parameters
  • quantity_name – The quantity to aggregate statistics for. Only works on keyword field.

  • metrics_to_use – The metrics calculated over the aggregations. Can be unique_code_runs, datasets, other domain specific metrics. The basic doc_count metric code_runs is always given.

  • order – The order dictionary is passed to the elastic search aggregation.

  • include – Uses an regular expression in ES to only return values that include the given substring.

date_histogram(metrics_to_use: List[str] = [], interval: str = '1M')

Adds a date histogram on the given metrics to the statistics part.

quantities(**kwargs)

Shorthand for adding multiple quantities. See quantity(). Keywork argument keys are quantity name, values are tuples of size and after value.

quantity(name, size=100, after=None, examples=0, examples_source=None, order_by: str = None, order: str = 'desc')

Adds a requests for values of the given quantity. It allows to scroll through all values via elasticsearch’s composite aggregations. The response will contain the quantity values and an example entry for each value.

This can be used to implement continues scrolling through authors, datasets, or uploads within the searched entries.

If one or more quantities are specified, the search results will contain a dictionary quantities. The keys are quantity name the values dictionary with ‘after’ and ‘values’ key. The ‘values’ key holds a dict with all the values as keys and their entry count as values (i.e. number of entries with that value).

Parameters
  • name – The quantity name. Must be in quantities.

  • after – The ‘after’ value allows to scroll over various requests, by providing the ‘after’ value of the last search. The ‘after’ value is part of the response. Use None in the first request.

  • size – The size gives the ammount of maximum values in the next scroll window. If the size is None, a maximum of 100 quantity values will be requested.

  • examples – Number of results to return that has each value

  • order_by – A sortable quantity that should be used to order. The max of each value bucket is used.

  • order – “desc” or “asc”

global_statistics()

Adds general statistics to the request. The results will have a key called global_statistics.

exclude(*args)

Exclude certain elastic fields from the search results.

include(*args)

Include only the given fields in the search results.

execute()

Executes without returning actual results. Only makes sense if the request was configured for statistics or quantity values.

execute_scan(order_by: str = None, order: int = -1, **kwargs)

This execute the search as scan. The result will be a generator over the found entries. Everything but the query part of this object, will be ignored.

execute_paginated(page: int = 1, per_page=10, order_by: str = None, order: int = -1)

Executes the search and returns paginated results. Those are sorted.

Parameters
  • page – The requested page, starts with 1.

  • per_page – The number of entries per page.

  • order_by – The quantity to order by.

  • order – -1 or 1 for descending or ascending order.

execute_scrolled(scroll_id: str = None, size: int = 1000, scroll: str = '5m', order_by: str = None, order: int = -1)

Executes a scrolling search. based on ES scroll API. Pagination is replaced with scrolling, no ordering is available, no statistics, no quantities will be provided.

Scrolling is done by calling this function again and again with the same scroll_id. Each time, this function will return the next batch of search results. If the scroll_id is not available anymore, a new scroll_id is assigned and scrolling starts from the beginning again.

The response will contain a ‘scroll’ part with attributes ‘total’, ‘scroll_id’, and ‘size’.

Parameters
  • scroll_id – The scroll id to receive the next batch from. None will create a new scroll.

  • size – The batch size in number of hits.

  • scroll – The time the scroll should be kept alive (i.e. the time between requests to this method) in ES time units. Default is 5 minutes.

TODO support order and order_by

execute_aggregated(after: str = None, per_page: int = 1000, includes: List[str] = None)

Uses a composite aggregation on top of the search to go through the result set. This allows to go arbirarely deep without using scroll. But, it will only return results with upload_id, calc_id and the given quantities. The results will be ‘ordered’ by upload_id.

Parameters
  • after – The key that determines the start of the current page. This after key is returned with each response. Use None (default) for the first request.

  • per_page – The size of each page.

  • includes – A list of quantity names that should be returned in addition to upload_id and calc_id.

nomad.search.flat(obj, prefix=None)

Helper that translates nested result objects into flattened dicts with domain.quantity as keys.

nomad.app

This module comprises the nomad@FAIRDI APIs. Currently there is NOMAD’s official api, and we will soon at the optimade api. The app module also servers documentation, gui, and alive.

nomad.cli

Command line interface (CLI) for nomad. Provides a group/sub-command structure, think git, that offers various functionality to the command line user.

Use it from the command line with nomad --help or python -m nomad.cli --help to learn more.

The CLI uses lazy_import for lazy loading modules. This has some limitations. You will break lazy loading if an from x import y is used in the cli code. You will also have to add imports via nomad.cli.lazy_import.lazy_module() before importing them.

nomad.client

Install the NOMAD client library

The NOMAD client library is a Python module (part of the nomad Python package) that allows to access the NOMAD archive to retrieve and analyse (large amounts) of NOMAD’s archive data. It allows to use queries to filter for desired entries, bulk download the required parts of the respective archives, and navigate the results using NOMAD’s metainfo Python API.

To install the NOMAD Python package, you can use pip install to install our source distribution

pip install nomad-lab

First example

'''
A simple example that uses the NOMAD client library to access the archive.
'''

from nomad.client import ArchiveQuery
from nomad.metainfo import units


query = ArchiveQuery(
    # url='http://nomad-lab.eu/prod/rae/beta/api',
    query={
        'dft.code_name': 'VASP'
    },
    required={
        'section_run': {
            'section_single_configuration_calculation': '*',
            'section_system': '*'
        }
    },
    per_page=10,
    max=100)

print(query)

for i, result in enumerate(query):
    if i < 10:
        calc = result.section_run[0].section_single_configuration_calculation[-1]
        energy = calc.energy_total
        formula = calc.single_configuration_calculation_to_system_ref.chemical_composition_reduced
        print('%s: energy %s' % (formula, energy.to(units.hartree)))

This script should yield a result like this:

Number queries entries: 7628
Number of entries loaded in the last api call: 10
Bytes loaded in the last api call: 118048
Bytes loaded from this query: 118048
Number of downloaded entries: 10
Number of made api calls: 1

Cd2O2: energy -11467.827149010665 hartree
Sr2O2: energy -6551.45699684026 hartree
Sr2O2: energy -6551.461104765451 hartree
Be2O2: energy -178.6990610734937 hartree
Ca2O2: energy -1510.3938165430286 hartree
Ca2O2: energy -1510.3937761449583 hartree
Ba2O2: energy -16684.667362890417 hartree
Mg2O2: energy -548.9736595672932 hartree
Mg2O2: energy -548.9724185656775 hartree
Ca2O2: energy -1510.3908614326358 hartree

Let’s discuss the different elements here. First, we have a set of imports. The NOMAD source codes comes with various sub-modules. The client module contains everything related to what is described here; the metainfo is the Python interface to NOMAD’s common archive data format and its data type definitions; the config module simply contains configuration values (like the URL to the NOMAD API).

Next, we create an ArchiveQuery instance. This object will be responsible for talking to NOMAD’s API for us in a transparent and lazy manner. This means, it will not download all data right away, but do so when we are actually iterating through the results.

The archive query takes several parameters:

  • The query is a dictionary of search criteria. The query is used to filter all of NOMAD’s entry down to a set of desired entries. You can use NOMAD’s GUI to create queries and copy their Python equivalent with the <>-code button on the result list.

  • The required part, allows to specify what parts of the archive should be downloaded. Leave it out to download the whole archives. Based on NOMAD’s Metainfo (the ‘schema’ of all archives), you can determine what sections to include and which to leave out. Here, we are interested in the first run (usually entries only have one run) and the first calculation result.

  • With the optional per_page you can determine, how many results are downloaded at a time. For bulk downloading many results, we recommend ~100. If you are just interested in the first results a lower number might increase performance.

  • With the optional max, we limit the maximum amount of entries that are downloaded, just to avoid accidentely iterating through a result set of unknown and potentially large size.

When you print the archive query object, you will get some basic statistics about the query and downloaded data.

The archive query object can be treated as a Python list-like. You use indices and ranges to select results. Here we iterate through a slice and print the calculated energies from the first calculation of the entries. Each result is a Python object with attributes governed by the NOMAD Metainfo. Quantities yield numbers, string, or numpy arrays, while sub-sections return lists of further objects. Here we navigate the sections section_run and sub-section section_system to access the quantity energy_total. This quantity is a number with an attached unit (Joule), which can be converted to something else (e.g. Hartree).

The create query object keeps all results in memory. Keep this in mind, when you are accessing a large amount of query results. You should use ArchiveQuery.clear() to remove unnecessary results.

The NOMAD Metainfo

You can imagine the NOMAD Metainfo as a complex schema for hiearchically organized scientific data. In this sense, the NOMAD Metainfo is a set of data type definitions. These definitions then govern how the archive for an data entry in NOMAD might look like. You can browse the hierarchy of definitions in our Metainfo browser.

Be aware, that the definitions entail everything that an entry could possibly contain, but not all entries contain all sections and all quantities. What an entry contains depends on the information that the respective uploaded data contained, what could be extracted, and of course what was calculated in the first place. To see what the archive of an concrete entry looks like, you can use the search interface, select an entry from the list fo search results, and click on the Archive tab.

To see inside an archive object in Python, you can use nomad.metainfo.MSection.m_to_dict() which is provided by all archive objects. This will convert a (part of an) archive into a regular, JSON-serializable Python dictionary.

For more details on the metainfo Python interface, consult the metainfo documentation.

The ArchiveQuery class

class nomad.client.ArchiveQuery(query: dict = None, required: dict = None, url: str = None, username: str = None, password: str = None, parallel: int = 1, per_page: int = 10, max: int = 10000, raise_errors: bool = False, authentication: Union[Dict[str, str], nomad.client.KeycloakAuthenticator] = None)

Object of this class represent a query on the NOMAD Archive. It is solely configured through its constructor. After creation, it implements the Python Sequence interface and therefore acts as a sequence of query results.

Not all results are downloaded at once, expect that this class will continuesly pull results from the API, while you access or iterate to the far side of the result list.

query

A dictionary of search parameters. Consult the search API to get a comprehensive list of parameters.

required

A potentially nested dictionary of sections to retrieve.

url

Optional, override the default NOMAD API url.

username

Optional, allows authenticated access.

password

Optional, allows authenticated access.

per_page

Determine how many results are downloaded per page (or scroll window). Default is 10.

max

Optionally determine the maximum amount of downloaded archives. The iteration will stop if max is surpassed even if more results are available. Default is 10.000. None value will set it to unlimited.

raise_errors

There situations where archives for certain entries are unavailable. If set to True, this cases will raise an Exception. Otherwise, the entries with missing archives are simply skipped (default).

authentication

Optionally provide detailed authentication information. Usually, providing username and password should suffice.

parallel

Number of processes to use to retrieve data in parallel. Only data from different uploads can be retrieved in parallel. Default is 1. The argument per_page will refer to archived retrieved in one process per call.

__init__(query: dict = None, required: dict = None, url: str = None, username: str = None, password: str = None, parallel: int = 1, per_page: int = 10, max: int = 10000, raise_errors: bool = False, authentication: Union[Dict[str, str], nomad.client.KeycloakAuthenticator] = None)

Initialize self. See help(type(self)) for accurate signature.

Working with private data

Public NOMAD data can be accessed without any authentication; everyone can use our API without the need for an account or login. However, if you want to work with your own data that is not yet published, or embargoed data was shared with you, you need to authenticate before accessing this data. Otherwise, you will simply not find it with your queries. To authenticate simply provide your NOMAD username and password to the ArchiveQuery constructor.

nomad.utils

Logging in nomad is structured. Structured logging means that log entries contain dictionaries with quantities related to respective events. E.g. having the code, parser, parser version, calc_id, mainfile, etc. for all events that happen during calculation processing. This means the get_logger() and all logger functions take keyword arguments for structured data. Otherwise get_logger() can be used similar to the standard logging.getLogger.

Depending on the configuration all logs will also be send to a central logstash.

tests

The nomad@FAIRDI tests are based on the pytest library. Pytest uses fixtures to modularize setup and teardown of mocks, infrastructure, and other context objects. The following depicts the used hierarchy of fixtures:

_images/test_fixtures.png

Otherwise the test submodules follow the names of the nomad code modules.