NOMAD Metainfo

Introduction

The NOMAD Metainfo stores descriptive and structured information about materials-science data contained in the NOMAD Archive. The Metainfo can be understood as the schema of the Archive. The NOMAD Archive data is structured to be independent of the electronic-structure theory code or molecular-simulation, (or beyond). The NOMAD Metainfo can be browsed as part of the NOMAD Repository and Archive web application.

Typically (meta-)data definitions are generated only for a predesigned and specific scientific field, application or code. In contrast, the NOMAD Metainfo considers all pertinent information in the input and output files of the supported electronic-structure theory, quantum chemistry, and molecular-dynamics (force-field) codes. This ensures a complete coverage of all material and molecule properties, even though some properties might not be as important as others, or are missing in some input/output files of electronic-structure programs.

_images/metainfo_example.png

NOMAD Metainfo is kept independent of the actual storage format and is not bound to any specific storage method. In our practical implementation, we use a binary form of JSON, called msgpack on our servers and provide Archive data as JSON via our API. For NOMAD end-users the internal storage format is of little relevance, because the archive data is solely served by NOMAD’s API.

The NOMAD Metainfo started within the NOMAD Laboratory. It was discussed at the CECAM workshop Towards a Common Format for Computational Materials Science Data and is open to external contributions and extensions. More information can be found in Towards a Common Format for Computational Materials Science Data (Psi-K 2016 Highlight).

Metainfo Python Interface

The NOMAD meta-info allows to define schemas for physics data independent of the used storage format. It allows to define physics quantities with types, complex shapes (vetors, matrices, etc.), units, links, and descriptions. It allows to organize large amounts of these quantities in containment hierarchies of extendable sections, references between sections, and additional quantity categories.

NOMAD uses the meta-info to define all archive data, repository meta-data, (and encyclopedia data). The meta-info provides a convenient Python interface to create, manipulate, and access data. We also use it to map data to various storage formats, including JSON, (HDF5), mongodb, and elastic search.

Starting example

from nomad.metainfo import MSection, Quantity, SubSection, Units

class System(MSection):
    '''
    A system section includes all quantities that describe a single a simulated
    system (a.k.a. geometry).
    '''

    n_atoms = Quantity(
        type=int, description='''
        A Defines the number of atoms in the system.
        ''')

    atom_labels = Quantity(type=MEnum(ase.data.chemical_symbols), shape['n_atoms'])
    atom_positions = Quantity(type=float, shape=['n_atoms', 3], unit=Units.m)
    simulation_cell = Quantity(type=float, shape=[3, 3], unit=Units.m)
    pbc = Quantity(type=bool, shape=[3])

class Run(MSection):
    systems = SubSection(sub_section=System, repeats=True)

We define simple metainfo schema with two sections called System and Run. Sections allow to organize related data into, well, sections. Each section can have two types of properties: quantities and sub-sections. Sections and their properties are defined with Python classes and their attributes.

Each quantity defines a piece of data. Basic quantity attributes are its type, shape, unit, and description.

Sub-sections allow to place section into each other and there allow to form containment hierarchies or sections and the respective data in them. Basic sub-section attributes are sub_section`(i.e. a reference to the section definition of the sub-section) and `repeats (determines if a sub-section can be contained once or multiple times).

The above simply defines a schema, to use the schema and create actual data, we have to instantiate the above classes:

run = Run()
system = run.m_create(System)
system.n_atoms = 3
system.atom_labels = ['H', 'H', 'O']

print(system.atom_labels)
print(n_atoms = 3)

Section instances can be used like regular Python objects: quantities and sub-sections can be set and access like any other Python attribute. Special meta-info methods, starting with m_ allow us to realize more complex semantics. For example m_create will instantiate a sub-section and add it to the parent section in one step.

Another example for an m_-method is:

run.m_to_json(indent=2)

This will serialize the data into JSON:

{
    "m_def" = "Run",
    "systems": [
        {
            "n_atoms" = 3,
            "atom_labels" = [
                "H",
                "H",
                "O"
            ]
        }
    ]
}

Definitions

class nomad.metainfo.Definition(m_def: Optional[nomad.metainfo.metainfo.Section] = None, m_resource: nomad.metainfo.metainfo.MResource = None, **kwargs)

A common base for all metainfo definitions.

All metainfo definitions (sections, quantities, sub-sections, packages, …) share some common attributes. These are defined in a common base: all metainfo items extend this common base and inherit from Definition.

Parameters
  • name

    Each definition has a name. Names have to be valid Python identifier. They can contain letters, numbers and _, but must not start with a number. This also qualifies them as identifier in most storage formats, databases, makes them URL safe, etc.

    Names must be unique within the Package or Section that this definition is part of.

  • description – The description can be an arbitrary human readable text that explains what this definition is about.

  • links – Each definition can be accompanied by a list of URLs. These should point to resources that further explain the definition.

  • categories – All metainfo definitions can be put into one or more categories. Categories allow to organize the definitions themselves. It is different from sections, which organize the data (e.g. quantity values) and not the definitions of data (e.g. quantities definitions). See Categories for more details.

__init__(m_def: Optional[nomad.metainfo.metainfo.Section] = None, m_resource: nomad.metainfo.metainfo.MResource = None, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

Quantities

class nomad.metainfo.Quantity(*args, **kwargs)

Definition of an atomic piece of data.

Quantity definitions are the main building block of meta-info schemas. Each quantity represents a single piece of data.

To define quantities, use objects of this class as classattribute values in section classes. The name of a quantity is automatically taken from its section class attribute. You can provide all other attributes to the constructor with keyword arguments

See Sections to learn about section classes. In Python terms, Quantity is a descriptor. Descriptors define how to get and set attributes in a Python object. This allows us to use sections like regular Python objects and quantity like regular Python attributes.

Beyond basic Definition attributes, Quantities are defined with the following attributes.

Parameters
  • type

    Defines the datatype of quantity values. This is the type of individual elements in a potentially complex shape. If you define a list of integers for example, the shape would be list and the type integer: Quantity(type=int, shape=['0..*']).

    The type can be one of:

    • a build-in primitive Python type: int, str, bool, float

    • an instance of MEnum, e.g. MEnum('one', 'two', 'three')

    • a section to define references to other sections as quantity values

    • a custom meta-info DataType, see Environments

    • a numpy dtype, e.g. np.dtype('float32')

    • typing.Any to support any value

    If set to dtype, this quantity will use a numpy array or scalar to store values internally. If a regular (nested) Python list or Python scalar is given, it will be automatically converted. The given dtype will be used in the numpy value.

    To define a reference, either a section class or instance of Section can be given. See Sections for details. Instances of the given section constitute valid values for this type. Upon serialization, references section instance will represented with metainfo URLs. See References and metainfo URLs.

    For quantities with more than one dimension, only numpy arrays and dtypes are allowed.

  • shape

    The shape of the quantity. It defines its dimensionality.

    A shape is a list, where each item defines one dimension. Each dimension can be:

    • an integer that defines the exact size of the dimension, e.g. [3] is the shape of a 3D spacial vector

    • a string that specifies a possible range, e.g. 0..*, 1..*, 3..6

    • the name of an int typed and shapeless quantity in the same section which values define the length of this dimension, e.g. number_of_atoms defines the length of atom_positions

    Range specifications define lower and upper bounds for the possible dimension length. The * can be used to denote an arbitrarily high upper bound.

    Quantities with dimensionality (length of the shape) higher than 1, must be numpy arrays. Theire type must be a dtype.

  • unit

    The physics unit for this quantity. It is optional.

    Units are represented with the pint_ Python package. Pint defines units and their algebra. You can either use pint units directly, e.g. units.m / units.s. The metainfo provides a preconfigured pint unit registry units. You can also provide the unit as pint parsable string, e.g. 'meter / seconds' or 'm/s'.

  • default

    The default value for this quantity. The value must match type and shape.

    Be careful with a default value like [] as it will be the default value for all occurrences of this quantity.

  • synonym_for – The name of a quantity defined in the same section as string. This will make this quantity a synonym for the other quantity. All other properties (type, shape, unit, etc.) are ignored. Getting or setting from/to this quantity will be delegated to the other quantity. Synonyms are always virtual.

  • derived – A Python callable that takes the containing section as input and outputs the value for this quantity. This quantity cannot be set directly, its value is only derived by the given callable. The callable is executed when this quantity is get. Derived quantities are always virtual.

  • cached – A bool indicating that derived values should be cached unless the underlying section has changed.

  • virtual – A boolean that determines if this quantity is virtual. Virtual quantities can be get/set like regular quantities, but their values are not (de-)serialized, hence never permanently stored.

  • is_scalar – Derived quantity that is True, iff this quantity has shape of length 0

__init__(*args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

Sections

With sections it is paramount to always be clear what is talked about. The lose term section can reference one of the following three:

  • section definition

    Which is a Python object that represents the definition of a section, its sub-sections and quantities. Section definitions should not be not written directly. Section definitions are objects of Section.

  • secton class

    Which is a Python class and MSection decendant that is used to express a section defintion in Python. Each section class is tightly associated with its section definition. The section definition can be access with the class attribute m_def. The section definition is automatically created from the section class upon defining the class through metaclass vodoo.

  • section instance

    The instance (object) of a section class, it follows the definition associated with the instantiated section class. The followed section definition can be accessed with the object attribute m_def.

A section class looks like this:

class SectionName(BaseSection):
    ''' Section description '''
    m_def = Section(**section_attributes)

    quantity_name = Quantity(**quantity_attributes)
    sub_section_name = SubSection(**sub_section_attributes)

The various Python elements of this class are mapped to a respective section definition. The SectionName becomes the name. The BaseSection is either MSection or another section class. The section_attributes become additional attributes of the section definition. The various Quantity and SubSection become the quantities and sub sections.

Each section class has to directly or indirectly extend MSection. This will provided certain class and object features to all section classes and all section instances. Read :ref:metainfo-reflection to learn more.

class nomad.metainfo.Section(*args, validate: bool = True, **kwargs)

Sections define blocks of related quantities and allows hierarchical data.

Section definitions determine what quantities and sub-sections can appear in a following section instance.

Parameters
  • quantities – The quantities definitions of this section definition as list of Quantity. Will be automatically set from the section class.

  • sub_sections – The sub-section definitions of this section definition as list of SubSection. Will be automatically set from the section class.

  • base_sections – A list of section definitions (Section). By default this definition will inherit all quantity and sub section definitions from the given section definitions. This behavior might be altered with extends_base_section.

  • extends_base_section

    If True, this definition must have exactly one base_sections. Instead of inheriting properties, the quantity and sub-section definitions of this section will be added to the base section.

    This allows to add further properties to an existing section definition. To use such extension on section instances in a type-safe manner MSection.m_as() can be used to cast the base section to the extending section.

  • extending_sections – A list of section definitions (Section). These are those sections that add their properties to this section via extends_base_section. This quantity will be set automatically.

  • constraints – Constraints are rules that a section must fulfil to be valid. This allows to implement semantic checks that go behind mere type or shape checks. This quantity takes the names of constraints as string. Constraints have to be implemented as methods with the constraint() decorator. They can raise ConstraintVialated or an AssertionError to indicate that the constraint is not fulfilled for the self section. This quantity will be set automatically from all constraint methods in the respective section class. To run validation of a section use MSection.m_validate().

  • event_handlers – Event handler are functions that get called when the section data is changed. There are two types of events: set and add_sub_section. The handler type is determined by the handler (i.e. function) name: on_set and on_add_sub_section. The handler arguments correspond to MSection.m_set() (section, quantity_def, value) and MSection.m_add_sub_section() (section, sub_section_def, sub_section). Handler are called after the respective action was performed. This quantity is automatically populated with handler from the section classes methods. If there is a method on_set or on_add_sub_section, it will be added as handler.

  • section_cls – A helper attribute that gives the section class as a Python class object.

  • inherited_sections – A helper attribute that gives direct and indirect base sections and extending sections including this section. These are all sections that this sections gets its properties from.

  • all_base_sections – A helper attribute that gives direct and indirect base sections.

  • all_properties – A helper attribute that gives all properties (sub section and quantity) definitions including inherited properties and properties from extending sections as a dictionary with names and definitions.

  • all_quantities – A helper attribute that gives all quantity definition including inherited ones and ones from extending sections as a dictionary that maps names (strings) to Quantity.

  • all_sub_sections – A helper attribute that gives all sub-section definition including inherited ones and ones from extending sections as a dictionary that maps names (strings) to SubSection.

  • all_sub_sections_by_section – A helper attribute that gives all sub-section definition including inherited ones and ones from extending sections as a dictionary that maps section classes (i.e. Python class objects) to lists of SubSection.

  • errors – A list of errors. These issues prevent the section definition from being usable.

  • warnings – A list of warnings. These still allow to use the section definition.

__init__(*args, validate: bool = True, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

Sub-Sections

class nomad.metainfo.SubSection(m_def: Optional[nomad.metainfo.metainfo.Section] = None, m_resource: nomad.metainfo.metainfo.MResource = None, **kwargs)

Defines what sections can appear as sub-sections of another section.

Like quantities, sub-sections are defined in a section class as attributes of this class. An like quantities, each sub-section definition becomes a property of the corresponding section definition (parent). A sub-section definition references another section definition as the sub-section (child). As a consequence, parent section instances can contain child section instances as sub-sections.

Contrary to the old NOMAD metainfo, we distinguish between sub-section the section and sub-section the property. This allows to use on child section definition as sub-section of many different parent section definitions.

Parameters
  • sub_section – A Section or Python class object for a section class. This will be the child section definition. The defining section the child section definition.

  • repeats – A boolean that determines wether this sub-section can appear multiple times in the parent section.

__init__(m_def: Optional[nomad.metainfo.metainfo.Section] = None, m_resource: nomad.metainfo.metainfo.MResource = None, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

Categories

class nomad.metainfo.Quantity(*args, **kwargs)

Definition of an atomic piece of data.

Quantity definitions are the main building block of meta-info schemas. Each quantity represents a single piece of data.

To define quantities, use objects of this class as classattribute values in section classes. The name of a quantity is automatically taken from its section class attribute. You can provide all other attributes to the constructor with keyword arguments

See Sections to learn about section classes. In Python terms, Quantity is a descriptor. Descriptors define how to get and set attributes in a Python object. This allows us to use sections like regular Python objects and quantity like regular Python attributes.

Beyond basic Definition attributes, Quantities are defined with the following attributes.

Parameters
  • type

    Defines the datatype of quantity values. This is the type of individual elements in a potentially complex shape. If you define a list of integers for example, the shape would be list and the type integer: Quantity(type=int, shape=['0..*']).

    The type can be one of:

    • a build-in primitive Python type: int, str, bool, float

    • an instance of MEnum, e.g. MEnum('one', 'two', 'three')

    • a section to define references to other sections as quantity values

    • a custom meta-info DataType, see Environments

    • a numpy dtype, e.g. np.dtype('float32')

    • typing.Any to support any value

    If set to dtype, this quantity will use a numpy array or scalar to store values internally. If a regular (nested) Python list or Python scalar is given, it will be automatically converted. The given dtype will be used in the numpy value.

    To define a reference, either a section class or instance of Section can be given. See Sections for details. Instances of the given section constitute valid values for this type. Upon serialization, references section instance will represented with metainfo URLs. See References and metainfo URLs.

    For quantities with more than one dimension, only numpy arrays and dtypes are allowed.

  • shape

    The shape of the quantity. It defines its dimensionality.

    A shape is a list, where each item defines one dimension. Each dimension can be:

    • an integer that defines the exact size of the dimension, e.g. [3] is the shape of a 3D spacial vector

    • a string that specifies a possible range, e.g. 0..*, 1..*, 3..6

    • the name of an int typed and shapeless quantity in the same section which values define the length of this dimension, e.g. number_of_atoms defines the length of atom_positions

    Range specifications define lower and upper bounds for the possible dimension length. The * can be used to denote an arbitrarily high upper bound.

    Quantities with dimensionality (length of the shape) higher than 1, must be numpy arrays. Theire type must be a dtype.

  • unit

    The physics unit for this quantity. It is optional.

    Units are represented with the pint_ Python package. Pint defines units and their algebra. You can either use pint units directly, e.g. units.m / units.s. The metainfo provides a preconfigured pint unit registry units. You can also provide the unit as pint parsable string, e.g. 'meter / seconds' or 'm/s'.

  • default

    The default value for this quantity. The value must match type and shape.

    Be careful with a default value like [] as it will be the default value for all occurrences of this quantity.

  • synonym_for – The name of a quantity defined in the same section as string. This will make this quantity a synonym for the other quantity. All other properties (type, shape, unit, etc.) are ignored. Getting or setting from/to this quantity will be delegated to the other quantity. Synonyms are always virtual.

  • derived – A Python callable that takes the containing section as input and outputs the value for this quantity. This quantity cannot be set directly, its value is only derived by the given callable. The callable is executed when this quantity is get. Derived quantities are always virtual.

  • cached – A bool indicating that derived values should be cached unless the underlying section has changed.

  • virtual – A boolean that determines if this quantity is virtual. Virtual quantities can be get/set like regular quantities, but their values are not (de-)serialized, hence never permanently stored.

  • is_scalar – Derived quantity that is True, iff this quantity has shape of length 0

__init__(*args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

In the old meta-info this was known as abstract types.

Categories are defined with Python classes that have MCategory as base class. Their name and description is taken from the class’s name and docstring. An example category looks like this:

class CategoryName(MCategory):
    ''' Category description '''
    m_def = Category(links=['http://further.explanation.eu'], categories=[ParentCategory])

Packages

class nomad.metainfo.Package(*args, **kwargs)

Packages organize metainfo defintions alongside Python modules

Each Python module with metainfo Definition (explicitely or implicitely) has a member m_package with an instance of this class. Definitions (categories, sections) in Python modules are automatically added to the module’s Package. Packages are not nested and rather have the fully qualitied Python module name as name.

This allows to inspect all definitions in a Python module and automatically puts module name and docstring as Package name and description.

Besides the regular Defintion attributes, packages can have the following attributes:

Parameters
  • section_definitions – All section definitions in this package as Section objects.

  • category_definitions – All category definitions in this package as Category objects.

  • all_definitions – A helper attribute that provides all section definitions by name.

__init__(*args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

Environments

class nomad.metainfo.Environment(*args, **kwargs)

Environments allow to manage many metainfo packages and quickly access all definitions.

Environments provide a name-table for large-sets of metainfo definitions that span multiple packages. It provides various functions to resolve metainfo definitions by their names, legacy names, and qualified names.

Parameters

packages – Packages in this environment.

__init__(*args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

Custom data types

class nomad.metainfo.DataType

Allows to define custom data types that can be used in the meta-info.

The metainfo supports the most types out of the box. These includes the python build-in primitive types (int, bool, str, float, …), references to sections, and enums. However, in some occasions you need to add custom data types.

This base class lets you customize various aspects of value treatment. This includes type checks and various value transformations. This allows to store values in the section differently from how the usermight set/get them, and it allows to have non serializeable values that are transformed on de-/serialization.

set_normalize(section: nomad.metainfo.metainfo.MSection, quantity_def: nomad.metainfo.metainfo.Quantity, value: Any) → Any

Transforms the given value before it is set and checks its type.

get_normalize(section: nomad.metainfo.metainfo.MSection, quantity_def: nomad.metainfo.metainfo.Quantity, value: Any) → Any

Transforms the given value when it is get.

serialize(section: nomad.metainfo.metainfo.MSection, quantity_def: nomad.metainfo.metainfo.Quantity, value: Any) → Any

Transforms the given value when making the section serializeable.

deserialize(section: nomad.metainfo.metainfo.MSection, quantity_def: nomad.metainfo.metainfo.Quantity, value: Any) → Any

Transforms the given value from its serializeable form.

class nomad.metainfo.MEnum(*args, **kwargs)

Allows to define str types with values limited to a pre-set list of possible values.

__init__(*args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

Reflection and custom data storage

When manipulating metainfo data in Python, all data is represented as Python objects, where objects correspond to section instance and their attributes to quantity values or section instances of sub-sections. By defining sections with section classes each of these Python objects already has an interface that allows to get/set quantities and sub-sections. But often this interface is too limited, or the specific section and quantity definitions are unknown when writing code.

class nomad.metainfo.MSection(m_def: Optional[nomad.metainfo.metainfo.Section] = None, m_resource: nomad.metainfo.metainfo.MResource = None, **kwargs)

Base class for all section instances on all meta-info levels.

All section instances indirectly instantiate the MSection and therefore all members of MSection are available on all section instances. MSection provides many special attributes and functions (they all start with m_) that allow to reflect on a section’s definition and allow to manipulate the section instance without a priori knowledge of the section defintion.

It also carries all the data for each section. All sub-classes only define specific sections in terms of possible sub-sections and quantities. The data is managed here.

m_def

The section definition that this section instance follows as a Section object.

m_parent

If this section is a sub-section, this references the parent section instance.

m_parent_sub_section

If this section is a sub-section, this is the SubSection that defines this relationship.

m_parent_index

For repeatable sections, parent keep a list of sub-sections. This is the index of this section in the respective parent sub-section list.

m_resource

The MResource that contains and manages this section.

__init__(m_def: Optional[nomad.metainfo.metainfo.Section] = None, m_resource: nomad.metainfo.metainfo.MResource = None, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

m_def: Section = None
m_set(quantity_def: nomad.metainfo.metainfo.Quantity, value: Any) → None

Set the given value for the given quantity.

m_get(quantity_def: nomad.metainfo.metainfo.Quantity) → Any

Retrieve the given value for the given quantity.

m_is_set(quantity_def: nomad.metainfo.metainfo.Quantity) → bool

True if the given quantity is set.

m_add_values(quantity_def: nomad.metainfo.metainfo.Quantity, values: Any, offset: int) → None

Add (partial) values for the given quantity of higher dimensionality.

m_add_sub_section(sub_section_def: nomad.metainfo.metainfo.SubSection, sub_section: nomad.metainfo.metainfo.MSection) → None

Adds the given section instance as a sub section of the given sub section definition.

m_remove_sub_section(sub_section_def: nomad.metainfo.metainfo.SubSection, index: int) → None

Removes the exiting section for a non repeatable sub section

m_get_sub_section(sub_section_def: nomad.metainfo.metainfo.SubSection, index: int) → nomad.metainfo.metainfo.MSection

Retrieves a single sub section of the given sub section definition.

m_get_sub_sections(sub_section_def: nomad.metainfo.metainfo.SubSection) → List[nomad.metainfo.metainfo.MSection]

Retrieves all sub sections of the given sub section definition.

m_sub_section_count(sub_section_def: nomad.metainfo.metainfo.SubSection) → int

Returns the number of sub sections for the given sub section definition.

m_create(section_cls: Type[MSectionBound], sub_section_def: Optional[nomad.metainfo.metainfo.SubSection] = None, **kwargs) → MSectionBound

Creates a section instance and adds it to this section provided there is a corresponding sub section.

Parameters
  • section_cls – The section class for the sub-secton to create

  • sub_section_def – If there are multiple sub-sections for the given class, this must be used to explicitely state the sub-section definition.

m_update(safe: bool = True, **kwargs)

Updates all quantities and sub-sections with the given arguments.

m_as(section_cls: Type[MSectionBound]) → MSectionBound

‘Casts’ this section to the given extending sections.

m_follows(definition: nomad.metainfo.metainfo.Section) → bool

Determines if this section’s definition is or is derived from the given definition.

m_to_dict(with_meta: bool = False, include_defaults: bool = False, categories: List[Union[Category, Type[MCategory]]] = None, partial: Callable[[Definition, MSection], bool] = None) → Dict[str, Any]

Returns the data of this section as a json serializeable dictionary.

Parameters
  • with_meta – Include information about the section definition and the sections position in its parent.

  • include_defaults – Include default values of unset quantities.

  • categories – A list of category classes or category definitions that is used to filter the included quantities and sub sections. Only applied to properties of this section, not on sub-sections. Is overwritten by partial.

  • partial – A function that determines if a definition should be included in the output dictionary. Takes a definition and the containing section as arguments. Partial is applied recursively on sub-sections. Overrides categories.

m_update_from_dict(dct: Dict[str, Any]) → None

Updates this section with the serialized data from the given dict, e.g. data produced by m_to_dict().

classmethod m_from_dict(dct: Dict[str, Any]) → MSectionBound

Creates a section from the given serializable data dictionary.

This is the ‘opposite’ of m_to_dict(). It takes a deserialized dict, e.g loaded from JSON, and turns it into a proper section, i.e. instance of the given section class.

m_to_json(**kwargs)

Returns the data of this section as a json string.

m_all_contents(depth_first: bool = False, include_self: bool = False, stop: Callable[[MSection], bool] = None) → Iterable[nomad.metainfo.metainfo.MSection]

Returns an iterable over all sub and sub subs sections.

Parameters
  • depth_first – A boolean indicating that children should be returned before parents.

  • include_self – A boolean indicating that the results should contain this section.

  • stop – A predicate that determines if the traversal should be stopped or if children should be returned. The sections for which this returns True are included in the results.

m_traverse()

Performs a depth-first traversal and yield tuples of section, property def, parent index for all set properties.

m_pretty_print(indent=None)

Pretty prints the containment hierarchy

m_contents() → Iterable[nomad.metainfo.metainfo.MSection]

Returns an iterable over all direct subs sections.

m_path(quantity_def: Optional[nomad.metainfo.metainfo.Quantity] = None) → str

Returns the path of this section or the given quantity within the section hierarchy.

m_root(cls: Type[MSectionBound] = None) → MSectionBound

Returns the first parent of the parent section that has no parent; the root.

m_parent_as(cls: Type[MSectionBound] = None) → MSectionBound

Returns the parent section with the given section class type.

m_resolved()

Returns the original resolved object, if this instance used to be a proxy.

For most purposes a resolved proxy is equal to the section it was resolved to. The exception are hashes. So if you want to use a potential former proxy in a hash table and make it really equal to the section it was resolved to, use the result of this method instead of the section/proxy itself.

m_resolve(path: str, cls: Type[MSectionBound] = None) → MSectionBound

Resolves the given path or dotted quantity name using this section as context and returns the sub_section or value.

m_get_annotations(key: Union[str, type], default=None, as_list: bool = False)

Convenience method to get annotations

Parameters
  • key – Either the optional annotation name or an annotation class. In the first case the annotation is returned, regardless of its type. In the second case, all names and list for names are iterated and all annotations of the given class are returned.

  • default – The default, if no annotation is found. None is the default default.

  • as_list – Returns a list, no matter how many annoations have been found.

m_validate() → Tuple[List[str], List[str]]

Evaluates all constraints and shapes of this section and returns a list of errors.

m_copy(deep=False, parent=None) → MSectionBound
m_all_validate()

Evaluates all constraints in the whole section hierarchy, incl. this section.

m_warning(*args, **kwargs)
get(key)
values()
m_xpath(expression: str)

Provides an interface to jmespath search functionality.

Parameters

expression – A string compatible with the jmespath specs representing the search. See https://jmespath.org/ for complete description.


metainfo_section.m_xpath(‘code_name’) metainfo_section.m_xpath(‘systems[-1].system_type’) metainfo_section.m_xpath(‘sccs[0].system.atom_labels’) metainfo_section.m_xpath(‘systems[?system_type == molecule].atom_labels’) metainfo_section.m_xpath(‘sccs[?energy_total < 1.0E-23].system’)

class nomad.metainfo.MetainfoError

Metainfo related errors.

__init__()

Initialize self. See help(type(self)) for accurate signature.

class nomad.metainfo.DeriveError

An error occurred while computing a derived value.

__init__()

Initialize self. See help(type(self)) for accurate signature.

class nomad.metainfo.MetainfoReferenceError

An error indicating that a reference could not be resolved.

__init__()

Initialize self. See help(type(self)) for accurate signature.

References and metainfo URLs

When in Python memory, quantity values that reference other sections simply contain a Python reference to the respective section instance. However, upon serializing/storing metainfo data, these references have to be represented differently.

Currently this metainfo implementation only supports references within a single section hierarchy (e.g. the same JSON file). References are stored as paths from the root section, over sub-sections, to the references section. Each path segment is the name of the sub-section or an index in a repeatable sub-section: /system/0/symmetry.

References are automatically serialized by MSection.m_to_dict(). When de-serializing data with MSection.m_from_dict() these references are not resolved right away, because the references section might not yet be available. Instead references are stored as MProxy instances. These objects are automatically replaced by the referenced object when a respective quantity is accessed.

class nomad.metainfo.MProxy(m_proxy_value: Union[str, int, dict], m_proxy_section: Optional[nomad.metainfo.metainfo.MSection] = None, m_proxy_quantity: Optional[nomad.metainfo.metainfo.Quantity] = None)

A placeholder object that acts as reference to a value that is not yet resolved.

url

The reference represented as an URL string.

__init__(m_proxy_value: Union[str, int, dict], m_proxy_section: Optional[nomad.metainfo.metainfo.MSection] = None, m_proxy_quantity: Optional[nomad.metainfo.metainfo.Quantity] = None)

Initialize self. See help(type(self)) for accurate signature.

Resources

class nomad.metainfo.MResource(logger=None)

Represents a collection of related metainfo data, i.e. a set of MSection instances.

__init__(logger=None)

Initialize self. See help(type(self)) for accurate signature.

A more complex example

# Copyright 2018 Markus Scheidgen
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an"AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

''' An example metainfo package. '''

import numpy as np
from datetime import datetime

from nomad.units import ureg
from nomad.metainfo import (
    MSection, MCategory, Section, Quantity, Package, SubSection, MEnum,
    Datetime, constraint)

m_package = Package(links=['https://nomad-lab.eu/prod/rae/docs/metainfo.html'])


class SystemHash(MCategory):
    ''' All quantities that contribute to what makes a system unique. '''


class Parsing(MSection):
    ''' All data that describes the NOMAD parsing of this run.

    Quantities can also be documented like this:

    Args:
        parser_name: 'Name of the used parser'
        parser_version: 'Version of the used parser'
    '''

    parser_name = Quantity(type=str)
    parser_version = Quantity(type=str)
    nomad_version = Quantity(type=str, default='latest')
    warnings = Quantity(type=str, shape=['0..*'])
    parse_time = Quantity(type=Datetime)


class System(MSection):
    ''' All data that describes a simulated system. '''

    n_atoms = Quantity(
        type=int, derived=lambda system: len(system.atom_labels),
        description='Number of atoms in the simulated system.')

    atom_labels = Quantity(
        type=str, shape=['n_atoms'], categories=[SystemHash],
        description='The atoms in the simulated systems.')

    atom_positions = Quantity(
        type=np.dtype('f'), shape=['n_atoms', 3], unit=ureg.m, categories=[SystemHash],
        description='The atom positions in the simulated system.')

    lattice_vectors = Quantity(
        type=np.dtype('f'), shape=[3, 3], unit=ureg.m, categories=[SystemHash],
        description='The lattice vectors of the simulated unit cell.')

    unit_cell = Quantity(synonym_for='lattice_vectors')

    periodic_dimensions = Quantity(
        type=bool, shape=[3], default=[False, False, False], categories=[SystemHash],
        description='A vector of booleans indicating in which dimensions the unit cell is repeated.')

    system_type = Quantity(type=str)


class SCC(MSection):

    energy_total = Quantity(type=float, default=0.0, unit=ureg.J)
    energy_total_0 = Quantity(type=np.dtype(np.float32), default=0.0, unit=ureg.J)
    an_int = Quantity(type=np.dtype(np.int32))

    system = Quantity(type=System, description='The system that this calculation is based on.')


class Run(MSection):
    ''' All data that belongs to a single code run. '''

    code_name = Quantity(type=str, description='The name of the code that was run.')
    code_version = Quantity(type=str, description='The version of the code that was run.')

    parsing = SubSection(sub_section=Parsing)
    systems = SubSection(sub_section=System, repeats=True)
    sccs = SubSection(sub_section=SCC, repeats=True)

    @constraint
    def one_scc_per_system(self):
        assert self.m_sub_section_count(Run.systems) == self.m_sub_section_count(Run.sccs),\
            'Numbers of system does not match numbers of calculations.'


class VaspRun(Run):
    ''' All VASP specific quantities for section Run. '''
    m_def = Section(extends_base_section=True)

    x_vasp_raw_format = Quantity(
        type=MEnum(['xml', 'outcar']),
        description='The file format of the parsed VASP mainfile.')


if __name__ == '__main__':
    # Demonstration of how to reflect on the definitions

    # All definitions are metainfo data themselves, and they can be accessed like any other
    # metainfo data. E.g. all section definitions are sections themselves.

    # To get quantities of a given section
    print(Run.m_def.m_get_sub_sections(Section.quantities))

    # Or all Sections in the package
    print(m_package.m_get_sub_sections(Package.section_definitions))

    # There are also some definition specific helper methods.
    # For example to get all attributes (Quantities and possible sub-sections) of a section.
    print(Run.m_def.all_properties)

    # Demonstration on how to use the definitions, e.g. to create a run with system:
    run = Run()
    run.code_name = 'VASP'
    run.code_version = '1.0.0'

    parsing = run.m_create(Parsing)
    parsing.parse_time = datetime.now()

    run.m_as(VaspRun).x_vasp_raw_format = 'outcar'
    # The same as
    run.x_vasp_raw_format = 'outcar'  # type: ignore

    system = run.m_create(System)
    system.atom_labels = ['H', 'H', 'O']

    calc = run.m_create(SCC)
    calc.energy_total = 1.23e-10
    calc.system = system

    # Or to read data from existing metainfo data:
    print(system.atom_labels)
    print(system.n_atoms)

    # To validate dimensions and custom constraints
    print('errors: %s' % run.m_all_validate())

    # To serialize the data:
    serializable = run.m_to_dict()
    # or
    print(run.m_to_json(indent=2))

    # To deserialize data
    run = Run.m_from_dict(serializable)
    print(run.sccs[0].system)

    # print(m_package.m_to_json(indent=2))  # type: ignore, pylint: disable=undefined-variable

Accessing the Metainfo

Above you learned what the metainfo is and how to create metainfo definitions and work with metainfo data in Python. But how do you get access to the existing metainfo definitions within NOMAD? We call the complete set of all metainfo definitions the NOMAD Metainfo.

This NOMAD Metainfo comprises definitions from various packages defined by all the parsers and converters (and respective code outputs and formats) that NOMAD supports. In addition there are common packages that contain definitions that might be relevant to different kinds of archive data.

Python

In the NOMAD source-code all metainfo definitions are materialized as Python source files that contain the definitions in the format described above. If you have installed the NOMAD Python package (see Install the NOMAD client library), you can simply import the respective Python modules:

from nomad.datamodel.metainfo.public import m_package
print(m_package.m_to_json(indent=2))

from nomad.datamodel.metainfo.public import section_run
my_run = section_run()

Many more examples about how to read the NOMAD Metainfo programmatically can be found here.

API

In addition, a JSON version of the NOMAD Metainfo is available through our API via the metainfo endpoint. You can get one giant JSON with all definitions, or you can access the metainfo for specific packages, e.g. the VASP metainfo. The returned JSON will also contain all packages that the requested package depends on.

Legacy metainfo version

There are no metainfo files anymore. The old *.nomadmetainfo.json files are no longer maintained, as the Python definitions in each parser/converter implementation are now the normative artifact for the NOMAD Metainfo.

To get the NOMAD Metainfo in the format of the old NOMAD CoE project, you can use the metainfo/legacy endpoint; e.g. the VASP legacy metainfo.