Data schema (Metainfo)

Introduction

The NOMAD Metainfo stores descriptive and structured information about materials-science data contained in the NOMAD Archive. The Metainfo can be understood as the schema of the Archive. The NOMAD Archive data is structured to be independent of the electronic-structure theory code or molecular-simulation, (or beyond). The NOMAD Metainfo can be browsed as part of the NOMAD Repository and Archive web application.

Typically (meta-)data definitions are generated only for a predesigned and specific scientific field, application or code. In contrast, the NOMAD Metainfo considers all pertinent information in the input and output files of the supported electronic-structure theory, quantum chemistry, and molecular-dynamics (force-field) codes. This ensures a complete coverage of all material and molecule properties, even though some properties might not be as important as others, or are missing in some input/output files of electronic-structure programs.

_images/metainfo_example.png

NOMAD Metainfo is kept independent of the actual storage format and is not bound to any specific storage method. In our practical implementation, we use a binary form of JSON, called msgpack on our servers and provide Archive data as JSON via our API. For NOMAD end-users the internal storage format is of little relevance, because the archive data is solely served by NOMAD’s API.

The NOMAD Metainfo started within the NOMAD Laboratory. It was discussed at the CECAM workshop Towards a Common Format for Computational Materials Science Data and is open to external contributions and extensions. More information can be found in Towards a Common Format for Computational Materials Science Data (Psi-K 2016 Highlight).

Metainfo Python Interface

The NOMAD meta-info allows to define schemas for physics data independent of the used storage format. It allows to define physics quantities with types, complex shapes (vetors, matrices, etc.), units, links, and descriptions. It allows to organize large amounts of these quantities in containment hierarchies of extendable sections, references between sections, and additional quantity categories.

NOMAD uses the meta-info to define all archive data, repository meta-data, (and encyclopedia data). The meta-info provides a convenient Python interface to create, manipulate, and access data. We also use it to map data to various storage formats, including JSON, (HDF5), mongodb, and elastic search.

Starting example

from nomad.metainfo import MSection, Quantity, SubSection, Units

class System(MSection):
    '''
    A system section includes all quantities that describe a single a simulated
    system (a.k.a. geometry).
    '''

    n_atoms = Quantity(
        type=int, description='''
        A Defines the number of atoms in the system.
        ''')

    atom_labels = Quantity(type=MEnum(ase.data.chemical_symbols), shape['n_atoms'])
    atom_positions = Quantity(type=float, shape=['n_atoms', 3], unit=Units.m)
    simulation_cell = Quantity(type=float, shape=[3, 3], unit=Units.m)
    pbc = Quantity(type=bool, shape=[3])

class Run(MSection):
    section_system = SubSection(sub_section=System, repeats=True)

We define simple metainfo schema with two sections called System and Run. Sections allow to organize related data into, well, sections. Each section can have two types of properties: quantities and sub-sections. Sections and their properties are defined with Python classes and their attributes.

Each quantity defines a piece of data. Basic quantity attributes are its type, shape, unit, and description.

Sub-sections allow to place section into each other and therefore allow to form containment hierarchies or sections and the respective data in them. Basic sub-section attributes are sub_section`(i.e. a reference to the section definition of the sub-section) and `repeats (determines if a sub-section can be contained once or multiple times).

The above simply defines a schema, to use the schema and create actual data, we have to instantiate the above classes:

run = Run()
system = run.m_create(System)
system.n_atoms = 3
system.atom_labels = ['H', 'H', 'O']

print(system.atom_labels)
print(n_atoms = 3)

Section instances can be used like regular Python objects: quantities and sub-sections can be set and access like any other Python attribute. Special meta-info methods, starting with m_ allow us to realize more complex semantics. For example m_create will instantiate a sub-section and add it to the parent section in one step.

Another example for an m_-method is:

run.m_to_json(indent=2)

This will serialize the data into JSON:

{
    "m_def" = "Run",
    "systems": [
        {
            "n_atoms" = 3,
            "atom_labels" = [
                "H",
                "H",
                "O"
            ]
        }
    ]
}

Definitions and Instances

As you already saw in the example, we first need to define what data can look like (schema), before we can actually program with data. Because schema and data are often discussed in the same context, it is paramount to clearly distingish between both. For example, if we just say “system”, it is unclear what we refer to. We could mean the idea of a system, i.e. all possible systems, a data structure that comprises a lattice, atoms with their elements and positions in the lattice. Or we mean a specific system of a specific calculation, with a concrete set of atoms, real numbers for lattice vectors and atoms positions as concrete data.

The NOMAD Metainfo is just a collection of definition that describe what materials science data could be (a schema). The NOMAD Archive is all the data that we extract from all data provided to NOMAD. The data in the NOMAD Archive follows the definitions of the NOMAD metainfo.

Similarely, we need to distingish between the NOMAD Metainfo as a collection of definitions and the Metainfo system that defines how to define a section or a quantity. In this sense, we have a three layout model, were the Archive (data) is an instance of the Metainfo (schema) and the Metainfo is an instance of the Metainfo system (schema of the schema).

This documentation describes the Metainfo by explaining the means of how to write down definitions in Python. Conceptually we map the Metainfo system to Python language constructs, e.g. a section definition is a Python class, a quantity a Python property, etc. If you are familiar with databases, this is similar to what an object relational mapping (ORM) would do.

Common attributes of Metainfo Definitions

In the example, you already saw the basic Python interface to the Metainfo. Sections are represented in Python as objects. To define a section, you write a Python classes that inherits from MSection. To define sub-sections and quantities you use Python properties. The definitions themselves are also objects derived from classes. For sub-sections and quantities, you directly instantiate SubSection and :class`Quantity`. For sections there is a generated object derived from Section that is available via m_def from each section class and section instance.

These Python classes that are used to represent metainfo definitions form an inheritance hierarchy to share common properties

class nomad.metainfo.Definition(m_def: Optional[nomad.metainfo.metainfo.Section] = None, m_resource: nomad.metainfo.metainfo.MResource = None, **kwargs)

Definition is the common base class for all metainfo definitions.

All metainfo definitions (sections, quantities, sub-sections, packages, …) share some common properties.

name

Each definition has a name. Names have to be valid Python identifier. They can contain letters, numbers and _, but must not start with a number. This also qualifies them as identifier in most storage formats, databases, makes them URL safe, etc.

Names must be unique within the Package or Section that this definition is part of.

By convention, we use capitalized CamelCase identifier to refer to sections definitions (i.e. section definitions are represented by Python classes), lower case snake_case identifier for variables that hold sections, and for properties (i.e. fields in a Python class) we typically use lower case snake_case identifier. Sub-sections are often prefixed with section_ to clearly separate sub-sections from quantities.

Generally, you do not have to set this attribute manually, it will be derived from Python identifiers automatically.

description

The description can be an arbitrary human readable text that explains what a definition is about. For section definitions you do not have to set this manually as it will be derived from the classes doc string. Quantity and sub-section descriptions can also be taken from the containing section class’ doc-string Attributes: section.

Each definition can be accompanied by a list of URLs. These should point to resources that further explain the definition.

aliases

A list of alternative names. For quantities and sub-sections these can be used to access the respective property with a different name from its containing section.

deprecated

If set this definition is marked deprecated. The value should be a string that describes how to replace the deprecated definition.

categories

All metainfo definitions can be put into one or more categories. Categories allow to organize the definitions themselves. It is different from sections, which organize the data (e.g. quantity values) and not the definitions of data (e.g. quantities definitions). See References and Proxies for more details.

__init__(m_def: Optional[nomad.metainfo.metainfo.Section] = None, m_resource: nomad.metainfo.metainfo.MResource = None, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

Quantities

Quantity definitions are the main building block of meta-info schemas. Each quantity represents a single piece of data.

class nomad.metainfo.Quantity(*args, **kwargs)

To define quantities, instantiate Quantity as a classattribute values in a section classes. The name of a quantity is automatically taken from its section class attribute. You can provide all other attributes to the constructor with keyword arguments

See metainfo-sections to learn about section classes. In Python terms, Quantity is a descriptor. Descriptors define how to get and set attributes in a Python object. This allows us to use sections like regular Python objects and quantity like regular Python attributes.

Each quantity must define a basic data type and a shape. The values of a quantity must fulfil the given type. The default shape is a single value. Quantities can also have physical units. Units are applied to all values.

type

Defines the datatype of quantity values. This is the type of individual elements in a potentially complex shape. If you define a list of integers for example, the shape would be list and the type integer: Quantity(type=int, shape=['0..*']).

The type can be one of:

  • a build-in primitive Python type: int, str, bool, float

  • an instance of MEnum, e.g. MEnum('one', 'two', 'three')

  • a section to define references to other sections as quantity values

  • a custom meta-info DataType, see Environments

  • a numpy dtype, e.g. np.dtype('float32')

  • typing.Any to support any value

If set to dtype, this quantity will use a numpy array or scalar to store values internally. If a regular (nested) Python list or Python scalar is given, it will be automatically converted. The given dtype will be used in the numpy value.

To define a reference, either a section class or instance of Section can be given. See metainfo-sections for details. Instances of the given section constitute valid values for this type. Upon serialization, references section instance will represented with metainfo URLs. See Resources.

For quantities with more than one dimension, only numpy arrays and dtypes are allowed.

shape

The shape of the quantity. It defines its dimensionality.

A shape is a list, where each item defines one dimension. Each dimension can be:

  • an integer that defines the exact size of the dimension, e.g. [3] is the shape of a 3D spacial vector

  • a string that specifies a possible range, e.g. 0..*, 1..*, 3..6

  • the name of an int typed and shapeless quantity in the same section which values define the length of this dimension, e.g. number_of_atoms defines the length of atom_positions

Range specifications define lower and upper bounds for the possible dimension length. The * can be used to denote an arbitrarily high upper bound.

Quantities with dimensionality (length of the shape) higher than 1, must be numpy arrays. Theire type must be a dtype.

is_scalar

Derived quantity that is True, iff this quantity has shape of length 0

unit

The physics unit for this quantity. It is optional.

Units are represented with the Pint Python package. Pint defines units and their algebra. You can either use pint units directly, e.g. units.m / units.s. The metainfo provides a preconfigured pint unit registry ureg. You can also provide the unit as pint parsable string, e.g. 'meter / seconds' or 'm/s'.

default

The default value for this quantity. The value must match type and shape.

Be careful with a default value like [] as it will be the default value for all occurrences of this quantity.

Quantities are mapped to Python properties on all section objects that instantiate the Python class/section definition that has this quantity. This means quantity values can be read and set like normal Python attributes.

In some cases it might be desirable to have virtual and read only quantities that are not real quantities used for storing values, but rather define an interface to other quantities. Examples for this are synonyms and derived quantities.

derived

A Python callable that takes the containing section as input and outputs the value for this quantity. This quantity cannot be set directly, its value is only derived by the given callable. The callable is executed when this quantity is get. Derived quantities are always virtual.

cached

A bool indicating that derived values should be cached unless the underlying section has changed.

virtual

A boolean that determines if this quantity is virtual. Virtual quantities can be get/set like regular quantities, but their values are not (de-)serialized, hence never permanently stored.

__init__(*args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

Sections and Sub-Sections

The NOMAD Metainfo allows to create hierarchical (meta-)data structures. A hierarchy is basically a tree, where sections make up the root and inner nodes of the tree, and quantities are the leaves of the tree. In this sense a section can hold further sub-sections (more branches of the tree) and quantities (leaves). We say a section can have two types of properties: sub-sections and quantities.

There is a clear distinction between section and sub-section. The term section refers to an object that holds data (properties) and the term sub-section refers to a relation between sections, i.e. between the containing section and the contained sections of the same type. Furthermore, we have to distinguish between sections and section definitions, as well as sub-sections and sub-section definitions. A section definition defines the possible properties of its instances, and an instantiating section can store quantities and sub-sections according to its definition. A sub-section definition defines a principle containment relationship between two section definitions and a sub-section is a set of sections (contents) contained in another section (container). Sub-section definitions are parts (properties) of the definition of containing sections.

class nomad.metainfo.Section(*args, validate: bool = True, **kwargs)

Instances of the class Section are created by writing Python classes that extend MSection like this:

class SectionName(BaseSection):
    ''' Section description '''
    m_def = Section(**section_attributes)

    quantity_name = Quantity(**quantity_attributes)
    sub_section_name = SubSection(**sub_section_attributes)

We call such classes section classes. They are not the section definition, but just representation of it in Python syntax. The section definition (in instance of Section) will be created for each of these classes and stored in the m_def property. See Reflection and custom data storage for more details.

Most of the attributes for a Section instance will be set automatically from the section class:

quantities

The quantities definitions of this section definition as list of Quantity. Will be automatically set from the section class.

sub_sections

The sub-section definitions of this section definition as list of SubSection. Will be automatically set from the section class.

base_sections

A list of section definitions (Section). By default this definition will inherit all quantity and sub section definitions from the given section definitions. This behavior might be altered with extends_base_section.

If there are no base sections to define, you have to use MSection.

The Metainfo supports two inheritance mechanism. By default it behaves like regular Python inheritance and the class inherits all its base classes’ properties. The other mode (enabled via extends_base_section=True), will add all sub-class properties to the base-class. This is used throughout the NOMAD metainfo to add code-specific metadata to common section definitions. Here is an example:

class Method(MSection):
    code_name = Quantity(str)

class VASPMethod(Method):
    m_def = Section(extends_base_section=True)
    x_vasp_some_incar_parameter = Quantity(str)

method = Method()
methid.x_vasp_same_incar_parameter = 'value'

In this example, the section class VASPMethod defines a section definition that inherits from section definition Method. The quantity x_vasp_some_incar_parameter will be added to Method and can be used in regular Method instances.

The following Section attributes maniputlate the inheritance semantics:

extends_base_section

If True, this definition must have exactly one base_sections. Instead of inheriting properties, the quantity and sub-section definitions of this section will be added to the base section.

This allows to add further properties to an existing section definition. To use such extension on section instances in a type-safe manner MSection.m_as() can be used to cast the base section to the extending section.

extending_sections

A list of section definitions (Section). These are those sections that add their properties to this section via extends_base_section. This quantity will be set automatically.

Besides defining quantities and sub-sections, a section definition can also provide constraints that are used to validate a section and its quantities and sub-sections. Constraints allow to define more specific data structures beyond types and shapes. But constraints are not enforced automatically, sections have to be explicitly validated in order to evaluate constraints.

Constrains can be defined via methods with the constraint decorator:

class System(MSection):
    lattice = Quantity(float, shape=[3, 3], unit='meter')

    @constraint
    def non_empty_lattice(self):
        assert np.abs(np.linalg.det(self.lattice.magnitude)) > 0

system = System()
system.m_validate()
constraints

Constraints are rules that a section must fulfil to be valid. This allows to implement semantic checks that go behind mere type or shape checks. This quantity takes the names of constraints as string. Constraints have to be implemented as methods with the constraint() decorator. They can raise ConstraintVialated or an AssertionError to indicate that the constraint is not fulfilled for the self section. This quantity will be set automatically from all constraint methods in the respective section class. To run validation of a section use MSection.m_validate().

Other attributes and helper properties:

section_cls

A helper attribute that gives the section class as a Python class object.

inherited_sections

A helper attribute that gives direct and indirect base sections and extending sections including this section. These are all sections that this sections gets its properties from.

all_base_sections

A helper attribute that gives direct and indirect base sections.

all_properties

A helper attribute that gives all properties (sub section and quantity) definitions including inherited properties and properties from extending sections as a dictionary with names and definitions.

all_quantities

A helper attribute that gives all quantity definition including inherited ones and ones from extending sections as a dictionary that maps names (strings) to Quantity.

all_sub_sections

A helper attribute that gives all sub-section definition including inherited ones and ones from extending sections as a dictionary that maps names (strings) to SubSection.

all_sub_sections_by_section

A helper attribute that gives all sub-section definition including inherited ones and ones from extending sections as a dictionary that maps section classes (i.e. Python class objects) to lists of SubSection.

all_aliases

A helper attribute that gives all aliases for all properties including inherited properties and properties form extending sections as a dictionary with aliases and the definitions.

event_handlers

Event handler are functions that get called when the section data is changed. There are two types of events: set and add_sub_section. The handler type is determined by the handler (i.e. function) name: on_set and on_add_sub_section. The handler arguments correspond to MSection.m_set() (section, quantity_def, value) and MSection.m_add_sub_section() (section, sub_section_def, sub_section). Handler are called after the respective action was performed. This quantity is automatically populated with handler from the section classes methods. If there is a method on_set or on_add_sub_section, it will be added as handler.

errors

A list of errors. These issues prevent the section definition from being usable.

warnings

A list of warnings. These still allow to use the section definition.

__init__(*args, validate: bool = True, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

class nomad.metainfo.SubSection(m_def: Optional[nomad.metainfo.metainfo.Section] = None, m_resource: nomad.metainfo.metainfo.MResource = None, **kwargs)

Like quantities, sub-sections are defined in a section class as attributes of this class. An like quantities, each sub-section definition becomes a property of the corresponding section definition (parent). A sub-section definition references another section definition as the sub-section (child). As a consequence, parent section instances can contain child section instances as sub-sections.

Contrary to the old NOMAD metainfo, we distinguish between sub-section the section and sub-section the property. This allows to use on child section definition as sub-section of many different parent section definitions.

sub_section

A Section or Python class object for a section class. This will be the child section definition. The defining section the child section definition.

repeats

A boolean that determines wether this sub-section can appear multiple times in the parent section.

__init__(m_def: Optional[nomad.metainfo.metainfo.Section] = None, m_resource: nomad.metainfo.metainfo.MResource = None, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

References and Proxies

Beside creating hierarchies (e.g. tree structures) with SubSection, the metainfo also allows to create cross references between sections and other sections or quantity values:

class Calculation(MSection):
    system = Quantity(type=System.m_def)
    atom_labels = Quantity(type=System.atom_labels)

calc = Calculation()
calc.system = run.systems[-1]
calc.atom_labels = run.systems[-1]

To define a reference, you define a normal quantity and simply use the section or quantity that you want to reference as type. Then you can assign respective section instances as values.

When in Python memory, quantity values that reference other sections simply contain a Python reference to the respective section instance. However, upon serializing/storing metainfo data, these references have to be represented differently.

Value references are a little different. When you read a value references it behaves like the references value. Internally, we do not store the values, but a reference to the section that holds the referenced quantity. Therefore, when you want to assign a value reference, you use the section with the quantity and not the value itself.

Currently this metainfo implementation only supports references within a single section hierarchy (e.g. the same JSON file). References are stored as paths from the root section, over sub-sections, to the referenced section or quantity value. Each path segment is the name of the sub-section or an index in a repeatable sub-section: /system/0 or /system/0/atom_labels.

References are automatically serialized by MSection.m_to_dict(). When de-serializing data with MSection.m_from_dict() these references are not resolved right away, because the references section might not yet be available. Instead references are stored as MProxy instances. These objects are automatically replaced by the referenced object when a respective quantity is accessed.

class nomad.metainfo.MProxy(m_proxy_value: Union[str, int, dict], m_proxy_section: Optional[nomad.metainfo.metainfo.MSection] = None, m_proxy_quantity: Optional[nomad.metainfo.metainfo.Quantity] = None)

A placeholder object that acts as reference to a value that is not yet resolved.

url

The reference represented as an URL string.

__init__(m_proxy_value: Union[str, int, dict], m_proxy_section: Optional[nomad.metainfo.metainfo.MSection] = None, m_proxy_quantity: Optional[nomad.metainfo.metainfo.Quantity] = None)

Initialize self. See help(type(self)) for accurate signature.

If you want to defined references, it might not be possible to define the referenced section or quantity before hand, due to how Python definitions and imports work. In these cases, you can use a proxy to reference the reference type:

class Calculation(MSection):
    system = Quantity(type=MProxy('System')
    atom_labels = Quantity(type=MProxy('System/atom_labels')

The strings given to MProxy are paths within the available definitions. The above example works, if System and System/atom_labels are eventually defined in the same package.

Categories

In the old meta-info this was known as abstract types.

Categories are defined with Python classes that have MCategory as base class. Their name and description is taken from the class’s name and docstring. An example category looks like this:

class CategoryName(MCategory):
    ''' Category description '''
    m_def = Category(links=['http://further.explanation.eu'], categories=[ParentCategory])

Packages

class nomad.metainfo.Package(*args, **kwargs)

Packages organize metainfo defintions alongside Python modules

Each Python module with metainfo Definition (explicitely or implicitely) has a member m_package with an instance of this class. Definitions (categories, sections) in Python modules are automatically added to the module’s Package. Packages are not nested and rather have the fully qualitied Python module name as name.

This allows to inspect all definitions in a Python module and automatically puts module name and docstring as Package name and description.

Besides the regular Defintion attributes, packages can have the following attributes:

section_definitions

All section definitions in this package as Section objects.

category_definitions

All category definitions in this package as Category objects.

all_definitions

A helper attribute that provides all section definitions by name.

__init__(*args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

Environments

class nomad.metainfo.Environment(*args, **kwargs)

Environments allow to manage many metainfo packages and quickly access all definitions.

Environments provide a name-table for large-sets of metainfo definitions that span multiple packages. It provides various functions to resolve metainfo definitions by their names, legacy names, and qualified names.

Parameters

packages – Packages in this environment.

__init__(*args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

Custom data types

class nomad.metainfo.DataType

Allows to define custom data types that can be used in the meta-info.

The metainfo supports the most types out of the box. These includes the python build-in primitive types (int, bool, str, float, …), references to sections, and enums. However, in some occasions you need to add custom data types.

This base class lets you customize various aspects of value treatment. This includes type checks and various value transformations. This allows to store values in the section differently from how the usermight set/get them, and it allows to have non serializeable values that are transformed on de-/serialization.

set_normalize(section: nomad.metainfo.metainfo.MSection, quantity_def: nomad.metainfo.metainfo.Quantity, value: Any) → Any

Transforms the given value before it is set and checks its type.

get_normalize(section: nomad.metainfo.metainfo.MSection, quantity_def: nomad.metainfo.metainfo.Quantity, value: Any) → Any

Transforms the given value when it is get.

serialize(section: nomad.metainfo.metainfo.MSection, quantity_def: nomad.metainfo.metainfo.Quantity, value: Any) → Any

Transforms the given value when making the section serializeable.

deserialize(section: nomad.metainfo.metainfo.MSection, quantity_def: nomad.metainfo.metainfo.Quantity, value: Any) → Any

Transforms the given value from its serializeable form.

class nomad.metainfo.MEnum(*args, **kwargs)

Allows to define str types with values limited to a pre-set list of possible values.

__init__(*args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

Reflection and custom data storage

When manipulating metainfo data in Python, all data is represented as Python objects, where objects correspond to section instance and their attributes to quantity values or section instances of sub-sections. By defining sections with section classes each of these Python objects already has an interface that allows to get/set quantities and sub-sections. But often this interface is too limited, or the specific section and quantity definitions are unknown when writing code.

class nomad.metainfo.MSection(m_def: Optional[nomad.metainfo.metainfo.Section] = None, m_resource: nomad.metainfo.metainfo.MResource = None, **kwargs)

The base-class for all section defining classes and respectively the base-class for all section objects.

While we use section classes to define sections, it is important to note that the section class is a different Python object than the actual section definition. For each section class (a Python class), we automatically generate a section definition Python object that instantiates Section. MSection and Section are completely different classes. MSection is used as a base-class for all section defining classes and Section is a section class that defines the section Section.

m_def

Each section class (and also section instance) has a build-in property m_def that refers to the actual section definition. While this defined automatically, you can do it manually to provide additional characteristics that cannot be covered in a Python class definition.

All section instances indirectly instantiate the MSection and therefore all members of MSection are available on all section instances. MSection provides many special attributes and functions (they all start with m_) that allow to reflect on a section’s definition and allow to manipulate the section instance without a priori knowledge of the section defintion.

m_set(quantity_def: nomad.metainfo.metainfo.Quantity, value: Any) → None

Set the given value for the given quantity.

m_get(quantity_def: nomad.metainfo.metainfo.Quantity) → Any

Retrieve the given value for the given quantity.

m_add_values(quantity_def: nomad.metainfo.metainfo.Quantity, values: Any, offset: int) → None

Add (partial) values for the given quantity of higher dimensionality.

m_get_sub_section(sub_section_def: nomad.metainfo.metainfo.SubSection, index: int) → nomad.metainfo.metainfo.MSection

Retrieves a single sub section of the given sub section definition.

m_get_sub_sections(sub_section_def: nomad.metainfo.metainfo.SubSection) → List[nomad.metainfo.metainfo.MSection]

Retrieves all sub sections of the given sub section definition.

m_create(section_cls: Type[MSectionBound], sub_section_def: Optional[nomad.metainfo.metainfo.SubSection] = None, **kwargs) → MSectionBound

Creates a section instance and adds it to this section provided there is a corresponding sub section.

Parameters
  • section_cls – The section class for the sub-secton to create

  • sub_section_def – If there are multiple sub-sections for the given class, this must be used to explicitely state the sub-section definition.

m_add_sub_section(sub_section_def: nomad.metainfo.metainfo.SubSection, sub_section: nomad.metainfo.metainfo.MSection) → None

Adds the given section instance as a sub section of the given sub section definition.

m_remove_sub_section(sub_section_def: nomad.metainfo.metainfo.SubSection, index: int) → None

Removes the exiting section for a non repeatable sub section

There are some specific attributes for section instances that are sub-sections of another section. While sub-sections are directly accessible from the containing section by using the Python property that represents the sub-section (e.g. run.section_system), there is also a way to navigate from the sub-section to the containing section (parent section) using these Python properties:

m_parent

If this section is a sub-section, this references the parent section instance.

m_parent_sub_section

If this section is a sub-section, this is the SubSection that defines this relationship.

m_parent_index

For repeatable sections, parent keep a list of sub-sections. This is the index of this section in the respective parent sub-section list.

m_resource

The MResource that contains and manages this section.

Often some general tasks have to be performed on a whole tree of sections without knowing about the definitions in advance. The following methods allow to access sub-sections reflectively.

m_traverse()

Performs a depth-first traversal and yield tuples of section, property def, parent index for all set properties.

m_all_contents(depth_first: bool = False, include_self: bool = False, stop: Callable[[MSection], bool] = None) → Iterable[nomad.metainfo.metainfo.MSection]

Returns an iterable over all sub and sub subs sections.

Parameters
  • depth_first – A boolean indicating that children should be returned before parents.

  • include_self – A boolean indicating that the results should contain this section.

  • stop – A predicate that determines if the traversal should be stopped or if children should be returned. The sections for which this returns True are included in the results.

m_contents() → Iterable[nomad.metainfo.metainfo.MSection]

Returns an iterable over all direct subs sections.

m_xpath(expression: str)

Provides an interface to jmespath search functionality.

Parameters

expression – A string compatible with the jmespath specs representing the search. See https://jmespath.org/ for complete description.

metainfo_section.m_xpath('code_name')
metainfo_section.m_xpath('systems[-1].system_type')
metainfo_section.m_xpath('sccs[0].system.atom_labels')
metainfo_section.m_xpath('systems[?system_type == `molecule`].atom_labels')
metainfo_section.m_xpath('sccs[?energy_total < `1.0E-23`].system')

Each section and all its quantities and contents can be transformed into a general JSON-serializable Python dictionary. Similarely, a section can be instantiated from such a Python dictionary. This allows to save and load sections to JSON-files or by other compatible means (e.g. document databases, binary JSON flavours).

m_to_dict(with_meta: bool = False, include_defaults: bool = False, include_derived: bool = False, categories: List[Union[Category, Type[MCategory]]] = None, partial: Callable[[Definition, MSection], bool] = None) → Dict[str, Any]

Returns the data of this section as a json serializeable dictionary.

Parameters
  • with_meta – Include information about the section definition and the sections position in its parent.

  • include_defaults – Include default values of unset quantities.

  • include_derived – Include values of derived quantities.

  • categories – A list of category classes or category definitions that is used to filter the included quantities and sub sections. Only applied to properties of this section, not on sub-sections. Is overwritten by partial.

  • partial

    A function that determines if a definition should be included in the output dictionary. Takes a definition and the containing section as arguments. Two default functions can be used by providing a string instead:

    • ’mongo’: Only include quantities that have an a_mongo annotation.

    • ’es’: Only include quantities that have an a_elastic or an an a_search annotation.

    Partial is applied recursively on sub-sections. Overrides categories.

classmethod m_from_dict(dct: Dict[str, Any]) → MSectionBound

Creates a section from the given serializable data dictionary.

This is the ‘opposite’ of m_to_dict(). It takes a deserialized dict, e.g loaded from JSON, and turns it into a proper section, i.e. instance of the given section class.

m_update_from_dict(dct: Dict[str, Any]) → None

Updates this section with the serialized data from the given dict, e.g. data produced by m_to_dict().

m_to_json(**kwargs)

Returns the data of this section as a json string.

__init__(m_def: Optional[nomad.metainfo.metainfo.Section] = None, m_resource: nomad.metainfo.metainfo.MResource = None, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

m_def: Section = None
m_set(quantity_def: nomad.metainfo.metainfo.Quantity, value: Any) → None

Set the given value for the given quantity.

m_get(quantity_def: nomad.metainfo.metainfo.Quantity) → Any

Retrieve the given value for the given quantity.

m_is_set(quantity_def: nomad.metainfo.metainfo.Quantity) → bool

True if the given quantity is set.

m_add_values(quantity_def: nomad.metainfo.metainfo.Quantity, values: Any, offset: int) → None

Add (partial) values for the given quantity of higher dimensionality.

m_add_sub_section(sub_section_def: nomad.metainfo.metainfo.SubSection, sub_section: nomad.metainfo.metainfo.MSection) → None

Adds the given section instance as a sub section of the given sub section definition.

m_remove_sub_section(sub_section_def: nomad.metainfo.metainfo.SubSection, index: int) → None

Removes the exiting section for a non repeatable sub section

m_get_sub_section(sub_section_def: nomad.metainfo.metainfo.SubSection, index: int) → nomad.metainfo.metainfo.MSection

Retrieves a single sub section of the given sub section definition.

m_get_sub_sections(sub_section_def: nomad.metainfo.metainfo.SubSection) → List[nomad.metainfo.metainfo.MSection]

Retrieves all sub sections of the given sub section definition.

m_sub_section_count(sub_section_def: nomad.metainfo.metainfo.SubSection) → int

Returns the number of sub sections for the given sub section definition.

m_create(section_cls: Type[MSectionBound], sub_section_def: Optional[nomad.metainfo.metainfo.SubSection] = None, **kwargs) → MSectionBound

Creates a section instance and adds it to this section provided there is a corresponding sub section.

Parameters
  • section_cls – The section class for the sub-secton to create

  • sub_section_def – If there are multiple sub-sections for the given class, this must be used to explicitely state the sub-section definition.

m_update(safe: bool = True, **kwargs)

Updates all quantities and sub-sections with the given arguments.

m_as(section_cls: Type[MSectionBound]) → MSectionBound

‘Casts’ this section to the given extending sections.

m_follows(definition: nomad.metainfo.metainfo.Section) → bool

Determines if this section’s definition is or is derived from the given definition.

m_to_dict(with_meta: bool = False, include_defaults: bool = False, include_derived: bool = False, categories: List[Union[Category, Type[MCategory]]] = None, partial: Callable[[Definition, MSection], bool] = None) → Dict[str, Any]

Returns the data of this section as a json serializeable dictionary.

Parameters
  • with_meta – Include information about the section definition and the sections position in its parent.

  • include_defaults – Include default values of unset quantities.

  • include_derived – Include values of derived quantities.

  • categories – A list of category classes or category definitions that is used to filter the included quantities and sub sections. Only applied to properties of this section, not on sub-sections. Is overwritten by partial.

  • partial

    A function that determines if a definition should be included in the output dictionary. Takes a definition and the containing section as arguments. Two default functions can be used by providing a string instead:

    • ’mongo’: Only include quantities that have an a_mongo annotation.

    • ’es’: Only include quantities that have an a_elastic or an an a_search annotation.

    Partial is applied recursively on sub-sections. Overrides categories.

m_update_from_dict(dct: Dict[str, Any]) → None

Updates this section with the serialized data from the given dict, e.g. data produced by m_to_dict().

classmethod m_from_dict(dct: Dict[str, Any]) → MSectionBound

Creates a section from the given serializable data dictionary.

This is the ‘opposite’ of m_to_dict(). It takes a deserialized dict, e.g loaded from JSON, and turns it into a proper section, i.e. instance of the given section class.

m_to_json(**kwargs)

Returns the data of this section as a json string.

m_all_contents(depth_first: bool = False, include_self: bool = False, stop: Callable[[MSection], bool] = None) → Iterable[nomad.metainfo.metainfo.MSection]

Returns an iterable over all sub and sub subs sections.

Parameters
  • depth_first – A boolean indicating that children should be returned before parents.

  • include_self – A boolean indicating that the results should contain this section.

  • stop – A predicate that determines if the traversal should be stopped or if children should be returned. The sections for which this returns True are included in the results.

m_traverse()

Performs a depth-first traversal and yield tuples of section, property def, parent index for all set properties.

m_pretty_print(indent=None)

Pretty prints the containment hierarchy

m_contents() → Iterable[nomad.metainfo.metainfo.MSection]

Returns an iterable over all direct subs sections.

m_path(quantity_def: Optional[nomad.metainfo.metainfo.Quantity] = None) → str

Returns the path of this section or the given quantity within the section hierarchy.

m_root(cls: Type[MSectionBound] = None) → MSectionBound

Returns the first parent of the parent section that has no parent; the root.

m_parent_as(cls: Type[MSectionBound] = None) → MSectionBound

Returns the parent section with the given section class type.

m_resolved()

Returns the original resolved object, if this instance used to be a proxy.

For most purposes a resolved proxy is equal to the section it was resolved to. The exception are hashes. So if you want to use a potential former proxy in a hash table and make it really equal to the section it was resolved to, use the result of this method instead of the section/proxy itself.

m_resolve(path: str, cls: Type[MSectionBound] = None) → MSectionBound

Resolves the given path or dotted quantity name using this section as context and returns the sub_section or value.

m_get_annotations(key: Union[str, type], default=None, as_list: bool = False)

Convenience method to get annotations

Parameters
  • key – Either the optional annotation name or an annotation class. In the first case the annotation is returned, regardless of its type. In the second case, all names and list for names are iterated and all annotations of the given class are returned.

  • default – The default, if no annotation is found. None is the default default.

  • as_list – Returns a list, no matter how many annoations have been found.

m_validate() → Tuple[List[str], List[str]]

Evaluates all constraints and shapes of this section and returns a list of errors.

m_copy(deep=False, parent=None) → MSectionBound
m_all_validate()

Evaluates all constraints in the whole section hierarchy, incl. this section.

m_warning(*args, **kwargs)
get(key)
values()
m_xpath(expression: str)

Provides an interface to jmespath search functionality.

Parameters

expression – A string compatible with the jmespath specs representing the search. See https://jmespath.org/ for complete description.

metainfo_section.m_xpath('code_name')
metainfo_section.m_xpath('systems[-1].system_type')
metainfo_section.m_xpath('sccs[0].system.atom_labels')
metainfo_section.m_xpath('systems[?system_type == `molecule`].atom_labels')
metainfo_section.m_xpath('sccs[?energy_total < `1.0E-23`].system')
class nomad.metainfo.MetainfoError

Metainfo related errors.

__init__()

Initialize self. See help(type(self)) for accurate signature.

class nomad.metainfo.DeriveError

An error occurred while computing a derived value.

__init__()

Initialize self. See help(type(self)) for accurate signature.

class nomad.metainfo.MetainfoReferenceError

An error indicating that a reference could not be resolved.

__init__()

Initialize self. See help(type(self)) for accurate signature.

Resources

class nomad.metainfo.MResource(logger=None)

Represents a collection of related metainfo data, i.e. a set of MSection instances.

__init__(logger=None)

Initialize self. See help(type(self)) for accurate signature.

A more complex example

#
# Copyright The NOMAD Authors.
#
# This file is part of NOMAD. See https://nomad-lab.eu for further info.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

''' An example metainfo package. '''

import numpy as np
from datetime import datetime

from nomad.units import ureg
from nomad.metainfo import (
    MSection, MCategory, Section, Quantity, Package, SubSection, MEnum,
    Datetime, constraint)

m_package = Package(links=['https://nomad-lab.eu/prod/rae/docs/metainfo.html'])


class SystemHash(MCategory):
    ''' All quantities that contribute to what makes a system unique. '''


class Parsing(MSection):
    ''' All data that describes the NOMAD parsing of this run.

    Quantities can also be documented like this:

    Args:
        parser_name: 'Name of the used parser'
        parser_version: 'Version of the used parser'
    '''

    parser_name = Quantity(type=str)
    parser_version = Quantity(type=str)
    nomad_version = Quantity(type=str, default='latest')
    warnings = Quantity(type=str, shape=['0..*'])
    parse_time = Quantity(type=Datetime)


class System(MSection):
    ''' All data that describes a simulated system. '''

    n_atoms = Quantity(
        type=int, derived=lambda system: len(system.atom_labels),
        description='Number of atoms in the simulated system.')

    atom_labels = Quantity(
        type=str, shape=['n_atoms'], categories=[SystemHash],
        description='The atoms in the simulated systems.')

    atom_positions = Quantity(
        type=np.dtype('f'), shape=['n_atoms', 3], unit=ureg.m, categories=[SystemHash],
        description='The atom positions in the simulated system.')

    lattice_vectors = Quantity(
        type=np.dtype('f'), shape=[3, 3], unit=ureg.m, categories=[SystemHash],
        aliases=['unit_cell'],
        description='The lattice vectors of the simulated unit cell.')

    periodic_dimensions = Quantity(
        type=bool, shape=[3], default=[False, False, False], categories=[SystemHash],
        description='A vector of booleans indicating in which dimensions the unit cell is repeated.')

    system_type = Quantity(type=str)


class SCC(MSection):

    energy_total = Quantity(type=float, default=0.0, unit=ureg.J)
    energy_total_0 = Quantity(type=np.dtype(np.float32), default=0.0, unit=ureg.J)
    an_int = Quantity(type=np.dtype(np.int32))

    system = Quantity(type=System, description='The system that this calculation is based on.')


class Run(MSection):
    ''' All data that belongs to a single code run. '''

    code_name = Quantity(type=str, description='The name of the code that was run.')
    code_version = Quantity(type=str, description='The version of the code that was run.')

    parsing = SubSection(sub_section=Parsing)
    systems = SubSection(sub_section=System, repeats=True)
    sccs = SubSection(sub_section=SCC, repeats=True)

    @constraint
    def one_scc_per_system(self):
        assert self.m_sub_section_count(Run.systems) == self.m_sub_section_count(Run.sccs),\
            'Numbers of system does not match numbers of calculations.'


class VaspRun(Run):
    ''' All VASP specific quantities for section Run. '''
    m_def = Section(extends_base_section=True)

    x_vasp_raw_format = Quantity(
        type=MEnum(['xml', 'outcar']),
        description='The file format of the parsed VASP mainfile.')


if __name__ == '__main__':
    # Demonstration of how to reflect on the definitions

    # All definitions are metainfo data themselves, and they can be accessed like any other
    # metainfo data. E.g. all section definitions are sections themselves.

    # To get quantities of a given section
    print(Run.m_def.m_get_sub_sections(Section.quantities))

    # Or all Sections in the package
    print(m_package.m_get_sub_sections(Package.section_definitions))

    # There are also some definition specific helper methods.
    # For example to get all attributes (Quantities and possible sub-sections) of a section.
    print(Run.m_def.all_properties)

    # Demonstration on how to use the definitions, e.g. to create a run with system:
    run = Run()
    run.code_name = 'VASP'
    run.code_version = '1.0.0'

    parsing = run.m_create(Parsing)
    parsing.parse_time = datetime.now()

    run.m_as(VaspRun).x_vasp_raw_format = 'outcar'
    # The same as
    run.x_vasp_raw_format = 'outcar'  # type: ignore

    system = run.m_create(System)
    system.atom_labels = ['H', 'H', 'O']

    calc = run.m_create(SCC)
    calc.energy_total = 1.23e-10
    calc.system = system

    # Or to read data from existing metainfo data:
    print(system.atom_labels)
    print(system.n_atoms)

    # To validate dimensions and custom constraints
    print('errors: %s' % run.m_all_validate())

    # To serialize the data:
    serializable = run.m_to_dict()
    # or
    print(run.m_to_json(indent=2))

    # To deserialize data
    run = Run.m_from_dict(serializable)
    print(run.sccs[0].system)

    # print(m_package.m_to_json(indent=2))  # type: ignore, pylint: disable=undefined-variable

Accessing the Metainfo

Above you learned what the metainfo is and how to create metainfo definitions and work with metainfo data in Python. But how do you get access to the existing metainfo definitions within NOMAD? We call the complete set of all metainfo definitions the NOMAD Metainfo.

This NOMAD Metainfo comprises definitions from various packages defined by all the parsers and converters (and respective code outputs and formats) that NOMAD supports. In addition there are common packages that contain definitions that might be relevant to different kinds of archive data.

Python

In the NOMAD source-code all metainfo definitions are materialized as Python source files that contain the definitions in the format described above. If you have installed the NOMAD Python package (see Install the NOMAD client library), you can simply import the respective Python modules:

from nomad.datamodel.metainfo.public import m_package
print(m_package.m_to_json(indent=2))

from nomad.datamodel.metainfo.public import section_run
my_run = section_run()

Many more examples about how to read the NOMAD Metainfo programmatically can be found here.

API

In addition, a JSON version of the NOMAD Metainfo is available through our API via the metainfo endpoint. You can get one giant JSON with all definitions, or you can access the metainfo for specific packages, e.g. the VASP metainfo. The returned JSON will also contain all packages that the requested package depends on.

Legacy metainfo version

There are no metainfo files anymore. The old *.nomadmetainfo.json files are no longer maintained, as the Python definitions in each parser/converter implementation are now the normative artifact for the NOMAD Metainfo.

To get the NOMAD Metainfo in the format of the old NOMAD CoE project, you can use the metainfo/legacy endpoint; e.g. the VASP legacy metainfo.