Glossary¶

This is a list of terms that have a specific meaning for NOMAD and are used through out the application and this documentation.

Annotation¶

Annotations are part of data schemas and they describe aspects that are not directly defining the type or shape of data. They often allow to alter how certain data is managed, represented, or edited. See annotations in the schema documentation.

App¶

Apps allow you to build customized user interfaces for specific research domains, making it easier to navigate and understand the data. This typically means that certain domain-specific properties are highlighted, different units may be used for physical properties, and specialized dashboards may be presented. This becomes crucial for NOMAD installations to be able to scale with data that contains a mixture of experiments and simulations, different techniques, and physical properties spanning different time and length scales.

Archive¶

NOMAD processes (parses and normalizes) all data. The entirety of all processed data is referred to as the Archive. Sometimes the term archive is used to refer to the processed data of a particular entry, e.g. "the archive of that entry".

The term archive is an old synonym for processed data. Since the term has different meaning for different people and is very abstract, we are slowly deprecating its use in favor of processed data (e.g. NOMAD Repository and NOMAD Archive).

Author¶

An author is typically a natural person that has uploaded a piece of data into NOMAD and has authorship over it. Often authors are users, but not always. Therefore, we have to distinguish between authors and users.

Dataset¶

Users can organize entries into datasets. Datasets are not created automatically, don't confuse them with uploads. Datasets can be compared to albums, labels, or tags on other platforms. Datasets are used to reference a collection of data and users can get a DOI for their datasets.

Deployment (NOMAD Oasis)¶

NOMAD Deployment refers to a live instance of a NOMAD distribution running on some hardware. A deployment is also known as an Oasis.

Distribution (distro)¶

NOMAD Distribution is a Git repository containing the configuration for instantiating a customized NOMAD instance. Distributions define the plugins that should be installed, the configurations files (e.g. nomad.yaml) to use, CI pipeline steps for building final Docker images and a docker-compose.yaml file that can be used to launch the instance.

ELN¶

Electronic Lab Notebooks (ELNs) are a specific kind of entry in NOMAD. These entries can be edited in NOMAD, in contrast to entries that are created by uploading and processing data. ELNs offer form fields and other widgets to modify the contents of an entry. As all entries, ELNs are based on a schema; how quantities are edited (e.g. which type of widget) can be controlled through annotations.

Entry¶

Data in NOMAD is organized in entries (as in "database entry"). Entries have an entry id. Entries can be searched for and entries have individual pages on the NOMAD GUI. Entries are always associated with raw files, where one of these files is the mainfile. Raw files are processed to create the processed data (or the archive) for an entry.

Example upload¶

Example uploads are pre-prepared uploads containing data that typically showcases certain features of a plugin. The contents of example uploads can be fixed, created programmatically or fetched from online sources. Example uploads can be instantiated by using the "Example uploads" -button in the "Uploads" -page of the GUI. Example uploads can be defined by creating an example upload plugin entry point.

Mainfile¶

Each entry has one raw file that defines it. This is called the mainfile of that entry. Typically most, if not all, processed data of an entry is retrieved from that mainfile.

Metadata¶

In NOMAD metadata refers to a specific technical sub-set of processed data. The metadata of an entry comprises ids, timestamps, hashes, authors, datasets, references, used schema, and other information.

Metainfo¶

The term metainfo refers to the sum of all schemas. In particular it is associated with all pre-defined schemas that are used to represent all processed data in a standardized way. Similar to an ontology, the metainfo provides additional meaning by associated in each piece of data with name, description, categories, type, shape, units, and more.

Normalizer¶

A normalizer is a small tool that can refine the processed data of an entry. Normalizers can read and modify processed data and thereby either normalize (change) some of the data or add normalized derived data. Normalizers are run after parsers and are often used to do processing steps that need to be applied to the outcome of many parsers and are therefore not part of the parsers themselves.

There are normalizer classes and normalize functions. The normalizer classes are run after parsing in a particular order and if certain conditions are fulfilled. Normalize functions are part of schemas (i.e. section definitions). They are run at the end of processing on all the sections that instantiate the respective section definition.

Parser¶

A parser is a small program that takes a mainfile as input and produces processed data. Parsers transform information from a particular source format into NOMAD's structured schema-based format. Parsers start with a mainfile, but can open and read data from other files (e.g. those referenced in the mainfile). Typically, a parser is associated with a certain file-format and is only applied to files of that format.

Plugin¶

NOMAD installations can be customized through plugins, which are Git repositories containing an installable python package that will add new features upon being installed. Plugins can contain one or many plugin entry points, which represent individual customizations.

Plugin entry point¶

Plugin entry points are used to configure and load different types of NOMAD customizations. There are several entry point types, including entry points for parsers, schema packages and apps. A single plugin may contain multiple entry points.

Processed data¶

NOMAD processes (parses and normalizes) all data. The processed data is the outcome of this process. Therefore, each NOMAD entry is associated with processed data that contains all the parsed and normalized information. Processed data always follows a schema. Processed data can be retrieved (via API) or downloaded as .json data.

Processing¶

NOMAD processes (parses and normalizes) all data. During processing, all provided files are considered. First, files are matched to parsers. Second, files that match with a parser, become mainfiles and an entry is created. Third, we run the parser to create processed data. Fourth, the processed data is further refined by running normalizers. Last, the processed data is saved and indexed. The exact processing time depends on the size of the uploaded data and users can track the processing state of each entry in the GUI.

Quantity¶

All processed data is structured into sections and quantities. Sections provide hierarchy and organization, quantities refer to the actual pieces of data. In NOMAD, a quantity is the smallest referable unit of processed data. Quantities can have many types and shapes; examples are strings, numbers, lists, or matrices.

In a schema, quantities are defined by their name, description, type, shape, and unit. Quantities in processed data are associated with a respective quantity definition from the respective schema.

Raw file¶

A raw file is any file that was provided by a NOMAD user. A raw-file might produce an entry, if it is of a supported file-format, but does not have to. Raw files always belong to an upload and might be associated with an entry (in this case, raw-files are also mainfiles).

The sum of all raw files is also referred to as the Repository. This is an old term from when NOMAD was implemented as several services (e.g. NOMAD Repository and NOMAD Archive).

Results (section `results`)¶

The results are a particular section of processed data. They comprise a summary of the most relevant data for an entry.

While all processed data can be downloaded and is accessible via API, an entry's results (combined with its metadata) is also searchable and can be read quicker and in larger amounts.

Schema¶

Schemas define possible data structures for processed data. Like a book they organize data hierarchically in sections and subsections. Schemas are similar to ontologies as they define possible relationships between data organized within them.

A schema is a collection of section and quantity definitions. Schemas are organized in schema packages, i.e. collections of definitions. All schemas combined form the metainfo.

Schema package¶

Schema packages contain a collection of schema definitions. Schema packages may be defined as YAML files or in Python as plugin entry points.

Section and Subsection¶

All processed data is structured into sections and quantities. Sections provide hierarchy and organization, quantities refer to the actual pieces of data.

In a schema, a section are defined by their name, description, all possible subsections, and quantities. Section definitions can also inherit all properties (subsections, quantities) from other section definitions using them as base sections.

Upload¶

NOMAD organizes raw-files (and all entries created from them) in uploads. Uploads consist of a directory structure of raw-files and a list of respective entries.

Uploads are created by a single user, the owner. Uploads have two states. Initially, they are mutable and have limited visibility. Owners can invite other to collaborate and those users can add/remove/change data. The owner can publish an upload at some point, where the upload becomes immutable and visible to everyone. Uploads are the smallest unit of data that can be individually shared and published.

User¶

A user is anyone with a NOMAD account. It is different from an author as all users can be authors, but not all authors have to be users. All data in NOMAD is always owned by a single user (others can be collaborators and co-authors).