# Schemas for ELNs¶

A schema defines all possible data structures. With small editions to our schemas, we can instruct NOMAD to provide respective editors for data. This allows us to build Electronic Lab Notebooks (ELNs) as tools to acquire data in a formal and structured way. For schemas with ELN annotations, users can create new entries in NOMAD GUI and edit the archive (structured data) of these entries directly in the GUI.

## Annotations¶

Definitions in a schema can have annotations. These annotations provide additional information that NOMAD can use to alter its behavior around these definitions. Annotations are named blocks of key-value pairs:

definitions:
sections:
MyAnnotatedSection:
m_annotations:
annotation_name:
key1: value
key2: value


Many annotations control the representation of data in the GUI. This can be for plots or data entry/editing capabilities. There are three main categories of annotations relevant to ELNs.

• eln annotations shape the data editor, i.e. allow you to control which type of forms to use to edit quantities.
• With tabular annotation data from linked .csv or Excel files can be parsed and added to data.
• plot annotation allows you to plot numerical data (e.g. added via tables) directly in the ELN (or)

## Example ELN¶

The is the commented ELN schema from our ELN example upload that can be created from NOMAD's upload page:

# Schemas can be defined as yaml files like this. The archive.yaml format will be
# interpreted by nomad as a nomad archive. Therefore, all definitions have to be
# put in a top-level section called "definitions"
definitions:
# The "definitions" section is interpreted as a nomad schema package
# Schema packages can have a name:
name: 'Electronic Lab Notebook example schema'
# Schema packages contain section definitions. This is where the interesting schema
# information begins.
sections:
# Here we define a section called "Chemical":
Chemical:
# Section definition can have base_sections. Base sections are other schema
# definition and all properties of these will be inherited.
base_sections:
- 'nomad.datamodel.metainfo.eln.Chemical'  # Provides typical quantities like name, descriptions, chemical_formula and makes those available for search
- 'nomad.datamodel.data.EntryData'  # Declares this as a top-level entry section. This determines the types of entries you can create. With this we will be able to create a "Chemical" entry.
# All definitions, sections, sub_sections, quantities, can provide a description.
description: |
This is an example description for Chemical.
A description can contain **markdown** markup and TeX formulas, like $\sum\limits_{i=0}^{n}$.
# Sections define quantities. Quantities allow to manage actual data. Quantities
# can have various types, shapes, and units.
quantities:
# Here we define a quantity called "from"
form:
# This defines a Enum type with pre-defined possible values.
type:
type_kind: Enum
type_data:
- crystalline solid
- powder
# Annotations allow to provide additional information that is beyond just defining
# the possible data.
m_annotations:
# The eln annotation allows add the quantity to a ELN
eln:
component: EnumEditQuantity  # A form field component for EnumQuantities that uses a pull down menu.
cas_number:
type: str
m_annotations:
eln:
component: StringEditQuantity
ec_number:
type: str
m_annotations:
eln:
component: StringEditQuantity
Instrument:
base_sections:
Process:
quantities:
instrument:
type: Instrument
m_annotations:
eln:
component: ReferenceEditQuantity
Sample:
m_annotations:
# The template annotation allows to define what freshly created entries (instances of this schema) will look like.
# In this example we create a sample with an empty pvd_evaporation process.
template:
processes:
pvd_evaporation: {}
base_sections:
quantities:
name:
type: str  # The simple string type
default: Default Sample Name
m_annotations:
eln:
component: StringEditQuantity  # A simple text edit form field
tags:
type:
type_kind: Enum
type_data:
- internal
- collaboration
- project
- other
shape: ['*']  # Shapes define non scalar values, like lists ['*'], vectors ['*', 3], etc.
m_annotations:
eln:
component: AutocompleteEditQuantity  # Allows to edit enums with an auto complete text form field
chemicals:
type: Chemical  # Types can also be other sections. This allows to reference a different section.
shape: ['*']
m_annotations:
eln:
component: ReferenceEditQuantity  # A editor component that allows to select from available "Chemical"s
substrate_type:
type:
type_kind: Enum
type_data:
- Fused quartz glass
- SLG
- other
m_annotations:
eln:
substrate_thickness:
type: np.float64
unit: m
m_annotations:
eln:
component: NumberEditQuantity
sample_is_from_collaboration:
type: bool
m_annotations:
eln:
component: BoolEditQuantity
# Besides quantities, a section can define sub_sections. This allows hierarchies
# of information.
sub_sections:
# Here we define a sub_section of "Sample" called "processes"
processes:
section:
# The sub-section's section, is itself a section definition
m_annotations:
eln:  # adds the sub-section to the eln and allows users to create new instances of this sub-section
# We can also nest sub_sections. It goes aribitrarely deep.
sub_sections:
pvd_evaporation:
section:
m_annotations:
# We can use the eln annotations to put the section to the overview
# page, and hide unwanted inherited quantities.
eln:
overview: true
hide: ['name', 'lab_id', 'description', 'method']
# Plots are shown in the eln. Currently we only support simple x,y
# line plots
plot:
title: Pressure and Temperature over Time
x: time
y:
- chamber_pressure
- substrate_temperature
quantities:
data_file:
type: str
description: |
A reference to an uploaded .csv produced by the PVD evaporation instruments
control software.
m_annotations:
# The tabular_parser annotation, will treat the values of this
# quantity as files. It will try to interpret the files and fill
# quantities in this section (and sub_section) with the column
# data of .csv or .xlsx files. There is also a mode option that by default, is set to column.
tabular_parser:
sep: '\t'
comment: '#'
browser:
adaptor: RawFileAdaptor  # Allows to navigate to files in the data browser
eln:
component: FileEditQuantity  # A form field that allows to drop and select files.
time:
type: np.float64
shape: ['*']
unit: s
m_annotations:
# The tabular annotation defines a mapping to column headers used in
# tabular data files
tabular:
name: Process Time in seconds
chamber_pressure:
type: np.float64
shape: ['*']
unit: mbar
m_annotations:
tabular:
name: Vacuum Pressure1
plot:
x: time
y: chamber_pressure
substrate_temperature:
type: np.float64
shape: ['*']
unit: kelvin
m_annotations:
tabular:
name: Substrate PV
unit: degC
plot:
x: time
y: substrate_temperature
hotplate_annealing:
section:
base_section: Process
m_annotations:
# We can use the eln annotations to put the section to the overview
# page, and hide unwanted inherited quantities.
eln:
overview: true
hide: ['name', 'lab_id', 'description']
quantities:
set_temperature:
type: np.float64  # For actual numbers, we use numpy datatypes
unit: K  # The unit system is based on Pint and allows all kinds of abreviations, prefixes, and complex units
m_annotations:
eln:
component: NumberEditQuantity  # A component to enter numbers (with units)
duration:
type: np.float64
unit: s
m_annotations:
eln:
component: NumberEditQuantity


## ELN Annotation¶

These annotations control how data can be entered and edited. Use the key eln to add this annotations. For example:

class Sample(EntryData):
sample_id = Quantity(type=str, a_eln=dict(component='StringEditQuantity')))


or in YAML schemas:

Sample:
quantities:
sample_id:
type: str
m_annotations:
eln:
component: StringEditQuantity


An eln annotation can be added to section and quantity definitions to different effects. In both cases, it controls how sections and quantities are represented in the GUI with different parameters; see below.

The UI gives an overview about all ELN edit annotations and components here.

name type
component str The form field component that is used to make the annotated quantity editable. If no component is given, the quantity won't be editable. This can be used on quantities only.
The supported values are:
StringEditQuantity: For editing simple short string values.
URLEditQuantity: For editing strings that are validated to be URLs.
EnumEditQuantity: For Editing enum values. Uses a dropdown list with enum values. This component may be used for short enumerates.
RadioEnumEditQuantity: For Editing enum values. Uses radio buttons.
AutocompleteEditQuantity: For editing enum values. Uses an autocomplete form with dropdown list. This component may be used for longer enumerates.
FileEditQuantity: For editing a reference to a file. Will allow to choose a file or upload a file.
BoolEditQuantity: For editing boolean choices.
NumberEditQuantity: For editing numbers with our without unit.
SliderEditQuantity: For editing numbers with a horizontal slider widget.
DateTimeEditQuantity: For editing datetimes.
RichTextEditQuantity: For editing long styled text with a rich text editor.
ReferenceEditQuantity: For editing references to other sections.
UserEditQuantity: For entering user information. Lets you choose a nomad user or enter information manually.
AuthorEditQuantity: For entering author information manually.
options:
- StringEditQuantity
- URLEditQuantity
- EnumEditQuantity
- RadioEnumEditQuantity
- AutocompleteEditQuantity
- FileEditQuantity
- BoolEditQuantity
- NumberEditQuantity
- SliderEditQuantity
- DateTimeEditQuantity
- RichTextEditQuantity
- ReferenceEditQuantity
- UserEditQuantity
- AuthorEditQuantity
label str Custom label for the quantity shown on the form field.
props Dict[str, typing.Any] A dictionary with additional props that are passed to the editcomponent.
default typing.Any Prefills any set form field component with the given value. This is different from the quantities default property. The quantities default is not stored in the data; the default value is assumed if no other value is given. The ELN form field default value will be stored, even if not changed.
defaultDisplayUnit str Allows to define a default unit to initialize a NumberEditQuantity with. The unit has to be compatible with the unit of the annotation quantity and the annotated quantity must have a unit. Only applies to quantities and with component=NumberEditQuantity.
minValue Union[int, float] Allows to specify a minimum value for quantity annotations with number type. Will show an error, if outside numbers are entered. Only works on quantities and in conjunction with component=NumberEditQuantity.
maxValue Union[int, float] Allows to specify a maximum value for quantity annotations with number type. Will show an error, if outside numbers are entered. Only works on quantities and in conjunction with component=NumberEditQuantity.
hide List[str] The annotation "hide" is deprecated. Use "visible" key of "properties" annotation instead. Allows you to hide certain quantities from a section editor. Give a list of quantity names. Quantities must exist in the section that this annotation is added to. Can only be used in section annotations.
deprecated
overview int Shows the annotation section on the entry's overview page. Can only be used on section annotations.
lane_width Union[str, int] Value to overwrite the css width of the lane used to render the annotation section and its editor.
properties SectionProperties The value to customize the quantities and sub sections of the annotation section. The supported keys: visible: To determine the visible quantities and sub sections by their names
editable: To render things visible but not editable, e.g. in inheritance situations
order: # To order things, properties listed in that order first, then the rest

### SectionProperties¶

A filter defined by an include list or and exclude list of the quantities and subsections.

name type
visible Filter Defines the visible quantities and subsections.
default: 1
editable Filter Defines the editable quantities and subsections.
order List[str] To customize the order of the quantities and subsections.

### Filter¶

A filter defined by an include list or and exclude list of the quantities or subsections.

name type
include List[str] The list of quantity or subsection names to be included.
exclude List[str] The list of quantity or subsection names to be excluded.

## Browser Annotation¶

The browser annotation allows to specify if the processed data browser needs to display a quantity differently. It can be applied to quantities. For example

    class Experiment(EntryData):
description = Quantity(type=str, a_browser=dict(render_value='HtmlValue'))


or in yaml

Experiment:
quantities:
description:
type: str
m_annotations:
browser:
render_value: HtmlValue

name type
adaptor str Allows to change the Adaptor implementation that is used to render the lane for this quantity. Possible values are:
RawFileAdaptor: An adopter that is used to show files, including all file actions, like file preview.
options:
- RawFileAdaptor
render_value str Allows to change the Component used to render the value of the quantity. Possible values are:
HtmlValue: Renders a string as HTML.
JsonValue: Renders a dict or list in a collapsable tree.
options:
- JsonValue
- HtmlValue

## Tabular Annotations¶

In order to import your data from a .csv or Excel file, NOMAD provides three distinct (and separate) ways, that with each comes unique options for importing and interacting with your data. To better understand how to use NOMAD parsers to import your data, three commented sample schemas are presented below. Also, each section follows and extends a general example explained thereafter.

Two main components of any tabular parser schema are: 1) implementing the correct base-section(s), and 2) providing a data_file Quantity with the correct m_annotations (only exception for the entry mode).

Please bear in mind that the schema files should 1) follow the NOMAD naming convention (i.e. My_Name.archive.yaml), and 2) be accompanied by your data file in order for NOMAD to parse them. In the examples provided below, an Excel file is assumed to contain all the data, as both NOMAD and Excel support multiple-sheets data manipulations and imports. Note that the Excel file name in each schema should match the name of the Excel data file, which in case of using a .csv data file, it can be replaced by the .csv file name.

TableData (and any other section(s) that is inheriting from TableData) has a customizable checkbox Quantity (i.e. fill_archive_from_datafile) to turn the tabular parser on or off. If you do not want to have the parser running everytime you make a change to your archive data, it is achievable then via unchecking the checkbox. It is customizable in the sense that if you do not wish to see this checkbox at all, you can configure the hide parameter of the section's m_annotations to hide the checkbox. This in turn sets the parser to run everytime you save your archive.

Be cautious though! Turning on the tabular parser (or checking the box) on saving your data will cause losing/overwriting your manually-entered data by the parser!

#### Column-mode Sample:¶

The following sample schema creates one quantity off the entire column of an excel file (column mode). For example, suppose in an excel sheet, several rows contain information of a chemical product (e.g. purity in one column). In order to list all the purities under the column purity and import them into NOMAD, you can use the following schema by substituting My_Quantity with any name of your choice (e.g. Purity), tabular-parser.data.xlsx with the name of the csv/excel file where the data lies, and My_Sheet/My_Column with sheet_name/column_name of your targeted data. The Tabular_Parser can also be changed to any arbitrary name of your choice.

Important notes:

• shape: ['*'] under My_Quantity is essential to parse the entire column of the data file.
• The data_file Quantity can have any arbitrary name (e.g. xlsx_file) and can be referenced within the tabular_parser annotation of other sections which are of type TableData via path_to_data_file in (please see Tabular Parser section)
• My_Quantity can also be defined within another subsection (see next sample schema)
# This schema is specially made for demonstration of implementing a tabular parser with
# column mode.
definitions:
name: 'Tabular Parser example schema'
sections:
Tabular_Parser: # The main section that contains the quantities to be read from an excel file.
# This name can be changed freely.
base_sections:
quantities:
data_file:
type: str
m_annotations:
tabular_parser: # The tabular_parser annotation, will treat the values of this
# quantity as files. It will try to interpret the files and fill
# quantities in this section (and sub_sections) with the column
# data of .csv or .xlsx files.
comment: '#' # Skipping lines in csv or excel file that start with the sign #
mode: column # Here the mode can be set. If removed, by default,
# the parser assumes mode to be column
My_Quantity:
type: str
shape: ['*']
m_annotations:
tabular: # The tabular annotation defines a mapping to column headers used in tabular data files
name: My_Sheet/My_Column # Here you can define where the data for the given quantity is to be taken from
# The convention for selecting the name is if the data is to be taken from an excel file,
# you can specify the sheet_name followed by a forward slash and the column_name to target the desired quantity.
# If only a column name is provided, then the first sheet in the excel file (or the .csv file)
# is assumed to contain the targeted data.
data:
m_def: Tabular_Parser # this is a reference to the section definition above
data_file: tabular-parser.data.xlsx # name of the excel/csv file to be uploaded along with this schema yaml file


#### Row-mode Sample:¶

The sample schema provided below, creates separate instances of a repeated section from each row of an excel file (row mode). For example, suppose in an excel sheet, you have the information for a chemical product (e.g. name in one column), and each row contains one entry of the aforementioned chemical product. Since each row is separate from others, in order to create instances of the same product out of all rows and import them into NOMAD, you can use the following schema by substituting My_Subsection, My_Section and My_Quantity with any appropriate name (e.g. Substance, Chemical_product and Name respectively).

Important notes:

• This schema demonstrates how to import data within a subsection of another subsection, meaning the targeted quantity should not necessarily go into the main quantites.
• Setting mode to row signals that for each row in the sheet_name (provided in My_Quantity), one instance of the corresponding (sub-)section (in this example, My_Subsection sub-section as it has the repeats option set to true), will be appended. Please bear in mind that if this mode is selected, then all other quantities should exist in the same sheet_name.
# This schema is specially made for demonstration of implementing a tabular parser with
# row mode.
definitions:
name: 'Tabular Parser example schema'
sections:
Tabular_Parser: # The main section that contains the quantities to be read from an excel file
# This name can be changed freely.
base_sections:
- nomad.parsing.tabular.TableData # Here we specify that we need to acquire the data from a .xlsx or a .csv file
quantities:
data_file:
type: str
m_annotations:
tabular_parser:
comment: '#' # Skipping lines in csv or excel file that start with the sign #
mode: row # Setting mode to row signals that for each row in the sheet_name (provided in quantity)
target_sub_section: # This is the reference to where the targeted (sub-)section lies within this example schema file
- My_Subsection/My_Section
sub_sections:
My_Subsection:
section:
sub_sections:
My_Section:
repeats: true # The repeats option set to true means there can be multiple instances of this
# section
section:
quantities:
My_Quantity:
type: str
m_annotations:
tabular: # The tabular annotation defines a mapping to column headers used in tabular data files
name: My_Sheet/My_Column # sheet_name and column name of the targeted data in csv/xlsx file
data:
m_def: Tabular_Parser # this is a reference to the section definition above
data_file: tabular-parser.data.xlsx # name of the excel/csv file to be uploaded along with this schema yaml file


#### Entry-mode Sample:¶

The following sample schema creates one entry for each row of an excel file (entry mode). For example, suppose in an excel sheet, you have the information for a chemical product (e.g. name in one column), and each row contains one entry of the aforementioned chemical product. Since each row is separate from others, in order to create multiple archives of the same product out of all rows and import them into NOMAD, you can use the following schema by substituting My_Quantity with any appropriate name (e.g. Name).

Important note:

• For entry mode, the convention for reading data from csv/excel file is to provide only the column name and the data are assumed to exist in the first sheet
# This schema is specially made for demonstration of implementing a tabular parser with
# entry mode.
definitions:
name: 'Tabular Parser example schema' # The main section that contains the quantities to be read from an excel file
# This name can be changed freely.
sections:
Tabular_Parser:
base_sections:
- nomad.parsing.tabular.TableRow # To create entries from each row in the excel file
# the base section should inherit from nomad.parsing.tabular.TableRow. For this specific case,
# the datafile should be accompanied
quantities:
My_Quantity:
type: str
m_annotations:
tabular:
name: My_Column


Here are all parameters for the two annotations Tabular Parser and Tabular.

### Tabular Parser¶

Instructs NOMAD to treat a string valued scalar quantity as a file path and interprets the contents of this file as tabular data. Supports both .csv and Excel files.

name type
comment str The character denoting the commented lines in .csv files. This is passed to pandas to parse the file. Has to be used to annotate the quantity that holds the path to the .csv or excel file.
sep str The character used to separate cells in a .csv file. This is passed to pandas to parse the file. Has to be used to annotate the quantity that holds the path to the .csv or excel file.
skiprows int Number of .csv file rows that are skipped. This is passed to pandas to parse the file. Has to be used to annotate the quantity that holds the path to the .csv or excel file.
separator str An alias for sep
mode str Either column or row. With column the whole column is mapped into a quantity (needs to be a list). With row each row (and its cells) are mapped into instances of a repeating sub section, where each section represents a row (quantities need to be scalars). Has to be used to annotate the quantity that holds the path to the .csv or excel file.
default: column
options:
- row
- column
target_sub_section List[str] A lists of paths to sub-sections of the annotation quantity's section. Each path is a / separated list of nested sub-sections. The targeted sub-sections, will be considered when mapping table columns to quantities. Has to be used to annotate the quantity that holds the path to the .csv or excel file.
default: []

### Tabular¶

Allows to map a quantity to a row of a tabular data-file. Should only be used in conjunction with tabular_parser.

name type
name str The column name that should be mapped to the annotation quantity. Has to be the same string that is used in the header, i.e. first .csv line or first excel file row. For excel files with multiple sheets, the name can have the form <sheet name>/<column name>. Otherwise, only the first sheets is used. Has to be applied to the quantity that a column should be mapped to.
unit str The unit of the value in the file. Has to be compatible with the annotated quantity's unit. Will be used to automatically convert the value. If this is not defined, the values will not be converted. Has to be applied to the quantity that a column should be mapped to.

## Plot Annotation¶

This annotation can be used to add a plot to a section or quantity. Example:

class Evaporation(MSection):
m_def = Section(a_plot={
'label': 'Temperature and Pressure',
'x': 'process_time',
'y': ['./substrate_temperature', './chamber_pressure'],
'config': {
'editable': True,
'scrollZoom': False
}
})
time = Quantity(type=float, shape=['*'], unit='s')
substrate_temperature = Quantity(type=float, shape=['*'], unit='K')
chamber_pressure = Quantity(type=float, shape=['*'], unit='Pa')


You can create multi-line plots by using lists of the properties y (and x). You either have multiple sets of y-values over a single set of x-values. Or you have pairs of x and y values. For this purpose the annotation properties x and y can reference a single quantity or a list of quantities. For repeating sub sections, the section instance can be selected with an index, e.g. "sub_section_name/2/parameter_name" or with a slice notation start:stop where negative values index from the end of the array, e.g. "sub_section_name/1:-5/parameter_name".

name type
label str Is passed to plotly to define the label of the plot.
x Union[List[str], str] A path or list of paths to the x-axes values. Each path is a / separated list of sub-section and quantity names that leads from the annotation section to the quantity. Repeating sub sections are indexed between two /s with an integer or a slice start:stop.
y Union[List[str], str] A path or list of paths to the y-axes values. list of sub-section and quantity names that leads from the annotation section to the quantity. Repeating sub sections are indexed between two /s with an integer or a slice start:stop.
lines List[dict] A list of dicts passed as traces to plotly to configure the lines of the plot. See https://plotly.com/javascript/reference/scatter/ for details.
layout dict A dict passed as layout to plotly to configure the plot layout. See https://plotly.com/javascript/reference/layout/ for details.
config dict A dict passed as config to plotly to configure the plot functionallity. See https://plotly.com/javascript/configuration-options/ for details.

Coming soon ...

## Custom normalizers¶

For custom schemas, you might want to add custom normalizers. All files are parsed and normalized when they are uploaded or changed. The NOMAD metainfo Python interface allows you to add functions that are called when your data is normalized.

Here is an example:

from nomad.datamodel import EntryData, ArchiveSection
from nomad.metainfo.metainfo import Quantity, Datetime, SubSection

class Sample(ArchiveSection):
formula = Quantity(type=str)

sample_id = Quantity(type=str)

def normalize(self, archive, logger):
super(Sample, self).normalize(archive, logger)

if self.sample_id is None:

class SampleDatabase(EntryData):
samples = SubSection(section=Sample, repeats=True)


To add a normalize function, your section has to inherit from ArchiveSection which provides the base for this functionality. Now you can overwrite the normalize function and add you own behavior. Make sure to call the super implementation properly to support schemas with multiple inheritance.

If we parse an archive like this:

data:
m_def: 'examples.archive.custom_schema.SampleDatabase'
samples:
- formula: NaCl


we will get a final normalized archive that contains our data like this:

{
"data": {
"m_def": "examples.archive.custom_schema.SampleDatabase",
"samples": [
{
`