Skip to content

How to navigate the code

NOMAD is a complex project with lots of parts. This guide gives you a rough overview about the codebase and ideas about what to look at first.

Git Projects

There is one main NOMAD project (and its fork on GitHub). This project contains all the framework and infrastructure code. It instigates all checks, builds, and deployments for the public NOMAD service, the NOMAD Oasis, and the nomad-lab Python package. All contributions to NOMAD have to go through this project eventually.

All (Git) projects that NOMAD depends on are either a Git submodule (you find them all in the dependencies directory or its subdirectories) or they are listed as PyPI packages in the pyproject.toml of the main project (or one of its submodules).

You can also have a look at the list of parsers and built-in plugins that constitute the majority of these projects. The only other projects are MatID, DOS fingerprints, and the NOMAD Remote Tools Hub.

Note

The GitLab organization nomad-lab and the GitHub organizations for FAIRmat and the NOMAD CoE all represent larger infrastructure and research projects, and they include many other Git projects that are not related. When navigating the codebase, only follow the submodules.

Python code

There are three main directories with Python code:

  • nomad: The actual NOMAD code. It is structured into more subdirectories and modules.

  • tests: Tests (pytest) for the NOMAD code. It follows the same module structure, but Python files are prefixed with test_.

  • examples: A few small Python scripts that might be linked in the documentation.

The nomad directory contains the following "main" modules. This list is not extensive but should help you to navigate the codebase:

  • app: The FastAPI APIs: v1 and v1.2 NOMAD APIs, OPTIMADE, DCAT, h5grove, and more.

  • archive: Functionality to store and access archive files. This is the storage format for all processed data in NOMAD. See also the docs on structured data.

  • cli: The command line interface (based on Click). Subcommands are structured into submodules.

  • config: NOMAD is configured through the nomad.yaml file. This contains all the (Pydantic) models and default config parameters.

  • datamodel: The built-in schemas (e.g. nomad.datamodel.metainfo.workflow used to construct workflows). The base sections and section for the shared entry structure. See also the docs on the datamodel and processing.

  • metainfo: The Metainfo system, e.g. the schema language that NOMAD uses.

  • normalizing: All the normalizers. See also the docs on processing.

  • parsing: The base classes for parsers, matching functionality, parser initialization, some fundamental parsers like the archive parser. See also the docs on processing.

  • processing: It's all about processing uploads and entries. The interface to Celery and MongoDB.

  • units: The unit and unit conversion system based on Pint.

  • utils: Utility modules, e.g. the structured logging system (structlog), id generation, and hashes.

  • files.py: Functionality to maintain the files for uploads in staging and published. The interface to the file system.

  • search.py: The interface to Elasticsearch.

GUI code

The NOMAD UI is written as a React single-page application (SPA). It uses (among many other libraries) MUI, Plotly, and D3. The GUI code is maintained in the gui directory. Most relevant code can be found in gui/src/components. The application entry point is gui/src/index.js.

Documentation

The documentation is based on MkDocs. The important files and directories are:

  • docs: Contains all the Markdown files that contribute to the documentation system.

  • mkdocs.yml: The index and configuration of the documentation. New files have to be added here as well.

  • nomad/mkdocs.py: Python code that defines macros which can be used in Markdown.

Other top-level directories

  • dependencies: Contains all the submodules, e.g. the parsers.

  • ops: Contains artifacts to run NOMAD components, e.g. docker-compose.yaml files, and our Kubernetes Helm chart.

  • scripts: Contains scripts used during the build or for certain development tasks.