A containerized cloud enabled architecture¶
NOMAD is a modern web-application that requires a lot of services to run. Some are NOMAD specific, others are 3rd party products. While all services can be traditionally installed and run on a single sever, NOMAD advocates the use of containers and operating NOMAD in a cloud environment.
NOMAD comprises two main services, its app and the worker. The app services our API, graphical user interface, and documentation. It is the outward facing part of NOMAD. The worker runs all the processing (parsing, normalization). Their separation allows to scale the system for various use-cases.
Other services are:
- rabbitmq: a task queue that we use to distribute tasks for the worker containers
- mongodb: a no-sql database used to maintain processing state and user-metadata
- elasticsearch: a no-sql database and search engine that drives our search
- a regular file system to maintain all the files (raw and archive)
- jupyterhub: run ai toolkit notebooks
- keycloak: our SSO user management system (can be used by all Oasises)
- a content management system to provide other web-page content (not part of the Oasis)
All NOMAD software is bundled in a single NOMAD docker image and a Python package (nomad-lab on pypi). The NOMAD docker image can be downloaded from our public registry. NOMAD software is organized in multiple git repositories. We use continuous integration to constantly provide the latest version of docker image and Python package.
NOMAD uses a modern and rich stack frameworks, systems, and libraries¶
Besides various scientific computing, machine learning, and computational material science libraries (e.g. numpy, skikitlearn, tensorflow, ase, spglib, matid, and many more), Nomad uses a set of freely available technologies that already solve most of its processing, storage, availability, and scaling goals. The following is a non comprehensive overview of used languages, libraries, frameworks, and services.
The backend of nomad is written in Python. This includes all parsers, normalizers, and other data processing. We only use Python 3 and there is no compatibility with Python 2. Code is formatted close to pep8, critical parts use pep484 type-hints. Pycodestyle, pylint, and mypy (static type checker) are used to ensure quality. Tests are written with pytest. Logging is done with structlog and logstash (see Elasticstack below). Documentation is driven by Sphinx.
Celery (+ rabbitmq) is a popular combination for realizing long running tasks in internet applications. We use it to drive the processing of uploaded files. It allows us to transparently distribute processing load while keeping processing state available to inform the user.
Elasticsearch is used to store repository data (not the raw files). Elasticsearch enables flexible, scalable search and analytics.
Keycloak is used for user management. It manages users and provides functions for registration, forgetting passwords, editing user accounts, and single sign-on to fairdi@nomad and other related services.
The elastic stack (previously ELK stack) is a centralized logging, metrics, and monitoring solution that collects data within the cluster and provides a flexible analytics front end for that data.
To run a nomad@FAIRDI instance, many services have to be orchestrated: the nomad app, nomad worker, mongodb, Elasticsearch, Keycloak, RabbitMQ, Elasticstack (logging), the nomad GUI, and a reverse proxy to keep everything together. Further services might be needed (e.g. JypiterHUB), when nomad grows. The container platform Docker allows us to provide all services as pre-build images that can be run flexibly on all types of platforms, networks, and storage solutions. Docker-compose allows us to provide configuration to run the whole nomad stack on a single server node.
kubernetes + helm¶
To run and scale nomad on a cluster, you can use kubernetes to orchestrated the necessary containers. We provide a helm chart with all necessary service and deployment descriptors that allow you to set up and update nomad with only a few commands.
Nomad as a software project is managed via GitLab. The nomad@FAIRDI project is hosted here. GitLab is used to manage versions, different branches of development, tasks and issues, as a registry for Docker images, and CI/CD platform.