### NOMAD Artificial Intelligence Toolkit

### The NOMAD Artificial Intelligence Toolkit contains powerful tools for finding completely new patterns and information in materials science Big Data

We develop and implement methods that identify correlations and structure in big data of materials. This will enable scientists and engineers to decide which materials are useful for specific applications or which new materials should be the focus of future studies. The following tutorials are designed to get started with the Artificial Intelligence Toolkit.

### How to get started!

### Access to the Tutorials

To access any tutorial, click on a title below to unveil a short description and the access button.

All tutorial notebooks are accessible anonymously, without need to log in.

To log in and access your work directory and stored notebooks, click here.

To register, please follow the same link .

*created by:* Sbailò, Luigi | Scheffler, Matthias | Ghiringhelli, Luca M.

In this tutorial, we demonstrate how to query the NOMAD Archive from the NOMAD Analytics toolkit. We then show examples of machine learning analysis performed on the retrieved data set.

*Keywords:* Materials properties prediction | Data visualization

*Method:* Clustering | Dimension reduction | Random forest

*created by:* Leitherer, Andreas | Ziletti, Angelo | Ghiringhelli, Luca M.

In this tutorial we will give an introduction to ARISE (Leitherer, Ziletti, Ghiringhelli arXiv:2103.09777).

*Keywords:* Bayesian deep learning | Unsupervised learning | SOAP | grain boundaries | binaries | ternaries | low-dimensional materials

*Method:* Bayesian deep learning | Unsupervised learning

*created by:* Arif, Mohammad-Yasin | Sbailò, Luigi | Ghiringhelli, Luca M.

In this tutorial, we present a method, based on subgroup discovery, for detecting domains of applicability (DA) of ML models within a materials class. The domain of applicability of an ML model is the region of input space where the model predicts the target property with the smallest uncertainty. The utility of this approach is demonstrated by analyzing three state-of-the-art ML models for predicting the formation energy of transparent conducting oxides.

*Keywords:* SOAP | MBTR | n-gram | Formation energy prediction | Transparent Conducting Oxides | heterogeneous catalysis

*Method:* Subgroup discovery | Kernel ridge regression

*created by:* Sbailò, Luigi | Purcell, Thomas A. R. | Ghiringhelli, Luca M. | Scheffler, Matthias

Learn how to find descriptive parameters (short formulas) that predict whether alloyed materials are topological or trivial insulators, using the example of tetradymites. This notebook is based on the algorithm 'sure independence screening and sparsifying operator' (SISSO) that enables to search for optimal descriptor by scanning huge feature spaces.

*Keywords:* Tetradymites | Topological insulators

*Method:* SISSO | Classification

*created by:* Arif, Mohammad-Yasin | Sbailò, Luigi | Purcell, Thomas A. R. | Ghiringhelli, Luca M. | Scheffler, Matthias

A tool for predicting the difference in the total energy between different polymorphs for 82 octet binary compounds, which gives an indication of the stability of the material. This is accomplished by identifying a set of descriptive parameters (a descriptor) from the free-atom data for the binary atomic species comprising the material using the Sure Independent Screening (SIS) + l0-norm minimization approach.

*Keywords:* Octet binaries

*Method:* SISSO

*created by:* Mazheika, Aliaksei | Sbailò, Luigi | Ghiringhelli, Luca | Levchenko, Sergey | Scheffler, Matthias

In this interactive tutorial we show the application of subgroup discovery for the search for indicators of carbond-dioxide activation with the aim of its further conversion.

*Keywords:* CO2 activation | heterogeneous catalysis

*Method:* Subgroup discovery

*created by:* Bieniek, Björn | Strange, Mikkel | Carbogno, Christian | Arif, Mohammad-Yasin | Sbailò, Luigi | Scheffler, Matthias

A set of tools to analyze the error in electronic structure calculations due to the choice of numerical settings. We use the NOMAD infrastructure to systematically investigate the deviances in total and relative energies as function of typical settings for basis sets, k-grids, etc. for 71 elemental and 81 binary solids in three different electronic-structure codes.

*Keywords:* Binaries | Elemental solids

*Method:* Linear Least-squares Regression

*created by:* Liu, Xiangyue | Sutton, Christopher | Yamamoto, Takenori | Lysogorskiy, Yury | Blumenthal, Lars | Hammerschmidt, Thomas | Golebiowski, Jacek | Ziletti, Angelo | Scheffler, Matthias | Ghiringhelli, Luca M.

In this tutorial, we will explore the best results of the NOMAD 2018 Kaggle research competition. The goal of this competition was to develop machine-learning models for the prediction of two target properties: the formation energy and the bandgap energy of transparent semiconducting oxides. The purpose of the modelling is to facilitate the discovery of new such materials and allow for advancements in (opto)electronic technologies

*Keywords:* Formation energy prediction | Band gap energy prediction

*Method:* Kernel ridge regression | Neural networks | SOAP | n-gram

*created by:* Fekete, Ádám | Stella, Martina | Lambert, Henry | De Vita, Alessandro | Csányi, Gábor

In this tutorial we will be using a machine learning method (clustering) to analyse results of Grain Boundary (GB) calculations of alpha-iron. Along the way we will learn about different methods to describe local atomic environment in order to calculate properties of GBs. We will use these properties to separate the different regions of the GB using clustering methods. Finally we will determine how the energy of the GB is changing according to the angle difference of the regions.

*Keywords:* Grain boundaries

*Method:* Clustering | K-means | Gaussian mixture

*created by:* Sbailò, Luigi | Ghiringhelli, Luca M.

Exploratory analyses make use of unsupervised learning techniques to extract information from unknown datasets. In this tutorial, we make use of some of the most popular clustering and dimension reduction algorithms to analyze a dataset composed of 82 octet-binary compounds.

*Keywords:* k-means | Hierarchical clustering | DBSCAN | HDBSCAN | DenPeak | PCA | t-SNE | MDS | Octet binaries

*Method:* Clustering | Dimension reduction

*created by:* Ahmetcik, Emre | Ziletti, Angelo | Ouyang, Runhai | Luigi Sbailò | Scheffler, Matthias | Ghiringhelli, Luca M.

In this tutorial we will show how to find descriptive parameters to predict materials properties using symbolic regrression combined with compressed sensing tools. The relative stability of the zincblende (ZB) versus rocksalt (RS) structure of binary materials is predicted and compared against a model trained with kernel ridge regression.

*Keywords:* Compressed sensing | Symbolic regression | Descriptors

*Method:* LASSO | SISSO | Kernel ridge regression

*created by:* Sbailò, Luigi | Ghiringhelli, Luca M.

In this tutorial we introduce to the most popular clustering algorithms. We focus on partitioning, hierarchical and density-based clustering algorithms, and methods are tested on artificial datasets of increasing complexity

*Keywords:* k-means | Hierarchical clustering | DBSCAN | HDBSCAN

*Method:* Clustering

*created by:* Speckhard, Daniel | Leitherer, Andreas | Ghiringhelli, Luca M.

In this tutorial we will introduce decision trees. We go through a toy model introducing the SKLearn API. We then discuss piece by piece the different theoretical aspects of trees. We then move to training a regression tree and classification tree on different datasets related to materials sceience. We end the tutorial by covering random forests and bagging classfifers.

*Keywords:* Images

*Method:* Decison tree | Random forest | Bagging classifier

*created by:* Langer, Marcel F.

In this tutorial, we'll explore the application of kernel ridge regression to the prediction of materials properties. We will begin with a largely informal, pragmatic introduction to kernel ridge regression, including a rudimentary implementation, in order to become familiar with the basic terminology and considerations. We will then discuss representations, and re-trace the NOMAD 2018 Kaggle challenge.

*Keywords:* Formation energy prediction

*Method:* Kernel ridge regression | SOAP

*created by:* Leitherer, Andreas | Ziletti, Angelo | Sbailò, Luigi | Scheffler, Matthias | Ghiringhelli, Luca M.

In this tutorial we will use the ElemNet neural network architecture (https://github.com/NU-CUCIS/ElemNet) to predict the volume per atom of inorganic compounds, where the open quantum materials database (OQMD) is used as a resource (specifically, the data is taken from Ward et. al., npj Comput. Mater. 2, 16028 (2016)).

*Keywords:* Deep neural networks | Descriptors

*Method:* Neural networks

*created by:* Langer, Marcel F.

In this tutorial we will get to know cmlkit, a python package for specifying, evaluating, and optimising machine learning models, and use it to compete in the Nomad 2018 Kaggle challenge.

*Keywords:* Formation energy prediction

*Method:* Kernel ridge regression | SOAP | MBTR | Symmetry Functions

*created by:* Regler, Benjamin | Scheffler, Matthias | Ghiringhelli, Luca M.

This interactive notebook includes the original implementation of total cumulative mutual information (TCMI) to reproduce the main results presented in the publication.

*Keywords:* information theory | mutual information | cumulative entropy | feature selection

*Method:* Clustering | TCMI

*Language(s):* python | javascript

*created by:* Csányi, Gábor | Kermode, James R.

In this tutorial, we will use Gaussian process regression, GPR (or equivalently, Kernel Ridge Regression, KRR) to train and predict charges of atoms in small organic molecules.

*Keywords:* GDB molecular database

*Method:* Gaussian-process regression | Kernel ridge regression

*created by:* Fekete, Ádám | Stella, Martina | Lambert, Henry | De Vita, Alessandro | Csányi, Gábor

In this tutorial we will be using a Gaussian Approximation Potentials to analyse results of TB DFT calculations of Si surface. Along the way we will learn about different descriptors (2b, 3b, soap) to describe local atomic environment in order to predict energies and forces of Si surface.

*Keywords:* SOAP descriptor | Gaussian Approximation Potentials (GAP)

*Method:* Gaussian-process regression | Kernel ridge regression

*created by:* Ziletti, Angelo | Leitherer, Andreas | Ghiringhelli, Luca M.

In this tutorial, we briefly introduce the main ideas behind convolutional neural networks, build a neural network model with Keras, and explain the classification decision process using attentive response maps.

*Keywords:* Classification | Neural Networks

*Method:* Convolutional Neural networks | Attentive response map

### Filter:

show featured only

text filter: