### Artificial Intelligence Toolkit - Tutorials

### Get started with the Artificial Intelligence Toolkit Tutorials!

We develop and implement methods that identify correlations and structure in big data of materials. This will enable scientists and engineers to decide which materials are useful for specific applications or which new materials should be the focus of future studies. The following tutorials are designed to get started with the Analytics Toolkit.

To log in directly, click here.

*created by:* Sbailò, Luigi | Scheffler, Matthias | Ghiringhelli, Luca M.

In this tutorial, we demonstrate how to query the NOMAD Archive from the NOMAD Analytics toolkit. We then show examples of machine learning analysis performed on the retrieved data set.

*Keywords:* Materials properties prediction | Data visualization

*Method:* Clustering | Dimension reduction | Random forest

*created by:* Sbailò, Luigi | Purcell, Thomas A. R. | Ghiringhelli, Luca M. | Scheffler, Matthias

Learn how to find descriptive parameters (short formulas) that predict whether alloyed materials are topological or trivial insulators, using the example of tetradymites. This notebook is based on the algorithm 'sure independence screening and sparsifying operator' (SISSO) that enables to search for optimal descriptor by scanning huge feature spaces.

*Keywords:* Tetradymites | Topological insulators

*Method:* SISSO | Classification

*created by:* Arif, Mohammad-Yasin | Sbailò, Luigi | Purcell, Thomas A. R. | Ghiringhelli, Luca M. | Scheffler, Matthias

A tool for predicting the difference in the total energy between different polymorphs for 82 octet binary compounds, which gives an indication of the stability of the material. This is accomplished by identifying a set of descriptive parameters (a descriptor) from the free-atom data for the binary atomic species comprising the material using the Sure Independent Screening (SIS) + l0-norm minimization approach.

*Keywords:* Octet binaries

*Method:* SISSO

*created by:* Mazheika, Aliaksei | Sbailò, Luigi | Ghiringhelli, Luca | Levchenko, Sergey | Scheffler, Matthias

In this interactive tutorial we show the application of subgroup discovery for the search for indicators of carbond-dioxide activation with the aim of its further conversion.

*Keywords:* CO2 activation | heterogeneous catalysis

*Method:* Subgroup discovery

*created by:* Bieniek, Björn | Strange, Mikkel | Carbogno, Christian | Arif, Mohammad-Yasin | Sbailò, Luigi | Scheffler, Matthias

A set of tools to analyze the error in electronic structure calculations due to the choice of numerical settings. We use the NOMAD infrastructure to systematically investigate the deviances in total and relative energies as function of typical settings for basis sets, k-grids, etc. for 71 elemental and 81 binary solids in three different electronic-structure codes.

*Keywords:* Binaries | Elemental solids

*Method:* Linear Least-squares Regression

*created by:* Liu, Xiangyue | Sutton, Christopher | Yamamoto, Takenori | Lysogorskiy, Yury | Blumenthal, Lars | Hammerschmidt, Thomas | Golebiowski, Jacek | Ziletti, Angelo | Scheffler, Matthias | Ghiringhelli, Luca M.

In this tutorial, we will explore the best results of the NOMAD 2018 Kaggle research competition. The goal of this competition was to develop machine-learning models for the prediction of two target properties: the formation energy and the bandgap energy of transparent semiconducting oxides. The purpose of the modelling is to facilitate the discovery of new such materials and allow for advancements in (opto)electronic technologies

*Keywords:* Formation energy prediction | Band gap energy prediction

*Method:* Kernel ridge regression | Neural networks | SOAP | n-gram

*created by:* Fekete, Ádám | Stella, Martina | Lambert, Henry | De Vita, Alessandro | Csányi, Gábor

In this tutorial we will be using a machine learning method (clustering) to analyse results of Grain Boundary (GB) calculations of alpha-iron. Along the way we will learn about different methods to describe local atomic environment in order to calculate properties of GBs. We will use these properties to separate the different regions of the GB using clustering methods. Finally we will determine how the energy of the GB is changing according to the angle difference of the regions.

*Keywords:* Grain boundaries

*Method:* Clustering | K-means | Gaussian mixture

*created by:* Sbailò, Luigi | Ghiringhelli, Luca M.

Exploratory analyses make use of unsupervised learning techniques to extract information from unknown datasets. In this tutorial, we make use of some of the most popular clustering and dimension reduction algorithms to analyze a dataset composed of 82 octet-binary compounds.

*Keywords:* k-means | Hierarchical clustering | DBSCAN | HDBSCAN | DenPeak | PCA | t-SNE | MDS | Octet binaries

*Method:* Clustering | Dimension reduction

*created by:* Ahmetcik, Emre | Ziletti, Angelo | Ouyang, Runhai | Luigi Sbailò | Scheffler, Matthias | Ghiringhelli, Luca M.

In this tutorial we will show how to find descriptive parameters to predict materials properties using symbolic regrression combined with compressed sensing tools. The relative stability of the zincblende (ZB) versus rocksalt (RS) structure of binary materials is predicted and compared against a model trained with kernel ridge regression.

*Keywords:* Compressed sensing | Symbolic regression | Descriptors

*Method:* LASSO | SISSO | Kernel ridge regression

*created by:* Sbailò, Luigi | Ghiringhelli, Luca M.

In this tutorial we introduce to the most popular clustering algorithms. We focus on partitioning, hierarchical and density-based clustering algorithms, and methods are tested on artificial datasets of increasing complexity

*Keywords:* k-means | Hierarchical clustering | DBSCAN | HDBSCAN

*Method:* Clustering

*created by:* Daniel Speckhard | Andreas Leitherer | Luca Ghiringhelli

In this tutorial we will introduce decision trees. We go through a toy model introducing the SKLearn API. We then discuss piece by piece the different theoretical aspects of trees. We then move to training a regression tree and classification tree on different datasets related to materials sceience. We end the tutorial by covering random forests and bagging classfifers.

*Keywords:* Images

*Method:* Decison tree | Random forest | Bagging classifier

*created by:* Langer, Marcel F.

In this tutorial, we'll explore the application of kernel ridge regression to the prediction of materials properties. We will begin with a largely informal, pragmatic introduction to kernel ridge regression, including a rudimentary implementation, in order to become familiar with the basic terminology and considerations. We will then discuss representations, and re-trace the NOMAD 2018 Kaggle challenge.

*Keywords:* Formation energy prediction

*Method:* Kernel ridge regression | SOAP

*created by:* Langer, Marcel F.

In this tutorial we will get to know cmlkit, a python package for specifying, evaluating, and optimising machine learning models, and use it to compete in the Nomad 2018 Kaggle challenge.

*Keywords:* Formation energy prediction

*Method:* Kernel ridge regression | SOAP | MBTR | Symmetry Functions

*created by:* Regler, Benjamin | Scheffler, Matthias | Ghiringhelli, Luca M.

This interactive notebook includes the original implementation of total cumulative mutual information (TCMI) to reproduce the main results presented in the publication.

*Keywords:* information theory | mutual information | cumulative entropy | feature selection

*Method:* Clustering | TCMI

*Language(s):* python | javascript

*created by:* Csányi, Gábor | Kermode, James R.

In this tutorial, we will use Gaussian process regression, GPR (or equivalently, Kernel Ridge Regression, KRR) to train and predict charges of atoms in small organic molecules.

*Keywords:* GDB molecular database

*Method:* Gaussian-process regression | Kernel ridge regression

*created by:* Fekete, Ádám | Stella, Martina | Lambert, Henry | De Vita, Alessandro | Csányi, Gábor

In this tutorial we will be using a Gaussian Approximation Potentials to analyse results of TB DFT calculations of Si surface. Along the way we will learn about different descriptors (2b, 3b, soap) to describe local atomic environment in order to predict energies and forces of Si surface.

*Keywords:* SOAP descriptor | Gaussian Approximation Potentials (GAP)

*Method:* Gaussian-process regression | Kernel ridge regression

*created by:* Ziletti, Angelo | Leitherer, Andreas | Ghiringhelli, Luca M.

In this tutorial, we briefly introduce the main ideas behind convolutional neural networks, build a neural network model with Keras, and explain the classification decision process using attentive response maps.

*Keywords:* Classification | Neural Networks

*Method:* Convolutional Neural networks | Attentive response map

### Filter:

show featured only

text filter: