Artificial Intelligence Toolkit - Tutorials
Get started with the Artificial Intelligence Toolkit Tutorials!
We develop and implement methods that identify correlations and structure in big data of materials. This will enable scientists and engineers to decide which materials are useful for specific applications or which new materials should be the focus of future studies. The following tutorials are designed to get started with the Artificial Intelligence Toolkit.
How to get started!
Access to the Tutorials
To access any tutorial, click on a title below to unveil a short description and the access button.
All tutorial notebooks are accessible anonymously, without need to log in.
To log in and access your work directory and stored notebooks, click here.
To register, please follow the same link .
In this tutorial, we demonstrate how to query the NOMAD Archive from the NOMAD Analytics toolkit. We then show examples of machine learning analysis performed on the retrieved data set.
In this tutorial we will give an introduction to ARISE (Leitherer, Ziletti, Ghiringhelli arXiv:2103.09777).
In this tutorial, we present a method, based on subgroup discovery, for detecting domains of applicability (DA) of ML models within a materials class. The domain of applicability of an ML model is the region of input space where the model predicts the target property with the smallest uncertainty. The utility of this approach is demonstrated by analyzing three state-of-the-art ML models for predicting the formation energy of transparent conducting oxides.
Learn how to find descriptive parameters (short formulas) that predict whether alloyed materials are topological or trivial insulators, using the example of tetradymites. This notebook is based on the algorithm 'sure independence screening and sparsifying operator' (SISSO) that enables to search for optimal descriptor by scanning huge feature spaces.
A tool for predicting the difference in the total energy between different polymorphs for 82 octet binary compounds, which gives an indication of the stability of the material. This is accomplished by identifying a set of descriptive parameters (a descriptor) from the free-atom data for the binary atomic species comprising the material using the Sure Independent Screening (SIS) + l0-norm minimization approach.
Keywords: Octet binaries
A set of tools to analyze the error in electronic structure calculations due to the choice of numerical settings. We use the NOMAD infrastructure to systematically investigate the deviances in total and relative energies as function of typical settings for basis sets, k-grids, etc. for 71 elemental and 81 binary solids in three different electronic-structure codes.
Method: Linear Least-squares Regression
created by: Liu, Xiangyue | Sutton, Christopher | Yamamoto, Takenori | Lysogorskiy, Yury | Blumenthal, Lars | Hammerschmidt, Thomas | Golebiowski, Jacek | Ziletti, Angelo | Scheffler, Matthias | Ghiringhelli, Luca M.
In this tutorial, we will explore the best results of the NOMAD 2018 Kaggle research competition. The goal of this competition was to develop machine-learning models for the prediction of two target properties: the formation energy and the bandgap energy of transparent semiconducting oxides. The purpose of the modelling is to facilitate the discovery of new such materials and allow for advancements in (opto)electronic technologies
In this tutorial we will be using a machine learning method (clustering) to analyse results of Grain Boundary (GB) calculations of alpha-iron. Along the way we will learn about different methods to describe local atomic environment in order to calculate properties of GBs. We will use these properties to separate the different regions of the GB using clustering methods. Finally we will determine how the energy of the GB is changing according to the angle difference of the regions.
Keywords: Grain boundaries
Exploratory analyses make use of unsupervised learning techniques to extract information from unknown datasets. In this tutorial, we make use of some of the most popular clustering and dimension reduction algorithms to analyze a dataset composed of 82 octet-binary compounds.
In this tutorial we will show how to find descriptive parameters to predict materials properties using symbolic regrression combined with compressed sensing tools. The relative stability of the zincblende (ZB) versus rocksalt (RS) structure of binary materials is predicted and compared against a model trained with kernel ridge regression.
In this tutorial we introduce to the most popular clustering algorithms. We focus on partitioning, hierarchical and density-based clustering algorithms, and methods are tested on artificial datasets of increasing complexity
In this tutorial we will introduce decision trees. We go through a toy model introducing the SKLearn API. We then discuss piece by piece the different theoretical aspects of trees. We then move to training a regression tree and classification tree on different datasets related to materials sceience. We end the tutorial by covering random forests and bagging classfifers.
created by: Langer, Marcel F.
In this tutorial, we'll explore the application of kernel ridge regression to the prediction of materials properties. We will begin with a largely informal, pragmatic introduction to kernel ridge regression, including a rudimentary implementation, in order to become familiar with the basic terminology and considerations. We will then discuss representations, and re-trace the NOMAD 2018 Kaggle challenge.
Keywords: Formation energy prediction
In this tutorial we will use the ElemNet neural network architecture (https://github.com/NU-CUCIS/ElemNet) to predict the volume per atom of inorganic compounds, where the open quantum materials database (OQMD) is used as a resource (specifically, the data is taken from Ward et. al., npj Comput. Mater. 2, 16028 (2016)).
Method: Neural networks
created by: Langer, Marcel F.
In this tutorial we will get to know cmlkit, a python package for specifying, evaluating, and optimising machine learning models, and use it to compete in the Nomad 2018 Kaggle challenge.
Keywords: Formation energy prediction
This interactive notebook includes the original implementation of total cumulative mutual information (TCMI) to reproduce the main results presented in the publication.
In this tutorial, we will use Gaussian process regression, GPR (or equivalently, Kernel Ridge Regression, KRR) to train and predict charges of atoms in small organic molecules.
Keywords: GDB molecular database
In this tutorial we will be using a Gaussian Approximation Potentials to analyse results of TB DFT calculations of Si surface. Along the way we will learn about different descriptors (2b, 3b, soap) to describe local atomic environment in order to predict energies and forces of Si surface.
In this tutorial, we briefly introduce the main ideas behind convolutional neural networks, build a neural network model with Keras, and explain the classification decision process using attentive response maps.
show featured only