Introduction to Big-Data Analytics
We develop and implement methods that identify correlations and structure in big data of materials.
The preparation, synthesis, and characterization of new materials is a complex and costly aspect of materials design. About 200,000 materials are “known” to exist, but the basic properties (e.g., optical gap, elasticity constants, plasticity, piezoelectric tensors, conductivity, etc.) have been determined for very few of them. Considering organic and inorganic materials, surfaces, interfaces, and nanostructures, as well as inorganic/organic hybrids, the number of possible materials is practically infinite. It is therefore highly likely that new materials with superior (but currently unknown) properties exist but still have yet to be identified, which could help address fundamental issues in a number of widespread fields such as energy storage and transformation, mobility, safety, information, and health.
Despite a huge number of possible materials, we note that “the chemical compound space” is sparsely populated when the focus is on selected properties or functions. Our aim is to develop big-data analytics tools that will help to sort all of the available materials data to identify trends and anomalies.
This is a significant challenge that will require a diverse set of domain- or even property-specific big-data analytics tools. The overarching topics that will be addressed in the NOMAD Laboratory are:
- Crystal-structure prediction with the capability of quantifying the energy difference between different (metastable) structures of the same composition,
- Scanning for good thermoelectric materials,
- Finding better materials for heterogeneous catalysis, e.g. focusing on CO2 activation and methane oxidation,
- Searching for better optoelectronic and better photovoltaic materials,
- Analyzing steels and their plasticity,
- … and more.
People who are not part of the NOMAD team, be they in industry or academia, are encouraged to collaborate on the developments and to employ the developed methods for their topics of interest. The results of an analysis will be visualizable using advanced graphics. Any study can be kept fully private or shared with others.
The Artificial Intelligence Toolkit will provide a simple and comprehensive interface for advanced searching of the code-independent NOMAD database and for performing sophisticated analysis on the retrieved data.
The code independent data is described using NOMAD Meta Info, an open, flexible, and hierarchical metadata classification system that we developed and to which anybody can contribute. The NOMAD Meta Info aims at defining a conceptual model to store the values connected to atomistic or ab initio calculations. A clear and usable metadata definition is a prerequisite to preparing the data for analysis.
In collaboration with the Berlin Big Data Center (BBDC), we use the Apache Flink infrastructure to support and go beyond the standard MapReduce model to enable rapid and complex queries.