How to extend the search¶
The search indices¶
NOMAD uses Elasticsearch as the underlying search engine. The respective indices
are automatically populated during processing and other NOMAD operations. The indices
are built from some of the archive information of each entry. These are mostly the
sections metadata
(ids, user metadata, other "administrative" and "internal" metadata)
and results
(a summary of all extracted (meta)data). However, these sections are not
indexed verbatim. What exactly and how it is indexed is determined by the Metainfo
and the elasticsearch
Metainfo extension.
The elasticsearch
Metainfo extension¶
Here is the definition of results.material.elements
as an example:
class Material(MSection):
...
elements = Quantity(
type=MEnum(chemical_symbols),
shape=["0..*"],
default=[],
description='Names of the different elements present in the structure.',
a_elasticsearch=[
Elasticsearch(material_type, many_all=True),
Elasticsearch(suggestion="simple")
]
)
Extensions are denoted with the a_
prefix as in a_elasticsearch
.
Since extensions can have all kinds of values, the elasticsearch
extension is rather
complex and uses the Elasticsearch
class.
There can be multiple values. Each Elasticsearch
instance configures a different part
of the index. This means that the same quantity can be indexed multiple time. For example,
if you need a text- and a keyword-based search for the same data. Here
is a version of the metadata.mainfile
definition as another example:
mainfile = metainfo.Quantity(
type=str, categories=[MongoEntryMetadata, MongoSystemMetadata],
description='The path to the mainfile from the root directory of the uploaded files',
a_elasticsearch=[
Elasticsearch(_es_field='keyword'),
Elasticsearch(
mapping=dict(type='text', analyzer=path_analyzer.to_dict()),
field='path', _es_field='')
]
)
The different indices¶
The first (optional) argument for Elasticsearch
determines where the data is indexed.
There are three principle places:
- the entry index (
entry_type
, default) - the materials index (
material_type
) - the entries within the materials index (
material_entry_type
)
Entry index¶
This is the default and is used even if another (additional) value is given. All data is put into the entry index.
Materials index¶
This is a separate index from the entry index and contains aggregated material information. Each document in this index represents a material. We use a hash over some material properties (elements, system type, symmetry) to define what a material is and which entries belong to which material.
Some parts of the material documents contain the material information that is always the same for all entries of this material. Examples are elements, formulas, symmetry.
Material entries¶
The materials index also contains entry-specific information that allows to filter materials for the existence of entries with certain criteria. Examples are publish status, user metadata, used method, or property data.
Adding quantities¶
In principle, all quantities could be added to the index, but for convention and
simplicity, only quantities defined in the sections metadata
and results
should be
added. This means that if you want to add custom quantities from your parser, for example,
you will also need to customize the results normalizer to copy or reference parsed data.
The search API¶
The search API does not have to change. It automatically supports all quantities with the
elasticsearch
extension. The keys that you can use in the API are the Metainfo paths of
the respective quantities, e.g. results.material.elements
or mainfile
(note that the
metadata.
prefix is always omitted). If there are multiple elasticsearch
annotations
for the same quantity, all but one define a field
parameter, which is added to the
quantity path, e.g. mainfile.path
.
The search web interface¶
Attention
Coming soon ...