Transform JSON Data Structures with Transformer¶
Who is this how-to guide for?¶
This guide is designed for mid-level NOMAD users who need to parse JSON-formatted data structures into another JSON
format. An example use case is transforming an external API response, partially or wholly, onto a NOMAD archive in a
standardized format. This document will cover the basic principles and common applications of the JsonToJson Transformer
from
the nomad-lab
package.
What should you know before this how-to guide?¶
Before diving into this guide, you should be familiar with the following:
- A basic understanding of the
nomad-lab
package. Follow the How to install nomad-lab guide
What you will know at the end of this how-to guide?¶
By the end of this how-to guide, you will:
- Understand how to use the
Transformer
class to transform JSON data structures. - Be able to apply transformation rules to transform data from one format to another.
- Learn how to customize and extend data conversion rules for specific needs.
Steps¶
1. Define Your Transformation Rules¶
Create a Python file and define the rules that specify how to transform your JSON data. These rules dictate where to
source
data in the input JSON and where and how to place it in the output JSON. Set the following JSON data to a
variable json_example
and create rules as:
{
"data": {
"a": 1,
"b": 2
},
"schema": {
"rules": {
"rule_a": {
"source": "a",
"target": "a"
},
"rule_b": {
"source": "b",
"target": "b"
}
}
}
}
Use this script to load the rules:
from nomad.datamodel.metainfo.annotations import Rules
rules = {
"example_transformation": Rules(
json_example['schema']
)
}
2. Initialize the Transformer¶
In your Python script, initialize the Transformer
with the rules you defined.
3. Prepare Your Source JSON¶
Prepare the JSON data that needs to be transformed. This data can come from files, API responses, or other sources. Here
we are reading from the json_example
from above:
4. Transform the Data¶
Use the transform
method of your transformer instance to apply the transformation rules to your source JSON.
transformed_json = transformer.transform(source_json, "example_transformation")
print(transformed_json)
this should produce:
Advanced Usage of JsonToJson Transformer¶
Handling Complex JSON Transformations¶
In more complex scenarios, you may need to handle nested JSON structures, apply conditional logic, or resolve
references. This section will guide you through using advanced features of the Transformer
class to address these
challenges.
Prerequisites¶
Before proceeding, it is better to have read the basic instructions explained above as it contains information on how to load the transformer.
Complex Rules Definition¶
In more sophisticated environments, transformation rules may require the evaluation of conditions, the manipulation of lists, or the resolution of nested structures. Below are examples of such advanced usage:
Conditional Copy Based on Regex¶
You can use regular expressions to conditionally copy data based on pattern matching. This is useful when you need to filter or format data before placing it in the target JSON.
{
"data": {
"a": 30
},
"schema": {
"rules": {
"rule_age": {
"source": "a",
"target": "age",
"conditions": [
{
"regex_condition": {
"regex_path": "a",
"regex_pattern": "^3\\d$"
}
}
],
"default_value": "default_age"
}
}
}
}
transformer = Transformer(mapping_dict=rules)
transformed_json = transformer.transform(source_json, "conditional_transformation_met")
print(transformed_json)
The rule checks if the value at path "a" matches the regex pattern (i.e., starts with '3' followed by any digit). If the condition is not met, the target "age" is set to "default_age".
this should produce:
Resolving References¶
When dealing with complex data structures, it might be necessary to use the structure of another rule
defined in a
different part of the transformation. For this you can set the reference
value to the path of your interest (keep in
mind that this path should be started with #
sign).
Important
The referenced rule's values will be overwritten by the local rule; meaning you can partly import the referenced rule and overwrite other fields onto your desired values.
source_json = {
"users": [
{"name": "user_1", "role": "manager", "manager_id": "101"},
{"name": "user_2", "role": "employee", "manager_id": "102"},
{"name": "user_3", "role": "manager", "manager_id": "103"}
],
"details": [
{"id": "101", "name": "user_4", "department": "A"},
{"id": "102", "name": "user_5", "department": "B"},
{"id": "103", "name": "user_6", "department": "C"}
]
}
rules = {
"employee_info": Rules(
name="Employee Info Mapping",
rules={
"rule_manager": Rule(
source="users[?role=='manager'].manager_id | [0]",
target="manager_details",
use_rule="#employee_info.details"
),
"details": Rule(
source="details[?id=='101'] | [0]",
target="specific_manager"
)
}
)
}
In this example, the first item in the rule
list, is a reference to the second item in the list. This setup extracts
manager details a specific manager from the list.
Implementing Advanced Transformations¶
Nested Structure Manipulation¶
You can manipulate nested structures by specifying deeper paths and using lists or dictionaries as intermediary storage.
{
"data": {
"f": {
"nested": {
"key": "value"
}
}
},
"schema": {
"rules": {
"rule_f_nested_key": {
"source": "f.nested.key",
"target": "nested_value"
}
}
}
}
The Transformer can handle deeply nested JSON structures. Define rules that navigate through nested paths to extract or set data.