# How to upload data To contribute your data to the repository, please, login to our [upload page](https://nomad-lab.eu/prod/rae/gui/uploads) (you need to register first, if you do not have a NOMAD account yet). *A note for returning NOMAD users!* We revised the upload process with browser based upload alongside new shell commands. The new Upload page allows you to monitor upload processing and verify processing results before publishing your data to the Repository. The [upload page](https://nomad-lab.eu/prod/rae/gui/uploads) acts as a staging area for your data. It allows you to upload data, to supervise the processing of your data, and to examine all metadata that NOMAD extracts from your uploads. The data on the upload page will be private and can be deleted again. If you are satisfied with our processing, you can publish the data. Only then, data will become publicly available and cannot be deleted anymore. You will always be able to access, share, and download your data. You may curate your data and create datasets to give them a hierarchical structure. These functions are available from the Your data page by selecting and editing data. You should upload many files at the same time by creating .zip or .tar files of your folder structures. Ideally, input and output files are accompanied by relevant auxiliary files. NOMAD will consider everything within a single directory as related. Once published, data cannot be erased. Linking a corrected version to a corresponding older one ("erratum") will be possible soon. Files from an improved calculation, even for the same material, will be handled as a new entry. You can publish data as being open access or restricted for up to three years (with embargo). For the latter you may choose with whom you want to share your data. We strongly support the idea of open access and thus suggest to impose as few restrictions as possible from the very beginning. In case of open access data, all uploaded files are downloadable by any user. Additional information, e.g. pointing to publications or how your data should be cited, can be provided after the upload. Also DOIs can be requested. The restriction on data can be lifted at any time. You cannot restrict data that was published as open access. Unless published without an embargo, all your information will be private and only visible to you (or NOMAD users you explicitly shared your data with). Viewing private data will always require a login. By uploading you confirm authorship of the uploaded calculations. Co-authors must be specified after the upload process. This procedure is very much analogous to the submission of a publication to a scientific journal. Upload of data is free of charge. ## Limits The following limitations apply to uploading: - One upload cannot exceed 32 GB in size - Only 10 non published uploads are allowed per user ## On the supported codes NOMAD is interpreting your files. It will check each file and recognize if it is the main output file of one of the supported codes. NOMAD will create a entry for this *mainfile* that represents the respective data of this code run, experiment, etc. NOMAD only shows that for such recognized entries. If you uploads do not contain any files that NOMAD recognizes, you upload will be shown as empty and no data can be published. However, all files that are associated to a recognized *mainfile* by residing in the same directory, will be presented as *auxiliary* files along side the entry represented by the *mainfile*. ### A note for VASP users On the handling of **POTCAR** files: NOMAD takes care of it; you don't need to worry about it. We understand that according to your VASP license, POTCAR files are not supposed to be visible to the public. Thus, in agreement with Georg Kresse, NOMAD will extract the most important information of POTCAR files and store it in the files named `POTCAR.stripped`. These files can be assessed and downloaded by anyone, while the original POTCAR files are only available to the uploader and assigned co-authors. This is done automatically; you don't need to do anything. ## Preparing an upload file You can upload .zip and .tar.gz files to NOMAD. The directory structure within can be arbitrary. Keep in mind that files in a single directory are all associated (see above). Ideally you only keep the files of a single (or closely related) code runs, experiments, etc. in one directory. You should not place files in additional archives within the upload file. NOMAD will not extract any zips in zips and similar entrapments. ## Uploading large amounts of data This problem is many fold. In the remainder the following topics are discussed. - NOMAD restrictions about upload size and number of unpublished simultaneous uploads - Managing metadata (comments, references, co-authors, datasets) for a large number of entries - Safely transferring the data to NOMAD ### General strategy Before you attempt to upload large amounts of data, do some experiments with a representative and small subset of your data. Use this to simulate a larger upload, checking and editing it the normal way. You do not have to publish this test upload; simply delete it before publish, once you are satisfied with the results. Ask for assistance. [Contact us](https://nomad-lab.eu/about/contact) in advance. This will allow us to react to your specific situation and eventually prepare additional measures. Keep enough time before you need your data to be published. Adding multiple hundreds of GBs to NOMAD isn't a trivial feat and will take some time and effort from all sides. ### Upload restrictions The upload restrictions are necessary to keep NOMAD data in manageable chunks and we cannot simply grant exceptions to these rules. This means you have to split your data into 32 GB uploads. Uploading these files, observing the processing, and publishing the data can be automatized through NOMAD APIs. When splitting your data, it is important to not split sub-directories if they represent all files of a single entry. NOMAD can only bundle those related files to an entry if they are part of the same upload (and directory). Therefore, there is no single recipe to follow and a script to split your data will depend on how your data is organized. ### Avoid additional operations on your data Changing the metadata of a large amounts of entries can be expensive and will also mean more work with our APIs. A simpler solution is to add the metadata directly to your uploads. This way NOMAD can pick it up automatically, no further actions required. Each NOMAD upload can contain a `nomad.json` file at the root. This file can contain metadata that you want to apply to all your entries. Here is an example: ``` { "comment": "Data from a cool research project", "references": ['http://archivex.org/mypaper'], "co_authors": [ '', '' ] "datasets": [ '' ], "entries": { "path/to/calcs/vasp.xml": { "commit": "An entry specific comment." } } } ``` Another measure is to directly publish your data upon upload. After performing some smaller test upload, you should consider to skip our staging and publish the upload right away. This can save you some time and additional API calls. The upload endpoint has a parameter `publish_directly`. You can modify the upload command that you get from the upload page like this: ``` curl "http://nomad-lab.eu/prod/rae/api/uploads/?token=&publish_directly=true" -T ``` ### Save transfer of files HTTP makes it easy for you to upload files via browser and curl, but it is not an ideal protocol for the stable transfer of large and many files. Alternatively, we can organize a separate manual file transfer to our servers. We will put your prepared upload files (.zip or .tag.gz) on a predefined path on the NOMAD servers. NOMAD allows to *"upload"* files directly from its servers via an additional `local_path` parameter: ``` curl -X PUT "http://nomad-lab.eu/prod/rae/api/uploads/?token=&local_path=" ```