Documentation outdated

This section is partially outdated. The current version of the platform organises imports with revisions to enable updates for 'living' maps or reviews. Please get in contact if you would be interested in properly documenting and updating this section.

Loading data into the platform⚓︎

Never run more than one import at a time

You should never run more than one import at a time. The system has to keep track of duplicates, which is really inefficient to do across multiple simultaneously running imports. Unfortunately, the system will not prevent you from doing this, so you have to start each import only after the previous one is completed successfully.

If you have many imports to run, you can set things up and use this script to automatically and sequentially initate imports.

Introduction⚓︎

In the main view, you see the list of all imports and can create new ones. A row in the table shows the name, when it was created, and when the import finished.¹ The three icons/buttons let you duplicate an import (e.g. if you'd like to update an OpenAlex query), delete the import,² or jump to the associated pipeline task.³ Below the table, you'll find the button to create a new import.

When creating a new import, make sure to always set a descriptive name and text. It may seem redundant at first, but it is greatly useful later on. The description (and or title) usually contains title, when the data was retrieved and from where. If you have multiple (sub-)queries across imports, it may also help to provide a brief higher-level description.

Before 'initiating an import', always save first!!

Below, you can select an import type. The nacsos_data library currently supports imports from

Web of Science "Plain text file" exports
Scopus CSV exports
OpenAlex
- JSONl files, each line a document from our solr mirror
- Solr exporter that downloads documents matching a query
LexisNexis API responses (not via web interface)
Native JSONl: you can upload files that are already in the correct format of AcademicItemModel, TwitterItemModel, or GenericItemModel.
NACSOS-legacy helper functions for transferring documents and annotations. Needs customisation per project, not recommended!

The most common imports are OpenAlex (solr), Scopus (CSV), and Web of Science (text). Please refer to the respective detail pages in this documentation for details on how to do so.

For imports where you need to upload files, the import procedure is as follows (see screenshots):

Fill title/description
Choose import type (scopus or wos) from the dropdown
Click "Choose files" and select the file from your computer
Click "Upload"
Click "Save" (top right)
Click "Initiate import"

Not following this order may cause a lot of issues!

Clicking the 'initiate' button more than once or before fully completing the form and saving will cause issues!

Duplicates⚓︎

Our goal is to keep each project free of duplicate publications. We do not try to link publications across the platform. During import, when using the proper functions (e.g. import_academic_items), we try to only insert new publications and keep track of duplicates. This means, that you can later also compute overlaps between imports and how many documents were contained in each import. The number listed on the platform might differ from what you'd expect since it is after deduplication. For further information on that, please read this, the source code in the nacsos_data library, or our working paper.

This might also be interesting for the future.

An import can can take several minutes to hours! In the beginning, the number of items will not change as data is prepared. Later, the number of items will start growing. When it's stagnant for several minutes, the import is likely done. ↩
Deleting an import will delete all associated items and data associated with those (e.g. annotations). If an item is part of another still existing import, the respected data is kept. ↩
On the platform, all long-running "tasks" are run in the background. The state of the task is not the same as the status of the import, but the task will usually update the import information when it's done. The task (see supplemental data view) also contains a full log of the import. ↩