Skip to content

Importing Scopus⚓︎

At this point, there is no automated way to import directly from Scopus. Once you have finalised your query, you can use the "Export" button at the top of the results table in Scopus to download results, 20,000 at a time.

Scopus export settings

Please make sure to select all fields as shown above. If you do not explicitly need it, don't select to include references, as this massively increases the filesize and is currently not supported by the platform anyway. Once you click "export", it takes a moment to get the file ready.

To upload the file you can use the web interface or use the script below.

Script for importing scopus
import asyncio

from nacsos_data.db import get_engine_async
from nacsos_data.util.academic.scopus import read_scopus_csv_file
from nacsos_data.util.academic.importer import import_academic_items


async def main() -> None:
    # The project to import data to
    PROJECT_ID = '???'
    # The user this import will be attached to
    USER_ID = '???'

    # Name of the import
    IMPORT_NAME = '???'
    # Description of the import (e.g. the query used and when it was downloaded)
    IMPORT_DESC = '???'

    db_engine = get_engine_async(conf_file='/path/to/remote.env')
    scopus_works = list(read_scopus_csv_file('/path/to/scopus.csv',
                                             project_id=PROJECT_ID))
    await import_academic_items(items=scopus_works,
                                db_engine=db_engine,
                                project_id=PROJECT_ID,
                                import_name=IMPORT_NAME,
                                description=IMPORT_DESC,
                                # you can set this if you rather want to 
                                # add new documents to an existing import
                                import_id=None,
                                user_id=USER_ID,
                                # Set this to True to simulate the import without actually doing it
                                dry_run=False,
                                # Set these to False if you don't want to overwrite values of duplicates
                                trust_new_authors=True,
                                trust_new_keywords=True)


if __name__ == '__main__':
    asyncio.run(main())