SARS-CoV-2 (Severe acute respiratory syndrome coronavirus 2) Submissions

Please see below for instructions on how to submit SARS-CoV-2 (Severe acute respiratory syndrome coronavirus 2) or COVID-19 related data. If you have any queries or require assistance with your submission please contact us at: virus-dataflow@ebi.ac.uk.

Registering Studies

Data submissions to the ENA require that you register a study to contextualise and group your data. Details of how to do this can be found in our Study Registration Guide. Please ensure you describe your study adequately, as well as provide an informative title.

Registering Samples

Having registered a study, please proceed to register your samples. These are metadata objects that describe the source biological material of your experiments. Following this, the sequence data can be registered (as described in later sections).

Instructions for sample registration can be found in our Sample Registration Guide. As part of this process, you must select a sample checklist to describe metadata. The most appropriate checklist for Coronavirus submissions is the “ENA virus pathogen reporting standard checklist” - ERC000033. This presents 9 mandatory, 15 recommended and 11 optional fields (along with any additional user-defined fields).

Please use the organism name “Severe acute respiratory syndrome coronavirus 2” and taxonomic ID 2697049. It is recommended, as a minimum, that collection date and geographic location (e.g. country) are specified and sample capture status field is provided a value of ‘active response in response to outbreak’. If you require support regarding sample metadata, please contact virus-dataflow@ebi.ac.uk.

If you have already submitted data to the GISAID database, a corresponding GISAID ID can be specified when using the ERC000033 checklist by creating a user-defined field named ‘GISAID Accession ID’.

Submitting Reads

After registering your study and samples, you can submit your read files along with experimental (library-related) metadata. See our Read Submission Guide for detailed instructions on submitting reads.

We encourage submissions to include information on specific protocols used for the experiment. This should be provided in the library description. This can be, for example, the name and/or URL to a specific protocol. View our listing of the available full experimental metadata dictionaries.

Submitting Assemblies

If submitting assemblies, you must have registered a study and a sample beforehand. It is also advised that the associated read data is also submitted. For instructions on assembly submission view our Assembly Submission Guide.

Assemblies can only be submitted using Webin-CLI, using -context genome. During the process, you must define metadata in the manifest file(s). Please specify ‘COVID-19 outbreak’ as the ‘ASSEMBLY_TYPE’.

Any annotations, where provided, are captured according to INSDC Feature Table Definitions.

Submitting Targeted Sequences

If submitting targeted or annotated sequences, you must register a study as described above. See our Targeted Sequence Submission Guide for submission instructions. When submitting annotated sequences, you must select an appropriate Annotation Checklist. There are several virus-specific annotation checklists, with “Single Viral CDS” the most appropriate for complete or partial coding sequences from a viral gene. If your sequences do not fit the annotation checklists above please contact us at virus-dataflow@ebi.ac.uk.

Any annotations, where provided, are captured according to INSDC Feature Table Definitions.

If submitting single contig assemblies, or for any other support or issues around SARS-CoV-2 submissions please contact virus-dataflow@ebi.ac.uk.

Release of Data

We recommend that submitted data is set to public as soon as possible to enable early presentation in ENA and also on the COVID-19 data platform. Users are responsible for releasing data they submit to ENA. This is done by setting an appropriate release date, as detailed in our Data Release Policies.