Data Standards¶

Harmonization of data and metadata collection becomes an essential effort in the age when data generation is often easier and more affordable then their organization and storage.

Compliance of submitted data to the relevant reporting standards promotes:

consistent and adequate data description
thorough data validation
data discoverability
data reproducibility
data interoperability and usability

ENA/INSDC reporting standards¶

The European Nucleotide Archive requires, where appropriate, use of the following reporting standards:

Feature Table – Description of nucleotide sequence provenance and functional annotation of nucleotide sequence domains.
Third Party Data – ENA use the INSDC agreed standards for capturing and presenting TPA data. Contact us if you intend to submit data that comprises of assembled or annotated data of existing INSDC records.
Missing values - Guidelines for registering metadata which is missing or restricted access.

Community-developed reporting standards¶

The European Nucleotide Archive supports use of many community-developed reporting standards in the form of sample checklists. Sample checklists are a defined set of minimum information required and validated during ENA sample registration. Sample checklists have been developed with different research communities and allow data submission to abide by different community-developed standards.

The full list can be viewed and explored here.

As part of our community engagement and standards development, the European Nucleotide Archive has a long-standing collaboration with the Genomic Standards Consortium (GSC). The GSC is an initiative of experts building or using genome collections and developing standards for harmonised metadata collection and analysis efforts across the wider genomics community.

The GSC supports a range of projects spanning sequencing projects, development of ontologies, metadata standards, software tools or data formats. Minimum information about any (x) nucleotide sequence (MIxS, Yilmaz et al, 2011) is the core GSC standard consisting of checklists for describing genomes (MIGS), metagenomes (MIMS) and marker sequences (MIMARKS).