Abstract: This data set comprise extracted and linked records of the European Nucleotide Archive to citations in open-access publications that aggregated at Europe PubMed Central. Doing so, ENA records were parsed and filtered for valid country tag and fed into ePMC RestFull API to extract matching secondary publication by ENA accession or project accession numbers. The resulting data sets are normalized as tables ENA_SEQUENCES, PMC_REFERENCES alongside a curated list of world's countries in table CONTRIES and economics groups in table COUNTRY2GRP. This tables are the basis for a data warehouse and a web application It enables to join literature and sequence databases in multidimensional fashion. A concrete use case in the context of the United Nations convention on Biological Diversity is the analysis of countries in respect of nucleotide sequence use and contribution.
License: CC BY 4.0 (Creative Commons Attribution)
DOI: 10.5447/ipk/2021/8
Content: 0 Directories 4 Files (4.6 GB)
CONTRIBUTOR: |
Matthias Lange,
Mehmood Ghaffar,
Jens Freitag,
Amber Scholz,
Upneet Hillebrand
[Show full information]
|
CREATOR: |
Guy Cochrane,
Blaise Alako
[Show full information]
|
PUBLISHER: | e!DAL - Plant Genomics and Phenomics Research Data Repository (PGP), IPK Gatersleben, Seeland OT Gatersleben, Corrensstraße 3, 06466, Germany |
SIZE: | 4.6 GB |
SUBJECT: | data citation, text mining, European Nucleotide Archive, nucleotide sequence data, Convention on Biological Diversity, Europe PMC |