Citation: S. Twardziok (2018-08-28): Coding sequences and GFF3 of high confidence wild emmer genes. DOI:10.5447/ipk/2019/0

Abstract: This dataset contains an updated gene annotation (version 2) of 67,182 high confidence (HC) gene loci for the wild emmer genome assembly (Avni, R. et al. Wild emmer genome architecture and diversity elucidate wheat evolution and domestication. Science 357, 93-97 (2017)). It was done at PGSB (Helmholtz Center Munich) by Sven Twardziok for a comparative study between durum wheat (DW), Triticum turgidum L. subsp. durum and wild emmer wheat (WEW), T. turgidum subsp. dicoccoides. As a consistent basis for this study both genome assemblies have been annotated with the same pipeline and parameters, to avoid technical differences solely based on different detection methods. The corresponding results are part of an DW genome paper which is currently under review (Maccaferri, M. et al. Durum wheat genome reveals past domestication signatures and future improvement targets (submitted)). The annotation pipeline, described in detail in Maccaferri, M. et al. and Avni et al., 2017, used evidences from plant reference protein sets as well as comprehensive transcriptome data (different tissues, developmental stages and treatments) to predict an extensive set of candidate gene models. Subsequent filter steps divide the the whole set of gene-like sequences into (i) high confidence genes (HC) genes, (ii) low confidence (LC) genes and (iii) transposon genes. The dataset contains the following files: Wild_Emmer_HC_genes_v2_PGSB_Mar2017_CDS.gff3: GFF3 for all splice variants (205,916), Wild_Emmer_HC_genes_v2_PGSB_Mar2017_CDS.fasta: coding sequence for all splice variants, Wild_Emmer_HC_genes_v2_PGSB_Mar2017_PROTEIN.fasta: protein sequence for all splice variants, Wild_Emmer_HC_genes_v2_PGSB_Mar2017_Representative_CDS.fasta: coding sequence for representative (=longest) splice variant (67,182).

License: CC BY-SA 4.0 (Creative Commons Attribution-ShareAlike)

DOI: 10.5447/ipk/2019/0

Content: 1 Directories 5 Files (535.2 MB)

Files:
Loading, please wait!
//h.gundlach@helmholtz-muenchen.de/Coding sequences and GFF3 of high confidence wild emmer genes [1 Directories 0 Files]
Download as ZIP (NOTE: ZIP Extraction using the native Windows Zip Client can fail due to file path length, please use third-party ZIP client instead)
Metadata
CONTRIBUTOR:
CREATOR:
Sven Twardziok [Show full information]
PUBLISHER: e!DAL - Plant Genomics and Phenomics Research Data Repository (PGP), IPK Gatersleben, Seeland OT Gatersleben, Corrensstraße 3, 06466, Germany
SIZE: 535.2 MB
SUBJECT: Wild Emmer, Triticum turgidum subsp. dicoccoides, gene annotation
COVERAGE: none
DATE: Event: event
UPDATED: TimePoint: Tue Aug 28 17:17:53 CEST 2018
CREATED: TimePoint: Tue Aug 28 17:16:25 CEST 2018
LANGUAGE: en
RELATION: none
SOURCE: none
Revision: 2 - CreationDate: Tue Aug 28 17:16:25 CEST 2018 - RevisionDate: Tue Aug 28 17:17:53 CEST 2018