Abstract: This dataset contains an updated gene annotation (version 2) of 67,182 high confidence (HC) gene loci for the wild emmer genome assembly (Avni, R. et al. Wild emmer genome architecture and diversity elucidate wheat evolution and domestication. Science 357, 93-97 (2017)). It was done at PGSB (Helmholtz Center Munich) by Sven Twardziok for a comparative study between durum wheat (DW), Triticum turgidum L. subsp. durum and wild emmer wheat (WEW), T. turgidum subsp. dicoccoides. As a consistent basis for this study both genome assemblies have been annotated with the same pipeline and parameters, to avoid technical differences solely based on different detection methods. The corresponding results are part of an DW genome paper which is currently under review (Maccaferri, M. et al. Durum wheat genome reveals past domestication signatures and future improvement targets (submitted)). The annotation pipeline, described in detail in Maccaferri, M. et al. and Avni et al., 2017, used evidences from plant reference protein sets as well as comprehensive transcriptome data (different tissues, developmental stages and treatments) to predict an extensive set of candidate gene models. Subsequent filter steps divide the the whole set of gene-like sequences into (i) high confidence genes (HC) genes, (ii) low confidence (LC) genes and (iii) transposon genes. The dataset contains the following files: Wild_Emmer_HC_genes_v2_PGSB_Mar2017_CDS.gff3: GFF3 for all splice variants (205,916), Wild_Emmer_HC_genes_v2_PGSB_Mar2017_CDS.fasta: coding sequence for all splice variants, Wild_Emmer_HC_genes_v2_PGSB_Mar2017_PROTEIN.fasta: protein sequence for all splice variants, Wild_Emmer_HC_genes_v2_PGSB_Mar2017_Representative_CDS.fasta: coding sequence for representative (=longest) splice variant (67,182).
License: CC BY-SA 4.0 (Creative Commons Attribution-ShareAlike)
DOI: 10.5447/ipk/2019/0
Content: 1 Directories 5 Files (535.2 MB)
CONTRIBUTOR: | |
CREATOR: |
Sven Twardziok
[Show full information]
|
PUBLISHER: | e!DAL - Plant Genomics and Phenomics Research Data Repository (PGP), IPK Gatersleben, Seeland OT Gatersleben, Corrensstraße 3, 06466, Germany |
SIZE: | 535.2 MB |
SUBJECT: | Wild Emmer, Triticum turgidum subsp. dicoccoides, gene annotation |