Citation: M. Mascher (2020-06-22): Assembly, annotation and analysis of the barley (Hordeum vulgare L.) pan-genome. DOI:10.5447/ipk/2020/24

Abstract: Annotated chromosome-scale sequence assemblies (a "pan-genome") of 20 barley genotypes. Sequence assembly was performed using the TRITEX pipeline (https://tritexassembly.bitbucket.io, Monat et al. 2019, [doi: 10.1186/s13059-019-1899-5]). Protein-coding gene models were predicted for all twenty genotypes by gene projection with de novo gene models of three genotypes (including “self-projections”). The folder 'Denovo_gene_annotation' contains de novo gene annotations in GFF3 format and their corresponding CDS and protein sequences in FASTA format for the three genotypes Morex, Barke and HOR 10350. Gene projection results for the twenty genomes are provided in the folder 'Gene_projectionn'. The folder 'Pseudomolecules' includes the twenty genome assemblies in FASTA format. Whole genome alignments of the twenty genotypes detected a large number of presence/absence variants (PAVs). A catalog of PAVs (in BED format) is supplied in the folder 'PAVs'. SNP calling was performed using 300 whole-genome shot-gun (WGS) data as well as genotyping-by-sequencing (GBS) data from Milner et al. 2019, Nature Genetics [doi:10.1038/s41588-018-0266-x] against the Morex genome. These two SNP matrices are provided in the folder 'SNP_matrices'. The folder 'kmer_counts_GWAS_PCA' holds matrices of normalized k-mer counts used for genetic analyses. The folder 'TE_annotation' holds GFF files specifying the positions of transposable elements annotated using two complementary methods. The folder 'Morex_Pacbio_CLR_assembly' contains a long-read assembly of cv. Morex, which was used to validate the short-read assemblies. Genome annotation and data management were supported by a grant of the German Ministry of Education and Research (FKZ 031A536, 'de.NBI', www.denbi.de).

License: CC BY 4.0 (Creative Commons Attribution)

DOI: 10.5447/ipk/2020/24

Metadata
CONTRIBUTOR:
Murukarthick Jayakodi, Sudharsan Padmarasu, Georg Haberer, Venkatesh Suresh Bonthala, Heidrun Gundlach, Cécile Monat, Thomas Lux, Nadia Kamal, Daniel Lang, Axel Himmelbach, Jennifer Ens, Xiao-Qi Zhang, Tefera Angessa, Gaofeng Zhou, Cong Tan, Camilla Hill, Penghao Wang, Miriam Schreiber, Anne Fiebig, Hikmet Budak, Dongdong Xu, Jing Zhang, Chunchao Wang, Ganggang Guo, Guoping Zhang, Keiichi Mochida, Takashi Hirayama, Kazuhiro Sato, Kenneth Chalmers, Peter Langridge, Robbie Waugh, Curtis Pozniak, Uwe Scholz, Klaus Mayer, Manuel Spannagl, Chengdao Li, Nils Stein, Yu Guo, Jeremy Schmutz, Jane Grimwood, Christopher Plott, Jerry Jenkins, Lori Boston [Show full information]
CREATOR:
Martin Mascher [Show full information]
PUBLISHER: e!DAL - Plant Genomics and Phenomics Research Data Repository (PGP), IPK Gatersleben, Seeland OT Gatersleben, Corrensstraße 3, 06466, Germany
SIZE: 91.2 GB
SUBJECT: genome sequence asssembly, reference genome sequence, short-read assembly, pan-genome, barley, Hordeum vulgare, genetic diversity, structural variation, Triticeae, genome annotation
COVERAGE: none
DATE: Event: event
CREATED: TimePoint: Mon Jun 22 12:01:09 CEST 2020
UPDATED: TimePoint: Mon Jun 22 12:42:16 CEST 2020
LANGUAGE: en
RELATION: none
SOURCE: none
Revision: 2 - CreationDate: Mon Jun 22 12:01:09 CEST 2020 - RevisionDate: Mon Jun 22 12:42:16 CEST 2020