Citation: M. Mascher (2020-06-22): Assembly, annotation and analysis of the barley (Hordeum vulgare L.) pan-genome. DOI:10.5447/ipk/2020/24

Abstract: Annotated chromosome-scale sequence assemblies (a "pan-genome") of 20 barley genotypes. Sequence assembly was performed using the TRITEX pipeline (, Monat et al. 2019, [doi: 10.1186/s13059-019-1899-5]). Protein-coding gene models were predicted for all twenty genotypes by gene projection with de novo gene models of three genotypes (including “self-projections”). The folder 'Denovo_gene_annotation' contains de novo gene annotations in GFF3 format and their corresponding CDS and protein sequences in FASTA format for the three genotypes Morex, Barke and HOR 10350. Gene projection results for the twenty genomes are provided in the folder 'Gene_projectionn'. The folder 'Pseudomolecules' includes the twenty genome assemblies in FASTA format. Whole genome alignments of the twenty genotypes detected a large number of presence/absence variants (PAVs). A catalog of PAVs (in BED format) is supplied in the folder 'PAVs'. SNP calling was performed using 300 whole-genome shot-gun (WGS) data as well as genotyping-by-sequencing (GBS) data from Milner et al. 2019, Nature Genetics [doi:10.1038/s41588-018-0266-x] against the Morex genome. These two SNP matrices are provided in the folder 'SNP_matrices'. The folder 'kmer_counts_GWAS_PCA' holds matrices of normalized k-mer counts used for genetic analyses. The folder 'TE_annotation' holds GFF files specifying the positions of transposable elements annotated using two complementary methods. The folder 'Morex_Pacbio_CLR_assembly' contains a long-read assembly of cv. Morex, which was used to validate the short-read assemblies. Genome annotation and data management were supported by a grant of the German Ministry of Education and Research (FKZ 031A536, 'de.NBI',

License: CC BY 4.0 (Creative Commons Attribution)

DOI: 10.5447/ipk/2020/24

PUBLISHER: e!DAL - Plant Genomics and Phenomics Research Data Repository (PGP), IPK Gatersleben, Seeland OT Gatersleben, Corrensstraße 3, 06466, Germany
SIZE: 91.2 GB
SUBJECT: genome sequence asssembly, reference genome sequence, short-read assembly, pan-genome, barley, Hordeum vulgare, genetic diversity, structural variation, Triticeae, genome annotation
CREATED: TimePoint: Mon Jun 22 12:01:09 CEST 2020
UPDATED: TimePoint: Mon Jun 22 12:42:16 CEST 2020
