Highlights include: a predicted genome size of ~650Mbp, a scaffold N50 of ~30kbp with most ordered gaps being extremely short, ~57,000 scaffolds, 3.5 million novel high quality SNPs between 9 genomes and the reference Khalas, ~25,000 gene predictions (excluding transposable elements), 38% GC in the nuclear genome, 381Mb of assembled sequence representing ~90% of genes and 60% of the genome sequence (remaining unassembled sequence is mostly highly repetitive) and draft Chloroplast gene sequences.
The scientific name is Phoenix dactylifera L., while the variety name is 'Khalas'. Combining the two we get 'PdactyK' for short.
It is our hope that the results provided here will be a starting point for researchers doing genetic studies of date palm. The assembly is a draft assembly using next generation sequencing reads and as such requires caution in its usage. While short range contiguity is of high quality, longer range contiguity (spanning gaps) is less certain. Manual inspection of contigs based on mate pair validity showed contigs/scaffolds up to 12kb were consistently assembled correctly. The quality of these scaffolds is roughly equivalent to other plant draft sequences such as rice and papaya. Larger scaffolds are more likely to have errors. As such researchers should be careful in operations such as PCR primer design when spanning gaps in the assembly (denoted by N's in the sequence).
DNA for this project was obtained from leaves kindly provided by the Qatar Plant Tissue Culture Lab in the Dept of Agriculture and Water Research (Qatar Ministry of Municipal Affairs and Agriculture) and by the USDA Palm and Citrus Collection, Riverside, California, USA
Click here to download Date Palm Research Data