########################### # Thibault Leroy - 180118 # ########################### A summary of the analyses made to perform the estimation of heterozygosity of the oak genotype 3P - Plomion et al. 2018 All the short Illumina paired-end reads used to produce the “3P” oak reference genome were mapped against the haplome assembly with bowtie2, using standard parameters for the “fast end-to-end” mode. Duplicated mapped reads were removed with Picard (http://broadinstitute.github.io/picard/). Samtools/bcftools scripts to calculate coverage and estimated allele frequency from the “DP4” tag of the VCF file (VCF file available = "3P.haplov2.3.bam.pisorted.dedup.pileup.maxcov250.vcf"). We only considered SNPs with a coverage between 100X and 200X. We also discarded all SNPs with a MAF<0.25 and all INDELs (all retained sites are also available, see file: "3P.haplov2.3.bam.pisorted.dedup.pileup.mincov100-maxcov200.vcf.withoutindel.sed.siteshz"); the proportion of heterozygous sites on the chromosomes was then calculated with a sliding window approach. For each window, this proportion was weighted by the N percent and the fraction covered, defined here as the proportion of bases within a window satisfying the same sequence depth criteria as used for SNP calling (100X-200X). Plomion et al. 2018 Nature Plants, Online Methods section Estimation of heterozygosity of the reference genotype “3P”