Process the VCF, create a gds file, and return a table which is in long format. See return for column list
Usage
vcf_to_qtlseqr_table(
vcf_path,
gds_outdir,
depth_thres = 5,
ref_freq_thres = 0.9,
parent_ref_sample = NULL,
parent_alt_sample = NULL,
parent_filter = FALSE,
single_allele_loci_only = TRUE,
overwrite = FALSE,
verbose = FALSE
)
Arguments
- vcf_path
Path to a VCF file, presumably one with a number of samples
- gds_outdir
directory to which to write the gds file
- depth_thres
minimum required (filtered) depth to consider calling a genotype. Less than this and the genotype is labelled lowDepth, Default: 5
- ref_freq_thres
minimum (greater than or equal to) required percentage for a genotype to be called Reference or Alternative, Default: 0.9
- parent_ref_sample
name of the sample which identifies the 'reference' parent strain, eg for cryptococcus KN99alpha. Default is NULL
- parent_alt_sample
name of the sample which identifies the 'alternate' parent strain, eg for cryptococcus TDY1993 Default is NULL
- parent_filter
Boolean, default FALSE. Set to TRUE to retain only those loci which in which
parent_ref_strain
is labelled Reference andparent_alt_strain
is labelled Alternate- single_allele_loci_only
Boolean, set to TRUE to exclude all multi allelic loci. Set to FALSE to keep all variants, Default: TRUE
- overwrite
if the gds already exists, skip creating and just open. Default is FALSE
- verbose
Boolean. Set to true to set the SeqArray functions to verbose. Default is FALSE.
Value
A dataframe with the following columns: CHR POS REF_Allele ALT1 QUAL Depth variant sample RealDepth Reference Alternative1 genotype Ref_percentage Alt1_percentage Filt_Genotype
Details
This function is a re-interpretation of Daniel's script /scratch/mblab/daniel.agustinho/tools/VCF_tabler.sh. This script parsed the VCF files through a series of awk commands. Critically, the info line for the VCF is expected to be in the following format: GT:DP:AD:RO:QR:AO:QA:GL. If you read that script, you'll see that the awk command extracted columns 1-9, which are CHROM POS ID REF ALT QUAL FILTER INFO FORMAT it a loop, it also extracts a given sample column. That column format is given by the INFO column, and is GT:DP:AD:RO:QR:AO:QA:GL, as stated above. Real depth was calculated by replacing the colons with field separators and extracting columns 13 and 15, which correspond to RO and AO respectively. This is mimicked in this re-interpreted function.
Examples
if (FALSE) {
if(interactive()){
#EXAMPLE1
}
}