Skip to contents

Process the VCF, create a gds file, and return a table which is in long format. See return for column list

Usage

vcf_to_qtlseqr_table(
  vcf_path,
  gds_outdir,
  depth_thres = 5,
  ref_freq_thres = 0.9,
  parent_ref_sample = NULL,
  parent_alt_sample = NULL,
  parent_filter = FALSE,
  single_allele_loci_only = TRUE,
  overwrite = FALSE,
  verbose = FALSE
)

Arguments

vcf_path

Path to a VCF file, presumably one with a number of samples

gds_outdir

directory to which to write the gds file

depth_thres

minimum required (filtered) depth to consider calling a genotype. Less than this and the genotype is labelled lowDepth, Default: 5

ref_freq_thres

minimum (greater than or equal to) required percentage for a genotype to be called Reference or Alternative, Default: 0.9

parent_ref_sample

name of the sample which identifies the 'reference' parent strain, eg for cryptococcus KN99alpha. Default is NULL

parent_alt_sample

name of the sample which identifies the 'alternate' parent strain, eg for cryptococcus TDY1993 Default is NULL

parent_filter

Boolean, default FALSE. Set to TRUE to retain only those loci which in which parent_ref_strain is labelled Reference and parent_alt_strain is labelled Alternate

single_allele_loci_only

Boolean, set to TRUE to exclude all multi allelic loci. Set to FALSE to keep all variants, Default: TRUE

overwrite

if the gds already exists, skip creating and just open. Default is FALSE

verbose

Boolean. Set to true to set the SeqArray functions to verbose. Default is FALSE.

Value

A dataframe with the following columns: CHR POS REF_Allele ALT1 QUAL Depth variant sample RealDepth Reference Alternative1 genotype Ref_percentage Alt1_percentage Filt_Genotype

Details

This function is a re-interpretation of Daniel's script /scratch/mblab/daniel.agustinho/tools/VCF_tabler.sh. This script parsed the VCF files through a series of awk commands. Critically, the info line for the VCF is expected to be in the following format: GT:DP:AD:RO:QR:AO:QA:GL. If you read that script, you'll see that the awk command extracted columns 1-9, which are CHROM POS ID REF ALT QUAL FILTER INFO FORMAT it a loop, it also extracts a given sample column. That column format is given by the INFO column, and is GT:DP:AD:RO:QR:AO:QA:GL, as stated above. Real depth was calculated by replacing the colons with field separators and extracting columns 13 and 15, which correspond to RO and AO respectively. This is mimicked in this re-interpreted function.

Examples

if (FALSE) {
if(interactive()){
 #EXAMPLE1
 }
}