Skip to contents

A wrapper for all the functions that perform the full G prime analysis to identify QTL. The following steps are performed:
1) Genome-wide G statistics are calculated by getG.
G is defined by the equation: $$G = 2*\sum_{i=1}^{4} n_{i}*ln\frac{obs(n_i)}{exp(n_i)}$$ Where for each SNP, \(n_i\) from i = 1 to 4 corresponds to the reference and alternate allele depths for each bulk, as described in the following table:

AlleleHigh BulkLow Bulk
Reference\(n_1\)\(n_2\)
Alternate\(n_3\)\(n_4\)

...and \(obs(n_i)\) are the observed allele depths as described in the data frame. getG calculates the G statistic using expected values assuming read depth is equal for all alleles in both bulks: $$exp(n_1) = ((n_1 + n_2)*(n_1 + n_3))/(n_1 + n_2 + n_3 + n_4)$$ $$exp(n_2) = ((n_2 + n_1)*(n_2 + n_4))/(n_1 + n_2 + n_3 + n_4)$$ $$exp(n_3) = ((n_3 + n_1)*(n_3 + n_4))/(n_1 + n_2 + n_3 + n_4)$$ $$exp(n_4) = ((n_4 + n_2)*(n_4 + n_3))/(n_1 + n_2 + n_3 + n_4)$$
2) G'

  • A tricube-smoothed G statistic is predicted by local regression within each chromosome using tricubeStat. This works as a weighted average across neighboring SNPs that accounts for Linkage disequilibrium (LD) while minizing noise attributed to SNP calling errors. G values for neighboring SNPs within the window are weighted by physical distance from the focal SNP.

    3) P-values are estimated based using the non-parametric method described by Magwene et al. 2011 with the function getPvals. Breifly, using the natural log of Gprime a median absolute deviation (MAD) is calculated. The Gprime set is trimmed to exclude outlier regions (i.e. QTL) based on Hampel's rule. An alternate method for filtering out QTL is proposed using absolute delta SNP indeces greater than 0.1 to filter out potential QTL. An estimation of the mode of the trimmed set is calculated using the mlv function from the package modeest. Finally, the mean and variance of the set are estimated using the median and mode and p-values are estimated from a log normal distribution.

    4) Negative Log10- and Benjamini-Hochberg adjusted p-values are calculated using p.adjust

Usage

runQTLseqAnalysis_local(
  SNPset,
  windowSize = 1e+06,
  popStruc = "F2",
  bulkSize,
  depth = NULL,
  replications = 10000,
  filter = 0.3,
  intervals = c(95, 99),
  chrom_col = "CHROM",
  coordinate_col = "POS",
  delta_snp_col = "deltaSNP",
  dp_low_col = "DP.LOW",
  dp_high_col = "DP.HIGH"
)

Arguments

SNPset

Data frame SNP set containing previously filtered SNPs

windowSize

the window size (in base pairs) bracketing each SNP for which to calculate the statitics. Magwene et. al recommend a window size of ~25 cM, but also recommend optionally trying several window sizes to test if peaks are over- or undersmoothed.

popStruc

the population structure. Defaults to "F2" and assumes "RIL" otherwise.

bulkSize

non-negative integer vector. The number of individuals in each simulated bulk. Can be of length 1, then both bulks are set to the same size. Assumes the first value in the vector is the simulated high bulk.

depth

This is calculated in the script, usually. It is input to a QTLSeqR function simulateConfInt_local

replications

integer. The number of bootstrap replications.

filter

numeric. A minimum SNP-index filter

intervals

confidence intervals -- note this is part of daniel's additional code to the QTLseqR package

chrom_col

name of the column which stores the chromosome label. Default is CHROM

coordinate_col

name of the column which stores the snp coordinate. Default is POS

delta_snp_col

name of the column which stores the change in allele frequency. Default is deltaSNP

dp_low_col

name of the column which stores the dp_low value. Default DP.LOW

dp_high_col

name of the column which stores the dp_high value. Default is DP.HIGH

Value

The supplied SNP set tibble after G' analysis. Includes five new columns:

  • Gprime - The tricube smoothed G statistic based on the supplied window size

  • pvalue - the pvalue at each SNP calculatd by non-parametric estimation

  • negLog10Pval - the -Log10(pvalue) supplied for quick plotting

  • qvalue - the Benajamini-Hochberg adjusted p-value

Details

DETAILS

Note

THIS IS NEARLY A VERBATIM COPY OF QTLseqR