Skip Navigation

This website contains current versions of the software packages and working code resulting from our published research. Please contact us if you have any questions or concerns regarding the use of the methods or if you encounter any errors or bugs. Early releases of our code will be periodically updated.

Available Software:

Text:
Increase font size
Decrease font size

Software for Analysis of Genotype Data

Sequence Kernel Association Test (SKAT)

    Description:

    The Sequence Kernel Association Test is tool for region based testing of rare variants from sequencing data. In particular, the SKAT is designed for testing the association of rare (and common) variants from sequence data with a dichotomous or quantitative trait. We also provide tools for estimation of power and sample size in order to design future sequencing studies. Although we focus on rare variants within a region, the method is applicable to any set of rare variants and can be applied to accurately estimate p-values even at low (e.g. 10^-6) levels.

    The method was developed and tailored towards rare variants. It can be applied to other types of data, e.g. gene expression data or common variants, but the tests can be slightly conservative. For other types of data, we recommend using the KM Test (below).

    Downloads:

    R packages:
    Windows: Download
    Linux: Download
    Manual: Download

    Reference:

    Wu, M.C.#, Lee, S.#, Cai, T., Li, Y., Boehnke, M., Lin, X. (2011). "Rare variant association testing for sequencing data with the sequence kernel association test (SKAT)". The American Journal of Human Genetics, 89, 82-93. PDF

    Additional Resources:

    Most recent versions of the code as well as some examples can be found here.

Multi-Kernel Sequence Kernel Association Test (MK-SKAT)

    Description:

    The Multi-Kernel SKAT is a practical franework built on the Sequence Kernel Association Test (SKAT) for conducting region based testing of rare variants from sequencing data. Specifically, the MK-SKAT takes a pragmatic approach to answering the questions: (1) which group of variants in the region should I test and (2) which of the many existing rare variant tests should I use? Since the answer to both questions depends on the true probalistic genetic model underlying the trait value (which is never known), MK-SKAT tests across a range of candidate groupings and candidate rare variant tests to generate a single p-value for significance of the region using perturbation. The methods allows for covariates and either quantitative or dichotomous traits.

    Downloads:

    R packages: Coming soon!

    Reference:

    Coming soon!

Logistic Kernel Machine Test

    Description:

    The logistic kernel machine test is used for testing the association of a SNP set with a dichotomous outcome. Here, we define a SNP set to multiple SNPs which have been grouped based on some criterion: proximity to a gene, pathway/function grouping membership, or within a window of the genome. The method is developed for SNP data, but can, in principle, be applied to a wide range of genomic data types.

    Note that the SKAT method (above) is built on the same framework, but is tailored towards rare variants and may be a little bit conservative for common variants at larger alpha-levels.

    The software for conducting the logistic KMT has been superseded by the SKAT software (above), but modifications to the default SKAT parameters are necessary

    Downloads:

    The previous software for the Logistic Kernel Machine Test has been superseded by the Sequence Kernel Association Test (SKAT) software (above). IMPORTANT: modifications to the default SKAT settings are needed since the defaults are aimed towards rare variants. (1) Please change the "kernel" parameter to "linear" or "IBS" since the weighted versions are primarily designed for rare variants. (2) One can set "method" equal to "liu" in order to more closely mimic the results of the original Logistic Kernel Machine Test.

    Reference:

    Wu, M.C., Kraft, P., Epstein, M.P., Taylor, D.M., Chanock, S.J., Hunter, D.J., and Lin, X. (2010). "Powerful SNP set analysis for case-control genome wide association studies". The American Journal of Human Genetics, 86, 929-942. PDF

    Additional Resources:

    500 Simulated data sets based on Model 1: Download


Software for Analysis of DNA Methylation Data

Global Analysis of Methylation Profiles (GAMP)

    Description:

    This package is designed to conduct "global analysis" of DNA methylation data, particularly from the Illumina 450k Infinium platform. Instead of examining the effect of individual CpGs, the idea is to compare the overall profile or distribution of CpG measurements across individuals.

    Briefly, each individual's methylation profile is summarized by approximating the density of the methylation distribution OR the cumulative distribution function (CDF) of the methylation distribution using B-splines. The B-spline coefficients are used to represent each individual's overall methylation distribution. To test for association between the overall distribution and a continuous or dichotomous variable of interest, we apply the SKAT test (above) to the spline coefficients. A single p-value is generated.

    Although the method is developed for DNA methylation data, it can be adapted to other types of data as well; however, the current software assumes that input values are between 0 and 1 (corresponding to percent methylation).

    This package depends on the fda and SKAT R packages.

    Downloads:

    R packages:
    Windows/Linux: Download
    Manual: Download

    Reference:

    Zhao, N., Maity, A., Staicu, A.-M., Joubert, B.R., London, S.J., Wu, M.C. (2013). "Global analysis of methylation profiles via a functional regression approach". Submitted.


    Software for Analysis of Gene Expression Data

    Sparse Linear Discriminant Analysis (sLDA)

      Description:

      We apply Sparse Linear Discriminant Analysis (sLDA) for testing the significance of Gene Pathways when signal is relatively weak. Also included is general code for running two-group L1 penalized linear discriminant analysis. Current software is only working code. Please contact us if you have any questions or concerns.

      Download:

      Working code (in R): Download

      Reference:

      Wu, M.C., Zhang, L., Wang, Z., Christiani, D.C., Lin, X. (2009). "Sparse linear discriminant analysis for simultaneous gene set/pathway significance test and gene selection". Bioinformatics, 25,1145-1151. PDF