R/BPrimm: a set of tools for Bayesian and Penalized regression in multiple loci mapping

 

Despite many recent methodological developments, variable selection in high dimension low sample size (HDLSS) setting where the number of covariates (p) is larger than the sample size (n) remains a difficult problem, especially when the covariates with zero coefficients are correlated with some covariates with nonzero coefficients. One typical example is genome-wide multiple loci mapping with dense genetic markers, where the number of covariates (e.g., the genotype profiles of genetic markers) are often much larger than the sample size and nearby markers often share similar genotype profiles. In this paper, we propose two variable selection methods: the Bayesian adaptive Lasso and the iterative adaptive Lasso. A Bayesian Information Criterion plus backward filtering approach is also designed for the iterative adaptive Lasso to control the false positive rate. These two methods extend the adaptive Lasso into the HDLSS setting.  We evaluate these two methods as well as several existing methods in the application of genome-wide multiple loci mapping in experimental cross. Both large-scale simulation and real data analysis show that the proposed methods have improved variable selection performance. The iterative adaptive Lasso is also computationally much more efficient than the commonly used marginal regression and step-wise regression methods.

 

The manuscript can be downloaded here

 

Genome-wide Multiple Loci Mapping in Experimental Crosses by the Iterative Adaptive Penalized Regression

 

By Wei Sun, Joseph G. Ibrahim and Fei Zou

 

Download and Installation

 

The source can be downloaded here.

 

Since the R package contains C code, a C complier is required for installation.

   

    Windows: Rtools need to be installed in order to Compile C codes in an R package. Please follow the Readme file in Rtools to set up the PATH variable. Most often R needs to be installed in c:/R.

    Mac OSX: The C compiler can installed as part of the Xcode developer tool.

    Linux: gcc compiler need to be installed

 

With both R and appropriate c complier installed, this R package can be installed as follows

(1) Open terminal in Mac or linux; Open a command prompt in Windows (Start menu -> run -> cmd, or Start menu -> cmd)

(2) Change to the directory where the source code is located. Untar the package using the command: tar –xzf BPrimm.tar.gz

(3) Install the package using the following command:

 

R CMD INSTALL BPrimm

 

Using BPrimm for multiple loci mapping

 

First, read both the trait data and the genetic data into R. The trait, denoted by y, should be a vector of length n, and the genetic data, denoted by X, should be a matrix of size n x p.

 

1. Marginal regression

 

For marginal regression, we need to specify the chromosome of each marker (corresponding to each column of matrix X), so that the significance of the most significant association within each chromosome is evaluated by permutation p-values.

 

library(BPrimm)

m1 = marginal(y, X, chrs, nper=10000)

 

2.  Forward regression

 

In forward regression, in addition to the number of permutations, a permutation p-value cutoff needs to be specified.

 

f1 = forward(y, X, nper=10000, p.cut=0.05)

 

3. Bayesian Lasso

 

Two parameters of the Bayesian Lasso, namely r and s need to be specified. As suggested by Ni and Xu (2008), r and s should take small positive values such as 0.01.  In all the Bayesian methods, n.burn indicates the number of burn in MCMC iterations, n.iter is the number of samples to keep, and n.thin =k means we take one sample very k iterations.

 

BL = Bayes.Lasso(y, X, n.burn=10000, n.thin=10, n.iter=1000, r=0.01, s=0.01)

 

4. Bayesian Adaptive Lasso and Bayesian t

 

In both the Bayesian Adaptive Lasso and the Bayesian t. The hyper-parameter delta and tau can be specified or set as NULL so that a hyper prior p(delta, tau) = 1/tau is used.

 

BAL = Bayes.AL(y, X, n.burn=10000, n.thin=10, n.iter=1000, delta=NULL, tau=NULL)

Bt  = Bayes.t(y, X, n.burn=10000, n.thin=10, n.iter=1000, delta=NULL, tau=NUL

 

 

5. Iterative Adaptive Lasso

 

In the following code, we first run IAL with criterion BIC to select a set of hyper-parameters delta and tau, and a subset model. Then we use the backward filtering to filter out the coefficients with insignificant effects. Here the significance cutoff is chosen as 0.05/nE, where nE is the number of effective tests.

 

I1 = IAL(y, X, criterion="BICÓ)

I2 = backward(y, X, I1, pcut=0.05/nE)