SPREG :: Regression Analysis of Secondary Phenotype Data in Case-Control Association Studies

SPREG is a computer program for performing regression analysis of secondary phenotype data in case-control association studies. Secondary phenotypes are quantitative or qualitative traits other than the case-control status. Because the case-control sample is not a random sample of the general population, standard statistical analysis of secondary phenotype data can yield very misleading results. SPREG implements valid and efficient statistical methods, as described in Lin and Zeng (submitted for publication, 2008).

General information

The program is written in FORTRAN-77 with the source file spreg.f. The current release performs linear regression analysis of quantitative traits under the additive mode of inheritance without environmental factors. Other capabilities will be added soon. Please check back frequently for updates.

Input

The program requires two separate groups of input: the data input and the control parameters input. The data input file contains text data in a tabular (row-column) format, with columns representing study subjects: the first row provides the disease status (1=case; 0=control) of the study subjects; the second row provides the trait values of the study subjects; the rest of the file provides the genotype scores (i.e., numbers of minor alleles) of the study subjects, one row for each SNP. Missing trait values are indicated by “-999”, and missing genotype scores are indicated by any number greater than 3. The control parameters described below are given in the file spreg.dat.

Parameter Type
file name of data input character
file name of program output character
total number of subjects integer
total number of SNPs integer
disease rate real

If the disease is rare, enter any number less than 0.01 for the disease rate.

Output

Computational results are written to the output file specified by the user. For each SNP, the output shows the maximum likelihood estimate of the genetic effect (i.e., slope parameter in the linear model), its standard error, the standard-normal test statistic and the (two-sided) p-value.

Example

The file test.dat contains a simulated data set with 100 SNPs. The control parameters are given in spreg.dat and the output file is test.out.

Download

» spreg.zip [updated 22 April 2008]

Reference

Lin, D. Y. and Zeng, D. (2008). Proper analysis of secondary phenotype data in case-control association studies. Submitted for publication.



april 3, 2008
postdoctoral positions available
25 june 2008
HAPSTAT 3.0
now supporting untyped SNP analysis
09 july 2008
SNPMStat
software update