SPREG :: Regression Analysis of Secondary Phenotype Data in Case-Control Association Studies
SPREG is a computer program for performing regression analysis of secondary phenotype data in case-control association studies. Secondary phenotypes are quantitative or qualitative traits other than the case-control status. Because the case-control sample is not a random sample of the general population, standard statistical analysis of secondary phenotype data can yield very misleading results. SPREG implements valid and efficient statistical methods, as described in Lin and Zeng (submitted for publication, 2008).
General information
The program is written in FORTRAN-77 with the source file spreg.f. The current release performs linear regression analysis of quantitative traits under the additive mode of inheritance without environmental factors. Other capabilities will be added soon. Please check back frequently for updates.
Input
The program requires two separate groups of input: the data input and the control parameters input. The data input file contains text data in a tabular (row-column) format, with columns representing study subjects: the first row provides the disease status (1=case; 0=control) of the study subjects; the second row provides the trait values of the study subjects; the rest of the file provides the genotype scores (i.e., numbers of minor alleles) of the study subjects, one row for each SNP. Missing trait values are indicated by “-999”, and missing genotype scores are indicated by any number greater than 3. The control parameters described below are given in the file spreg.dat.
| Parameter | Type |
|---|---|
| file name of data input | character |
| file name of program output | character |
| total number of subjects | integer |
| total number of SNPs | integer |
| disease rate | real |
If the disease is rare, enter any number less than 0.01 for the disease rate.
Output
Computational results are written to the output file specified by the user. For each SNP, the output shows the maximum likelihood estimate of the genetic effect (i.e., slope parameter in the linear model), its standard error, the standard-normal test statistic and the (two-sided) p-value.
Example
The file test.dat contains a simulated data set with 100 SNPs. The control parameters are given in spreg.dat and the output file is test.out.
Download
» spreg.zip [updated 22 April 2008]
Reference
Lin, D. Y. and Zeng, D. (2008). Proper analysis of secondary phenotype data in case-control association studies. Submitted for publication.