lm()
, anova()
) in R Matrix eQTL calculations are based on the linear regression model. The following R code samples confirm that for various scenarios and randomly generated data the estimates, statistics, and p-values produced by Matrix eQTL match those produced by the existing linear regression function lm
in R. All five test code files are also available via R command:
• demo( package="MatrixEQTL" )
The artificial dataset used in performance testing is generated to have a strongly correlated first gene-SNP pair. Both R and Matlab version of Matrix eQTL produce for following results for it:
no covariates | 10 covariates | |||
Additive model | t = 23.816 | pv = 3.768e-096 | t = 23.927 | pv = 1.386e-096 |
ANOVA model | F = 284.479 | pv = 5.423e-095 | F = 287.112 | pv = 2.031e-095 |
We ran the analysis in Plink with and without covariates. The output file for the first gene (Assoc.txt.P1.qassoc
) shows results matching those above.
CHR | SNP | BP | NMISS | BETA | SE | R2 | T | P |
1 | Snp_00001 | 1 | 840 | 1.006 | 0.04224 | 0.4036 | 23.82 | 3.768e-096 |
The output for the model with covariates (Linear.txt.P1.assoc.linear
) also agrees with the output of Matrix eQTL.
CHR | SNP | BP | A1 | TEST | NMISS | BETA | STAT | P |
1 | Snp_00001 | 1 | C | ADD | 840 | 1.006 | 23.93 | 1.386e-096 |
The analysis with no covariates was performed using the following command in R:
snpMatrix::snp.rhs.estimates( d ~ 1, family = "Gaussian", snp.data = snps1 )
The output for the first SNP agrees with the previous results (although the statistic is called z-value, not t-statistic).
Model | Y-variable | Parameter | Estimate | S.E. | z-value |
Snp_00001 | d | Snp_00001 | 1.006 | 0.042242 | 23.816 |
The analysis with 10 covariates was performed using a similar command:
snpMatrix::snp.rhs.estimates( d ~ cvrt, family = "Gaussian", snp.data = snps1 )
Again, the results agree with those presented before.
Model | Y-variable | Parameter | Estimate | S.E. | z-value |
Snp_00001 | d | Snp_00001 | 1.0063 | 0.042057 | 23.927 |
Although FastMap does not support covariates, it can estimate both additive and ANOVA models.
It output files contain -log10(p-value)
.
For the additive linear model, the p-value is 10 -95.423908 = 3.76784E-96.
For the ANOVA model, the p-value is 10 -94.265731 = 5.42337E-95.
R/qtl and Merlin use other models so I can not directly match their output.