hapstat»documentation»effects

Effects estimation

HAPSTAT estimates the effects of haplotypes and environmental covariates and haplotype-environment interactions through regression modeling. For quantitative traits, the linear regression model is employed. For binary traits, the logistic regression model is employed, and the regression parameters pertain to the log odds ratios. For age-at-onset data, the Cox proportional hazards model is employed, and the regression parameters pertain to the log hazard ratios. The mode of inheritance can be additive, dominant, recessive or codominant. Under the additive model, having two copies of a causal haplotype has twice the effect on the trait as compared to having a single copy. Under the dominant model, having one or two copies has the same effect. Under the recessive model, only having two copies of the causal haplotype will affect the trait. Under the codominant model, the effect of having two copies can be arbitrarily different from that of having a single copy. In HAPSTAT, the codominant effects are decomposed into additive and recessive components.

Navigation

Estimate haplotype effects by selecting the tab in the left panel labeled Additive effects. The options available to the user display in the right panel. The additive genetic model is set by default; changing this setting in the options panel will change the selected tab label accordingly. After you click on Calculate, your results will display on the left; see Figure 3.1.

Convergence criteria

HAPSTAT uses the EM and Newton-Raphson algorithms to estimate haplotype effects. The convergence criteria are the same as those used for the estimation of haplotype frequencies described in the previous section. The maximization is taken over all parameters in the likelihood. The default tolerance is 10−4 and the number of iterations is 500.

Assumptions

Select the additive, recessive, dominant or codominant mode of inheritance from the left dropdown. Use the right dropdown to estimate haplotype effects under Hardy-Weinberg equilibrium (default) or disequilibrium. For Hardy-Weinberg disequilibrium, HAPSTAT will return an estimate for the inbreeding coefficient (ρ).

Effects

The box labeled Effects is a static display of the main effects and interactions selected for estimation. By default, HAPSTAT selects the haplotype with the highest frequency in the default sample and all covariates, as well as the interactions between them. The selected haplotypes are compared to a reference group, which includes all unselected haplotypes.

Select effects

To change the default selection, click the icon on the toolbar to activate the Select effects dialog, shown in Figure 3.2. The panel labeled Effects shows the current selection.

The haplotypes whose frequencies are no greater than the value specified by Threshold are removed from calculation. The default threshold is given by

max ( 2/n , 0.001 ),

where n is the total sample size. For case-control and cohort studies, frequencies are determined by the sample chosen from the adjacent dropdown. For case-control studies, the control sample is chosen by default; for cohort studies, the subcohort is the default. The default values of the Select effects dialog when using external data are discussed in the External data section under Examples.

The haplotypes above the threshold along with their frequencies are listed in the Gene panel. The Reference dropdown lists haplotypes whose frequencies are below the threshold along with those haplotypes that are above the threshold but are not selected for effects estimation. Covariates are listed in the Environment panel.

To add a main effect, click on the desired variable in the Gene or Environment list followed by the Add button. The selected variable now appears in the Effects panel under the heading Gene or Environment, respectively. To add an interaction, select the appropriate variables from the Gene and Environment lists and click the Add button. You can select multiple variables from the Environment list by using the Shift/Ctrl key. To remove a specific effect from the selection, click on that effect on the Effects panel followed by the Remove button. Clicking on a heading on the Effects panel will remove all associated effects.

In Figure 3.3, we remove the haplotype effect 10011 and the interactions 10011×Age and 10011×Gender by clicking on the Gene heading followed by the Remove button. Next, we add haplotype effects 01011, 01100 and 11011 and the interactions 01011×Age, 01100×Age, 11011×Age, 01011×Gender, 01100×Gender, 11011×Gender, and 01011×Age×Gender. Figure 3.4 illustrates the addition of interaction 01011×Age×Gender. The results are shown in Figure 3.5.

For longitudinal studies, the user can specify both fixed and random effects through the Select effects dialog. See the Longitudinal data section under Examples for further detail.

Multiple genes

Consider the multiple gene selection illustrated in Figure 1.3. Click Continue and then click the icon to activate the Select effects dialog. Frequencies are estimated over all genes and haplotypes with frequencies no greater than the Threshold in the joint distribution are excluded from computation. For each gene, haplotypes and their frequencies from the marginal distribution are listed in the corresponding Gene panel. In the Select effects dialog, select haplotype 100 from Gene A and haplotype 11 from Gene B to add the gene-gene interaction 100×11 to the Effects selection. Clicking Calculate gives the result in Figure 3.6.

SNP Analysis

HAPSTAT can be used to analyze the effects of individual SNPs by treating each SNP as a separate gene. By using the linkage disequilibrium information of multiple SNPs to infer the missing SNP values, HAPSTAT provides efficient estimation of SNP effects in the presence of missing data. Figure 3.7 and Figure 3.8 show the estimation results from two models, one including all the five SNPs and one including only SNP4. In Figure 3.7, the model includes the main effects of SNP1-SNP5 and their interactions with Age and Gender, as well as the interactions between SNP1 and SNP2-SNP5.

Summary

In the left panel, HAPSTAT displays the estimates of regression parameters and their standard errors, together with the Wald statistics and two-sided p-values. The lower panel displays the log-likelihood value(s). You can calculate the likelihood ratio statistic to test a set of parameters by fitting the models with and without the set of parameters of interest.

Precision

You may change the decimal precision of the displayed results via the menu option Settings»Precision or the icon on the toolbar. To change the decimal precision for an individual column, right-click on the column header and select Precision from the drop-down menu. In the Precision dialog box, enter the number of digits to follow the decimal point for fixed notation (default) or the maximum number of significant digits for scientific notation. The default precision is 4.

Saving

Select the menu option File»Save to save the effects estimates. To save both frequency and effect estimates, select the menu option File»Save All or click the icon on the toolbar. The results for the case-control data using the options shown in Figures 3.1-3.8 are given in case-control.out.

14 october 2008
new release available
hapstat 3.0
command-line executable for Linux
25 june 2008
new release available
HAPSTAT 3.0
now supporting untyped SNP analysis
17 june 2008
HAPSTAT 2.0 update
29 february 2008
new release available
HAPSTAT 2.0
now supporting longitudinal studies
11 october 2007
now available
hapstat command-line executable for Linux
ENAR 2007
spring meeting
HAPSTAT 1.0 tutorial