Documentation of EG13PC  (v0.5-pc)
                                          last update: BQ 7/10/1991


Purpose: fitting multivariate binary regression models that allow
-------
  more than one class in each cluster, and a different
  regression for each class and for the dependence between and
  within classes. Allows the choice between GEE1 and GEE2.

References:
-----------
Liang, Zeger and Qaqish (1989). "Multivariate Regression Models using
  Generalized Estimating Equations" Technical Report, Department
  of Biostatistics, The Johns Hopkins University, School of Hygiene and
  Public Health. 
  Later published in 1992, JRSS-B (with discussion).

Qaqish and Liang (1990). "Marginal Models for Correlated Binary Data with
  Multiple Classes and more than one Level of Nesting"
  Technical Report, Department of Biostatistics, The Johns Hopkins
  University, School of Hygiene and Public Health.
  Later published in 1992, Biometrics.


Version: 0.5-pc Beta test version.
-------

Environment: IBM-PC running DOS
-----------
  The program does not require a math processor. However if one is installed
  in the system it will be used.
  The program was compiled with Turbo Pascal 5.5
  ((C) Borland International)

Necessary files: To run the program the following files are needed
---------------
  EG13PC.EXE    : the executable code
  X.DAT         : a data file
  X.CTL         : a control file

Output: X.LST  : the output
------

To run the program: From the DOS command line issue:
------------------
       EG13PC   X.CTL   X.LST

  Notice that the control file X.CTL tells the program what the data file
is. This allows the use of several control file with the same data file.

Data file format: Free format with one record per observation.
----------------
  The variables are: Cluster id
                     The class number
                     The response variable (y = 0/1)
                     The regressor(s)
The data and control files record length must be <= 255.


Control file format:
-------------------
  The control file can have RECFM F or V, maximum LRECL is 80.


The first  and second records are titles that will be printed on the
  output file.

The third  record is the data file name.

The fourth record contains an integer, the number of classes. The
  maximium allowed is 12.

The next record contains an integer, the number of variables that follow
  the response in the data file. It is not necessary that all these
  variables be used in the regressions. max = 64.

The next record contains two integers, i1 and i2:
 i1 = number of parameters.
 i2 = number of parameters for main effects.
 Naturally i1 is greater than or equal to i2. (not checked)
 It must be arranged so that the odds-ratios parameters
 are the last in the parameter vector. (not checked)

The next record contains a real number, the convergence criterion.
  Iteration stops when the sum of the absolute changes in all
  parameters between two iterations is less than that number or
  the maximum number of iterations is reached, whichever occurs
  first.

The next record contains an integer, the maximum number of iterations.

The next record contains an integer i1, say. If i1 = 1 then the
  current estimates of the parameters will printed at each iteration.

The next record contains an integer, i.
  i = 1 : GEE1 (has a bug, do not use)
  i = 2 : GEE2

(7/10/1991 :
The PC PASCAL version has a bug in the GEE1 implementation. So this
option should not be used. The bug will be fixed in the upcoming C version.)

The next  record contains an integer i1, say. If i1 = 1 then the
 Zhao and Prentice formulae for third and fourth order moments will be
 used. if i1 = 2 then the exact solution will be used for these moments.

The next record is ignored.

The next and following records, as many as there are parameters,
  specify labels for the parameters. These will be used to label the
  output. Only the first 16 characters will be used.

The next record is ignored.

The next record(s) contain initial values for the parameters.
  These may span one or more records.

The next record is ignored.

The next records specify the regressions. If the number of
  classes is C, then   C + C + {C * (C-1) / 2}     records are required.
  C specifications for the regressions for each class.
  C specifications for the regressions for the within class odds
    ratios.
  C * (C-1) / 2   specifications for the regressions for the
                  between class odds ratios.
examples: C          number of specifications
          1          2
          2          5
          3          9
          4         14
          5         20
          6         27
          7         35
          8         44
Each regression is specified by a sequence of integers as follows:
       i1 i2 i3 i4 i5 i6 ...
where i1 and i2 are class numbers. To specify the regression for
the main effects for a class set i2 = 0. i3, i5, .. are the
parameter indices. i4, i6, ... are the regressor indices.
If B is the regression parameter and x is the vector of
regressors in the data file then the regression will be
  B(i3)*x(i4) + B(i5)*x(i6) + ...
If i3 = 0 then that regression is set to 0.
 It must be arranged so that the odds-ratios parameters
 are the last in the parameter vector. (not checked)
Each parameter should appear at least once in the regression
specifications.
Order of specification not important.

Any additional records will be ignored.

Note: extra text following numbers on the control file is allowed
 except on the following:
  record number 4: the data file name
  the record(s) specifying the initial parameter values
  the record(s) specifying the regressions.
 This is demonstrated by the example control file.


Current program limits:
--------------
maximum number of classes = 12
maximum cluster size = 8
maximium number of observations: The sum of
  n + n * (n-1) / 2
 over all clusters must be <=1000, where n is the cluster size.
maximum number of parameters = 10
maximum number of potential regressors in the input file = 64
maximum number of iterations that could be specified = 100

Technical notes:
---------------
The values of the regressors used in the regression for the
within and between class associations should be the same for all
members of any given cluster. The program currently uses the
values from the last member in each cluster. Don't rely on this
"feature". This will change in future versions of the program.

The program does a fair amount of checking on the control file
and the data file. However it is not an exhaustive check.

The model specification is very flexible. Completely ridiculous
models can be specified. The program has no way of recognizing these.
Care is needed here.


Example control file:
--------------------
--  Title1: example control file --
--  Title2:  --
COPD1.DAT
2         = number of classes. Suppose class 1 = P, class 2 = S
6         = dim (x): x1 x2 x3 x4 x5 x6
9 6       = total=9, main effects=6
0.001     = convergence criterion
50        = maximun number of iterations
1         = print current estimates each iteration. 1=yes, 0=no
2         = 1=GEE1, 2=GEE2
2         = exact    2=exact  1=Z&P approx.
labels for beta:  these will appear on the output file
1 Intercept
2 Sex (F)
3 Race (B)
4 Age-50
5 Smoker
6 Ex smoker
7 P.P
8 S.S
9 P.S
Initial estimates:
-0.83188 -0.80439 -0.91741
0.03796 1.14924 0.39144
0.93362 0.934 0.934
model specification:
1 0    1 1 2 2 3 3 4 4 5 5 6 6
2 0    1 1 2 2 3 3 4 4 5 5 6 6
1 1    7 1
2 2    8 1
1 2    9 1
-- End of the control File --

The model specified above is:
 Main effects:
   For class 1 (P):
     logit(Pr{Y=1}) = B1*x1 + B2*x2 + B3*x3 + B4*x4 + B5*x5 + B6*x6
   For class 2 (S):
     logit(Pr{Y=1}) = B1*x1 + B2*x2 + B3*x3 + B4*x4 + B5*x5 + B6*x6
 Odds ratios:
   Within class 1 (P.P):
     log(odds ratio) = B7*x1
  Within class 2 (S.S):
     log(odds ratio) = B8*x1
  Between classes 1 and 2 (P.S):
     log(odds ratio) = B9*x1

Suppose that a model with no association between classes 1 and 2 is
required. Then the last line of the control file should be:
       1 2  0
the sixth record becomes
       8 6       = total=8, main effects=6
and the label for B9
       9 P.S
should be deleted.
