%REDMON USERS GUIDE (TEXT VERSION) 

                       Version 1.0


             Prepared by:

	                  Sean O'Brien 		
                        Michael Schell
						
                        Department of Biostatistics
                        University of North Carolina
                        Chapel Hill, NC 27514
	

REDMON6.TXT (5/14/97)


                               A: OVERVIEW

Isotonic regression is a nonparametric method appropriately used when a 
dependent response variable is monotonically related to an independent 
predictor variable.  The regression estimate is a step function which 
reduces the description of n points to L(<=n) level sets. This method 
yields a model consisting of L more-or-less homogenous subpopulations. 
The estimate for each group (an interval in the domain) is equal to the 
average of the response variables for points in the group. 

Under isotonic regression, the number of level sets is often large, 
preventing simple description. The reduced monotonic regression and 
reduced isotonic regression procedures performed by %REDMON improve the 
parsimony of such models by reducing the number of level sets. This is 
accomplished using a backward elimination algorithm to combine groups 
that do not differ significantly from one another. 

The independent variable is assumed to be observed without error. The 
errors in the dependent variable, estimated by the residuals obtained by 
subtracting the reduced isotonic fit from the observed values, are 
assumed to have an independent, identically distributed Gaussian 
distribution with zero mean and constant variance.

REDUCED MONOTONIC REGRESSION VERSUS REDUCED ISOTONIC REGRESSION

Isotonic regression forces the regression estimate to increase or 
decrease in the direction specified by the user. It is appropriate when 
the direction of the association is known with certainty. Reduced 
monotonic regression is a two-sided extension of the reduced isotonic 
method. The direction of the trend is determined by the data. When the 
direction is known, the one-sided version is more powerful for detecting 
differences between adjacent groups. 

CHOOSING A SIGNIFICANCE LEVEL

When the predictor and response variable are monotonically related, the 
appropriate estimate is a step function with at least one step. When the 
predictor and response variables are unrelated, however, the correct 
model is a single flat line. %REDMON will choose the flat line model 
with probability 1-ALPHA under the null. The value ALPHA is the overall type-I 
error probability. It corresponds to the test H0: no trend versus H1: 
isotonic or monotonic trend. The user may specify ALPHA using the ALPHA= 
option or may use the default ALPHA=.05.

The actual number of level sets in the reduced monotonic regression 
model depends on the data and on the significance criterion 
("Significance Level to Stay") used to determine when the elimination 
algorithm ends. The macro chooses this value automatically such that all 
groups will be collapsed with probability 1-ALPHA under the null. Because 
this value is set internally, users do not need to be aware of it. 
Nonetheless, a short description of how this value is chosen is provided 
in the details section. Interested users may over-ride the automatic 
selection of this value using the SLS= option.

References:

Robertson, T., Wright, F. T., Dykstra, R. L. (1988), Order-Restricted 
Statistical Inference, New York: Wiley.

Schell, M. and Singh B., "The Reduced Monotonic Regression Method", JASA 
92:128-35, 1997.


                              B: GETTING STARTED

Before %REDMON can be used it has to appear in your SAS program. It is 
not necessary to re-type the program. Instead use the %INCLUDE statement 
to read the program from a file. For example, if the macro is stored in 
the 'c:\' directory use the command:
 
		%INCLUDE 'c:\redmon.sas'; 

After the %INCLUDE statement, the program may be invoked wherever a PROC 
statement could appear. To do so, submit the command %REDMON, followed by arguments which appear  in parentheses. For example:

		%REDMON(DATA=work.mydata, Y=weight, X=height);


                               ARGUMENTS

Arguments, appearing in parentheses after the word %REDMON, specify the 
model, request special output, and change defaults. The following table 
lists them:

NAME                PURPOSE                            DEFAULT
--------    ----------------------------        ---------------------- 

DATA=		Specify the SAS data set		use last created 
								dataset

X=		Specify predictor variable(s)		[required]


Y=		Specify response variable		[required]


Z=		Specify by-group variable		no by-groups


METHOD=	Specify isotonic increasing, 		monotonic (2-sided)
		isotonic decreasing, or 
		monotonic method 


ALPHA=	Specify target overall type-I 	overall alpha = .05
		error level 
 

SLS=		Specify significance level to 	corresponds to alpha=.05
		stay  corresponds to alpha = 
		in backward elimination of 
		level sets


FREQ=		Name a variable containing 		no weights
		frequencies 
 

PLOT=		Request a high-resolution graphics 	no plots
		plot and specify location
 

OUT=		Request output data sets sets		no output data 
                          EXPLANATION OF PARAMETERS


DATA=

The DATA= argument specifies the name of the SAS data set containing 
your variables. If this argument is omitted, %REDMON uses the most 
recently created data set (_LAST_).

Data set specified: 	%REDMON(DATA=work.mydata, X=height, Y=weight);
Data set unspecified:	%REDMON(X=height, Y=weight);

X= 

The X= argument specifies the name of the predictor variable. %REDMON 
syntax allows more than one predictor to be specified. However, this 
does not result in a multiple regression model. Instead, %REDMON fits a 
separate model for each predictor in the X= list. 

Single predictor:		%REDMON(DATA=work.mydata, X=height, Y=weight);
Multiple predictors:	%REDMON(DATA=work.mydata, X=height wingspan 
shoesize, Y=weight);

Note that X= appears only once and the predictor names are separated by 
blanks, not commas.  
NOTE: %REDMON does not currently handle missing predictor values. 
Including observations with missing values will yield unpredictable 
results. 

Y=

Y= specifies the name of the response variable. Only one response 
variable is allowed. 
NOTE: Observations with missing response variables are eliminated from 
the analysis. 

Z= 	

The "Z=" argument allows separate models to be fit for observations at 
each level of a given stratification variable. If more than one Z-
variable is specified, %REDMON fits separate models for each level 
formed by cross-classifying them.

Stratify by gender:	%REDMON(DATA=work.mydata, Y=weight, X=height, 
					Z=gender);
Stratify by 
gender*race:		%REDMON(DATA=work.mydata, Y=weight, 
					X=height, Z=gender race);

METHOD= 

RECOGNIZED OPTIONS:  METHOD=up   METHOD=down   METHOD=best

%REDMON performs reduced monotonic regression by default. This means 
that the macro determines the direction of the trend from the data. When 
the direction of the trend is known, reduced isotonic (antitonic) 
regression is more appropriate. This one-sided method uses lower 
critical values than reduced monotonic regression, corresponding to 
greater power.

The METHOD= argument is used to request isotonic regression with the 
direction specified. The following values are allowed:  'up' (for 
increasing trend, often called isotonic), 'down' (for decreasing trend, 
often called antitonic), and 'best' (for monotonic). When multiple 
predictors are included, it is possible to specify a different method 
for each. The first method in the list corresponds to the first 
predictor, the second method to the second predictor etc. 

Single predictor:		%REDMON(DATA=work.mydata, X=height, Y=weight, 
					METHOD=up);
Multiple predictors:	%REDMON(DATA=work.mydata, X=height wingspan 
					shoesize, Y=weight, METHOD=up best down);

ALPHA= 

Reduced monotonic (isotonic) regression improves the parsimony of the 
conventional isotonic regression model by combining groups that do not 
significantly differ. When the predictor and response variables are 
unrelated, the correct model collapses all groups into a single one. 
This occurs with probability 1-ALPHA under the null. The value ALPHA is the type-I error rate for the test H0: no trend versus H1: isotonic or 
monotonic trend. By default, the target ALPHA = .05. The ALPHA= option is used to specify other values for ALPHA.

Overall ALPHA specified:	%REDMON(DATA=work.mydata, X=height, 							Y=weight, ALPHA=.1);

NOTE: ALPHA values are approximate, not exact. Appproximation is accurate  for .01 < ALPHA < .10 and  sample size 20 < n < 800. (See details.)


SLS=  

Level sets are eliminated using a backward elimination algorithm which 
combines adjacent groups one at a time. The algorithm ends when each 
group in the model produces F statistics significant at the SLS= level. 
By default, the SLS= value is chosen internally as a function of the 
desired overall type-I error probability, i.e the probability ALPHA that all  groups are combined into a single one under the null hypothesis. Unless the user wishes to have direct control over the number of level sets  eliminated, this option should not be used. ALPHA= and SLS= should never  both be specified. 

SLS specified:		%REDMON(DATA=work.mydata, X=height, Y=weight, 
					SLS=.001);

NOTE: When SLS= is specified directly, the overall type-I error rate is 
no longer controlled. The SLS= value will always be smaller than ALPHA since SLS= is a comparison-wise signifance level and ALPHA refers to an overall error rate which accounts for multiple comparisons. (See details.)

FREQ= 

Like the FREQ statement in PROC REG or PROC LOGISTIC, the FREQ=  
argument specifies a variable whose values represent frequencies. When 
this option is used, each observation in the input data set is assumed 
to represent n observations, where n is the value of the FREQ variable 
(SAS/STAT Users Guide Version 6). The analysis produced using FREQ= is 
the same as an analysis produced using a data set that contains n 
observations in place of each observation in the input data set. Note 
that the sample size used for determining SLS is considered to be equal 
to the sum of the values of the FREQ variable. Using the ALPHA= option 
will yield a conservative test due to the tied observations.


FREQ var named 'frq':		%REDMON(DATA=work.mydata, X=height, 
						Y=weight,FREQ=frq);

PLOT=  

RECOGNIZED OPTIONS: PLOT=screen    PLOT=FILE    PLOT=FILE directory

The PLOT= argument requests a high resolution plot to be printed, either 
to a postscript file or to the display manager default device. To print 
to display manager, use the command PLOT=screen. To print to file, use: 
PLOT=file. This creates a file named '_PLOT1.PS'. If such a file exists 
already, it is overwritten. If multiple plots are printed in a single 
%REDMON invocation then the files are numbered sequentially i.e.  
'PLOT1.PS', '_PLOT2.PS' etc. If PLOT=file is used, it is also possible 
to specify the directory in which to store the files. This is done by 
including the name of the directory after the keyword 'file'. 

Plot to screen:		%REDMON(DATA=work.mydata, X=height, Y=weight, 						PLOT=screen);

Plot to file:		%REDMON(DATA=work.mydata, X=height, Y=weight, 						PLOT=file);

Plot to file in 		%REDMON(DATA=work.mydata, X=height, Y=weight, 'c:\plots\' 			PLOT=file c:\plots\);
directory:


OUT=

OUT=yes requests that two data sets be output for each model. _FINAL1 
contains one observation for each level set. It provides the sample 
size, range of predictor values, predicted response, and standard 
deviation for each level set. _FIT1 contains one observation for each 
observation in the input data set. It provides the isotonic fit, reduced 
isotonic (monotonic) fit and residual for each observation. If multiple 
models are specified the files are numbered sequentially,  _FIT1, _FIT2, 
... and _FINAL1, _FINAL2,... If files with the same names exist, they 
are overwritten.

Request output data sets:	%REDMON(DATA=work.mydata, X=height, 
						Y=weight, OUT=yes);


                                DETAILS

Isotonic regression minimizes the sum of squares of deviations from the 
model to the data under the restriction that the fit is non-decreasing, 
i.e. E(Y|X = x) is monotonic in X. This is accomplished using the pooled 
adjacent violators algorithm (PAVA). %REDMON implements this algorithm 
in a DATA STEP. 

%REDMON implements the level sets backward elimination algorithm using 
PROC REG with SELECTION=BACKWARD. L-1 dummy variables are used to 
identify groups. The model is parameterized such that elimination of a 
predictor variable corresponds to replacing two adjacent level sets with 
a single one.

MISSING PREDICTOR AND RESPONSE VALUES

%REDMON does not currently handle missing predictor values. For best 
results, eliminate observations with missing predictor values in a data 
step prior to invoking the macro. Missing response variables are 
allowed. Observations with missing response values are eliminated from 
the analysis.  

ALPHA and SLS

The SLS value chosen by %REDMON to yield a given ( are based on simulation results described in Schell and Singh, 1997. Table 1 of this 
article provides estimates of SLS for three values of ALPHA (ALPHA=.01,.05,.10)  and five sample sizes (n=10,20,50,200,800). To handle other values of ALPHA and other sample sizes, %REDMON uses interpolation.

The interpolation method used by the macro is appropriate for  .01 < ALPHA < .10 and 20 < n < 800. When ALPHA and n are outside of this range the same formulas are used. However, no simulations have been conducted do determine their accuracy.  For sample sizes where 10 < n < 20 some guidance is given in Table 1 of Schell and Singh, 1997.

MEMORY AND SOFTWARE REQUIREMENTS

%REDMON requires SAS version 6.11 or later. An implementation for older 
SAS versions is available upon request.

%REDMON's implementation of the pooled adjacent violators algorithm 
(PAVA) is optimized for the case where the entire data set (predictor, 
response, and weight variables plus other temporary variables) fits in 
memory. We have not experienced any problems with this limitation. An 
implementation for large data sets is available upon request.

The high resolution graphics option was developed on PC SAS version 6.11 
for Windows and tested successfully on OS/2 and UNIX platforms. Comments 
on how to improve this and other aspects of the program are welcome and 
appreciated.


Please contact one of the authors to request the macro or report bugs. 

Sean O'Brien
sobrien@bios.unc.edu

Michael Schell
mschell@bios.unc.edu