library(survival) source("RISTfunctions.txt") K=5 # number of covariate considered per spilt nmin=6 # minimum number of observed data in each node M=50 # number of trees in each fold L = 2 # number of folds tao=6 # length of study n=200 # training sample size nval=200 # test sample size dataX = data.frame(replicate(P, runif(n+nval))) colnames(dataX) <- paste("X", c(1:P), sep="") mu=sin(dataX$X1*pi)+2*abs(dataX$X2-0.5)+dataX$X3^3 y=rexp(n+nval, 1/mu) C=runif(n+nval, 0, tao) censor=(y<=C) obtime=apply(cbind(y,C), 1, min) dataset=data.frame(cbind(dataX, censor, obtime)[1:n,]) testset=data.frame(cbind(dataX, censor, obtime)[(n+1):(n+nval),]) R_Muti_ERT_build = Muti_ERT_fit(dataset, M, K, L, nmin, SupLogRank=1, tao, impute="random") R_Muti_ERT_predict= Muti_ERT_Predict(testset, R_Muti_ERT_build$Forest_seq[[L]], R_Muti_ERT_build$SurvMat_seq[[L]], R_Muti_ERT_build$time_intrest) ############################################### ############# read me !!! ################### ############################################### 1. Use "Muti_ERT_fit" to fit the model, and use "R_Muti_ERT_predict" to predict new subjects Dataset and testset must be arranged in the following order: (X, cencoring indicator, time) where "cencor = 1" means failure, and "0" means censored. "R_Muti_ERT_predict" gives three outputs: 1) Predicted min(T, tao). 2) Predicted survival function for each subject (one row for each subject, one column for each time point) 3) All time points 2. specify parameters: K # number of covariate considered per spilt, usually sqrt(p) or p/3 nmin # minimum number of observed data in each node, default is 6. M # number of trees in each fold, default is 50 L # number of folds, 1-5 are recommended. tao # lengh of study. Must be larger than all survival times. As long as it is larger than all survival times, the value of it will not make any difference. SupLogRank # "1" is default, using log-rank test to find best split. "2" is using sup log-rank test to find best split. This could be time consuming. "0" is using t-test to compare two groups, not recommended. impute # imputation method, always use "random". Please do not change this. 3. Since all computations are done in R (I'm still working on the C version), it can be a little bit slow for large data set. You could first try a small value of M and L=1 see how much time would be needed. The total computational time should grow linearly as M*L grows.