Donglin Zeng
Professor
CoDirector, Carolina Survey Research Laboratory
Department of Biostatistics
Gillings School of Global Public Health, CB#7420
University of North Carolina at Chapel Hill, 275997420
I am a statistician, or a mathematical statistician, or a biostatistician.
I can also be called a "data scientist" if this is more attractive to you
(not to me thoughsounds as if I were a lab guy who manufactures numbers).
Education
B.S.(major in pure mathematics), Department of Mathematics, University of Science and Technology of China,
1993
M.S. (major in
partial differential equations), Department of Mathematics, University of Science and Technology of
China, 1995
Ph.D., Department of Statistics, University of Michigan, 2001
Teaching Activities
Course I. Semiparametric Models in Health Science, Fall 2002
Course II. Advanced Probability and Statistical Inference (I), Fall
2004
Course III. Advanced Probability and Statistical Inference (I), Fall
2005
Course IV. Advanced Probability and Statistical Inference (I), Fall
2006
Course V. Advanced Probability and Statistical Inference (I), Fall
2007
Workshop. Empirical Processes and Statistical Applications, JulOct
2008
Course VI. Advanced Probability and Statistical Inference (I), Fall
2009
Course VII. Statistical Learning and
HighDimensional Data, Spring 2011
Course VIII. Advanced Probability and Statistical Inference (I), Fall
2011
Course VIV. Advanced Survey Sampling Methods,
Spring
2013
Course X. Advanced Probability and Statistical Inference (I), Fall
2015
Course XI. Advanced Probability and Statistical
Inference (I), Fall
2016
Course XII. Statistical Learning and Personalized Medicine, Spring 2018
Research Interest
My current interest is method development on machine learning, personalized medicine, semiparametric
inference and high dimensional inference, with particular applications to EHR data, clinical trials, survival
data and genetics in biomedical studies. Other applied areas I involve
include medical diagnostics in breast imaging and AIDS.
Some
recent work can be found in my curriculum vitae.
Some
algorithms and codes related to
my research on semiparametric transformation models are also available.
Another page displays simulations for the
metaanalysis.
My personal research interest, probably future interest, also includes linear/nonlinear
functional analysis, which I respectfully view as the most pretty pearl in
abstract sciences.
Selected Personal Views
 I often think about where we, as statisticians, are in the era of Big Data. As more and more data emerge these days, researchers from all different fields including us are certainly excited to seek more data evidence to improve and test their own models and methods. However, Big Data does not necessarily mean Big Evidence and it is usually not; moreover, Cheap Data are often Poor Data. Therefore, it is important for us to ensure scientific quality of data, representativeness of data, and generalizability of findings, which is exactly the whole driving force of Statistics in terms of handling random errors, sampling and probabilistic inference. By saying that, I think that we should maintain this important value and unique identity of our researches, instead of being torn away by casespecific methods purely driven by individual data and specific tasks. Evidencebased decision making should always be combined with theoryguided decision making! At the same time, we need to learn from different disciplines, especially computer science, to effectively handle data storage, processing and analysis, combine global analytic objectives and local analytic objectives, and eventually efficiently translate our work into practice.
I create this pyramid for data science (I am sure that workforce in data science has the same pyramid shapehope salary too :)).
Data Science Pyramid
!!! As statisticians, we should be proud where we are in the pyramid  it is us who can directly interact with and influence decision makers and clinical usersHOORAY!!!
 Highdimensional variable selection has been the most eyecatching topic in journal publications. However, only till recently do we start to look into postestimation inference concerning nonregularity of inference. I think that this should be the most important goal from now on, given plenty of regularization methods in this areamaybe every single statistical paper on highdimensional estimation should have a big component on obtaining correct inference (not pointwise but should be locally regular). For prediction purpose, certain oracle selection may be a good choice; however, concerning inference, nonoracle selection should be used, which exactly reflects the fact of Uncertainty Principle (tradeoff between precision and stability). I notice there have been interesting development from Zhang, van der Geer and Belloni's groups using orthogonalization ideas. However, one of my general thoughts is whether we can cast this problem into semiparametric inference problems (for semiparametric models, nuisance parameters are usually infinitedimensional so certainly highdimensional), treating the vector of coefficients in some l_2 space (infinite dimensional vector) so the sparsity being some compact set of this space. There have been wellknown results in semiparametric framework to construct efficient estimation for parameter of interest and obtain valid inference even if the nuisance parameter is estimated far slower than parametricrate. I am sure that those results can be useful. Furthermore, we can not only obtain correct inference for one single coefficient but also some "smooth" functional of the whole vector of coefficients (this is known in semiparametric inference). This may be a good direction towards to highdimensional inference with local regularity, and certainly, it will revive interest of learning semiparametric inference from our students.

I wrote a short blog regarding causal inference and I know the same concept has been discussed by other before. I am not trying to change any existing causality framework but just think it may be useful to accelerate drug development, given the difficulty of conducting a perfect randomization study but with the increasing knowledge of patient's information in the Big Data era.
Selected Publications
1. D. Zeng, Estimating Marginal Survival Function by Adjusting for
Dependent Censoring Using Many Covariates, Annals of Statistics, 2004.
2.
D. Zeng, Likelihood Approach for Marginal Proportional Hazards Regression
in the Presence of Dependent Censoring, Annals of Statistics, 2005.
3.
D. Zeng and J. Cai, Asymptotic Results for Maximum Likelihood Estimators
in Joint Analysis of Repeated Measurements and Survival Time, Annals of Statistics, 2005.
4.
D. Zeng, D.Y. Lin and G. Yin, Maximum Likelihood Estimation in the
Proportional Odds Model with Random Effects, Journal of the American Statistical Association, 2005.
5.
D. Zeng, G. Yin and J. Ibrahim, Inference for a Class of Transformed Hazards Model,
Journal of the American Statistical Association, 2005.
6.
D.Y. Lin and D. Zeng, LikelihoodBased
Inference on Haplotype Effects in Genetic Association (with
discussions),
Journal of the American Statistical Association, 2006.
7.
D. Zeng, G. Yin and J. Ibrahim, Semiparametric Transformation Models for Survival Data
with a Cure Fraction,
Journal of the American Statistical Association, 2006.
8.
D. Zeng and D.Y. Lin, Maximum likelihood
estimation in semiparametric
transformation models for counting processes
, Biometrika, 2006 and its technical report.
9.
D. Zeng and D.Y. Lin, Semiparametric Transformation
Models With Random
Effects for Recurrent Events,
Journal of the American Statistical Association, 2007.
10.
D. Zeng and D.Y. Lin,
Maximum Likelihood Estimation
in Semiparametric Models with Censored Data (with discussion).,
Journal of the Royal Statistical Society B, 2007.
11.
D. Zeng and D.Y. Lin,
Efficient Estimation in the Accelerated Failure
Time Model,
Journal of the American Statistical Association, 2007.
12.
Q. Chen, D. Zeng and J. Ibrahim, Sieve Maximum Likelihood
Estimation for Regression Models with
Covariates Missing at Random, Journal of the American Statistical
Assoication, 2007.
13.
B. Johnson, D. Y. Lin, and D. Zeng, Penalized Estimating
Functions and Variable Selection in Semiparametric Regression
Models, Journal of the American Statistical
Assoication, 2008.
14.
G. Yin, H. Li, and D. Zeng, Partially Linear
Additive Hazards Regression with Varying Coefficients, Journal of the
American Statistical
Association, 2008.
15. D. Zeng, Q. Chen and J. Ibrahim, GammaFrailty Transformation Models for Multivariate
Survival Times, Biometrika, 2009.
16. D. Zeng and D.Y. Lin, A Generalized Asymptotic
Theory for Maximum Likelihood Estimation in Semiparametric Regression Models
with Censored Data, Statistica Sinica, 2009.
17. D. Zeng and J. Cai Semiparametric Additive Rate Model for
Recurrent Events with Informative Terminal Event, Biometrika,
2010.
18. D.Y. Lin and D. Zeng On The Relative Efficiency of Using Summary
Statistics versus Individual Level Data in MetaAnalysis
, Biometrika, 2010.
19. Y. Wang, T. Chen and D. ZengSupport Vector Hazards Machine: A Counting Process Framework for Learning Risk Scores for Censored Outcomes, Journal of Machine Learning Research, 2016.
20. Y. Wang, P. Wu, Y. Liu, C., C. Weng and D. Zeng Learning Optimal Individualized Treatment Rules form Electronic Health Record Data, IEEE International Conference on Healthcare Informatics: ICHI 2016 proceedings, 2016.