Skip to main content

Chixiang Chen, PhD

Academic Title:

Assistant Professor

Primary Appointment:

Epidemiology & Public Health

Secondary Appointment(s):


Education and Training

Postdoctoral researcher, University of Pennsylvania, 2021.

Ph.D. in Biostatistics, Pennsylvania State University, 2020.


Dr. Chixiang Chen is an Assistant Professor in Biostatistics who joined the Division of Biostatistics and Bioinformatics, Department of Epidemiology and Public Health, University of Maryland School of Medicine since 2021. Dr. Chen has worked across both theoretical and applied areas of statistics, also has extensive interdisciplinary collaborations including clinical trial design and analysis in neuroscience, medicare claim data, electronical health records, imaging data analysis, oncology research, among others. These research has resulted in >15 first-author/senior-author peer-reviewed publications (in total > 55 publications)

Dr. Chen has been devoted to advancing statistical methods in large-scale observational studies and real-world data (e.g., Medicare Claims, UKB), encompassing diverse areas, including causal inference, data integration, unsupervised clustering, machine learning, single-cell data analysis, longitudinal data analysis, missing data analysis, biological age, among others. The applications of these methods aim to better depict patient-level heterogeneity for treatment effects, identify high risk sub-cohort leading poor health outcomes, and predict health outcomes promoting early intervention.

Please find details from my Personal Website:

Research/Clinical Keywords

Statistical Method: Data/Information Integration; Causal inference and machine learning; Trajectory analysis; Federated learning; Cell Type Deconvolution; single cell data analysis. Biomedical Studies: Clinical studies in neuroscience; Biological Aging; Alzheimer's disease; Post-fracture recovery; Single-cell transcriptomics.

Highlighted Publications

Selected Publications in Biostatistics and Bioinformatics (* corresponding author):

  • Chen, C.*, Wang, M., and Chen, S. (2023). An efficient data integration scheme to synthesize information from multiple secondary outcomes to the main data analysis. Biometrics. 79(4):2947–2960.
  • Chen, C.*, Leung, Y., Ionita, M., Wang, L.-S., and Li, M. (2022). Omnibus and Robust Deconvolution Scheme of Bulk RNA Sequencing Data via Integrating Multiple Single-cell reference sets and Prior Biological Knowledge. Bioinformatics. 38(19), 4530–4536.

Selected Publications in interdisciplinary science:

  • Carney, C. P., Pandey, N., Kapur, A., Saadi, H., Ong, H. L., Chen, C., ... & Kim, A. J. (2023). Impact of Targeting Moiety Type and Protein Corona Formation on the Uptake of Fn14-Targeted Nanoparticles by Cancer Cells. ACS nano, 17(20), 19667-19684.


Research Interests

Dr. Chen has broad interests in both theoretical methodology development and statistical application in multidisciplinary areas. Recently, he is especially interested in developing robust statistical frameworks to integrate information from multiple data sources, with applications in both clinical and genomics studies. Many ongoing works involve survival analysis, causal inference, single-cell data analysis, among others, under the umbrella of the proposed framework.

  1. REAL-WORLD DATA/INFORMATION INTEGRATION: In the era of big data, it is imperative to have methods that can incorporate information from the real world to improve statistical power and results reproducibility to a local and primary study. One data integration scheme we have developed is vertical integration (VI), which borrows information from numerous secondary variables within one study, such as longitudinal records of biomarkers and surrogate outcomes, to the main trait. In practice, these measurements are often treated in secondary analyses; thus, they are not directly applicable to the principal analysis yet. We proposed an efficient and robust scheme of multiple information borrowing from secondary outcomes, named MinBo, which is shown to be robust to the mis-specification of any secondary models and can substantially improve the statistical power for the primary endpoint analysis. MinBo particularly fits clinical trials and epidemiological studies where multiple outcomes are of interest, including the work proposed in this application. In addition to VI, we are currently developing a horizontal integration (HI) scheme by synthesizing summary information from external studies. The developed HI framework can be flexibly accommodated to many promising applications, such as causal inference, imaging studies, and multiomics studies.
  2. STATISTICAL TOOLS IN OMICS DATA ANALYSIS AND AGING: Cell-type deconvolution in bulk tissue RNA sequencing (RNA-seq) data is essential to understanding cell-type composition variation between aging-related conditions. Due to recent advances in single-cell RNA sequencing (scRNA-seq) and the availability of a large amount of bulk RNA-seq data in disease-relevant tissues, increasingly sophisticated deconvolution methods have been developed. However, the performance of existing methods heavily relies on the quality of information provided from external data, such as the selection of scRNA-seq data as reference and prior biological information. A motivation for developing new methods is to maximally utilize publicly available data to perform more effective and robust cell-type deconvolution. Specifically, I have developed a unified method, named InteRD, by integrating information from marker genes, scRNA-seq data, and prior biological knowledge from other studies to better deconvolution cell type information. Many critical downstream analyses have been done based on deconvoluted cell-type proportions, including association analysis with HbA1c-marker and cognition conditions, which can also be used to recover biological age. This methodologic expertise will allow me to maximally contribute to the proposed research, leveraging several different analytic approaches to conduct sophisticated and novel longitudinal analyses of multiple linked data types and sources.

    Both measured and unmeasured confounding and limited sample size are significant challenges for epidemiological and biological investigations in observational studies. The former issue can lead to biased inference, while the latter raises concerns of insufficient power and consequently reduced reliability of findings. Due to these limitations, the findings of many observational studies may become questionable under scrutiny. I have developed valuable tools to address these issues under various contexts and using different data sources. These include a new ensemble-learning assisted causal inference framework in brain aging studies, unsupervised-learning assisted semiparametric estimation in rehabilitation studies by using Medicare claims, robust marginal structure models in the study of Alzheimer’s Disease, and high-dimensional mediation analysis in multi-modal imaging studies, among others. These advanced causal inference computation methods are essential for the proposed work, which is highly innovative and significant in its clinical and methodologic arenas. 

  4. NETWORK DETECTION AND FEATURE SELECTION: One aspect we are working on is to develop a statistical framework to recover dynamic networks from static-state data. Collecting temporal or perturbed data is a prerequisite for reconstructing dynamic networks. However, these data types are seldom available for genomic studies in medicine, significantly limiting the use of dynamic networks to characterize the biological principles underlying human health and diseases. Our proposed framework makes reconstruction possible from steady-state data by introducing an agent and incorporating a varying coefficient model with ordinal differential equations, which is named as DRDNet Multiple networks can be inferred corresponding to covariate effects that are linked to known or latent agents, such as disease risk. This high-dimensional feature selection framework can be leveraged to enhance a wide range of theoretical and practical applications across disparate fields and disciplines. Our team has developed and disseminated multiple R packages in these fields to facilitate large-scale dissemination and sharing of our expertise.
  5. COLLABORATION IN SCIENCE: Besides statistical methodology development, I have worked on multiple projects with collaborators in various disciplines. These collaborations include gerontology, cardiology, biochemistry, and neuroscience. I have experience in data management and preprocessing (which allows me to supervise new analysts and mentor students who oversee this aspect of the research), modeling and statistical analysis, result evaluation and interpretation, and dissemination of study findings in peer-reviewed manuscripts, presentations, and grant applications. I am thrilled to be a collaborative biostatistician contributing to innovation and discovery in public health, clinical research, and other scientific fields.


Awards and Affiliations

RCCN Travel Awards for Early Career Investigators, RCCN. Dec. 2023.

Flexible, High Value Early Career Faculty Award, OAIC CC. Nov. 2023.

Dean's Award for Scholarly Achievement, College of Medicine, Penn State Univ. May.2021.

Alumni Society Award, College of Medicine, Penn State Univ. Sep.2019.

JSM Student Paper Award, ASA Nonparametric Section. Jul.2019.

Scholarship Award, the 24th Summer Institute in Statistical Genetics, Univ. of Washington. Jul.2019.

Biopharm-Deming Student Scholar Award, ASA Biopharmaceutical Section and the 74th Annual Deming Conference on Applied Statistics. Dec.2018.

Student Paper Award, ICSA Applied Statistics Symposium. Jun.2018.

Grants and Contracts

Selected Grants:

  • Funded (01/2023-01/2028) (Co-Inv; PI-Graeme Woodworth) CA269995 "Nanotherapeutic enhancement of interstitial thermal therapy for glioblastoma " R01, NIH/NCI.
  • Funded (09/2021-05/2026) (Co-Inv; PI-Michelle Shardell) AG069915 "Methods to Test Lifestyle, Vaginal Microenvironment, and Genitourinary Symptoms across Menopause Transition" R01, NIH/NIA.
  • Funded (03/2024-02/2028) (Co-Inv; PI-Rozalina McCoy) DK135515 "Comparative Effectiveness and Safety of Metabolic/Bariatric Surgery, GLP-1, and SGLT-2 Medications for Patients with Obesity and Type 2 Diabetes" R01, NIH/NIDDK.
  • Funded (07/2022-06/2024)  (Co-Inv; PI-Daniel Harrison) MS210103  "Adaptive Optics Retinal Imaging in Multiple Sclerosis" CDMRP, DOD.
  • Funded (06/2022-05/2025)  (Co-Inv; PI-Daniel Harrison) RG-2110-38460 "Development of a Convolutional Neural Network for MRI Prediction of Progression and Treatment Response in Progressive Forms of Multiple SclerosisNational Multiple Sclerosis Society.

Community Service

Conference Service:

Editor, ICSA Bulletin. Jan. 2024-Present

Session Chair, ENAR 2024 Spring Meeting, Mar.2024

Session Chair, 2023 JSM, Aug.2023

Editorial Assistant, ICSA Bulletin 2021 Issue. Dec.2020-Jan.2023

Session Chair, ENAR 2021 Spring Meeting, Online. Mar.2021

Session Chair, ENAR 2019 Spring Meeting, Philadelphia, PA. Mar.2019

Organizer for Biostatistics Student Seminar, Penn State Univ. Sep.2018-May.2020

Chair, Hershey Chinese Students and Scholars Association. Jan.2018-Sep.2019


Paper Review Service as a Reviewer:

Bioinformatics, Biometrics, Annuals of Applied Statistics, Statistics in Medicine, Journal of Biopharmaceutical Statistics, Biometrical Journal, BMC Medical Research Methodology, European journal of neuroscience, Computers in Biology and Medicine, among others

Professional Activity

  • Invited talk. The Effect of Alcohol Consumption on Brain Ageing: A New Causal Inference Framework for Incomplete and Massive Phenomic Data, National Cancer Institute, Bethesda, Mar. 2024.
  • Invited talk. A Joint Learning for Analyzing a National Geriatric Centralized Network: A NewToolbox Deciphering a Real-world Complexity, ENAR, Baltimore, Mar. 2024.
  • Invited talk. Robust Machine Learner for Mean Potential Outcome With Information Integration From Auxiliary Data. JSM, Toronto, Aug. 2023.
  • Invited talk. An Efficient Data Integration Scheme to Synthesize Information from Multiple Secondary Outcomes into the Main Data Analysis, SLAM, JHU, April 2022.
  • Invited talk. Robust Schemes for Incorporating Multiple Auxiliary Information in Biomedical Studies, BRIGHT, UMD, Nov. 2021.
  • Contributed paper presentation. A Generalized Weighting Scheme to Integrate Information from Multiple Auxiliary Records to the Main Study, ENAR Spring Meeting,
    Baltimore, MA. Mar.2021
  • Contributed paper presentation. Informative dynamic ODE-based-network learning (IDOL) from steady data, ENAR Spring Meeting, Nashville, TN. Mar.2020
  • Paper award presentation. A robust consistent information criterion for model selection based on empirical likelihood, JSM, Denver, CO. Jul.2019
  • Contributed paper presentation. A robust consistent information criterion for model selection based on empirical likelihood, ENAR Spring Meeting, Philadelphia, PA. Mar.2019
  • Scholar award poster. A robust consistent information criterion for model selection based on empirical likelihood, The 74th Annual Deming Conference, Atlantic City, NJ. Dec.2018
  • Paper award presentation. Empirical likelihood based criteria for model selection on marginal analysis of longitudinal data with dropout missingness, ICSA Applied Statistics Symposium, New Brunswick, NJ. Jun.2018
  • Contributed paper presentation. Empirical likelihood based criteria for model selection on marginal analysis of longitudinal data with dropout missingness, ENAR Spring Meeting, Atlanta, GA. Mar.2018