Skip to main content

Arun Das, PhD, MSc, BS

Academic Title:

Research Fellow

Primary Appointment:

Institute for Genome Sciences

Education and Training

Ph.D., Computer Science, Johns Hopkins University, Baltimore, MD (2025)

M.Sc., Computer Science, Johns Hopkins University, Baltimore, MD (2021)

Sc.B., Computational Biology and Computer Science, Brown University, Providence, RI (2018)

Biosketch

Dr. Arun Das works on algorithms for improving and accelerating analyses of large genomic datasets, with a focus on improving the accessibility to large genomic data, understanding the representation of particular groups in existing genomic data, and developing novel approaches to analyze new and existing data.

His past work involves a first-of-its-kind approach to accelerating search in large genomic data structures using auxiliary machine learning models, lower overhead sketching and approximation-based approaches to read classification in metagenomic experiments, and investigations into the hidden variation in South Asian individuals and populations through the analysis of unmapped reads against linear and pangenome references. He has also worked on developing tools for visualizing relatedness in genomic datasets, reviewing the state and evolution of linear and pangenome references, and contributed to the HG002 Q100 effort to create the first complete human diploid genome.

He currently works on algorithmic approaches for cancer genomics, where he is seeking to tie together insights from spatial transcriptomics, single cell genomics, and next-generation sequencing to better understand the cancer landscape.

Research/Clinical Keywords

Algorithms, algorithms for genomic analysis, machine learning, sketching, data analysis

Highlighted Publications

Kirsche, Melanie, Arun Das, and Michael C. Schatz. "Sapling: accelerating suffix array queries with learned data models." Bioinformatics 37, no. 6 (2021): 744-749.

Das, Arun, and Michael C. Schatz. "Sketching and sampling approaches for fast and accurate long read classification." BMC bioinformatics 23, no. 1 (2022): 452.

Das, Arun, Arjun Biddanda, Rajiv C. McCoy, and Michael C. Schatz. "Assembling unmapped reads reveals hidden variation in South Asian genomes." bioRxiv (2025): 2025-05.

Taylor, Dylan J., Jordan M. Eizenga, Qiuhui Li, Arun Das, Katharine M. Jenike, Eimear E. Kenny, Karen H. Miga et al. "Beyond the human genome project: the age of complete human genome sequences and pangenome references." Annual review of genomics and human genetics 25 (2024).

Hansen, Nancy F., Nathan Dwarshuis, Hyun Joo Ji, Arang Rhie, Hailey Loucks, Glennis A. Logsdon, Mitchell R. Vollger et al. "A complete diploid human genome benchmark for personalized genomics." bioRxiv (2025): 2025-09.

 

 

Additional Publication Citations

For a full list of publications, please see my Google Scholar page: https://scholar.google.com/citations?user=OxAcXusAAAAJ&hl=en 

Research Interests

My work is focused on the development of algorithmic approaches to improve genomic analysis. This may be through the discovery of novel methods to search, index, or analyze existing data, the development of integrated approaches to consider multiple data sources or analyses simultaneously, or through efforts to identify the key aspects of a dataset, thus allowing for more focused, lower overhead analysis.

Links of Interest

Please find my personal website here: https://arundas.org/