Brief Bio
I’m an Assistant Professor in the Department of Computer Science at Cornell University. I received my PhD in Computer Science from UC Berkeley. I work in the field of high-performance computing (HPC) for large-scale computational sciences and lead the Cornell HPC group. I’m interested in developing algorithms and software infrastructures on parallel machines to speed up data processing without sacrificing programming productivity, and to make high-performance computing more accessible. I’m a big fan of sparse linear algebra and believe in sparse linear algebra as a computational abstraction for tackling large-scale computational challenges.
I received the 2024 SIAG/Supercomputing Early Career Prize, the 2023 ISSNAF Young Investigator Mario Gerla Award, and the 2020 SIGHPC Computational & Data Science Fellowship.
I’m an Affiliate Faculty in the Applied Math and Computational Sciences Division (Performance and Algorithms Group) at Lawrence Berkeley National Laboratory and a Graduate Field Faculty in the School of Electrical and Computer Engineering, Department of Computational Biology and the Center for Applied Math at Cornell.
I’m also a faculty member of the Computer Systems Laboratory (CSL) at Cornell University.
I’ll be recruiting PhD students for Fall 2026 to join my group at Cornell CS. Our research spans parallel computation, sparse linear algebra, programming systems, large-scale computational biology challenges, and algorithms for emerging hardware architectures. Projects focus on parallel scientific computation, including—but not limited to—real-world large-scale challenges in computational biology and emerging hardware technologies. Prior biology knowledge is not required, but a background in parallel computing and C/C++ programming is highly encouraged.
The group fosters a collegial, collaborative culture in the beautiful natural surroundings of Ithaca, NY. You’ll likely enjoy Ithaca much more than you expect!
Due to limited time, I’m mostly unable to respond to individual email inquiries. If you’re interested, please apply to the Cornell CS PhD program and mention my name in your application materials, along with why you’re interested in working with me. The applications are submitted through the department.
Publications, Talks, Teaching
For a complete list of publications, talks, and teaching information, please see my CV (I’m fairly good at keeping it up to date) or my Google Scholar account. The PDFs of most of my articles can be found on arXiv.
If you’re interested in course or research talk slides, please feel free to email me. I’ll add them here eventually.
Recent Updates
4/24/2025 I’m very pleased to announce that our NSF proposal “ACED: Fast and Scalable Whole Genome Analysis on Emerging Hardware Technologies” was awarded. This project, conducted in collaboration with Professor April Wei’s Lab, will address major computational challenges in population genetics through parallel computation, sparse linear algebra, and new hardware technologies.
Selected Publication & Software
Popcorn: Accelerating Kernel K-means on GPU using Sparse Linear Algebra
Our PPoPP 2025 paper introduces a new sparse-matrix formulation of Kernel K-means that enables an efficient, high-performance GPU implementation. Our open-source tool, Popcorn, achieves up to 123.8× speedup over a CPU version and 2.6× over a dense GPU implementation.
Read the Popcorn Paper
Popcorn GitHub
GPU-Accelerated Distributed 2D SpGEMM
Our ICPE 2025 paper introduces a GPU-based distributed-memory SpGEMM implementation built on CombBLAS, achieving over 2× speedup compared to the CPU-only version and outperforming PETSc on large sparse matrices. A hybrid communication strategy dynamically selects host- or device-level data path based on message size, reducing overhead and improving scalability across multi-GPU clusters.
HySortK: High-Performance Sorting-Based K-mer Counting
Our ICPP 2024 paper describes HySortK, a new distributed-memory k-mer counting tool for genomics pipelines. Using a sorting-based approach and a flexible hybrid-parallelism layer, HySortK significantly reduces memory overhead and improves scalability, achieving 2–10× speedup over a GPU baseline on 4–8 nodes and up to 2× speedup over leading CPU tools on 16 nodes, while reducing peak memory usage by approximately 30%.
Read the HySortK Paper
HySortK GitHub
GPU-Accelerated Pangenome Graph Layout
Our SC 24 paper introduces a GPU-optimized layout tool for pangenome graphs. On 24 human whole-chromosome pangenomes, our implementation achieves up to 57.3× speedup over a multithreaded CPU baseline, reducing layout times from hours to minutes while maintaining layout quality.
