- ⸻ 2026-04-23
How I managed thousands of datasets to build the scPRINT family of scRNA-seq foundation models
At the start of my PhD, I was faced with what seemed like a mountain to climb: build, largely alone, a foundation model for single-cell RNA-seq data. As anyone in the field knows, building the model is not the hard part. Getting the data is.
- ⸻ 2024-04-03
MappedCollection: Weighted random sampling from large collections of scRNA-seq datasets
A few labs and companies now train models on large-scale scRNA-seq count matrices and related data modalities. But unlike for many other data types, there isn’t yet a playbook for data scales that don’t fit into memory.