Stefan G. Stark*, Joanna Ficek*, Francesco Locatello, Ximena Bonilla, Stéphane Chevrier, Franziska Singer, Tumor Profiler Consortium, Gunnar Rätsch**, Kjong-Van Lehmann** * Equal contribution ** Correspondence to: Gunnar.Ratsch@ratschlab.org, Kjong.Lehmann@inf.ethz.ch
Motivation Recent technological advances have led
to an increase in the production and availability of single-cell data. The
ability to integrate a set of multi-technology measurements would allow the
identification of biologically or clinically meaningful observations through
the unification of the perspectives afforded by each technology. In most cases,
however, profiling technologies consume the used cells and thus pairwise
correspondences between datasets are lost. Due to the sheer size single-cell
datasets can acquire, scalable algorithms that are able to universally match
single-cell measurements carried out in one cell to its corresponding sibling
in another technology are needed.
Results We propose Single-Cell data Integration via
Matching (SCIM), a scalable approach to recover such correspondences in two or
more technologies. SCIM assumes that cells share a common (low-dimensional)
underlying structure and that the underlying cell distribution is approximately
constant across technologies. It constructs a technology-invariant latent space
using an auto-encoder framework with an adversarial objective. Multi-modal
datasets are integrated by pairing cells across technologies using a bipartite
matching scheme that operates on the low-dimensional latent representations. We
evaluate SCIM on a simulated cellular branching process and show that the
cell-to-cell matches derived by SCIM reflect the same pseudotime
on the simulated dataset. Moreover, we apply our method to two real-world
scenarios, a melanoma tumor sample and a human bone
marrow sample, where we pair cells from a scRNA
dataset to their sibling cells in a CyTOF dataset
achieving 93% and 84% cell-matching accuracy for each one of the samples
respectively.
Source code availability https://github.com/ratschlab/scim
Publications
Open access data [Manifest]