Multimodal analysis substantially improves our ability to resolve cell states, allowing us to identify and validate previously unreported lymphoid subpopulations. Multimodal single-cell technologies, which simultaneously profile multiple data types in the same cell, represent a new frontier for the discovery and characterization of cell states. While PCA will identify the directions that explain maximal variance in the source data, sPCA can help pinpoint sources of variation that are of the greatest interest. and transmitted securely. In Stuart etal. (BD) UMAP visualization of 161,764 cells 10x 3 cells analyzed based on RNA data (B), protein data (C), or WNN analysis (D). Visualization methods for integrated multimodal single-cell data are still underdeveloped. (C) For each of our 57 clusters, we calculated the optimal surface marker enrichment panels based on our CITE-seq data. Moreover, these identified biomarkers were invariant across human volunteers and vaccination time points. Broad immune activation underlies shared set point signatures for vaccine responsiveness in healthy individuals and disease activity in patients with lupus. We observe only minor fluctuations when varying k within this range. 2 Introduction. Because the RNA neighbors represent a mixture of different Tcell subsets, there is substantial error between predicted and measured protein expression levels for CD4 and CD8. Therefore, they subtract local connectivity from cellular distances when computing the exponential kernel. Production of IL-17 by MAIT cells is increased in multiple sclerosis and is associated with IL-7 receptor expression. (E) WNN analysis exhibits improved runtimes compared to competing methods. The WNN procedure begins by first applying standard analytical workflows to each modality independently and constructing KNN graphs for each one. Cell 184 , 35733587.e29 (2021). Single-nucleus multiregion transcriptomic analysis of brain vasculature in Alzheimer's disease. Integrated analysis of multimodal single-cell data. Korsunsky I., Nathan A., Millard N., Raychaudhuri S. Presto scales Wilcoxon and auROC analyses to millions of observations. Samples were washed, resuspended in 2% PFA containing iridium intercalator (Fluidigm), and stored at 4C until acquisition (within 3days of staining). We encourage users to compute both to understand how their dataset can be interpreted in light of a reference, and also to flag any particular populations that may not be well represented. Tissue-Resident Memory CD8. Alignment and expression quantification: We applied standard pipelines to initially align and quantify the CITE-seq datasets newly generated for this manuscript. official website and that any information you provide is encrypted Online ahead of print. (2020) scRNA-seq dataset, which includes 44,721 PBMC from patients hospitalized with COVID-19 and healthy controls. p value is computed using an unpaired Wilcoxon test. These ideas are a generalization of the methods described for two modalities, with a full mathematical description below for clarity: Suppose the single-cell dataset has M modalities, we define the following: We next calculate the within and cross-modality affinities, m,m(Xim,Xi,knnmm) and m,n(Xim,Xi,knnnm) between within and cross-modality predicted values for each cell Xi,knnmm and Xi,knnnm, and the actual values Xim. Only CD14+ and CD16+ monocytes are shown. Front. 3P and 5P libraries were then pooled together in equal amounts and sequenced on an Illumina Novaseq S4 flowcell. 1Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA, 2New York Genome Center, New York, NY 10013, USA, 3Technology Innovation Lab, New York Genome Center, New York, NY 10013, USA, 4Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA, 5Cape Town HVTN Immunology Lab, Hutchinson Cancer Research Institute of South Africa, Cape Town 8001, South Africa, 6Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA, 7Center for Data Visualization, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA, 8BioLegend Inc., San Diego, CA 92121, USA, 9Chan Zuckerberg Biohub, San Francisco, CA 94063, USA. We sorted cells based on the marker panels identified in (C), and performed bulk RNA-seq. We reasoned that we could quantify the performance of the different methods by comparing the similarity of each cells molecular state to its closest neighbors in the integrated latent space. In this manuscript we analyze data falling into three categories: measurements of single-cell gene expression, single-cell surface protein expression, and single-cell chromatin accessibility (ATAC-seq). Protect your company name, brands and ideas as domains at one of the largest domain providers in Scandinavia. Stoeckius M., Zheng S., Houck-Loomis B., Hao S., Yeung B.Z., Mauck W.M., 3rd, Smibert P., Satija R. Cell Hashing with barcoded antibodies enables multiplexing and doublet detection for single cell genomics. UMAP visualizations of RNA and ATAC-seq data, as well as integrated WNN analysis. As recommended in the MOFA+ tutorial (https://raw.githack.com/bioFAM/MOFA2_tutorials/master/R_tutorials/10x_scRNA_scATAC.html), we used the z-scored data (scaled data) from the two assays as view1 and view2 for MOFA+. For both the 10x v3 (3 scRNaseq) and 10x Immune Profiling Solution (5 scRNA-seq), we used Cell Ranger 3.1.0 to align reads to the GRCh38 human genome with default settings. However, joint analysis of two modalities without properly handling the noise often leads to overfitting of one modality by the other and worse clustering results than vanilla single-modality 2022 Jan;40(1):30-41. doi: 10.1038/s41587-021-01131-y. (L) chromVar motif activity scores for the p53 and CTCF motifs for all basal subpopulations. Characterization of directed differentiation by high-throughput single-cell RNA-Seq. We quantified the performance of the method using the correlation (Pearson; Figure2D, Spearman; FigureS2), between predicted and measured values. Additional heatmaps are shown in FigureS4. Briefly, we perform within-modality comparisons for each modality, and extend the concept of cross-modality predictions to all pairwise combinations of modalities. Spatiotemporal transcriptomic atlas reveals the dynamic characteristics and key regulators of planarian regeneration. Cells are labeled by their annotations from (Ma etal., 2020b). See also FigureS7. Here we introduce bridge integration, a method to integrate single-cell datasets across modalities using a multiomic dataset as a molecular bridge. For these analyses, we used a more recently generated CITE-seq dataset of human bone marrow, representing 30,672 mononuclear cells with a panel of 25 antibodies. Heatmap displays pseudobulk averages where cells are grouped by cell type, human donor, and technical replicate, and demonstrates that markers are repeatedly detected across samples and replicates. Accessibility The scope of this package is to provide efficient access to a selection of curated, pre-integrated, publicly available landmark datasets for Keywords: single cell genomics, multimodal analysis, CITE-seq, immune system, T cell, reference mapping, COVID-19 Go to: We first apply a procedure known as supervised principal component analysis (sPCA) (Barshan etal., 2011) to the transcriptome measurements in our reference dataset. While the full power of Cobolt is to integrate together data from single and multiple modality datasets, in its simplest form, Cobolt can be used for the analysis of data from solely a multi-modality technology. Formally, sPCA transforms the dataset to maximize the dependency with the response variable. Notably, we observed consistent trends when restricting our analysis only to individuals with either positive or negative CMV Tcell responses (FigureS5). Cells are colored by their predicted level-2 annotations. Here, using Acanthopanax senticosus (A. senticosus) grafted onto more vigorously grown Acanthopanax The simultaneous measurement of multiple modalities, known as multimodal analysis, represents an exciting frontier for single-cell genomics and necessitates new computational methods that can define cellular states based on multiple data types. In Figures 5G, 5H, and andS5,S5, we identify genes whose expression level is correlated with a cells position along a molecular gradient defined by a single protein. We observe only minor fluctuations when varying k within this range. In Figure4C we visualize the level of enrichment for each cluster based on panels of one to ten markers. Antibodies were pooled together and concentrated in a 50kDa Amicon filter as per manufacturers instructions in PBS. Essentially, in cases where two methods disagree based on an RNA classification, we attempt to classify the cell based on its protein levels to see if there is strong evidence for one annotation versus another. We plot a representative subset of these features in Figure5G. In other contexts (particularly when important cell type markers are missing or not previously known), the unsupervised nature of a cells transcriptome may be the most valuable. (J) Pseudobulk expression profiles of the Basal_4 and Basal_1 subpopulations demonstrate that the two groups exhibit similar transcriptomic profiles. R Foundation for Statistical Computing; 2013. While the samples contained cells across the full spectrum of hematopoietic differentiation, the antibody panel was designed to separate groups of terminally differentiated cells. (2020) after reference-mapping. By contrast, protein neighbors were predominantly correctly identified as CD8+ Tcells (in the protein KNN graph, 12 CD8+/CD4+ edges were identified). Our results indicate that this phenotype does not represent a strictly binary phenomenon and may not be specific to CMV response. Natl. Last, we note that the modality weights learned in our procedure serve not only as a proxy for the technical quality of a measurement type, but may also reflect the biological importance of each modality in determining cellular identity. We validated the presence of the populations in independent healthy PBMC samples by performing flow cytometry for the same markers (Figures 5C and 5D). We verified the accuracy of our predictions using the query protein data, which was held out of the reference mapping procedure, yet revealed expression patterns based on our predicted annotations that were fully concordant with our reference dataset. We obtained poor results with the default nb loss function, and as suggested in the tutorial, tried the sse loss function as an alternative. Features include the best RNA and protein features identified by differential expression. The integrated latent space defined by WNN most accurately reconstructs expression levels for all 25 proteins. Platelets are included as a positive control, as CD69 is constitutively expressed on these cells. Federal government websites often end in .gov or .mil. Within CD8+ memory Tcells, we identified distinct subpopulations defined by bimodal and mutually exclusive expression of the integrin proteins CD49a and CD103 (Figure5A). Features include the best RNA and protein features identified by differential expression. Hu H, Liu R, Zhao C, Lu Y, Xiong Y, Chen L, Jin J, Ma Y, Su J, Yu Z, Cheng F, Ye F, Liu L, Zhao Q, Shuai J. RNA Biol. For example, cells annotated by Seurat as Treg express CD25 protein, while cells annotated by scArches as Treg do not. (J) Gene dropout curve for neighbors of regulatory Tcells defined by RNA, ADT, and WNN analysis. The pool of cells was then washed 3 times in staining buffer and filtered using a 40m Flowmi filter in PBS. (D) Pathway enrichment (enrichR) of the top DE genes between day 0 and day 3 myeloid cells exhibits a clear enrichment for components of the interferon response. Additional benchmarking analyses in FigureS2. We did not observe significant changes during the time course in overall abundance of broad immune classes (Figure6E;); thus, we focused on identifying more subtle compositional changes. Willing A., Jger J., Reinhardt S., Kursawe N., Friese M.A. (H) Same as in (F) but for B cell states. In addition to constructing a multimodal reference, we demonstrate the ability to map scRNA-seq data onto this dataset. Lumaquin-Yin D, Montal E, Johns E, Baggiolini A, Huang TH, Ma Y, LaPlante C, Suresh S, Studer L, White RM. (I) NK cells are ordered by their quantitative expression of CD16 protein expression. Eight participants in this trial were selected for single cell analysis from Group 1 (no IL-12) and Group 3 (1000 mcg IL-12) based on sample availability. Our 57 clusters fall into subsets of these categories (i.e., CD8+ TCM_1, CD8+ TCM_2, etc. Sci. (H) Same as in (G) but for progenitor cell states. In addition, scMM learns underlying relationships across modalities, enabling crossmodal generation of single-cell data. To accomplish this, we use the umap_transform functionality implemented in the R uwot package, which enables new points to be adding to an existing embedding. Gayoso A., Lopez R., Steier Z., Regier J., Streets A., Yosef N. A Joint Model of RNA Expression and Surface Protein Abundance in Single Cells. B cell states are subdivided by their mutually exclusive expression of kappa or lambda light chain, with distinguishing markers including IGKC, IGLC3, IGLC3. We next further explored the performance of our WNN integration, assessed its robustness to fluctuations in data quality, and performed benchmarking against other recently developed methods. In our final annotations, we considered 57 total clusters. We calculate the analogous ratio for protein affinities. We also detect rare populations of invariant NKT cells (defined by the use of TRAV10.TRAJ18). Integrated analysis of multimodal single-cell data Highlights Weighted nearest neighbor analysis integrates multimodal single-cell data A multimodal reference atlas of the circulating human immune system Identification and validation of novel sources of lymphoid heterogeneity We next benchmarked WNN analysis against two recently introduced methods for multimodal integration: multi-omics factor analysis v2 (MOFA+) (Argelaguet etal., 2020), which uses a statistical framework based on factor analysis, and totalVI (Gayoso etal., 2019), which combines deep neural networks with a hierarchical Bayesian model. In 73.8% of cases, we observe stronger support for the Seurat annotation. We leverage this resource to characterize extensive lymphoid heterogeneity that has not been previously observed by scRNA-seq alone, including the heterogeneous expression of integrin proteins on circulating memory Tcells, a gradient of adaptive-like responses in NK cells, and tightly clustered clonal populations within effector and cytotoxic groups. Hafemeister C., Satija R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Pott S. Simultaneous measurement of chromatin accessibility, DNA methylation, and nucleosome phasing in single cells. van Dijk D., Sharma R., Nainys J., Yim K., Kathail P., Carr A.J., Burdziak C., Moon K.R., Chaffer C.L., Pattabiraman D. Recovering Gene Interactions from Single-Cell Data Using Data Diffusion. Computational flow cytometry: helping to make sense of high-dimensional immunology data. We removed cells with< 500 detected genes, but also removed cells where we detected an aberrantly high number of features (more than 6,000 genes, more than 50,000 ADT reads, or more than 10,000 ADT reads), particularly to avoid clumps of antibodies that can occasionally attach to cells. We repeated this analysis for all protein features, and found that, WNN analysis consistently achieved the highest correlation. (D) Pathway enrichment (enrichR) of the top DE genes between day 0 and day 3 myeloid cells exhibits a clear enrichment for components of the interferon response. As each per-gene test is highly sensitive to the number of cells, we also calculated a perturbation score, which reflects the strength of the molecular response based on the whole transcriptome (color of each dot). Importantly, these strategies must be robust to potentially large differences in the data quality and information content for each modality. S.H., E.A.-N., W.M.M., M.J.L., A.J.W., M.S., E.P., E.P.M., L.B.F., B.Y., and A.J.R. For each cell, we consider the set knn1,i,1knn1,i,200knn2,i,1knn2,i,200knnM,i,1knnM,i,200 and identify the k-most similar cells within this set based on the weighted similarity metric as weighted nearest neighbors. Moreover, CyTOF analysis of the larger sample set identified a depletion of MAIT cells in COVID-19 samples (Figures 7F and andS7).S7). WebIntegrated analysis of multimodal single-cell data The simultaneous measurement of multiple modalities, known as multimodal analysis, represents an exciting frontier for single-cell genomics and necessitates new computational methods that can define cellular states based on multiple data types. Srivastava A., Malik L., Smith T., Sudbery I., Patro R. Alevin efficiently estimates accurate gene abundances from dscRNA-seq data. (F) UMAP visualization of CITE-seq dataset of 49,147 PBMC analyzed with the 10X 5 Immune Profiling kit, which also measures immune repertoires. Hensley T.R., Easter A.B., Gerdts S.E., De Rosa S.C., Heit A., McElrath M.J., Andersen-Nissen E. Enumeration of major peripheral blood leukocyte populations for multicenter clinical trials using a whole blood phenotyping assay.