Revealing cell fate decisions during reprogramming by scRNA-seq

. Single-cell RNA sequencing (scRNA-seq) technologies serve as powerful tools to dissect cellular heterogeneity comprehensively. With the rapid development of scRNA-seq, many previously unsolved questions were answered by using scRNA-seq. Cell reprogramming allows to reprogram the somatic cell into pluripotent stem cells by specific transcription factors or small molecules. However, the underlying mechanism for the reprogramming progress remains unclear in some aspects for it is a highly heterogeneous process. By using scRNA-seq, it is of great value for better understanding the mechanism of reprogramming process by analyzing cell fate conversion at single-cell level. In this review, we will introduce the methods of scRNA-seq and generation of iPSCs by reprogramming, and summarize the main researches that revealing reprogramming mechanism with the use scRNA-seq.


INTRODUCTION OF SINGLE-CELL RNA SEQUENCING TECHNOLOGIES
With the development of next-generation sequencing (NGS) technology, people have had a further understanding of genomics, transcriptomes, epigenetics. RNA-seq provides a far more accurate measurement of levels of transcripts and their isoforms using deepsequencing technologies [1]. However, RNA-seq is originally used for a large population of cells, without distinguishing the differences of single cells from the population. In 2009, Tang et al described a method of single-cell RNA sequencing (scRNA-seq) using optimized NGS [2,3]. With the establishment of scRNA-seq, these single-cell analyses will allow researchers to uncover new and potentially unexpected biological discoveries relative to traditional profiling methods that assess bulk populations [4]. The scRNA-seq protocols are usually divided into following steps: 1) single-cell isolation, 2) RNA capture and transcriptome amplification, 3) sequencing library preparation , 4) sequencing and data analysis [4].
An individual cell could be isolated in different approaches: limiting dilution [5,6], micromanipulation, laser capture microdissection(LCM) [7,8], flowactivated cell sorting (FACS) [9] or microfluidic technology. Limiting dilution with pipette is a commonly used approach to obtain single cells, which is low cost but with low efficiency due to the statistical distribution of cells. Micromanipulation is the classical method to isolate cells from embryos, however it is timeconsuming and low-throughput [10]. LCM utilizes a laser system to isolate cells from solid samples which protect the cells but expensive. FACS is the most commonly used method to isolate single cells in a high throughput manner. It is especially preferred when a particular cell population with specific cell surface markers need to be analyzed [11]. Recently, microfluidic technology has become a popular strategy for single cell isolation, for it enables precise fluid control, requiring low sample consumption meanwhile achieving high throughput analysis [12,13]. Based on microfluidic systems, some widely used commercial platforms have been developed such as Fluidigm C1 and Chromium system from 10x Genomics, which provide a convenient and high-throughput processing method to study rare cell types in a heterogeneous population.
After isolation, single cells are lysed in a hypotonic buffer where mRNA are captured and bind by a poly(dT) primer. Single-cell transcriptome amplification requires first reverse transcription of mRNA into cDNA by PCR. Currently, both PCR and in vitro transcription (IVT) are widely used [14]. PCR takes the advantage of fast and exponential amplification of cDNA, but resulting primer dimers [14]. IVT utilizes T7 RNA polymerase to linearly amplify RNA from the first strand of cDNA, followed by an additional round of reverse transcription of the amplified RNA. Since both strategies generates some biases, a challenge is to develop sensitive, precise, and reliable technologies to generate amplified transcripts for sequencing from individual cells [15].
Prior to sequencing, sequencing library is prepared followed the protocol of commercial kits such as Nextera kit. Currently, cell barcoding is widely used to increase throughput for sequencing by addition of a specific tag for each cells. Sequencing for single cell is compatible with most bulk RNA-Seq platform such as illumina platform. ScRNA-seq data are generated and interpreted by appropriate computational and statistical methods, which is indeed challenging to generate unbiased results from a mass of single-cell data.

METHODS OF SINGLE-CELL RNA-SEQ
A multitude of scRNA-seq methods have been developed in recent years, with dramatic advances in throughput and accuracy, and enabling major discoveries. These newly developed scRNA-seq methods differ in how single cells are sorted, and how they tag transcripts for the cell-of-origin and generate library for sequencing.
Here, we will discuss major scRNA-seq methods ( Table  1). In 2009, Tang et al described a method of scRNAseq for the first time. They modified the widely used method to achieve single-cell whole transcriptome amplification, generating cDNA as long as 3 kilobases (kb) efficiently and without biases [2,16,17]. Poly A was added to the 3' end of first-strand cDNA up to 3kb, as the biding site of poly T in the second-strand cDNA. They found the genes and splicing sites that could not be figured out previously in mouse blastomeres. However, the limitation for Tang's method was that they could not determine the length and sequence of mRNA, or the strand location of transcripts.
Islam et al introduced single-cell tagged reverse transcription (STRT) to specifically select and sequence RNA 5' end in 2011 [18]. The key of this method is adding an additional CCC sequence in the 3' end of firststrand cDNA. Furthermore, a helper oligo with distinct barcodes and a universal primer sequence was used. Based on the 5' end selection, STRT is suitable for identification of transcription start site (TSS) which is often lost in methods that show 3' bias like Tang's method.
Ramsköld et al developed Smart-seq in 2012. Smart-Seq generates and amplifies full-length cDNA from a single cell by exploiting the template-switching capacity of the reverse transcriptase from M-MLV. Templateswitching and the terminal transferase activity of the enzyme are critical to the success of Smart-seq. As the enzyme arrives at the 5'-end of the mRNA, the reverse transcriptase adds a few non-templated C nucleotides and switches templates to reverse-transcribes to the end of the oligonucleotides. Smart-seq improved read coverage across transcripts, which significantly enhances detailed analyses of alternative transcript isoforms and identification of SNPs [19]. On the basis of Smart-seq, in 2013 they developed Smart-seq2 which improved coverages and accuracy compared to smart-seq libraries and generated with off-the-shelf reagents at lower cost [20]. In Smart-seq2, template-switching oligo (TSOs) results a two-fold increase in cDNA yield compared to SMARTer IIA oligo in Smart-seq. In addition, each detailed procedures were optimized in smart-seq2 to enable higher sensitivity and fewer technical biases.
CEL-Seq, a method for overcoming inefficient application of single cells was developed by barcoding and pooling samples before linearly amplifying mRNA with the use of in vitro transcription [21]. The reverse-transcription primer was designed with an anchored polyT, a unique barcode, the 5' Illumina adaptor, and a T7 promoter. Comparing with PCR-based amplification method, it gives more reproducible, linear, and sensitive results. CEL-seq2 was developed in 2016, with higher sensitivity, lower costs, and less hands-on time [22]. In CEL-seq2, shorter primers integrated a unique molecular identifier (UMI), more effective reverse transcriptases and optimized clean-up method were used. By directly comparing with Smart-seq, CEL-seq2 showed better sensitivity and reproducibility [23].
In 2015, two different groups developed Drop-seq [12] and inDrop [13], by combining scRNA-seq and fluidic control of nanoliter-sized droplets. These dropbased methods resolved the limitations of ease and scale for broad application of scRNA-seq. In Drop-seq, thousands of cells can be analysis in a single reaction for each bead barcoded uniquely. In contrast to Drop-seq, which has 16 million barcodes in its bead library, the inDrop method generated only about 150,000 unique barcodes, which means it can process fewer cells in a single run.
In order to properly choose among available scRNA-Seq methods, a research systematically compared different scRNA-seq methods on their sensitivity, accuracy, precision and cost-efficiency. The results showed that Smart-seq on microfluidic platform is most sensitive, CEL-seq is the most precise method, and Drop-seq is the most efficient methods. Generally, accuracy is similar for all methods [24].

APPLICATION OF SCRNA-SEQ TO EXPLORE HETEROGENEITY
Recently, scRNA-seq has been widely used in tumor, developmental biology, neuroscience and other fields. It can reveal the differential expression of genes in singlecell level, which is helpful to study the cellular heterogeneity and randomness of gene regulation. Cell heterogeneity is an important characteristic of tumor, which are divided into a diversity of genotypes and phenotypes. Kim et a [25] applied scRNA-seq to analyze the clonal evolution of breast cancer patients with chemotherapy. Heterogeneity within the tumor was found at the single-cell level, and different targeted drug pathway activation was found in refractory patients. Importantly, therapeutic strategies were developed to target two independent pathways in metastatic cancer cells.
Since only few cells in the early embryonic development of mammals, applying scRNA-seq to analyze embryonic development is important. Xue et al reported a comprehensive analysis of transcriptome dynamics from oocyte to morula in both human and mouse embryos using scRNA-seq as Tang described [3,26]. They found that each development stage can be delineated concisely by a small number of functional modules of co-expressed genes, indicating a sequential order of transcriptional changes in several key pathways.
Also, scRNA-seq was widely used in neuroscience. Recently, Spaethling et al developed a culture system,  33 33 for culturing resected adult human brain tissue removed during neurosurgery [27]. By scRNA-seq, they identified

GENERATION OF IPSCS BY REPROGRAMMING
Stem cells, a kind of cells that can differentiate into cell types, have the potential of self-renewal and directional differentiation. During development, uncommitted stem cells differentiate to various cell types. The lineage commitment and differentiation was considered to be unidirectional and irreversible traditionally, represented by a "Waddington's landscape model". However, this classic view of cell fate hierarchy was soon challenged by the study of somatic cell nuclear transfer (SCNT), which showed that somatic epigenome can be reprogrammed to pluripotency. Thereafter, numerous studies of cell fusion and transdifferentiation supported that cell fates could be manipulated in vitro. In 2006, Shinya Yamanaka demonstrated that mature somatic cells can be reprogrammed to a pluripotent state by defined factors, generating induced pluripotent stem cells (iPSCs). Since then, remarkable advances have been achieved in optimization of the methodology, and the application of iPSCs technologies to regenerative medicine and disease modeling. Yamanaka's team introduced four key factors, Oc3/4, Sox2, Klf4, and c-Myc (referred as OSKM), to generate ESC-like (embryonic stem cell) iPSCs from mouse embryonic and adult fibroblast cultures. Following the exciting research, three groups successfully achieved to generate iPSCs that were indistinguishable to ESCs [28][29][30]. Thereafter, reprogramming of human cells to iPSCs was achieved in 2007 [31,32]. Yamanaka's group demonstrated the induction of pluripotent cells from human fibroblasts by OSKM using a retroviral system, while Thomoson's group developed four different factors, Oct4, Sox2, Nanog, and Lin28, using a lentiviral system. Since the integrating viral approaches for introduction of OSKM may lead to insertion mutations, interfering with the expression regulation of endogenous genes or being abnormally reactivated in terminally differentiated cells. Therefore, non-integrating viral approaches such as using Sendai virus or adenovirus to generate iPSCs was developed [33,34]. Non-transgene reprogramming techniques such as delivering reprogramming factors in the form of mRNA or protein is a safer approach, avoided the possible infections in patients and the lack of a need to handle viral particles.
Besides OSKM, a unique set of miRNAs have been found specific in human embryonic stem cells (hESCs) and played important roles in reprogramming [35][36][37][38]. There were some findings provided the first proofconcept that miRNAs have the capability of directly converting fibroblasts to a cardiomyocyte-like phenotype in vitro [39]. Additionally, some antibodies were identified that can catalyze cellular dedifferentiation and nuclear reprogramming by acting at the cell surface [40]. According to this study, generating iPSCs by manipulating some pathway can efficiently eliminate the influence of exogenous induction and avoid the use of oncogenes.

UNDERSTANDING REPROGRAMMING TO THE INDUCED PLURIPOTENT STATE
Numerous studies have been made in understanding of the mechanism of iPSC reprogramming particularly in terms of transcriptional events. Many reports now suggest that successful reprogramming requires stepwise transition through key intermediate stages, and a few cells reach the final stage [41]. A study of mouse iPSC reprogramming found a sequential expression of pluripotency markers during the reprogramming process. The found that alkaline phosphatase was activated first, followed by silencing of somatic-specific genes and expression of SSEA1 in the specific stage [42]. In the  33 33 last, progressive silencing of exogenous genes with concomitant upregulation of endogenous Oct4 and Nanog marked fully reprogrammed cells [42]. Usually, induction of proliferation and acquisition of epithelial characteristics are accompanied in the early stage [41]. Nowadays, several models, such as elite model and stochastic/deterministic model, have been proposed to represent how cell fates and cell transitions are regulated during the reprogramming process [43]. However, these models could only partially explain the reprogramming mechanisms.
Interestingly, somatic reprogramming through forced expression of the same exogenous OSKM transcription factors in human and mouse cells generate pluripotent stem cells equivalent to different embryonic counterparts [44]. Mouse iPSCs are reprogrammed into a 'naïve' state similar to the state of mouse embryonic stem cells (mESCs), which are derived from inner cell mass(ICM) of developing blastocysts [45,46]. Human iPSCs are in a 'primed' state similar to the state of human ESCs, and mouse epiblast stem cells (mEpiSCs) which are derived from the post-implantation epiblast of murine embryos [47]. Efforts have been made to realize the reprogramming of human cells to a 'naïve' state. Currently, several 'naïve'-specific, but not 'primed'specific, cell surface marker proteins were demonstrated in 'naïve' and 'primed' human pluripotent stem cells [48]. More mechanistic insights of the difference in mouse and human somatic cell reprogramming would be beneficial for generation of human 'naïve' iPSCs.
Although our understanding of somatic cell reprogramming have largely improved since its initial discovery, it is possible that still only a small part of the complete picture has been revealed. Understanding the mechanism of reprogramming to pluripotency will have important implications not only to improve the method, but will help to propel their therapeutic applications. Based on the great improvement on scRNA-seq technology, the mechanism of reprogramming to the pluripotent state would be better understood.

REVEALING REPROGRAMMING PATH BY SCRNA-SEQ
Despite progresses made through bulk analyses as outlined above, very little is known at the single-cell level in reprogramming. Dissection of sophisticated biological processes at the single-cell resolution will facilitate the capture of transcriptional snapshots of infrequent occurrences. More researchers start to implement scRNA-seq technology to deconstruct the cellular heterogeneity and reprogramming trajectory ( Figure 1). It is widely accepted that all cells convert with different speed and through a variety of paths during reprogramming. In previous research, a single-cell transcriptional analysis of mouse fibroblasts reprogramming suggested that the reprogramming process had two phases: a prolonged stochastic phase followed by a rapid deterministic phase [49]. Following the research, Chung et al demonstrated that the stochastic phase of reprogramming to pluripotency is an ordered probabilistic process through the single-cell transcript analysis of MRC5 human lung fibroblasts undergoing reprogramming by OSKM [50]. In addition, they found that cells followed two trajectories: one toward an ESC-like state (the ''productive'' trajectory) and the other away from both ESC and fibroblasts (the ''alternative'' trajectory) [50].
To find out the process of reprogramming, Francesconi et al employed MARS-Seq to analysis two types of reprogramming, the transdifferentiation of pre-B cells into macrophages induced by the TF C/EBPα [51] and the reprogramming of pre-B cells into iPSCs, based on the transient expression of C/EBPα followed by the induction of OSKM [52]. They found different activity of Myc have influences on efficiency of transdifferentiation and reprogramming [53]. Furthermore, these results illustrated the advantages of scRNA-seq on characterizing heterogeneity in cell fate conversion processes and identifying its underlying causes.
Several studies developed new mathematical algorithms for single-cell reprogramming analysis. Lin et al developed Single-cell Orientation Tracing (SOT) to analyze cell fate continuum based on single-cell RNA sequencing data generated in two distinct reprogramming system, the OSK-mediated reprogramming and the chemical reprogramming. They analyzed more than 150,000 single cells and found that cells bifurcate into two categories, reprogramming potential (RP) or non-reprogramming (NR). [54]. Another study introduced Waddington-OT to reconstruct the landscape of reprogramming from 315,000 scRNAseq profiles, which revealed a wider range of developmental programs than previously characterized [55]. They found that cells adopt a mesenchymal-to-epithelial transition state will give rise to populations related to pluripotent, extra-embryonic, and neural cells, with each harboring multiple finer subpopulations. This study also predicts transcription factors and paracrine signals that affect cell fates during reprogramming.
Chemical reprogramming uncovered an intermediate extraembryonic endoderm (XEN)-like state, which is not exist in OSKM reprogramming [56]. ScRNA-seq analysis of 36,199 cells at multiple time points throughout the chemical reprogramming process reveal that a dynamic early two-cell (2C) embryonic-like programs are key aspects of successful reprogramming from XEN-like state to pluripotency [57]. Moreover, the reprogramming process is greatly accelerated via enhancing the 2C-like program by fine-tuning chemical treatment.
Collectively, these studies using scRNA-seq to analyze reprogramming shed light on mechanistic insights into the nature of induced pluripotency previously unknown, and provided great value for optimization of the method of reprogramming.

SUMMARY
scRNA-seq technology have provided great insight into our understanding of biological diversity and rare cells that have previously been difficult to resolve from bulk tissue samples. Great achievement has been made in embryonic development, cancer, stem cells, immune system and tracing cell types in organs/tissues by using scRNA-seq. Since the advent of reprogramming to generate iPSCs, numerous studies have focused on revealing the mechanism of reprogramming. ScRNA-seq served as a powerful instrument to uncover previously unclear intermediate stages masked in the mixed cell population. The mechanisms of reprogramming proved by scRNA-seq provide more information on how single cell fate converted, indicating further researches on reprogramming with the use of scRNA-seq. More progress in understanding reprogramming mechanism or improving reprogramming method can be made with the use of scRNA-seq technology. It helps to figure out differences between single-cell fates, and the key factors devoted to reprogramming. Furthermore, developing appropriate algorithm to analyze the high-throughput data is crucial for obtaining the desired answers. Different algorithms have been developed to solve some single-cell biological questions, however more accurate, universal, and fast one is still needed.