What is Single Cell Genomics?

Most bacteria in environments ranging from the human body to the ocean cannot be cloned in the laboratory and thus cannot be sequenced using existing Next Generation Sequencing (NGS) technologies. This represents the key bottleneck for various projects ranging from the Human Microbiome Project (HMP) [3, 6] to antibiotics discovery [9]. For example, the key question in the Human Microbiome Project is how bacteria interact with each other. These interactions are often conducted by various peptides that are produced either for communication with other bacteria or for killing them. However, peptidomics studies of the human microbiome are now limited since mass spectrometry (the key technology for such studies) requires knowledge of fairly complete proteomes. On the other hand, while studies of new peptide antibiotics would greatly benefit from DNA sequencing of genes coding for Non-Ribosomal Peptide Syntetases (NRPS) [11, 13], existing metagenomics approaches are unable to sequence these exceptionally long genes (over 60,000 nucleotides).

HMP and discovery of new antibiotics are just two examples of many projects that would be revolutionized by Single Cell Sequencing (SCS). Recent improvements in both experimental [4, 7, 8, 10] and computational [1] aspects of SCS have opened the possibility of sequencing bacterial genomes from single cells. In particular, [1] demonstrated that SCS can capture a large number of genes, sufficient for inferring the organism’s metabolism. In many applications (including proteomics and antibiotics discovery), having a great majority of genes captured is almost as useful as having complete genomes.

Currently, Multiple Displacement Amplification (MDA) is the dominant technology for single cell amplification [2]. However, MDA introduces extreme amplification bias (orders-of-magnitude difference in coverage between different regions) and gives rise to chimeric reads and read-pairs that complicate the ensuing assembly.1 Acknowledging the fact that existing assemblers were not designed to handle these complications, Rodrigue et al., 2009 [12] remarked that the challenges facing SCS are increasingly computational rather than experimental. A recent paper [5] illustrates that existing assemblers produce inferior results for single cell projects even when the goal is to assemble a single NRPS, let alone a complete genome.

Chitsaz et al., 2011 [1] introduced the E+V-SC assembler, combining parts of EULER-SR with a modified Velvet, and achieved a significant improvement in the quality of SCS. However, as the authors of E+V-SC realized, one needs to change algorithmic design (rather than just modify existing tools like Velvet) to fully utilize the potential of SCS.

We present the SPAdes assembler, introducing a number of new algorithmic solutions and improving on state-of-the-art assemblers for both SCS and standard (multicell) bacterial datasets.

References:

H. Chitsaz, J.L. Yee-Greenbaum, G. Tesler, M.J. Lombardo, C.L. Dupont, J.H. Badger, M. Novotny, D.B. Rusch, L.J. Fraser, N.A. Gormley, O. Schulz-Trieglaff, G.P. Smith, D.J. Evers, P.A. Pevzner, and R.S. Lasken. Efficient de novo assembly of single-cell bacterial genomes from short-read data sets. Nat Biotechnol, 29(10):915–921, 2011.
F.B. Dean, J.R. Nelson, T.L. Giesler, and R.S. Lasken. Rapid amplification of plasmid and phage DNA using phi 29 DNA polymerase and multiply-primed rolling circle amplification. Genome Res, 11(6):1095–1099, Jun 2001.
S.R. Gill, M. Pop, R.T. Deboy, P.B. Eckburg, P.J. Turnbaugh, B.S. Samuel, J.I. Gordon, D.A. Relman, C.M. Fraser-Liggett, and K.E. Nelson. Metagenomic analysis of the human distal gut microbiome. Science, 312(5778):1355–1359, Jun 2006.
J.P. Glotzbach, M. Januszyk, I.N. Vial, V.W. Wong, A. Gelbard, T. Kalisky, H. Thangarajah, M.T. Longaker, S.R. Quake, G. Chu, and G.C. Gurtner. An information theoretic, microfluidic-based single cell analysis permits identification of subpopulations among putatively homogeneous stem cells. PLoS One, 6(6):e21211, 2011.
R.V. Grindberg, T. Ishoey, D. Brinza, E. Esquenazi, R.C. Coates, W.T. Liu, L. Gerwick, P.C. Dorrestein, P. Pevzner, R. Lasken, and W.H. Gerwick. Single cell genome amplification accelerates identification of the apratoxin biosynthetic pathway from a complex microbial assemblage. PLoS One, 6(4):e18565, 2011.
M. Hamadyand, R. Knight. Microbial community profiling for human microbiome projects: Tools, techniques, and challenges. Genome Res, 19(7):1141–1152, Jul 2009.
T. Ishoey, T. Woyke, R. Stepanauskas, M. Novotny, and R.S. Lasken. Genomic sequencing of single microbial cells from environmental samples. Current Opinion in Microbiology, 11(3):198–204, Jun 2008.
S. Islam, U. Kjallquist, A. Moliner, P. Zajac, J.B. Fan, P. Lonnerberg, and S. Linnarsson. Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Res, 21(7):1160–1167, Jul 2011.
J.W. Li and J.C. Vederas. Drug discovery and natural products: end of an era or an endless frontier? Science, 325(5937):161– 165, Jul 2009.
N. Navin, J. Kendall, J. Troge, P. Andrews, L. Rodgers, J. McIndoo, K. Cook, A. Stepansky, D. Levy, D. Esposito, L. Muthuswamy, A. Krasnitz, W.R. McCombie, J. Hicks, and M. Wigler. Tumour evolution inferred by single-cell sequencing. Nature, 472(7341):90–94, Apr 2011.
C. Rausch, T. Weber, O. Kohlbacher, W. Wohlleben, and D.H. Huson. Specificity prediction of adenylation domains in nonribosomal peptide synthetases (NRPS) using transductive support vector machines (TSVMs). Nucleic Acids Res, 33(18):5799– 5808, 2005.
S. Rodrigue, R.R. Malmstrom, A.M. Berlin, B.W. Birren, M.R. Henn, and S.W. Chisholm. Whole genome amplification and de novo assembly of single bacterial cells. PLoS One, 4(9):e6864, 2009.
S.A. Sieber and M.A. Marahiel. Molecular mechanisms underlying nonribosomal peptide synthesis: approaches to new antibiotics. Chem Rev, 105(2):715–738, Feb 2005.

Русский