Skip to main content

Public

SPAdes 2.2 released.

We have just released new version of SPAdes 2.2 single-cell assembler!

Improvements of SPAdes 2.2 over previous versions:

  • simplified installation process,
  • decreased RAM usage (35 Gb instead of 85 Gb on E.coli single-cell dataset),
  • increased speed (about three times faster),
  • better results (larger N50, largest contig, number of found genes and others).

 

You can download SPAdes 2.2 from this page.

Alla Lapidus

 

Alla Lapidus, Ph.D.,joined the lab in 2012 from the Fox Chase Cancer Center, PA, USA. Dr. Lapidus received her master degree (with honors) from the Department of Theoretical and Experimental Physics, Moscow Physics-Engineering Institute, Moscow, Russia, carried out postdoctoral training at Institute of Genetics and Selection of Industrial Microorganisms, Moscow, Russia, received her Technical Project Management Certificate from the American management Association.

Academic Appointments:

Associate Director, Algorithmic Biology Lab, St. Petersburg, Russia

Principal Investigator, Theodosius Dobzhansky Center For Genome Informatics, St.Petersburg,
Russia (http://dobzhanskycenter.bio.spbu.ru/)

Associate Professor, Institute for Personalized Medicine, Fox chase Cancer Center, PA, USA (http://www.fccc.edu/research/pid/lapidus/index.html)


Research focus:

Genomics, Genome sequencing, Genome assembly, Clinical sequencing using new sequencing technologies, Comparative sequence analysis, Bioinformatics

 

Selected Publications:

Books and Review

Lapidus,  A. L.(2009) Genome Sequence Databases: Sequencing and Assembly. Encyclopedia of Microbiology. (Moselio Schaechter, Editor), pp. 196-210 Oxford: Elsevier.

Kunin V, Copeland A, Lapidus A, Mavromatis K, Hugenholtz P. (2009) A bioinformatician's guide to metagenomics. Microbiol Mol Biol Rev. 72(4):557-78, Review.

Chistoserdova L, Chen SW, Lapidus A, Lidstrom ME. (2003) Methylotrophy in Methylobacterium extorquens AM1 from a genomic point of view. J Bacteriol. May;185(10):2980-7. Review.

Other publications:

Fernandez-Fueyo E, Ruiz-Dueñas FJ, Ferreira P, Floudas D, Hibbett DS, Canessa P, Larrondo LF, James TY, Seelenfreund D, Lobos S, Polanco R, Tello M, Honda Y, Watanabe T, Watanabe T, San RJ, Kubicek CP, Schmoll M, Gaskell J, Hammel KE, St John FJ, Vanden Wymelenberg A, Sabat G, Splinter Bondurant S, Syed K, Yadav JS, Doddapaneni H, Subramanian V, Lavín JL, Oguiza JA, Perez G, Pisabarro AG, Ramirez L, Santoyo F, Master E, Coutinho PM, Henrissat B, Lombard V, Magnuson JK, Kües U, Hori C, Igarashi K, Samejima M, Held BW, Barry KW, Labutti KM, Lapidus A, Lindquist EA, Lucas SM, Riley R, Salamov AA, Hoffmeister D, Schwenk D, Hadar Y, Yarden O, de Vries RP, Wiebenga A, Stenlid J, Eastwood D, Grigoriev IV, Berka RM, Blanchette RA, Kersten P, Martinez AT, Vicuna R, Cullen D. (2012) Comparative genomics of Ceriporiopsis subvermispora and Phanerochaete chrysosporium provide insight into selective ligninolysis. Proc Natl Acad Sci U S A. Apr 3;109(14):5458-63. Epub 2012 Mar 20.

 Padamsee M, Kumar TK, Riley R, Binder M, Boyd A, Calvo AM, Furukawa K, Hesse C, Hohmann S, James TY, LaButti K, Lapidus A, Lindquist E, Lucas S, Miller K, Shantappa S, Grigoriev IV, Hibbett DS, McLaughlin DJ, Spatafora JW, Aime MC. (2012). The genome of the xerotolerant mold Wallemia sebi reveals adaptations to osmotic stress and suggests cryptic sexual reproduction. Fungal Genet Biol. Mar; 49(3): 217-26. Epub 2012 Feb 4.

Frese SA, Benson AK, Tannock GW, Loach DM, Kim J, Zhang M, Oh PL, Heng NC, Patil PB, Juge N, Mackenzie DA, Pearson BM, Lapidus A, Dalin E, Tice H, Goltsman E, Land M, Hauser L, Ivanova N, Kyrpides NC, Walter J. (2011) The evolution of host specialization in the vertebrate gut symbiont Lactobacillus reuteri.  PLoS Genet.  Feb;7(2): e1001314. Epub 2011 Feb 17.

Ran L, Larsson J, Vigil-Stenman T, Nylander JA, Ininbergs K, Zheng WW, Lapidus A, Lowry S, Haselkorn R, Bergman B. Genome erosion in a nitrogen-fixing vertically transmitted endosymbiotic multicellular cyanobacterium. PLoS One. 2010 Jul 8;5(7):e11486. Erratum in: PLoS One. 2010; 5(9)

Sieber JR, Sims DR, Han C, Kim E, Lykidis A, Lapidus AL, McDonnald E, Rohlin L, Culley DE, Gunsalus R, McInerney MJ. (2010) The genome of Syntrophomonas wolfei: new insights into syntrophic metabolism and biohydrogen production. Environ Microbiol. May 12. PMID: 20482737

Janssen PJ, Van Houdt R, Moors H, Monsieurs P, Morin N, Michaux A, Benotmane MA, Leys N, Vallaeys T, Lapidus A, Monchy S, Médigue C, Taghavi S, McCorkle S, Dunn J, van der Lelie D, Mergeay M. (2010) The complete genome sequence of Cupriavidus metallidurans strain CH34, a master survivalist in harsh and anthropogenic environments. PLoS One. May 5;5(5):e10433.PMID: 20463976 

Woyke T, Tighe D, Mavromatis K, Clum A, Copeland A, Schackwitz W, Lapidus A, Wu D, McCutcheon JP, McDonald BR, Moran NA, Bristow J, Cheng JF. (2101) One bacterial cell, one complete genome. PLoS One. Apr 23;5(4):e10314.PMID: 20428247

Strnad H, Lapidus A, Paces J, Ulbrich P, Vlcek C, Paces V, Haselkorn R. (2010) Complete genome sequence of the photosynthetic purple nonsulfur bacterium Rhodobacter capsulatus SB 1003. J Bacteriol. Apr 23.

Wu D, Hugenholtz P, Mavromatis K, Pukall R, Dalin E, Ivanova NN, Kunin V, Goodwin L, Wu M, Tindall BJ, Hooper SD, Pati A, Lykidis A, Spring S, Anderson IJ, D'haeseleer P, Zemla A, Singer M, Lapidus A, Nolan M, Copeland A, Han C, Chen F, Cheng JF, Lucas S, Kerfeld C, Lang E, Gronow S, Chain P, Bruce D, Rubin EM, Kyrpides NC, Klenk HP, Eisen JA. (2009) A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea. Nature.  Dec 24;462(7276):1056-60.PMID: 20033048

Chain PS, Grafham DV, Fulton RS, Fitzgerald MG, Hostetler J, Muzny D, Ali J, Birren B, Bruce DC, Buhay C, Cole JR, Ding Y, Dugan S, Field D, Garrity GM, Gibbs R, Graves T, Han CS, Harrison SH, Highlander S, Hugenholtz P, Khouri HM, Kodira CD, Kolker E, Kyrpides NC, Lang D, Lapidus A, Malfatti SA, Markowitz V, Metha T, Nelson KE, Parkhill J, Pitluck S, Qin X, Read TD, Schmutz J, Sozhamannan S, Sterk P, Strausberg RL, Sutton G, Thomson NR, Tiedje JM, Weinstock G, Wollam A; Genomic Standards Consortium Human Microbiome Project Jumpstart Consortium, Detter JC. (2009) Genomics. Genome project standards in a new era of sequencing. Science.Oct 9;326(5950):236-7. No abstract available. PMID: 19815760

Lauro FM, McDougald D, Thomas T, Williams TJ, Egan S, Rice S, DeMaere MZ, Ting L, Ertan H, Johnson J, Ferriera S, Lapidus A, Anderson I, Kyrpides N, Munk AC, Detter C, Han CS, Brown MV, Robb FT, Kjelleberg S, Cavicchioli R. (2009) The genomic basis of trophic strategy in marine bacteria. Proc Natl Acad Sci U S A. 2009 Sep 15;106(37):15527-33. Epub 2009 Sep 8.PMID: 19805210

Salinero KK, Keller K, Feil WS, Feil H, Trong S, Di Bartolo G, Lapidus A. (2009) Metabolic analysis of the soil microbe Dechloromonas aromatica str. RCB: indications of a surprisingly complex life-style and cryptic anaerobic pathways for aromatic degradation. BMC Genomics. Aug 3;10:351.PMID: 19650930

Kislyuk A, Lomsadze A, Lapidus AL, Borodovsky M. (2009) Frameshift detection in prokaryotic genomic sequences. Int J Bioinform Res Appl. 2009;5(4):458-77.PMID: 19640832

Penn K, Jenkins C, Nett M, Udwary DW, Gontang EA, McGlinchey RP, Foster B, Lapidus A, Podell S, Allen EE, Moore BS, Jensen PR. (2009) Genomic islands link secondary metabolism to functional adaptation in marine Actinobacteria, ISME J. May 28.

Herlemann DP, Geissinger O, Ikeda-Ohtsubo W, Kunin V, Sun H, Lapidus A, Hugenholtz P, Brune A. Genomic analysis of "Elusimicrobium minutum," the first cultivated representative of the phylum "Elusimicrobia" (formerly termite group 1). (2009) Appl Environ Microbiol. May;75(9):2841-9.

Sela DA, Chapman J, Adeuya A, Kim JH, Chen F, Whitehead TR, Lapidus A, Rokhsar DS, Lebrilla CB, German JB, Price NP, Richardson PM, Mills DA. (2008) The genome sequence of Bifidobacterium longum subsp. infantis reveals adaptations for milk utilization within the infant microbiome. Proc Natl Acad Sci U S A. 2008 Dec 2;105(48):18964-9. Epub 2008 Nov 24.

Chivian D, Brodie EL, Alm EJ, Culley DE, Dehal PS, Desantis TZ, Gihring TM, Lapidus A, Lin LH, Lowry SR, Moser DP, Richardson PM, Southam G, Wanger G, Pratt LM, Andersen GL, Hazen TC, Brockman FJ, Arkin AP, Onstott TC. (2008) Environmental genomics reveals a single-species ecosystem deep within Earth. Science. Oct 10;322(5899):275-8.

Kalyuzhnaya MG, Lapidus A, Ivanova N, Copeland AC, McHardy AC, Szeto E, Salamov A, Grigoriev IV, Suciu D, Levine SR, Markowitz VM, Rigoutsos I, Tringe SG, Bruce DC, Richardson PM, Lidstrom ME, Chistoserdova L. (2008) High-resolution metagenomics targets specific functional types in complex microbial communities. Nat Biotechnol. Sep;26(9):1029-34.

Mavromatis K, Ivanova N, Shapiro H, Barry K, Goltsman E, McHardy AC, Rigoutsos I, Salamov A, Korzeniewski F, Land M, Lapidus A,  Grigoriev I, Richardson P, Hugenholtz P, Kyrpides NC.  Use of simulated data sets to evaluate the fidelity of metagenomic processing  methods. Nature Methods. 2007 Jun;4(6):495-500.

Lapidus A,Goltsman E, Auger S, Galleron N, Segurens B, Dossat C, Land ML, Broussolle V, Brillard J, Guinebretiere MH, Sanchis V,  Nguen-The C, Lereclus D, Richardson P, Wincker P, Weissenbach J, Ehrlich SD, Sorokin A. -  Extending the Bacillus cereus group genomics to putative food-borne pathogens of different toxicity. Chem Biol Interact. 2008 , Jan30, 171(2):236-49.

Reslewic S, Zhou S, Place M, Zhang Y, Briska A, Goldstein S, Churas C, Runnheim R, Forrest D, Lim A, Lapidus A, Han CS, Roberts GP, Schwartz DC.  Whole-genome shotgun optical mapping of Rhodospirillum rubrum. Appl Environ Microbiol. 2005 Sep;71(9):5511-22.

Ivanova N, Sorokin A, Anderson I, Galleron N, Candelon B, Kapatral V, Bhattacharyya A, Reznik G, Mikhailova N, Lapidus A, Chu L, Mazur M, Goltsman E, Larsen N, D'Souza M, Walunas T, Grechkin Y, Pusch G, Haselkorn R, Fonstein M, Ehrlich SD, Overbeek R, Kyrpides N. Genome sequence of Bacillus cereus and comparative analysis with Bacillus anthracis. Nature. 2003 May 1;423(6935):87-91.

Lapidus A,Galleron N, Andersen JT, Jørgensen PL, Ehrlich SD, Sorokin A. Co-linear scaffold of the Bacillus licheniformis and Bacillus subtilis genomes and its use to compare their competence genes. FEMS Microbiol Lett. 2002 Mar 19;209(1):23-30.

Lapidus A,Galleron N, Sorokin A, Ehrlich SD. Sequencing and functional annotation of the Bacillus subtilis genes in the 200 kb rrnB-dnaB region. Microbiology. 1997 Nov;143 ( Pt 11): 3431-41.

Sorokin A, Lapidus A, Capuano V, Galleron N, Pujic P, Ehrlich SD. A new approach using multiplex long accurate PCR and yeast artificial chromosomes for bacterial chromosome mapping and sequencing. Genome Res. 1996 May;6(5):448-53.

Lapidus AL,Mochul'skii AV, Podkovyrov SM, Lebedeva MI, Antipin AA, Izotova LS, Zagnit'ko OP, Komolova GS, Klesov AA, Veiko VP, et al. Expression of the hAng gene in Escherichia coli; isolation and characterization of human recombinant Ser-(-1) angiogenin. Biomed Sci. 1990;1(6):597-604.

 

 

 

SPAdes publication

 

Anton Bankevich, Sergey Nurk, Dmitry Antipov, Alexey A. Gurevich, Mikhail Dvorkin, Alexander S. Kulikov, Valery M. Lesin, Sergey I. Nikolenko, Son Pham, Andrey D. Prjibelski, Alexey V. Pyshkin, Alexander V. Sirotkin, Nikolay Vyahhi, Glenn Tesler, Max A. Alekseyev, and Pavel A. Pevzner.
SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing.
Journal of Computational Biology. May 2012, 19(5): 455-477. doi:10.1089/cmb.2012.0021.

Abstract

The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies.

Datasets

Single E. coli cell and a single marine cell (Deltaproteobacterum SAR324) were isolated by micromanipulation. Paired-end libraries were generated on an Illumina Genome Analyzer IIx from MDA-amplified single-cell DNA and from standard (multicell) genomic DNA prepared from cultured E. coli. We call these datasets ECOLI-SC, ECOLI-MC, and SAR324. They consist of 100 bp paired-end reads with average insert sizes 266 bp for ECOLI-SC, 215 bp for ECOLI-MC, and 240 bp for SAR324. Both E. coli datasets have 600x coverage.

 
Results
 
We benchmarked seven assemblers (EULER-SRIDBASOAPdenovoVelvetVelvet-SCE+V-SC, and SPAdes) on three datasets (ECOLI-SCECOLI-MC, and SAR324). To provide unbiased benchmarking, we used the assembly evaluation tool Plantagora (http://www.plantagora.org). 
 
Table 1 illustrates that SPAdes compares well to other assemblers on  multicell and, particularly, single-cell datasets. SPAdes assembled ~96.1% of the E. coli genome from the ECOLI-SC dataset, with an N50  of 49623 bp and a single misassembly. E+V-SC assembled ~93.8%  of the E. coli genome with an N50 of 32051 and two misassemblies. SPAdes captured ~100 more E. coli genes than E+V-SC, ~800 more than Velvet, and ~900 more than SOAPdenovo.
 
On the ECOLI-MC dataset, the EULER-SR assembly featured the largest N50 (110,153 bp) but was compromised by 10 misassemblies. All other assemblers generated a small number of misassembled contigs, ranging from 4 (IDBA and Velvet) to 0 (Velvet-SC, E+V-SC, and SPAdes-single reads). SPAdes and Velvet also had larger N50 (86,590 and 78,602 bp) than other assemblers except for EULER-SR. All assemblers but SOAPdenovo produced nearly 100% coverage of the genome. Table 1 reveals that the substitution error rate ranges over an order of magnitude for different assemblers, with Velvet (for ECOLI-SC) and SPAdes-single reads (for ECOLI-MC) the most accurate.
 
We further compared E+V-SC and SPAdes on the SAR324 dataset. SPAdes assembled contigs totaling 5,129,304 bp (vs. 4,255,983 bp for E+V-SC) and an N50 of 75,366 bp (as compared to 30,293 bp for E+V-SC). Since the complete genome of Deltaproteobacterium SAR324 is unknown, we used long ORFs to estimate the number of genes longer than 600 bp, as a proxy for assembly quality. There are 2603 long ORFs in the SPAdes assembly vs. 2377 for E+V-SC.
 


 

This work was supported by the Government of the Russian Federation (grant 11.G34.31.0018) and by the National Institutes of Health, USA (NIH grant 3P41RR024851-02S1). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the organizations or agencies that provided support for the project.

WABI accepted papers

The lab have got two papers accepted to WABI this year. 12th Workshop on Algorithms in Bioinformatics will be held in Ljubljana (Slovenia) on September 10-12, 2012. The papers are:

  • Nikolay Vyahhi, Son K. Pham, Pavel Pevzner. "From de Bruijn Graphs to Rectangle Graphs for Genome Assembly".
  • Natalie E Castellana, Andrey Lushnikov, Piotr Rotkiewicz, Natasha Sefcovic, Pavel A Pevzner, and Adam Godzik, Kira Vyatkina. "MORPH-PRO: A Novel Algorithm and Web Server for Protein Morphing".

SPAdes Rectangles

From de Bruijn Graphs to Rectangle Graphs for Genome Assembly.

Nikolay Vyahhi, Son K. Pham, Pavel Pevzner.

(to appear)

Back from Barcelona

We've visited RECOMB-seq and RECOMB (16th Annual International Conference on Research in Computational Molecular Biology) in Barcelona! That was amazing conference, full of interesting people and fascinating talks on computational molecular biology, including our paper on PATH-SETS and Pavel's presentation of the SPAdes single-cell assembler. Most of the talks this year were focused on sequencing and genotyping technologies and algorithms, evolution, structural and functional genomics and molecular sequence analysis.

Just to remind, we'll hold the following RECOMB satellite conferences in St. Petersburg in August 2012:

We'll be glad to see you there.

Released SPAdes 2.0.0 (with integrated BayesHammer)

We are glad to announce our Genome Assembler SPAdes release (with integrated error correction tool BayesHammer).

You can download manual with installation guide and try SPAdes.  Please send all your comments to SPAdes support.

RECOMB-AB, RECOMB-BE

We announced two RECOMB satellites conferences to be held in St. Petersburg in August 2012:

RECOMB-BE is a continuation of previous three conferences (2009, 2010, 2011), while RECOMB-AB is new and focused on open algorithmic problems in bioinformatics.

Join us during this conferences!

SPAdes version 2.0.0

Manual

Please read manual with installation guide before using SPAdes

Requirements 

SPAdes requires a 64-bit Linux system. 
 
Assembling our test multi-cell E. coli dataset by SPAdes uses about 700 Mb peak memory, and single cell
E. coli dataset uses 6 Gb peak memory. Correcting errors in these datasets requires about 70 Gb of RAM.

 

Installing SPAdes

1. Installing Debian package (for Debian, Mint, Ubuntu, etc)

First, add the repository containing SPAdes by inserting the following line to the end of the le /etc/apt/sources.list
 
 
Note that the space before the last slash is required. Update the package list by typing
 
sudo apt-get update
 
After that, SPAdes can be installed just by typing
 
sudo apt-get install spades
 
23.2 Installing RPM package (for CentOS, Fedora, Mandriva, Red Hat, SUSE,
etc)
If your Linux system can install RPM packages you can use spades.rpm at http://spades.bioinf.spbau.
ru/release2.0.0/spades.rpm.
3.3 Running SPAdes without installation (building from source)
If you're unable to install a package (e. g. you don't have root privileges), you can download SPAdes as an
this case, you will have to take care of some dependencies beforehand:
ˆ g++ (version 4.4 or higher)
ˆ python (version 2.4 or higher)
ˆ cmake (version 2.6 or higher)
ˆ boost (version 1.42 or higher)
ˆ zlib
3.4 Running SPAdes without installation (even without building)
If you don't have possibility to install some of the packages described above, we've prepared the SPAdes
binaries for multiple values of k (see section 4.4), in the range from 11 to 149. To use these binaries, you
need to download SPAdes as an archive (http://spades.bioinf.spbau.ru/release2.0.0/spades_2.0.
0.tar.gz), extract it and use the following scripts to download the binaries:
./spades_download_binary.py <space-separated values of k>
./spades_download_bayeshammer.py
3.5 Testing your installation
To check your installation type
spades.py
if you intalled SPAdes from package (see 3.1 and 3.2), or
./spades.py
from folder you extracted the archive (see 3.3 and 3.4). It runs SPAdes on a toy dataset (rst 1;000 bp of
E. coli) that comes with SPAdes for testing purposes. If the installation is successful you will see lines like
the following lines in the end of the log.
*Corrected reads are in .../spades_output/ECOLI_1K/corrected/
*Assembled contigs are .../spades_output/ECOLI_1K/spades_04.18_17.59.30/ECOLI_1K.fasta
*Assessment of their quality is in
.../spades_output/ECOLI_1K/spades_04.18_17.59.30/quality_results/quality.txt
Thank you for using SPAdes!
======= SPAdes pipeline finished

 

SPAdes works best with new error-correction tool BayesHammer (paper submitted).
SPAdes (version 1.0.0) was initially released only for our collaborators in February 2012 but BayesHammer has not been released yet. Please wait until we release both SPAdes (version 2.0.0) AND BayesHammer (already integretaed into SPAdes pipeline) in the middle of April 2012. You can leave your contact by sending an e-mail to our support and we will notify you about our new release.

Collaboration

LAB's collaborative projects in the field of genomics focus on assembling genomes from the datasets provided by our collaborators, analyzing the obtained SPAdes assemblies, and them to those produced by world wide used assemblers, namely, Velvet-SC, E+V-SC, ALLPATHS, and SOAPdenovo. 

 

Our collaborator institutions:

File:UCSD logo.png

University of California, San Diego

 

National Institutes of Health (NIH) - Turning Discovery Into Health

National Institute of Health (NIH), USA: Human Microbiome Project (HMP)

 

J. Craig Venter Institute

J. Craig Venter Institite (JCVI)

 

Joint Genome Institute (JGI): Genomic Encyclopedia for Bacteria and Archaea (GEBA) and other projects

 

TSRI Logo

Scripps Institution of Oceanography 

 

Yale logo.png

Department of Ecology and Evolutionary Biology, Yale University

 

Broad Institute of MIT and Harvard
 
 
 
 
 
National Research University of Information Technologies, Mechanics and Optics
 
 
 
 
 
 
 
 
 
 
Syndicate content