Skip to main content

Clone of SPAdes Genome Assembler (version 20.01.2014)

 

 

SPAdes 3.0 is out! 

Now with support for IonTorrent, PacBio, module for highly polymorphic diploid genomes and many other new features!

See all changes in changelog

SPAdes Assembler

 

SPAdes manual with installation guide (ver 3.0)

dipSPAdes manual

Download SPAdes

Assembling long Illumina paired-end reads (2x150 and 2x250) application note

SPAdes on GAGE-B data sets benchmark

Benchmark for other data sets

Support e-mail: spades.support@bioinf.spbau.ru

 

 

 

For the benchmarks we used:

E. coli K-12 MG1655 reference length is 4639675 bp with 4324 annotated genes. S. aureus USA300 FPR3757 (chromosome and three plasmids) reference length is 2917469 bp with 2622 annotated genes.

Only contigs of 500 bp and longer were taken in consideration. Tables were obtained using QUAST 2.3.

 

Assembly NG50 # contigs Largest Total length MA MM IND GF (%) # genes
Single-cell E. coli                  
A5 14399 745 101584 4441145 8 12.01 0.17 89.880 3444
ABySS 68534 179 178720 4345617 6 3.32 1.68 88.268 3704
CLC 32506 503 113285 4656964 2 5.53 1.42 92.291 3768
EULER-SR 26662 429 140518 4248713 17 10.87 35.67 84.898 3416
Ray 45448 361 210820 4379139 17 6.29 2.83 88.372 3636
SOAPdenovo 1540 1166 51517 2958144 1 1.87 0.11 57.672 1766
Velvet 22648 261 132865 3501984 2 2.19 1.23 73.765 3080
E+V-SC 32051 344 132865 4540286 2 2.35 0.73 91.744 3771
IDBA-UD contigs 98306 244 284464 4814043 8 5.09 0.27 95.210 4045
IDBA-UD scaffolds 109057 229 284464 4813609 8 5.14 0.77 95.199 4052
SPAdes2.5 contigs 110081 240 268493 4797724 1 3.52 0.64 94.926 4037
SPAdes2.5 scaffolds 112393 234 268493 4799671 1 4.36 0.79 94.948 4042
                   
Isolate E. coli                  
A5 43651 176 181690 4551797 0 0.40 0.11 98.017 4163
ABySS 106155 96 221861 4619631 2 3.77 0.41 98.974 4241
CLC 86964 112 221549 4550314 1 1.96 0.33 98.094 4205
EULER-SR 110153 100 221409 4574240 8 3.16 10.33 98.102 4192
Ray 86246 98 221942 4634429 2 2.14 0.09 96.903 4136
SOAPdenovo 49626 181 165487 4535469 0 0.15 0.11 97.696 4132
Velvet 82776 120 242032 4554702 3 2.57 0.37 98.175 4196
E+V-SC 54856 171 166115 4539639 0 1.30 0.15 97.795 4134
IDBA-UD contigs 106844 110 221687 4565529 3 3.40 0.31 98.331 4206
IDBA-UD scaffolds 133098 93 284363 4565454 4 4.08 0.61 98.355 4216
SPAdes2.5 contigs 133088 92 285414 4558033 0 2.17 0.33 98.137 4208
SPAdes2.5 scaffolds 133309 90 285414 4558337 0 2.59 0.42 98.156 4212
                   
                   
Single-cell S. aureus                  
A5 4829 937 41828 2770402 9 24.63 0.37 91.581 1815
ABySS 43173 185 175286 2899223 4 6.49 0.46 96.578 2456
EULER-SR 7247 750 66549 2988161 42 21.85 13.76 94.395 2008
Ray 62026 84 125177 2947717 13 2.29 0.96 92.936 2412
SOAPdenovo 510 1047 27317 1473402 0 1.32 0.29 46.717 595
Velvet 15656 347 67677 2746768 3 4.41 4.49 93.181 2274
E+V-SC 32296 215 107657 2932416 6 6.92 5.03 97.437 2477
IDBA-UD contigs 87549 114 175236 2996997 7 2.43 0.66 98.583 2567
IDBA-UD scaffolds 111392 99 210360 2996115 7 2.50 1.35 98.606 2573
SPAdes2.5 contigs 148260 101 284175 2996547 4 4.23 1.02 98.726 2544
SPAdes2.5 scaffolds 159252 99 429536 2997079 4 4.72 1.09 98.744 2544
 

 


A5 and CLC 3.22.55708 were run with default parameters.ABySS 1.3.5, EULER-SR 2.0.1, Ray 2.2.0, SOAPdenovo 2.04, Velvet 1.2.07, and E+V-SC were run with vertex size 55.
IDBA-UD 1.1.0 was run in its default iterative mode.
 
The total assembly size may increase (and in some cases exceeds the genome size) due to contaminants (see Chitsaz et al. (2011)), misassembled contigs, repeats, and hubs that contribute to multiple contigs. The percentage of the E. coli and S. aureus genomes covered filters out these issues (GF (%), Genome fraction (%) column).
 
The NG50 statistic is the same as the N50 except that the genome size is used rather than the assembly size. 
 
Misassemblies (MA) are locations on an assembled contig where the left flanking sequence aligns over 1 kb away from the right flanking sequence on the reference.
 
Mismatch (substitution) error rate (MM) and number of indels (IND) per 100 kbp are measured in aligned regions of the contigs. 
 
 
In each column, the best assemblers by that criteria is indicated in bold.
 
 

Related publications

  • S. Nurk, A. Bankevich, D. Antipov, A. A. Gurevich, A. Korobeynikov, A. Lapidus, A. D. Prjibelsky, A. Pyshkin, A. Sirotkin, Y. Sirotkin, R. Stepanauskas, J. S. McLean, R. Lasken, S. R. Clingenpeel, T. Woyke, G. Tesler, M. A. Alekseyev, and P. A. Pevzner. Assembling Single-Cell Genomes and Mini-Metagenomes From Chimeric MDA Products. Journal of Computational Biology 20(10) (2013), 714-737. doi:10.1089/cmb.2013.0084

  • Anton Bankevich, Sergey Nurk, Dmitry Antipov, Alexey A. Gurevich, Mikhail Dvorkin, Alexander S. Kulikov, Valery M. Lesin, Sergey I. Nikolenko, Son Pham, Andrey D. Prjibelski, Alexey V. Pyshkin, Alexander V. Sirotkin, Nikolay Vyahhi, Glenn Tesler, Max A. Alekseyev, and Pavel A. Pevzner. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell SequencingJournal of Computational Biology 19(5) (2012), 455-477. doi:10.1089/cmb.2012.0021

  • Son K. Pham, Dmitry Antipov, Alexander Sirotkin, Glenn Tesler, Pavel A. Pevzner, and Max A. Alekseyev. Pathset Graphs: A Novel Approach for Comprehensive Utilization of Paired Reads in Genome AssemblyJournal of Computational Biology (2012). doi:10.1089/cmb.2012.0098

  • Nikolay Vyahhi, Son K. Pham, and Pavel A. Pevzner. From de Bruijn Graphs to Rectangle Graphs for Genome AssemblyLecture Notes in Bioinformatics 7534 (2012), pp. 249-261. doi:10.1007/978-3-642-33122-0_20
  • Sergey I. Nikolenko, Anton I. Korobeynikov and Max. A. Alekseyev. BayesHammer: Bayesian clustering for error correction in single-cell sequencingBMC Genomics (2013) 14(S1):S7. doi:10.1186/1471-2164-14-S1-S7

 

 


 
“I'd like to thank you for the great job you are doing with SPAdes. It's a very useful software!”
Lionel Guy
Uppsala University, Sweden
 
“Thanks for your great SPAdes assembler, we have successfully assembled several cultured organims and your assembler always performed best compared to other assemblers when run on the PE- and/or MP MiSeq data we generally use.”
Dr. Harald R. Gruber-Vodicka
Symbiosis Group
Max Planck Institute of Marine Microbiology, Bremen, Germany
 
“We are also getting good results with SPAdes for metagenomic samples, thanks to its effort to recover as much genomic sequence as it can.”
Amr Abouelleil
Bioinformatics Assembly Analyst at Broad Institute
 
“I have recently used SPAdes to assembly reads generated on an Illumina platform (2 x 250 bp). The assemblies look very good!”
Mark de Been
Department of Medical Microbiology
University Medical Center Utrecht (UMCU) The Netherlands

 

 

Acknowledgements

This work was supported by the Government of the Russian Federation (grant 11.G34.31.0018) and by the National Institutes of Health, USA (NIH grant 3P41RR024851-02S1). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the organizations or agencies that provided support for the project.