Clone of SPAdes Genome Assembler (version 20.01.2014)

SPAdes 3.0 is out!

Now with support for IonTorrent, PacBio, module for highly polymorphic diploid genomes and many other new features!

See all changes in changelog.

SPAdes Assembler

SPAdes manual with installation guide (ver 3.0)

dipSPAdes manual

Download SPAdes

Assembling long Illumina paired-end reads (2x150 and 2x250) application note

SPAdes on GAGE-B data sets benchmark

Benchmark for other data sets

Support e-mail: spades.support@bioinf.spbau.ru

Follow @spadesassembler

For the benchmarks we used:

MDA single-cell E. coli; 6.3 Gb, 29M reads, 2x100bp, insert size ~ 270bp (Illumina Genome Analyzer IIx)
Standard isolate E. coli; 6.2Gb, 28M reads, 2x100bp, insert size ~ 215bp (Illumina Genome Analyzer IIx)
MDA single-cell S. aureus; 14.6Gb, 33M reads, 2x100bp, insert size ~ 214bp (Illumina Genome Analyzer IIx)

E. coli K-12 MG1655 reference length is 4639675 bp with 4324 annotated genes. S. aureus USA300 FPR3757 (chromosome and three plasmids) reference length is 2917469 bp with 2622 annotated genes.

Only contigs of 500 bp and longer were taken in consideration. Tables were obtained using QUAST 2.3.

Assembly	NG50	# contigs	Largest	Total length	MA	MM	IND	GF (%)	# genes
Single-cell E. coli
A5	14399	745	101584	4441145	8	12.01	0.17	89.880	3444
ABySS	68534	179	178720	4345617	6	3.32	1.68	88.268	3704
CLC	32506	503	113285	4656964	2	5.53	1.42	92.291	3768
EULER-SR	26662	429	140518	4248713	17	10.87	35.67	84.898	3416
Ray	45448	361	210820	4379139	17	6.29	2.83	88.372	3636
SOAPdenovo	1540	1166	51517	2958144	1	1.87	0.11	57.672	1766
Velvet	22648	261	132865	3501984	2	2.19	1.23	73.765	3080
E+V-SC	32051	344	132865	4540286	2	2.35	0.73	91.744	3771
IDBA-UD contigs	98306	244	284464	4814043	8	5.09	0.27	95.210	4045
IDBA-UD scaffolds	109057	229	284464	4813609	8	5.14	0.77	95.199	4052
SPAdes2.5 contigs	110081	240	268493	4797724	1	3.52	0.64	94.926	4037
SPAdes2.5 scaffolds	112393	234	268493	4799671	1	4.36	0.79	94.948	4042

Isolate E. coli
A5	43651	176	181690	4551797	0	0.40	0.11	98.017	4163
ABySS	106155	96	221861	4619631	2	3.77	0.41	98.974	4241
CLC	86964	112	221549	4550314	1	1.96	0.33	98.094	4205
EULER-SR	110153	100	221409	4574240	8	3.16	10.33	98.102	4192
Ray	86246	98	221942	4634429	2	2.14	0.09	96.903	4136
SOAPdenovo	49626	181	165487	4535469	0	0.15	0.11	97.696	4132
Velvet	82776	120	242032	4554702	3	2.57	0.37	98.175	4196
E+V-SC	54856	171	166115	4539639	0	1.30	0.15	97.795	4134
IDBA-UD contigs	106844	110	221687	4565529	3	3.40	0.31	98.331	4206
IDBA-UD scaffolds	133098	93	284363	4565454	4	4.08	0.61	98.355	4216
SPAdes2.5 contigs	133088	92	285414	4558033	0	2.17	0.33	98.137	4208
SPAdes2.5 scaffolds	133309	90	285414	4558337	0	2.59	0.42	98.156	4212


Single-cell S. aureus
A5	4829	937	41828	2770402	9	24.63	0.37	91.581	1815
ABySS	43173	185	175286	2899223	4	6.49	0.46	96.578	2456
EULER-SR	7247	750	66549	2988161	42	21.85	13.76	94.395	2008
Ray	62026	84	125177	2947717	13	2.29	0.96	92.936	2412
SOAPdenovo	510	1047	27317	1473402	0	1.32	0.29	46.717	595
Velvet	15656	347	67677	2746768	3	4.41	4.49	93.181	2274
E+V-SC	32296	215	107657	2932416	6	6.92	5.03	97.437	2477
IDBA-UD contigs	87549	114	175236	2996997	7	2.43	0.66	98.583	2567
IDBA-UD scaffolds	111392	99	210360	2996115	7	2.50	1.35	98.606	2573
SPAdes2.5 contigs	148260	101	284175	2996547	4	4.23	1.02	98.726	2544
SPAdes2.5 scaffolds	159252	99	429536	2997079	4	4.72	1.09	98.744	2544

A5 and CLC 3.22.55708 were run with default parameters.ABySS 1.3.5, EULER-SR 2.0.1, Ray 2.2.0, SOAPdenovo 2.04, Velvet 1.2.07, and E+V-SC were run with vertex size 55.

IDBA-UD 1.1.0 was run in its default iterative mode.

The total assembly size may increase (and in some cases exceeds the genome size) due to contaminants (see Chitsaz et al. (2011)), misassembled contigs, repeats, and hubs that contribute to multiple contigs. The percentage of the E. coli and S. aureus genomes covered filters out these issues (GF (%), Genome fraction (%) column).

The NG50 statistic is the same as the N50 except that the genome size is used rather than the assembly size.

Misassemblies (MA) are locations on an assembled contig where the left flanking sequence aligns over 1 kb away from the right flanking sequence on the reference.

Mismatch (substitution) error rate (MM) and number of indels (IND) per 100 kbp are measured in aligned regions of the contigs.

In each column, the best assemblers by that criteria is indicated in bold.

Related publications

S. Nurk, A. Bankevich, D. Antipov, A. A. Gurevich, A. Korobeynikov, A. Lapidus, A. D. Prjibelsky, A. Pyshkin, A. Sirotkin, Y. Sirotkin, R. Stepanauskas, J. S. McLean, R. Lasken, S. R. Clingenpeel, T. Woyke, G. Tesler, M. A. Alekseyev, and P. A. Pevzner. Assembling Single-Cell Genomes and Mini-Metagenomes From Chimeric MDA Products. Journal of Computational Biology 20(10) (2013), 714-737. doi:10.1089/cmb.2013.0084
Anton Bankevich, Sergey Nurk, Dmitry Antipov, Alexey A. Gurevich, Mikhail Dvorkin, Alexander S. Kulikov, Valery M. Lesin, Sergey I. Nikolenko, Son Pham, Andrey D. Prjibelski, Alexey V. Pyshkin, Alexander V. Sirotkin, Nikolay Vyahhi, Glenn Tesler, Max A. Alekseyev, and Pavel A. Pevzner. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. Journal of Computational Biology 19(5) (2012), 455-477. doi:10.1089/cmb.2012.0021
Son K. Pham, Dmitry Antipov, Alexander Sirotkin, Glenn Tesler, Pavel A. Pevzner, and Max A. Alekseyev. Pathset Graphs: A Novel Approach for Comprehensive Utilization of Paired Reads in Genome Assembly. Journal of Computational Biology (2012). doi:10.1089/cmb.2012.0098

Nikolay Vyahhi, Son K. Pham, and Pavel A. Pevzner. From de Bruijn Graphs to Rectangle Graphs for Genome Assembly. Lecture Notes in Bioinformatics 7534 (2012), pp. 249-261. doi:10.1007/978-3-642-33122-0_20
Sergey I. Nikolenko, Anton I. Korobeynikov and Max. A. Alekseyev. BayesHammer: Bayesian clustering for error correction in single-cell sequencing. BMC Genomics (2013) 14(S1):S7. doi:10.1186/1471-2164-14-S1-S7

“I'd like to thank you for the great job you are doing with SPAdes. It's a very useful software!”

Lionel Guy

Uppsala University, Sweden

“Thanks for your great SPAdes assembler, we have successfully assembled several cultured organims and your assembler always performed best compared to other assemblers when run on the PE- and/or MP MiSeq data we generally use.”

Dr. Harald R. Gruber-Vodicka

Symbiosis Group

Max Planck Institute of Marine Microbiology, Bremen, Germany

“We are also getting good results with SPAdes for metagenomic samples, thanks to its effort to recover as much genomic sequence as it can.”

Amr Abouelleil

Bioinformatics Assembly Analyst at Broad Institute

“I have recently used SPAdes to assembly reads generated on an Illumina platform (2 x 250 bp). The assemblies look very good!”

Mark de Been

Department of Medical Microbiology

University Medical Center Utrecht (UMCU) The Netherlands

Acknowledgements

This work was supported by the Government of the Russian Federation (grant 11.G34.31.0018) and by the National Institutes of Health, USA (NIH grant 3P41RR024851-02S1). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the organizations or agencies that provided support for the project.