Skip to main content

Public

SiBELia: A comparative genomics tool

Today is the first release of SiBELia — Synteny Block ExpLoration tool! It finds synteny blocks in genomes represented as nucleotide sequences using approach based on de Bruijn graphs.

Congratulation to Ilya Minkin — author of SiBELia and grad student at the St. Petersburg Academic Unviersity, and to Son Pham who is supervising this project. For more information please take a look at the poster. We'll be happy to hear your feedback.

 

Rectangle Graph for Repeat Resolution

Rectangle Graph for Repeat Resolution in Genome Assembly

Ultimate tool for resolving repeats in genome assemblies.

Though the specific implementation of the idea of the rectangle graph approach is already included into the current SPAdes distribution, we're also releasing the Rectangle Graph Module (RGM) as the separate code which can be run independently of SPAdes. Although RGM differs from the current implementation of the rectangle graph approach in SPAdes, in the future we plan to integrate RGM in SPAdes. RGM can be run with other genome assemblers if they use the graph format as SPAdes files.

For more details see: Nikolay Vyahhi, Son K. Pham, Pavel Pevzner. From de Bruijn Graphs to Rectangle Graphs for Genome Assembly, Lecture Notes in Bioinformatics 7534 (2012), pp. 249-261.

Code provided "as is", for questions please contact Nikolay Vyahhi.

Requirements: Python 2.7.x (2.5 and 2.6 may also work)

Usage:

  1. Run SPAdes in debug mode (spades.py --debug), so it will produce saves directory.
  2. Run rectangle module by command line: python rrr.py -s project_name/saves [options] [-o out_dir=out]
  3. Check out_dir for contigs, log files and other interesting debug information.

All rrr.py command line options:

  -h, --help       show this help message and exit
  -s SAVES_DIR     Name of directory with saves
  -g GENOME        File with genome (optional)
  -o OUT_DIR       Output directory, default = out (optional)
  -d DEBUG_LOGGER  File for debug logger (optional)
  --sc             Turn on if data is single-cell (optional)

 

Sibelia

Sibelia
(aka Synteny Block ExpLoration tool)

Sibelia: A comparative genomics tool: It assists biologists in analysing the genomic variations that correlate with pathogens, or the genomic changes that help microorganisms adapt in different environments. Sibelia will also be helpful for the evolutionary and genome rearrangement studies for multiple strains of microorganisms. 

Sibelia is useful in finding: (1) shared regions, (2) regions that present in one group of genomes but not in others, (3) rearrangements that transform one genome to other genomes.

In version 2, Sibelia works with multiple strains of bacteria and partitions their genomes into synteny blocks — blocks of highly conserved regions among all compared genomes. It represents genomes in circos pictures [for publication] or interactive forms [for experts’ analysis].

Sibelia is under active development. If you see that Sibelia has a potential to support your research, please do not hesitate to contact us at vyahhi@bioinf.spbau.ru with a list of features you would like Sibelia to have.

Links:

Related publications:

Ilya Minkin, Nikolay Vyahhi, Son Pham. "SyntenyFinder: A Synteny Blocks Generation and Genome Comparison Tool" (WABI 2012 poster)

Ilya Minkin, Anand Patel, Mikhail Kolmogorov, Nikolay Vyahhi, Son Pham. Sibelia: A fast synteny blocks generation tool for many closely related microbial genomes (submitted)

Acknowledgements

This work was supported by the Government of the Russian Federation (grant 11.G34.31.0018) and by the National Institutes of Health, USA (NIH grant 3P41RR024851-02S1).


Circos visualization:


The figure illustrates the hierarchy structure of synteny blocks between two strains of Helicobacter pylori: F32 and Gambia94/24.

Staphylococcus aureus subsp. aureus,
strains JH1, N315, TW20 and MSSA476.
Minimum size of a block = 10 000 bp.

Pseudomonas aeruginosa,
strains UCBPP-PA14, PAO1 and NCGM2.S1.
Minimum size of a block = 10 000 bp.

Helicobacter pylori,
strains F32 and Gambia94/24.
Minimum size of a block = 5000 bp.

 

Наука и образование Санкт-Петербурга

20 сентября на на телеканале ТВЦ был показан репортаж про лабораторию алгоритмической биологии, а также другие научные и образовательные проекты Санкт-Петербурга.

RECOMB-AB and RECOMB-BE 2012 were organized in St. Petersburg

We've ogranized and hosted RECOMB-AB (Open Problems in Algorithmic Biology) and RECOMB-BE (Bioinformatics Education) satellite conferences this year.

It was awesome to see you all in St. Petersburg, we'll post all materials and photos a little bit later.

Interns

Summer 2013



Petar Ivanov
Moscow State University


Vitaliy Demyanuk
ITMO


Artem Tarasov

St. Petersburg State University

   

 

 

Summer 2012



Pavel Avdeev
ITMO
 

Irina Gorbunova
SPbSU
 

Aleksey Kladov
SPbSU
 

Tatyana Krivosheeva
MSU
 

Ilya Minkin
SPbAU
 

Ivan Mihajlin
SPbSTU
 

Alexander Opeykin
SPbAU
 

Taisia Peunova
SPbSU
 

Vladislav Saveliev
SPbAU
 

Yana Safonova
NNSU
 

Ilya Chernyavsky
СПбАУ
 

 

 

Summer 2011

 

Antibody sequencing

 

Antibodies are proteins produced by the body’s immune system in response to antigens – potentially harmful substances. They are formed of four polypeptide chains: two identical heavy chains, and two identical light chains. A heavy chain is composed of four gene segments: V (variable), D (diverse), J (join), and C (constant); similarly, a light chain consists of three gene segments: V, J, and C. Antibodies are not encoded directly in the genome, but are assembled from those gene segments, each chosen from hundreds of candidates. Moreover, some nucleotides may be inserted or deleted at the junctions, increasing antibody diversity, and somatic hypermutation further diversifies the antibody repertoire.

The effectiveness of an antibody in blocking a particular antigen strongly depends on its amino acid sequence, as well as on the presence (or absence) of certain modifications. This makes the task of antibody sequencing highly important. However, due to their diversity, no complete antibody database exists. As a consequence, MS/MS database search approaches to protein sequencing are inapplicable to this case, leaving de novo sequencing the most attractive alternative.

Just a few years ago, sequencing a single antibody represented a heroic effort. “Digitizing" the $25 billion antibody industry forms an important goal because antibodies act as key diagnostic and therapeutic agents. Our experimental collaborators anticipate that as soon as the cost of antibody sequencing drops below $1,000, most diagnostic and therapeutic antibodies will be routinely sequenced. This flurry of sequencing thousands of antibodies will necessarily lead to digitization throughout the industry, a task requiring advanced software tools. Also, future applications will focus on previously unsequenced polyclonal antibodies. This research, if successful, would lead to disruptive computational technology in the antibody industry.

 

At the first stage of this project, we have developed a de Bruijn graph approach for the de novo assembly of thousands of top-down spectra into a protein sequence.

 

 

QUAST: Quality Assessment Tool for Genome Assemblies

Dear users! Our website moved to quast.sf.net! New QUAST page is here.

 

QUAST evaluates genome assemblies. For metagenome assembly evaluation, see MetaQUAST project. For contig alignment visualization, see Icarus project.

QUAST works both with and without a reference genome.

The tool accepts multiple assemblies, thus is suitable for comparison.

 

Source code

Manual

QUAST web interface

Paper at Bioinformatics journal

Paper at PubMed

Poster at RECOMB-2013 (PDF)

Citation:

Alexey Gurevich, Vladislav Saveliev, Nikolay Vyahhi and Glenn Tesler,
QUAST: quality assessment tool for genome assemblies,
Bioinformatics (2013) 29 (8): 1072-1075.
doi: 10.1093/bioinformatics/btt086
First published online: February 19, 2013 

Citation of other formats

 

About us:

Latest news:

  • April 19, 2016  the version 4.0 is released! Now with Icarus visualizer! Check out full list of changes.
  • April 1, 2016 — MetaQUAST paper was published in Bioinformatics volume 32, issue 7, pp. 1088-1090.
  • November 27, 2015 — the version 3.2 is released! Now with raw reads support. Check out full list of changes.
  • July 13, 2015 — QUAST repositories are open for public access on Github! Command-line tool is here, web interface is here.
  • June, 2015 — the number of QUAST downloads exceeded ten thousands (including more than 6500 downloads of QUAST v.2.3)!
  • April 15, 2013 — The paper was published in Bioinformatics volume 29, issue 8, pp. 1072-1075.

Please help us to make QUAST better by sending your comments, bug reports, and suggestions to quast.support@bioinf.spbau.ru.

 

Samples of QUAST plots:

            

Interactive QUAST evaluation demos of single-cell E. coliH. sapiens chr. 14 , and B. impatiens (bumblebee) assemblies.


 

QUAST 1.0 released.

Today we've also released first version of QUAST to public! QUAST is the ultimate tool to evaluate genome assemblies by computing various metrics (e.g., N50, number of ORFs, etc.), drawing plots and producing handy reports.

We use it every day to improve SPAdes single-cell assembler. Enjoy and we would be happy to hear your feedback.

Talks

Lectures:

Syndicate content