What is antibody repertoire?
Antibody repertoire is a set of curculating antibodies. Reconstruction of antibody repertoire is important step of antibody drug development. We present a collection of tools for investigating antibody repertoire based on immunosequencing data:
IgRepertoireConstructor: an algorithm for construction of antibody repertoire and immunoproteogenomics analysis
IgSimulator: tool for simulation of antibody repertoire
IgQUAST: quality assessment tool for antibody repertoires (coming soon)
Antibody repertoire representation
We present an antibody repertoire as a set of clusters that correspond to antibody clones (groups of identical antibodies presenting by antibody nucleotide sequence, frequency and a set of IgSeq reads composing group). We use two files to describe antibody repertoire: CLUSTERS.FA (FASTA file containing antibody sequences) and RCM (ReadCluster Map). Examples of CLUSTERS.FA and RCM files for toy repertoire are listed below.
CLUSTERS.FASTA is a FASTA file, where each sequence corresponds to the antibody clone. Header of each sequence contains information about corresponding cluster id and size. Example shows repertoire containing 3 clusters of sizes 3, 2, and 1. 
Every line of RCM file contains information about read name and corresponding cluster id. For example, cluster 1 contains of reads MISEQ@:53:000000000A2BMW:1:2114:14345:28882, MISEQ@:53:000000000A2BMW:1:2114:14345:28882 and MISEQ@:53:000000000A2BMW:1:2114:14393:28886. 
IgRepertoireConstructor
IgRepertoireConstructor is a tool for construction of antibody repertoire from Illumina IgSeq library. IgRepertoireConstructor takes as an input immunosequencing reads that cover variable regions of antibodies and returns antibody repertoire constructed from the given reads as its output.
Visit IgRepertoireConstructor official page at GitHub for more details and download the latest version!
IgSimulator
IgSimulator is a tool for simulation of antibody repertoire and IgSeq library. IgSimulator is designed for testing and benchmarking tools for reconstruction of Ig repertoires.
Visit IgSimulator official page at GitHub for more details and download the latest version!
IgQUAST
 Single repertoire evaluation
 Multiple repertoires comparison
 Quality assessment against an ideal repertoire
Single repertoire evaluation
IgQUAST computes basic metrics such as # clusters, # singletons (or clusters containing of single read), size of maximal cluster, average size of cluster and a set of metrics showing number of clusters in repertoire of size larger than thresholds (# clusters >= 10, # clusters >= 50, # clusters >= 100 etc) and draws plots, such as histogram of cluster size / length distribution:
Histogram of cluster size distribution Histogram of cluster length distribution
IgQUAST additionally performs advanced analysis of mutated groups (groups of antibodies possibly developed from the same antibody). Example of advanced analysis of IgQUAST is shown below:


(a) Example of visualization of two clusters alignment. Peaks correspond to positions of polymorphisms in alignment. Red bars correspond to positions of CDRs computed by IgBlast.  (b) Example of visualization of summarized alignment of cluster against similar clusters. 
(c) Example of histogram of relative positions of polymorphisms. Red bars correspond to theoretical positions to CDRs. 
Multiple repertoire comparison
IgQUAST compares two or more repertoires constructed from the same IgSeq library and computed a set of metrics showing similarity of input repertoires.
General metrics for all compared repertoires
Metric name  Description 
# ideal groups  Number of clusters that are identical in all input repertoires, i.e. have similar sequences and were combined by the same set of reads 
# trusted groups  Number of groups where clusters from different repertoires have similar sequences and share >90% of reads. Such groups occur when cluster from one repertoire is presented by one big and several small clusters in other repertoires. These groups can be result of inaccurate error correction of one of input repertoires. 
# untrusted groups  Number of groups where clusters from different repertoires have nonsimilar sequences and share >90% of reads. Existence of such groups indicates that at least one of cluster sequence from untrusted group is erroneous and should be reconstructed 
# nontrivial ideal/trusted/untrusted groups  Ideal/trusted/untrusted groups where at least one cluster is not singleton. 
# big untrusted groups  Number of groups of big clusters (only clusters of size at least as specified with option isolminsize) from different repertoires that have similar sequences and share >90% of reads. 
Individual metrics for each repertoire
Metric name  Description 
# isolated clusters  Number of clusters that presented in only one input repertoire and have no similar clusters in other repertoires. 
# short clusters  Number of clusters with length of sequence <300 nt. 
# short isolated clusters  Number of isolated clusters with length of sequence <300 nt. 
min/avg/max cluster size  Minimal/average/maximal size of isolated cluster. 
# trivial isolated clusters  Number of isolated singletons 
IgQUAST reports various plots showing comparative histograms of cluster size / antibody length distribution for input repertoires:
Quality assessment against an ideal repertoire
IgQUAST evaluates repertoire with respect of ideal repertoire (e.g., in case of simulated repertoire) in terms of sensitivity (the measure of the representation of the ideal clusters by the constructed clusters) and specificity (the error rate of the incorrectly merged clusters of the ideal repertoire):
Metric name  Description 
# original clusters  Number of clusters in ideal repertoire. 
# not merged  Number of nontrivial clusters in the original repertoire that contain multiple clusters in the constructed repertoire. For a correctly constructed repertoire, the value of #this metric is 0. 
# not merged (not trivial + singletons)  Number of not merged clusters that are formed by a single nontrivial cluster and a number of singletons in the constructed repertoire. 
# original singletons  number of singletons in ideal repertoire. 
max original cluster  Size of maximal cluster from ideal repertoire. 
# constructed clusters  Number of constructed clusters. 
# errors  Number of constructed clusters that contain reads from more than one original cluster. For the correctly constructed repertoire, this metric is 0. 
# constructed singletons  Number of constructed singleton clusters. 
max constructed cluster  size of maximal constructed cluster. 
avg fillin  The value of avg fillin for an original cluster C is computed as the ratio of the size of its largest nonerroneous subcluster in the constructed repertoire to the size of C. 
fillin of max cluster  Maximal cluster of the original repertoire corresponds to the most frequent monoclonal antibodies. This metric is equal to the fillin of the maximal original cluster. 
correct singletons (%)  Some singletons in the constructed repertoire can be false due to insufficient error correction. This metric shows percentage of true singletons in the constructed repertoire. 
used reads (%)  Percentage of reads used in the repertoire reconstruction. This metric shows how well the reads have been utilized for reconstructing repertoires. 
#lost clusters  Number of original clusters that were completely lost in the constructed repertoire. 
lost clusters size (%)  Percentage of the lost clusters size as compared to full size of original repertoire. 
min/avg/max percentage of identity (%)  Minimal/average/maximal percentage of identity between sequences of clusters from original repertoire and corresponding clusters in constructed repertoire (corresponding cluster from constructed repertoire selected as a cluster that have most shared reads with cluster in original repertoire). 