Skip to main content

Immunoproteogenomics: analysis of antibody repertoire

What is antibody repertoire?

Antibody repertoire is a set of curculating antibodies. Reconstruction of antibody repertoire is important step of antibody drug development. We present a collection of tools for investigating antibody repertoire based on immunosequencing data:

IgRepertoireConstructor: an algorithm for construction of antibody repertoire and immunoproteogenomics analysis

IgSimulator: tool for simulation of antibody repertoire

IgQUAST: quality assessment tool for antibody repertoires (coming soon)

Antibody repertoire representation

We present an antibody repertoire as a set of clusters that correspond to antibody clones (groups of identical antibodies presenting by antibody nucleotide sequence, frequency and a set of Ig-Seq reads composing group). We use two files to describe antibody repertoire: CLUSTERS.FA (FASTA file containing antibody sequences) and RCM (Read-Cluster Map). Examples of CLUSTERS.FA and RCM files for toy repertoire are listed below.

CLUSTERS.FASTA is a FASTA file, where each sequence corresponds to the antibody clone.

Header of each sequence contains information about corresponding cluster id and size.

Example shows repertoire containing 3 clusters of sizes 3, 2, and 1.


Every line of RCM file contains information about read name and corresponding cluster id.

For example, cluster 1 contains of reads MISEQ@:53:000000000-A2BMW:1:2114:14345:28882,

MISEQ@:53:000000000-A2BMW:1:2114:14345:28882 and MISEQ@:53:000000000-A2BMW:1:2114:14393:28886.


IgRepertoireConstructor is a tool for construction of antibody repertoire from Illumina Ig-Seq library. IgRepertoireConstructor takes as an input immunosequencing reads that cover variable regions of antibodies and returns antibody repertoire constructed from the given reads as its output. 

Visit IgRepertoireConstructor official page at GitHub for more details and download the latest version!



IgSimulator is a tool for simulation of antibody repertoire and Ig-Seq library. IgSimulator is designed for testing and benchmarking tools for reconstruction of Ig repertoires.

Visit IgSimulator official page at GitHub for more details and download the latest version!



IgQUAST (Immunoglobulin QUality ASsessment Tool) is a tool for quality assessment of antibody repertoire. IgQUAST takes antibody repertoire(s) as an input and evaluates them in the different ways:
  • Single repertoire evaluation
  • Multiple repertoires comparison
  • Quality assessment against an ideal repertoire

Single repertoire evaluation

IgQUAST computes basic metrics such as # clusters, # singletons (or clusters containing of single read), size of maximal cluster, average size of cluster and a set of metrics showing number of clusters in repertoire of size larger than thresholds (# clusters >= 10, # clusters >= 50, # clusters >= 100 etc) and draws plots, such as histogram of cluster size / length distribution: 

Histogram of cluster size distribution                                           Histogram of cluster length distribution

IgQUAST additionally performs advanced analysis of mutated groups (groups of antibodies possibly developed from the same antibody). Example of advanced analysis of IgQUAST is shown below:




(a) Example of visualization of two clusters alignment. Peaks correspond to positions of polymorphisms in alignment. Red bars correspond to positions of CDRs computed by IgBlast.    (b) Example of visualization of summarized alignment of cluster against similar clusters.

(c) Example of histogram of relative positions of polymorphisms. Red bars correspond to theoretical positions to CDRs.


Multiple repertoire comparison

IgQUAST compares two or more repertoires constructed from the same Ig-Seq library and computed a set of metrics showing similarity of input repertoires.

General metrics for all compared repertoires

Metric name Description
# ideal groups Number of clusters that are identical in all input repertoires, i.e. have similar sequences and were combined by the same set of reads
# trusted groups Number of groups where clusters from different repertoires have similar sequences and share >90% of reads. Such groups occur when cluster from one repertoire is presented by one big and several small clusters in other repertoires. These groups can be result of inaccurate error correction of one of input repertoires.
# untrusted groups Number of groups where clusters from different repertoires have non-similar sequences and share >90% of reads. Existence of such groups indicates that at least one of cluster sequence from untrusted group is erroneous and should be reconstructed
# non-trivial ideal/trusted/untrusted groups Ideal/trusted/untrusted groups where at least one cluster is not singleton.
# big untrusted groups Number of groups of big clusters (only clusters of size at least as specified with option --isol-min-size) from different repertoires that have similar sequences and share >90% of reads.

Individual metrics for each repertoire

Metric name Description
# isolated clusters Number of clusters that presented in only one input repertoire and have no similar clusters in other repertoires.
# short clusters Number of clusters with length of sequence <300 nt.
# short isolated clusters Number of isolated clusters with length of sequence <300 nt.
min/avg/max cluster size Minimal/average/maximal size of isolated cluster.
# trivial isolated clusters Number of isolated singletons


IgQUAST reports various plots showing comparative histograms of cluster size / antibody length distribution for input repertoires:

Quality assessment against an ideal repertoire

IgQUAST evaluates repertoire with respect of ideal repertoire (e.g., in case of simulated repertoire) in terms of sensitivity (the measure of the representation of the ideal clusters by the constructed clusters) and specificity (the error rate of the incorrectly merged clusters of the ideal repertoire):

Metric name Description
# original clusters Number of clusters in ideal repertoire.
# not merged Number of non-trivial clusters in the original repertoire that contain multiple clusters in the constructed repertoire. For a correctly constructed repertoire, the value of #this metric is 0.
# not merged (not trivial + singletons) Number of not merged clusters that are formed by a single non-trivial cluster and a number of singletons in the constructed repertoire.
# original singletons number of singletons in ideal repertoire.
max original cluster Size of maximal cluster from ideal repertoire.
# constructed clusters Number of constructed clusters.
# errors Number of constructed clusters that contain reads from more than one original cluster. For the correctly constructed repertoire, this metric is 0.
# constructed singletons Number of constructed singleton clusters.
max constructed cluster size of maximal constructed cluster.
avg fill-in The value of avg fill-in for an original cluster C is computed as the ratio of the size of its largest non-erroneous subcluster in the constructed repertoire to the size of C.
fill-in of max cluster Maximal cluster of the original repertoire corresponds to the most frequent monoclonal antibodies. This metric is equal to the fill-in of the maximal original cluster.
correct singletons (%) Some singletons in the constructed repertoire can be false due to insufficient error correction. This metric shows percentage of true singletons in the constructed repertoire.
used reads (%) Percentage of reads used in the repertoire reconstruction. This metric shows how well the reads have been utilized for reconstructing repertoires.
#lost clusters Number of original clusters that were completely lost in the constructed repertoire.
lost clusters size (%) Percentage of the lost clusters size as compared to full size of original repertoire.
min/avg/max percentage of identity (%) Minimal/average/maximal percentage of identity between sequences of clusters from original repertoire and corresponding clusters in constructed repertoire (corresponding cluster from constructed repertoire selected as a cluster that have most shared reads with cluster in original repertoire).