Tag Convolution

Based on the tag generation strategy applied by Twister, we developed the tag convolution approach to validating amino acid sequences, and implemented it in a standalone software tool.

Usage

java -Xmx2G -jar TagConvolution.jar -d=input directory with the MS/MS spectra -dStrings=input directory with the amino acid strings [options]

Input

The input comprises a set of deconvoluted high-resolution bottom-up MS/MS spectra stored in the msalign format supported by the tool MS-Deconv, which we thus recommend to use for deconvolution.

All the msalign files from the unput directory will be considered as input.

Options

-k, --tag-length <integer value>

Specify the length of the tags to be extracted from the input spectra.

Default: 4

-e, --mass-tolerance <float value>

Specify mass tolerance in mDa (please do not specify the units).

Default: 4mDa

-r, --peak reflection <0|1>

If set to 1, the peak reflection procedure will be applied prior to tag generation. If set to 0, no peak reflection will be performed.

Default: 1

-m, --modifications <0|1>

If set to 1, the peaks supposed to correspond to water loss ions will be removed from each spectrum. If set to 0, all the peaks will be kept.

Default: 1

-h, --threshold on the middle tag score <integer value>

Specify threshold on the middle tag score for the amino acid strings strings of length (2k+1).

Default: 1

-pepnovo, --PepNovo-generated file <0|1>

If set to 1, the input file with the amino acid strings is supposed to be a PepNovo output file. If set to 0, the input file is supposed to be in the format of TagConvolutionSampleInput.txt.

Default: 1

Output

In the input directory, a subdirectory "results-Twister" is created, which contains the output files. The name of each output file starts with the short name "InputFolder" of the input folder. For each input file, two output files are generated, the names of which end with 'valid.txt' and 'scores.txt', respectively. The former contains the amino acid strings that passed the validation procedure, and the latter stores the tag and amino acid scores for each of those strings.

Test datasets (bottom-up)

Carbonic anhydrase 2 (CAH2): raw (436Mb), mzXML (720Mb), msalign (55Mb), pepnovo (16Mb), output (1Mb) (k=3, h=300, the rest of parameter have the default values).

Alemtuzumab: raw (152Mb), mzXML (38Mb), msalign (3Mb), pepnovo (703Kb), output (17Kb) (k=3, the rest of parameter have the default values).

Attachment	Size
TagConvolution.zip	8.27 MB