Parameter File

A parameter file is required for each CoMPaseD analysis. When using GUI mode, this file is automatically generated based on user settings and saved alongside the result files. In CLI mode, a saved parameter file can be modified, or a new one can be manually created. The file can be stored in any directory but must be referenced using the -p path/to/parameter.params option.

Each parameter is specified using a keyword, with a brief explanation of its function provided below. More detailed descriptions and theoretical background can be found in the main documentation.

List of Parameters

General Settings

Crux_path: Absolute path to the Crux executable.

Clips_path: Path to the Perl script used for peptide mapping (optional if Use_perl_mapping is False).

Promast_path: Path to the Perl script used for protein inference (optional if Use_perl_mapping is False).

Indexing_key_len: Length of the indexing key used in peptide mapping (default = 5).

Differentiate_I_L: If True, distinguishes between isoleucine (I) and leucine (L) in sequences (default = False).

Peptide Mapping & Protease Selection

Use_perl_mapping: If True, enables the original Perl-based peptide mapping (default = False).

Multi_Threads: Deprecated.

Sampling_output: If True, saves the peptide lists generated during random sampling (default = True).

Input & Output Files

Fasta: Path to the FASTA file containing protein sequences.

Output_directory: Directory where result files will be stored.

Protease Digestion Parameters

Proteases: List of proteases used for digestion (comma-separated). Only protease names available in the Crux toolkit are allowed (default = trypsin,lysarginase,glu-c,chymotrypsin,lys-c). The custom enzyme mode from crux can be enabled by prefixing any protease with custom and suffixing by the cleavage rule as used in Crux, e.g. custom trypsin_new [KR]|{P} would create a custom trypsin_new cleaving after K and R but not if followed by P.

Max_MCs: Maximum number of missed cleavages (MCs) allowed per protease. The order corresponds to Proteases (default = 2,2,5,5,2).

The following parameters are only available in the command line version of CoMPaseD:

Min_Pep_MW: Minimum peptide mass to be considered for digestion (default = 400).

Max_Pep_MW: Maximum peptide mass to be considered for digestion (default = 6000).

Min_Pep_Len: Minimum peptide length to be considered for digestion (default = 6).

Max_Pep_Len: Maximum peptide length to be considered for digestion (default = 55).

Peptide Sampling & Coverage

Freq_MCs: Probability distribution of MCs for each protease. Separate values for each protease with squared brackets and remember to start each group with a value for peptides without MCs. Percentage values may be used directly but CoMPaseD will display a warning when the values for any protease do not sum to 1. The order corresponds to Proteases and must be with increasing number of MCs within each protease (default = [0.7415,0.2090,0.0484],[0.5757,0.2899,0.1336],[0.5620,0.2753,0.1110,0.0419,0.0086,0.0012],[0.2002,0.3369,0.2648,0.1471,0.0498,0.0012],[0.9102,0.0836,0.0058]).

Note

The default configuration indicates that for trypsin there are peptides with zero, one or two MCs. 74.15% of all tryptic peptides do not contain a MCs, 20.90% contain one MCs and 4.84% contain two MCs.

Peptides_Sampling_Size: Number of peptides sampled per protease. Optional if Sampling_Size_Based_On is coverage. The order corresponds to Proteases (default = 10000,10000,10000,10000,10000).

Pep_Level_Proteome_Cov: Expected proteome coverage per protease at the peptide level. Optional if Sampling_Size_Based_On is number. The order corresponds to Proteases (default = 0.033051,0.031973356,0.014570424,0.009367681,0.053870932, which equals 10000 peptides per protease for the provided Bacillus subtilis database).

Sampling_Size_Based_On: Defines whether peptide sampling is based on number or coverage (default = number).

Bins: Protein length bins used to calculate group-wise protease scores (default = 0,50,100,99999).

Monte Carlo Simulation Parameters

Use_Unique_Peptides_Only: If True, only peptides that map uniquely to one protein are retained for every protease. If set to False, all peptides are used and CoMPaseD builds protein groups where required. (default = True).

Number_of_Proteases: Maximal number of proteases concurrently used in the analysis (default = 5).

Sampling_Number: Number of times the Monte Carlo sampling is repeated (default = 10).

Protein Abundance & Expression

Protein_dynamic_range: Expected dynamic range of protein abundances in orders of magnitude (default = 6.5).

Not_expressed_fraction: Percent fraction of proteins assumed to be not expressed in the studied conditions. This value is provided per bin (default = 40,30,20).

Note

The default configuration for Binsand Not_expressed_fraction indicates that there are three protein length bins: small proteins with a length between 1 and 50 amino acids, medium proteins between 51 and 100 amino acids and large proteins between 101 and 99999 amino acids. 40% of the small proteins, 30% of the medium proteins and 20% of the large proteins in the fasta database are not expressed.

Scoring Weights

Protein_IDs_weight: Weight assigned to protein identifications in protease score calculation (default = 1).

Peptide_IDs_weight: Weight assigned to peptide identifications in protease score calculation (default = 1).

Coverage_weight: Weight assigned to proteome coverage in protease score calculation (default = 1).

Peptide Detectability Prediction by DeepMSPeptide

Use_DeepMSPeptide_Predictions: If True, enables the use of DeepMSPeptide predictions (default = True).

Weights_DeepMSPeptide_Predictions: Weight assigned to DeepMSPeptide-based scoring (default = 1).

Path_DeepMSPeptide_Model: Path to the DeepMSPeptide prediction model file. Located in the bin\ folder of CoMPaseD by default.

Additional Files

Protein_weight_file: Path to an existing file containing protein abundance values. This is either a previously generated protein abundance file or can be generated by the user. If generated manually, it must contain at least three columns: Identifier, containing the protein identifier as used in the FASTA database; Group containing an identical string for all proteins that should be treated as one group during protease score calculation and Random_sampling_X, where X is an integer from 1 to the number of random samplings and which contains protein expression values.

Digestion_result_file: Path to an existing digestion result file. Can be provided to avoid dublicated digestion if the settings for Proteases and Max_MCs did not change.

Default Parameter File

[MPD-config]
Crux_path = C:/Programs/crux/bin/crux.exe
Clips_path = C:/Programs/CoMPaseD/bin/Perl/clips.pl
Promast_path = C:/Programs/CoMPaseD/bin/Perl/clips.pl
Indexing_key_len = 5
Differentiate_I_L = False
Use_perl_mapping = False
Multi_Threads = False
Sampling_output = True
Fasta = C:/Programs/CoMPaseD/TestData/Uniprot_BSU168.fasta
Output_directory = C:/CoMPaseD_Data/output
Proteases = trypsin,lysarginase,glu-c,chymotrypsin,lys-c
Max_MCs = 2,2,5,5,2
Freq_MCs = [0.7415,0.2090,0.0484],[0.5757,0.2899,0.1336],[0.5620,0.2753,0.1110,0.0419,0.0086,0.0012],[0.2002,0.3369,0.2648,0.1471,0.0498,0.0012],[0.9102,0.0836,0.0058]
Peptides_Sampling_Size = 10000,10000,10000,10000,10000
Pep_Level_Proteome_Cov = 0.033051,0.031973356,0.014570424,0.009367681,0.053870932
Min_Pep_MW = 400
Max_Pep_MW = 6000
Min_Pep_Len = 6
Max_Pep_Len = 55
Sampling_Size_Based_On = number
Bins = 0,50,100,99999
Number_of_Proteases = 5
Sampling_Number = 10
Protein_dynamic_range = 6.5
Not_expressed_fraction = 40,30,20
Protein_IDs_weight = 1.0
Peptide_IDs_weight = 1.0
Coverage_weight = 1.0
Use_DeepMSPeptide_Predictions = True
Weights_DeepMSPeptide_Predictions = 1.0
Use_Unique_Peptides_Only = True
Path_DeepMSPeptide_Model = C:/Programs/CoMPaseD/bin/CoMPaseDDMSPModel.h5
Protein_weight_file = 
Digestion_result_file = C:/CoMPaseD_Data/output/unique_peptides_table_filtered.tsv