Parameter File
A parameter file is required for each CoMPaseD analysis. When using GUI mode, this file is automatically generated based on user settings and saved alongside the result files. In CLI mode, a saved parameter file can be modified, or a new one can be manually created. The file can be stored in any directory but must be referenced using the -p path/to/parameter.params option.
Each parameter is specified using a keyword, with a brief explanation of its function provided below. More detailed descriptions and theoretical background can be found in the main documentation.
List of Parameters
General Settings
Crux_path: Absolute path to the Crux executable.
Clips_path: Path to the Perl script used for peptide mapping (optional if
Use_perl_mappingisFalse).
Promast_path: Path to the Perl script used for protein inference (optional if
Use_perl_mappingisFalse).
Indexing_key_len: Length of the indexing key used in peptide mapping (default =
5).
Differentiate_I_L: If
True, distinguishes between isoleucine (I) and leucine (L) in sequences (default =False).
Peptide Mapping & Protease Selection
Use_perl_mapping: If
True, enables the original Perl-based peptide mapping (default =False).
Multi_Threads: Deprecated.
Sampling_output: If
True, saves the peptide lists generated during random sampling (default =True).
Input & Output Files
Fasta: Path to the FASTA file containing protein sequences.
Output_directory: Directory where result files will be stored.
Protease Digestion Parameters
Proteases: List of proteases used for digestion (comma-separated). Only protease names available in the Crux toolkit are allowed (default =
trypsin,lysarginase,glu-c,chymotrypsin,lys-c). The custom enzyme mode from crux can be enabled by prefixing any protease with custom and suffixing by the cleavage rule as used in Crux, e.g.custom trypsin_new [KR]|{P}would create a custom trypsin_new cleaving after K and R but not if followed by P.
Max_MCs: Maximum number of missed cleavages (MCs) allowed per protease. The order corresponds to
Proteases(default =2,2,5,5,2).
The following parameters are only available in the command line version of CoMPaseD:
Min_Pep_MW: Minimum peptide mass to be considered for digestion (default =
400).
Max_Pep_MW: Maximum peptide mass to be considered for digestion (default =
6000).
Min_Pep_Len: Minimum peptide length to be considered for digestion (default =
6).
Max_Pep_Len: Maximum peptide length to be considered for digestion (default =
55).
Peptide Sampling & Coverage
Freq_MCs: Probability distribution of MCs for each protease. Separate values for each protease with squared brackets and remember to start each group with a value for peptides without MCs. Percentage values may be used directly but CoMPaseD will display a warning when the values for any protease do not sum to 1. The order corresponds to Proteases and must be with increasing number of MCs within each protease (default =
[0.7415,0.2090,0.0484],[0.5757,0.2899,0.1336],[0.5620,0.2753,0.1110,0.0419,0.0086,0.0012],[0.2002,0.3369,0.2648,0.1471,0.0498,0.0012],[0.9102,0.0836,0.0058]).
Note
The default configuration indicates that for trypsin there are peptides with zero, one or two MCs. 74.15% of all tryptic peptides do not contain a MCs, 20.90% contain one MCs and 4.84% contain two MCs.
Peptides_Sampling_Size: Number of peptides sampled per protease. Optional if
Sampling_Size_Based_Oniscoverage. The order corresponds to Proteases (default =10000,10000,10000,10000,10000).
Pep_Level_Proteome_Cov: Expected proteome coverage per protease at the peptide level. Optional if
Sampling_Size_Based_Onisnumber. The order corresponds to Proteases (default =0.033051,0.031973356,0.014570424,0.009367681,0.053870932, which equals 10000 peptides per protease for the provided Bacillus subtilis database).
Sampling_Size_Based_On: Defines whether peptide sampling is based on number or coverage (default =
number).
Bins: Protein length bins used to calculate group-wise protease scores (default =
0,50,100,99999).
Monte Carlo Simulation Parameters
Use_Unique_Peptides_Only: If
True, only peptides that map uniquely to one protein are retained for every protease. If set toFalse, all peptides are used and CoMPaseD builds protein groups where required. (default =True).
Number_of_Proteases: Maximal number of proteases concurrently used in the analysis (default =
5).
Sampling_Number: Number of times the Monte Carlo sampling is repeated (default =
10).
Protein Abundance & Expression
Protein_dynamic_range: Expected dynamic range of protein abundances in orders of magnitude (default =
6.5).
Not_expressed_fraction: Percent fraction of proteins assumed to be not expressed in the studied conditions. This value is provided per bin (default =
40,30,20).
Note
The default configuration for Binsand Not_expressed_fraction indicates that there are three protein length bins: small proteins with a length between 1 and 50 amino acids, medium proteins between 51 and 100 amino acids and large proteins between 101 and 99999 amino acids. 40% of the small proteins, 30% of the medium proteins and 20% of the large proteins in the fasta database are not expressed.
Scoring Weights
Protein_IDs_weight: Weight assigned to protein identifications in protease score calculation (default =
1).
Peptide_IDs_weight: Weight assigned to peptide identifications in protease score calculation (default =
1).
Coverage_weight: Weight assigned to proteome coverage in protease score calculation (default =
1).
Peptide Detectability Prediction by DeepMSPeptide
Use_DeepMSPeptide_Predictions: If
True, enables the use of DeepMSPeptide predictions (default =True).
Weights_DeepMSPeptide_Predictions: Weight assigned to DeepMSPeptide-based scoring (default =
1).
Path_DeepMSPeptide_Model: Path to the DeepMSPeptide prediction model file. Located in the
bin\folder of CoMPaseD by default.
Additional Files
Protein_weight_file: Path to an existing file containing protein abundance values. This is either a previously generated protein abundance file or can be generated by the user. If generated manually, it must contain at least three columns:
Identifier, containing the protein identifier as used in the FASTA database;Groupcontaining an identical string for all proteins that should be treated as one group during protease score calculation andRandom_sampling_X, where X is an integer from 1 to the number of random samplings and which contains protein expression values.
Digestion_result_file: Path to an existing digestion result file. Can be provided to avoid dublicated digestion if the settings for
ProteasesandMax_MCsdid not change.
Default Parameter File
[MPD-config]
Crux_path = C:/Programs/crux/bin/crux.exe
Clips_path = C:/Programs/CoMPaseD/bin/Perl/clips.pl
Promast_path = C:/Programs/CoMPaseD/bin/Perl/clips.pl
Indexing_key_len = 5
Differentiate_I_L = False
Use_perl_mapping = False
Multi_Threads = False
Sampling_output = True
Fasta = C:/Programs/CoMPaseD/TestData/Uniprot_BSU168.fasta
Output_directory = C:/CoMPaseD_Data/output
Proteases = trypsin,lysarginase,glu-c,chymotrypsin,lys-c
Max_MCs = 2,2,5,5,2
Freq_MCs = [0.7415,0.2090,0.0484],[0.5757,0.2899,0.1336],[0.5620,0.2753,0.1110,0.0419,0.0086,0.0012],[0.2002,0.3369,0.2648,0.1471,0.0498,0.0012],[0.9102,0.0836,0.0058]
Peptides_Sampling_Size = 10000,10000,10000,10000,10000
Pep_Level_Proteome_Cov = 0.033051,0.031973356,0.014570424,0.009367681,0.053870932
Min_Pep_MW = 400
Max_Pep_MW = 6000
Min_Pep_Len = 6
Max_Pep_Len = 55
Sampling_Size_Based_On = number
Bins = 0,50,100,99999
Number_of_Proteases = 5
Sampling_Number = 10
Protein_dynamic_range = 6.5
Not_expressed_fraction = 40,30,20
Protein_IDs_weight = 1.0
Peptide_IDs_weight = 1.0
Coverage_weight = 1.0
Use_DeepMSPeptide_Predictions = True
Weights_DeepMSPeptide_Predictions = 1.0
Use_Unique_Peptides_Only = True
Path_DeepMSPeptide_Model = C:/Programs/CoMPaseD/bin/CoMPaseDDMSPModel.h5
Protein_weight_file =
Digestion_result_file = C:/CoMPaseD_Data/output/unique_peptides_table_filtered.tsv