(reference-params)= # Parameter File A parameter file is required for each CoMPaseD analysis. When using GUI mode, this file is automatically generated based on user settings and saved alongside the result files. In CLI mode, a saved parameter file can be modified, or a new one can be manually created. The file can be stored in any directory but must be referenced using the `-p path/to/parameter.params` option. Each parameter is specified using a keyword, with a brief explanation of its function provided below. More detailed descriptions and theoretical background can be found in the [main documentation](documentation-head). ## **List of Parameters** ### **General Settings** (params_crux_dummy)= - **Crux_path:** Absolute path to the Crux executable. (params_clips_dummy)= - **Clips_path:** Path to the Perl script used for peptide mapping *(optional if `Use_perl_mapping` is `False`)*. (params_promast_dummy)= - **Promast_path:** Path to the Perl script used for protein inference *(optional if `Use_perl_mapping` is `False`)*. (params_idx_len_dummy)= - **Indexing_key_len:** Length of the indexing key used in peptide mapping *(default = `5`)*. (params_I_L_diff_dummy)= - **Differentiate_I_L:** If `True`, distinguishes between isoleucine (I) and leucine (L) in sequences *(default = `False`)*. ### **Peptide Mapping & Protease Selection** (sampling_output)= - **Use_perl_mapping:** If `True`, enables the original Perl-based peptide mapping *(default = `False`)*. (params_multi_dummy)= - **Multi_Threads:** *Deprecated*. (params_sampling_out_dummy)= - **Sampling_output:** If `True`, saves the peptide lists generated during random sampling *(default = `True`)*. ### **Input & Output Files** (params-fasta)= - **Fasta:** Path to the FASTA file containing protein sequences. (params-output_dir)= - **Output_directory:** Directory where result files will be stored. ### **Protease Digestion Parameters** (params-proteases)= - **Proteases**: List of proteases used for digestion (comma-separated). Only protease names available in the Crux toolkit are allowed *(default = `trypsin,lysarginase,glu-c,chymotrypsin,lys-c`)*. The custom enzyme mode from crux can be enabled by prefixing any protease with custom and suffixing by the cleavage rule as used in Crux, *e.g.* `custom trypsin_new [KR]|{P}` would create a custom trypsin_new cleaving after K and R but not if followed by P. (params-max_mc)= - **Max_MCs**: Maximum number of missed cleavages (MCs) allowed per protease. The order corresponds to `Proteases` *(default = `2,2,5,5,2`)*. *The following parameters are only available in the command line version of CoMPaseD:* (params-min_pep_mw)= - **Min_Pep_MW**: Minimum peptide mass to be considered for digestion *(default = `400`)*. (params-max_pep_mw)= - **Max_Pep_MW**: Maximum peptide mass to be considered for digestion *(default = `6000`)*. (params-min_pep_len)= - **Min_Pep_Len**: Minimum peptide length to be considered for digestion *(default = `6`)*. (params-max_pep_len)= - **Max_Pep_Len**: Maximum peptide length to be considered for digestion *(default = `55`)*. ### **Peptide Sampling & Coverage** (params-freq_mc)= - **Freq_MCs**: Probability distribution of MCs for each protease. Separate values for each protease with squared brackets and remember to start each group with a value for peptides without MCs. Percentage values may be used directly but CoMPaseD will display a warning when the values for any protease do not sum to 1. The order corresponds to Proteases and must be with increasing number of MCs within each protease *(default = `[0.7415,0.2090,0.0484],[0.5757,0.2899,0.1336],[0.5620,0.2753,0.1110,0.0419,0.0086,0.0012],[0.2002,0.3369,0.2648,0.1471,0.0498,0.0012],[0.9102,0.0836,0.0058]`)*. ```{note} The default configuration indicates that for trypsin there are peptides with zero, one or two MCs. 74.15% of all tryptic peptides do not contain a MCs, 20.90% contain one MCs and 4.84% contain two MCs. ``` (params-pep_number)= - **Peptides_Sampling_Size**: Number of peptides sampled per protease. Optional if `Sampling_Size_Based_On` is `coverage`. The order corresponds to Proteases *(default = `10000,10000,10000,10000,10000`)*. (params-pep_coverage)= - **Pep_Level_Proteome_Cov**: Expected proteome coverage per protease at the peptide level. Optional if `Sampling_Size_Based_On` is `number`. The order corresponds to Proteases *(default = `0.033051,0.031973356,0.014570424,0.009367681,0.053870932`, which equals 10000 peptides per protease for the provided Bacillus subtilis database)*. (params-sampling_based_on)= - **Sampling_Size_Based_On**: Defines whether peptide sampling is based on number or coverage *(default = `number`)*. (params_bins)= - **Bins**: Protein length bins used to calculate group-wise protease scores *(default = `0,50,100,99999`)*. ### **Monte Carlo Simulation Parameters** (params-unique_peptides_only)= - **Use_Unique_Peptides_Only**: If `True`, only peptides that map uniquely to one protein are retained for every protease. If set to `False`, all peptides are used and CoMPaseD builds protein groups where required. *(default = `True`)*. (params_max_proteases)= - **Number_of_Proteases**: Maximal number of proteases concurrently used in the analysis *(default = `5`)*. (params_n_samplings)= - **Sampling_Number**: Number of times the Monte Carlo sampling is repeated *(default = `10`)*. ### **Protein Abundance & Expression** (params_dynamic_range)= - **Protein_dynamic_range**: Expected dynamic range of protein abundances in orders of magnitude *(default = `6.5`)*. (params_undetectable)= - **Not_expressed_fraction**: Percent fraction of proteins assumed to be not expressed in the studied conditions. This value is provided per bin *(default = `40,30,20`)*. ```{note} The default configuration for `Bins`and `Not_expressed_fraction` indicates that there are three protein length bins: small proteins with a length between 1 and 50 amino acids, medium proteins between 51 and 100 amino acids and large proteins between 101 and 99999 amino acids. 40% of the small proteins, 30% of the medium proteins and 20% of the large proteins in the *fasta* database are not expressed. ``` ### **Scoring Weights** (params_weight_prot)= - **Protein_IDs_weight**: Weight assigned to protein identifications in protease score calculation *(default = `1`)*. (params_weight_pep)= - **Peptide_IDs_weight**: Weight assigned to peptide identifications in protease score calculation *(default = `1`)*. (params_weight_cov)= - **Coverage_weight**: Weight assigned to proteome coverage in protease score calculation *(default = `1`)*. ### **Peptide Detectability Prediction by DeepMSPeptide** (params_use_dmsp)= - **Use_DeepMSPeptide_Predictions**: If `True`, enables the use of DeepMSPeptide predictions *(default = `True`)*. (params_dmsp_weight)= - **Weights_DeepMSPeptide_Predictions**: Weight assigned to DeepMSPeptide-based scoring *(default = `1`)*. (params_dmsp_model)= - **Path_DeepMSPeptide_Model**: Path to the DeepMSPeptide prediction model file. Located in the `bin\` folder of CoMPaseD by default. ### **Additional Files** (params_protein_abundance_file)= - **Protein_weight_file**: Path to an existing file containing protein abundance values. This is either a previously generated [protein abundance file](result-protein_abundance) or can be generated by the user. If generated manually, it must contain at least three columns: `Identifier`, containing the protein identifier as used in the FASTA database; `Group` containing an identical string for all proteins that should be treated as one group during protease score calculation and `Random_sampling_X`, where X is an integer from 1 to the number of random samplings and which contains protein expression values. (params_digestion_result_dummy)= - **Digestion_result_file**: Path to an existing digestion result file. Can be provided to avoid dublicated digestion if the settings for `Proteases` and `Max_MCs` did not change. ## Default Parameter File ``` [MPD-config] Crux_path = C:/Programs/crux/bin/crux.exe Clips_path = C:/Programs/CoMPaseD/bin/Perl/clips.pl Promast_path = C:/Programs/CoMPaseD/bin/Perl/clips.pl Indexing_key_len = 5 Differentiate_I_L = False Use_perl_mapping = False Multi_Threads = False Sampling_output = True Fasta = C:/Programs/CoMPaseD/TestData/Uniprot_BSU168.fasta Output_directory = C:/CoMPaseD_Data/output Proteases = trypsin,lysarginase,glu-c,chymotrypsin,lys-c Max_MCs = 2,2,5,5,2 Freq_MCs = [0.7415,0.2090,0.0484],[0.5757,0.2899,0.1336],[0.5620,0.2753,0.1110,0.0419,0.0086,0.0012],[0.2002,0.3369,0.2648,0.1471,0.0498,0.0012],[0.9102,0.0836,0.0058] Peptides_Sampling_Size = 10000,10000,10000,10000,10000 Pep_Level_Proteome_Cov = 0.033051,0.031973356,0.014570424,0.009367681,0.053870932 Min_Pep_MW = 400 Max_Pep_MW = 6000 Min_Pep_Len = 6 Max_Pep_Len = 55 Sampling_Size_Based_On = number Bins = 0,50,100,99999 Number_of_Proteases = 5 Sampling_Number = 10 Protein_dynamic_range = 6.5 Not_expressed_fraction = 40,30,20 Protein_IDs_weight = 1.0 Peptide_IDs_weight = 1.0 Coverage_weight = 1.0 Use_DeepMSPeptide_Predictions = True Weights_DeepMSPeptide_Predictions = 1.0 Use_Unique_Peptides_Only = True Path_DeepMSPeptide_Model = C:/Programs/CoMPaseD/bin/CoMPaseDDMSPModel.h5 Protein_weight_file = Digestion_result_file = C:/CoMPaseD_Data/output/unique_peptides_table_filtered.tsv ```