(reference-params)=
# Parameter File  

A parameter file is required for each CoMPaseD analysis. When using GUI mode, this file is automatically generated based on user settings and saved alongside the result files. In CLI mode, a saved parameter file can be modified, or a new one can be manually created. The file can be stored in any directory but must be referenced using the `-p path/to/parameter.params` option.  

Each parameter is specified using a keyword, with a brief explanation of its function provided below. More detailed descriptions and theoretical background can be found in the [main documentation](documentation-head).  

## **List of Parameters**  

### **General Settings**
(params_crux_dummy)=  
- **Crux_path:** Absolute path to the Crux executable.  
(params_clips_dummy)=  
- **Clips_path:** Path to the Perl script used for peptide mapping *(optional if `Use_perl_mapping` is `False`)*.  
(params_promast_dummy)=  
- **Promast_path:** Path to the Perl script used for protein inference *(optional if `Use_perl_mapping` is `False`)*.  
(params_idx_len_dummy)=  
- **Indexing_key_len:** Length of the indexing key used in peptide mapping *(default = `5`)*.  
(params_I_L_diff_dummy)=  
- **Differentiate_I_L:** If `True`, distinguishes between isoleucine (I) and leucine (L) in sequences *(default = `False`)*.  

### **Peptide Mapping & Protease Selection**  
(sampling_output)=
- **Use_perl_mapping:** If `True`, enables the original Perl-based peptide mapping *(default = `False`)*.  
(params_multi_dummy)=  
- **Multi_Threads:** *Deprecated*.  
(params_sampling_out_dummy)=  
- **Sampling_output:** If `True`, saves the peptide lists generated during random sampling *(default = `True`)*.  
   

### **Input & Output Files**  
(params-fasta)=
- **Fasta:** Path to the FASTA file containing protein sequences.  
(params-output_dir)=
- **Output_directory:** Directory where result files will be stored.  

### **Protease Digestion Parameters**  
(params-proteases)=
- **Proteases**: List of proteases used for digestion (comma-separated). Only protease names available in the Crux toolkit are allowed *(default = `trypsin,lysarginase,glu-c,chymotrypsin,lys-c`)*. The custom enzyme mode from crux can be enabled by prefixing any protease with custom and suffixing by the cleavage rule as used in Crux, *e.g.* `custom trypsin_new [KR]|{P}` would create a custom trypsin_new cleaving after K and R but not if followed by P.  
(params-max_mc)=
- **Max_MCs**: Maximum number of missed cleavages (MCs) allowed per protease. The order corresponds to `Proteases` *(default = `2,2,5,5,2`)*.

*The following parameters are only available in the command line version of CoMPaseD:*
(params-min_pep_mw)=
- **Min_Pep_MW**: Minimum peptide mass to be considered for digestion *(default = `400`)*.  
(params-max_pep_mw)=
- **Max_Pep_MW**: Maximum peptide mass to be considered for digestion *(default = `6000`)*.  
(params-min_pep_len)= 
- **Min_Pep_Len**: Minimum peptide length to be considered for digestion *(default = `6`)*.  
(params-max_pep_len)= 
- **Max_Pep_Len**: Maximum peptide length to be considered for digestion *(default = `55`)*.  


### **Peptide Sampling & Coverage**  
(params-freq_mc)=  
- **Freq_MCs**: Probability distribution of MCs for each protease. Separate values for each protease with squared brackets and remember to start each group with a value for peptides without MCs. Percentage values may be used directly but CoMPaseD will display a warning when the values for any protease do not sum to 1. The order corresponds to Proteases and must be with increasing number of MCs within each protease *(default = `[0.7415,0.2090,0.0484],[0.5757,0.2899,0.1336],[0.5620,0.2753,0.1110,0.0419,0.0086,0.0012],[0.2002,0.3369,0.2648,0.1471,0.0498,0.0012],[0.9102,0.0836,0.0058]`)*.  

```{note} 
The default configuration indicates that for trypsin there are peptides with zero, one or two MCs. 74.15% of all tryptic peptides do not contain a MCs, 20.90% contain one MCs and 4.84% contain two MCs.  
```  

(params-pep_number)=  
- **Peptides_Sampling_Size**: Number of peptides sampled per protease. Optional if `Sampling_Size_Based_On` is `coverage`. The order corresponds to Proteases *(default = `10000,10000,10000,10000,10000`)*.  
(params-pep_coverage)=  
- **Pep_Level_Proteome_Cov**: Expected proteome coverage per protease at the peptide level. Optional if `Sampling_Size_Based_On` is `number`. The order corresponds to Proteases *(default = `0.033051,0.031973356,0.014570424,0.009367681,0.053870932`, which equals 10000 peptides per protease for the provided Bacillus subtilis database)*.  
(params-sampling_based_on)=  
- **Sampling_Size_Based_On**: Defines whether peptide sampling is based on number or coverage *(default = `number`)*. 
(params_bins)=  
- **Bins**: Protein length bins used to calculate group-wise protease scores *(default = `0,50,100,99999`)*.  

### **Monte Carlo Simulation Parameters**  
(params-unique_peptides_only)=
- **Use_Unique_Peptides_Only**: If `True`, only peptides that map uniquely to one protein are retained for every protease. If set to `False`, all peptides are used and CoMPaseD builds protein groups where required. *(default = `True`)*.
(params_max_proteases)=  
- **Number_of_Proteases**: Maximal number of proteases concurrently used in the analysis *(default = `5`)*.  
(params_n_samplings)=  
- **Sampling_Number**: Number of times the Monte Carlo sampling is repeated *(default = `10`)*.  

### **Protein Abundance & Expression**  
(params_dynamic_range)=  
- **Protein_dynamic_range**: Expected dynamic range of protein abundances in orders of magnitude *(default = `6.5`)*.  
(params_undetectable)=
- **Not_expressed_fraction**: Percent fraction of proteins assumed to be not expressed in the studied conditions. This value is provided per bin *(default = `40,30,20`)*.  

```{note} 
The default configuration for `Bins`and `Not_expressed_fraction` indicates that there are three protein length bins: small proteins with a length between 1 and 50 amino acids, medium proteins between 51 and 100 amino acids and large proteins between 101 and 99999 amino acids. 40% of the small proteins, 30% of the medium proteins and 20% of the large proteins in the *fasta* database are not expressed.  
```  

### **Scoring Weights**  
(params_weight_prot)=
- **Protein_IDs_weight**: Weight assigned to protein identifications in protease score calculation *(default = `1`)*.  
(params_weight_pep)=
- **Peptide_IDs_weight**: Weight assigned to peptide identifications in protease score calculation *(default = `1`)*.  
(params_weight_cov)=
- **Coverage_weight**: Weight assigned to proteome coverage in protease score calculation *(default = `1`)*.  

### **Peptide Detectability Prediction by DeepMSPeptide**  
(params_use_dmsp)=
- **Use_DeepMSPeptide_Predictions**: If `True`, enables the use of DeepMSPeptide predictions *(default = `True`)*.    
(params_dmsp_weight)=
- **Weights_DeepMSPeptide_Predictions**: Weight assigned to DeepMSPeptide-based scoring *(default = `1`)*.  
(params_dmsp_model)=
- **Path_DeepMSPeptide_Model**: Path to the DeepMSPeptide prediction model file. Located in the `bin\` folder of CoMPaseD by default.

### **Additional Files**  
(params_protein_abundance_file)=  
- **Protein_weight_file**: Path to an existing file containing protein abundance values. This is either a previously generated [protein abundance file](result-protein_abundance) or can be generated by the user. If generated manually, it must contain at least three columns: `Identifier`, containing the protein identifier as used in the FASTA database; `Group` containing an identical string for all proteins that should be treated as one group during protease score calculation and `Random_sampling_X`, where X is an integer from 1 to the number of random samplings and which contains protein expression values.  
(params_digestion_result_dummy)=  
- **Digestion_result_file**: Path to an existing digestion result file. Can be provided to avoid dublicated digestion if the settings for `Proteases` and `Max_MCs` did not change. 


## Default Parameter File  
```
[MPD-config]
Crux_path = C:/Programs/crux/bin/crux.exe
Clips_path = C:/Programs/CoMPaseD/bin/Perl/clips.pl
Promast_path = C:/Programs/CoMPaseD/bin/Perl/clips.pl
Indexing_key_len = 5
Differentiate_I_L = False
Use_perl_mapping = False
Multi_Threads = False
Sampling_output = True
Fasta = C:/Programs/CoMPaseD/TestData/Uniprot_BSU168.fasta
Output_directory = C:/CoMPaseD_Data/output
Proteases = trypsin,lysarginase,glu-c,chymotrypsin,lys-c
Max_MCs = 2,2,5,5,2
Freq_MCs = [0.7415,0.2090,0.0484],[0.5757,0.2899,0.1336],[0.5620,0.2753,0.1110,0.0419,0.0086,0.0012],[0.2002,0.3369,0.2648,0.1471,0.0498,0.0012],[0.9102,0.0836,0.0058]
Peptides_Sampling_Size = 10000,10000,10000,10000,10000
Pep_Level_Proteome_Cov = 0.033051,0.031973356,0.014570424,0.009367681,0.053870932
Min_Pep_MW = 400
Max_Pep_MW = 6000
Min_Pep_Len = 6
Max_Pep_Len = 55
Sampling_Size_Based_On = number
Bins = 0,50,100,99999
Number_of_Proteases = 5
Sampling_Number = 10
Protein_dynamic_range = 6.5
Not_expressed_fraction = 40,30,20
Protein_IDs_weight = 1.0
Peptide_IDs_weight = 1.0
Coverage_weight = 1.0
Use_DeepMSPeptide_Predictions = True
Weights_DeepMSPeptide_Predictions = 1.0
Use_Unique_Peptides_Only = True
Path_DeepMSPeptide_Model = C:/Programs/CoMPaseD/bin/CoMPaseDDMSPModel.h5
Protein_weight_file = 
Digestion_result_file = C:/CoMPaseD_Data/output/unique_peptides_table_filtered.tsv
```