This repository provides a meticulously crafted Python-based pipeline for the in-depth analysis and optimization of gene-phosphorylation site dynamics. Leveraging advanced methods for data preprocessing, parameter estimation, and visualization, the pipeline delivers valuable insights into kinase interactions and phosphorylation mechanisms. Designed for modularity and robustness, it accommodates a variety of experimental datasets, enabling researchers to adapt the workflow seamlessly to their specific needs. By addressing the inherent complexities of phosphorylation datasets, this repository ensures precision and scalability in analytical processes.
---
## Features
### 1. Data Organization
- 📁 Automatically categorizes output files (`.xlsx`,`.csv`,`.png`, `.html`) into protein-specific subdirectories based on protein symbols.
- 🗂️ Establishes a structured file hierarchy that simplifies downstream analyses and minimizes manual effort in organizing outputs.
- 🖇️ Constructs observed data matrices (P_initial) and kinase-specific arrays (K_array) to provide a robust foundation for analysis.
- 🔄 Handles incomplete kinase-phosphorylation site datasets by generating synthetic data, ensuring all relationships are considered during optimization.
### 3. Parameter Estimation
- 🛠️ Employs sophisticated optimization algorithms (SLSQP & DE) to estimate accurate alpha and beta parameters.
- 📊 Supports multiple loss functions, including mean squared error (MSE), mean absolute percentage error (MAPE), Huber loss, and weighted error metrics.
- ✅ Enforces biologically meaningful constraints, ensuring alpha values sum to unity for gene-phosphorylation site combinations and beta values reflect consistent kinase contributions.
### 4. Scalable Optimization
- ⚡ Utilizes multi-core parallel processing to expedite computational workflows, making it suitable for large-scale datasets.
- 🔧 Provides configurable hyperparameter settings, including parameter bounds, regularization techniques (L1/L2), and data scaling methods tailored to experimental contexts.
### 5. Advanced Visualization Tools
- 🌈 Generates a wide range of visualizations, including heatmaps, scatter plots, and convergence plots, to effectively communicate optimization results.
- 🔍 Produces residual analysis outputs, such as cumulative residual distributions and goodness-of-fit plots, for rigorous error evaluation.
- 📉 Incorporates dimensionality reduction techniques, such as PCA and t-SNE, to reveal hidden patterns in complex datasets.
---
## Structure
### Scripts
1.**`arrange_folders.py`**: 📂 Organizes output files into protein-specific subdirectories for streamlined result management.
2.**`metrics.py`**: 📈 Computes performance metrics (e.g., MSE, RMSE, R-squared) and generates comparative summaries across datasets.
3.**`optimization_posthoc.py`**: 🔄 Conducts post-optimization analyses, extracting parameters and visualizing error landscapes and parameter distributions.
4.**`optimization_1.py` and `optimization_2.py`**: 🧮 Serve as the core optimization scripts, supporting diverse algorithms and constraints for precise parameter estimation.
5.**`optimization_analysis.py`**: 🧠 Provides advanced statistical analyses, dimensionality reduction, and network visualizations to complement optimization results.
6.**`problem_diagram.py`**: 🎨 Generates network diagrams with labeled nodes, colored edges, and subscripted labels for multiple Graphviz layouts.
### Files
-**Input**:
- 📄 `input1.csv`: Contains phosphorylation time-series data for multiple genes and phosphorylation sites.
- 📄 `input2.csv`: Mapped kinase-phosphorylation site interactions, forming the foundation for optimization.
1.**Theoretical Biophysics Lab, Humboldt University of Berlin**
Research conducted under the supervision of [Prof. Edda Klipp](https://rumo.biologie.hu-berlin.de/tbp/index.php/en/).
Institution: Humboldt University of Berlin, Berlin, Germany.
2.**Freie Universität Berlin**
Author: MSc Bioinformatics student at [Freie Universität Berlin](https://www.fu-berlin.de/en/index.html).
Institution: Freie Universität Berlin, Berlin, Germany.
---
## Acknowledgements
🎓 Special thanks to collaborators and domain experts for their invaluable insights and guidance, which have been instrumental in shaping this pipeline. Their contributions have significantly enhanced the repository's analytical capabilities and usability. A moment of gratitute to my family and friends who wer patient and kind enough to be there from start to end.