diff --git a/README.md b/README.md index 3d1b2f05283798ef579901982715b503a15242fc..54667b0c9e25a5ee6fd1b8c74ac7192d34f22f30 100644 --- a/README.md +++ b/README.md @@ -1,55 +1,77 @@ -# DSLS Project -## Shared genetic traits in psychiatric disorders - -In the project we analyze shared genetic traits between three psychiatric diseases, namely autism, depression and schizophrenia using publicly available RNA-Seq and DNA Methylation datasets. - -This repository contains the notebooks with conducted analyses of RNA-Seq and DNA Methylation datasets. - -All data necessary to run the notebooks can be downloaded from GEO database and this [link](TODO) - -### Statistical analysis - -#### Differential Expression Analysis (limma) - -`differential_expression_analysis.Rmd` - -In order to run the analysis, please download following datasets from GEO database: -* Autism dataset - [GSE25507](https://ftp.ncbi.nlm.nih.gov/geo/series/GSE25nnn/GSE25507/matrix/) -* Schizophrenia dataset - [GSE27383](https://ftp.ncbi.nlm.nih.gov/geo/series/GSE27nnn/GSE27383/matrix/) -* Depression dataset - [GSE98793](https://ftp.ncbi.nlm.nih.gov/geo/series/GSE98nnn/GSE98793/matrix/) -* Affymetrix chip annotation file - [Platform GPL570](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?mode=raw&is_datatable=true&acc=GPL570&id=55999&db=GeoDb_blob143) - -#### Differential Methylation Analysis (ChAMP) - -##### Preprocessing - -* `GSE80417_Preprocessing.ipynb` -* `Methylation_Labeling.ipynb` - -`differential_methylation_analysis.Rmd` - -In order to run the analysis, please download following datasets from GEO database: - -TODO - -### Machine Learning - -Following notebooks contain code for multiclass classification based on RNA-Seq and DNA Methylation data: - -* `rnaseq_ml.ipynb` [RNA-Seq] -* `methylation_ml.ipynb` [DNA Methylation] - -In order to run the notebooks, choose one of the two options: - -* Run `differential_expression_analysis.Rmd` and `differential_methylation_analysis.Rmd` in order to generate necessary input data -* (recommended) Download the already generated input data from this [link](TODO) - -### Annotation - -`Methylation_Postprocessing.ipynb` - -TODO - -`annotation_gsea.ipynb` - -TODO +# DSLS Project +## Shared genetic traits in psychiatric disorders + +In the project we analyze shared genetic traits between three psychiatric diseases, namely autism, depression and schizophrenia using publicly available RNA-Seq and DNA Methylation datasets. + +This repository contains the notebooks with conducted analyses of RNA-Seq and DNA Methylation datasets. + +All data necessary to run the notebooks can be downloaded from GEO database and this [link](https://drive.google.com/drive/folders/1V1I6pUEiTr2J5Ixma6nM69cd_u1aqiFp?usp=drive_link). + +### Statistical analysis + +#### Differential Expression Analysis (limma) + +`differential_expression_analysis.Rmd` + +In order to run the analysis, please download following datasets from GEO database: +* Autism dataset - [GSE25507 Series matrix](https://ftp.ncbi.nlm.nih.gov/geo/series/GSE25nnn/GSE25507/matrix/) +* Schizophrenia dataset - [GSE27383 Series matrix](https://ftp.ncbi.nlm.nih.gov/geo/series/GSE27nnn/GSE27383/matrix/) +* Depression dataset - [GSE98793 Series matrix](https://ftp.ncbi.nlm.nih.gov/geo/series/GSE98nnn/GSE98793/matrix/) +* Affymetrix chip annotation file - [Platform GPL570](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?mode=raw&is_datatable=true&acc=GPL570&id=55999&db=GeoDb_blob143) + +#### Differential Methylation Analysis (ChAMP) + +##### Preprocessing + +* `GSE80417_Preprocessing.ipynb` + * [accession](https://drive.google.com/file/d/1sxfsygRn_wWUaImALhn2uRDrz7WtVQMc/view?usp=drive_link) + * [accession2](https://drive.google.com/file/d/1sxfsygRn_wWUaImALhn2uRDrz7WtVQMc/view?usp=drive_link) + * [labels](https://drive.google.com/file/d/1QEc0fb7gSLfYwWoTYUyxv_V08myP4nZf/view?usp=drive_link) + * Schizophrenia dataset - [GSE80417 Raw Betas](https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE80417&format=file&file=GSE80417%5FrawBetas%2Ecsv%2Egz) + + +* `Methylation_Labeling.ipynb` + * [labels](https://drive.google.com/file/d/1QEc0fb7gSLfYwWoTYUyxv_V08myP4nZf/view?usp=drive_link) + + +`differential_methylation_analysis.Rmd` + +In order to run the analysis, please download following datasets from GEO database: + +* Autism dataset - [GSE109905](https://ftp.ncbi.nlm.nih.gov/geo/series/GSE109nnn/GSE109905/matrix/) +* Schizophrenia dataset - [GSE80417](https://ftp.ncbi.nlm.nih.gov/geo/series/GSE80nnn/GSE80417/matrix/) +* Depression dataset - [GSE113725](https://ftp.ncbi.nlm.nih.gov/geo/series/GSE201nnn/GSE201016/matrix/) + + +#### Machine Learning + +Following notebooks contain code for multiclass classification based on RNA-Seq and DNA Methylation data: + +* `rnaseq_ml.ipynb` [RNA-Seq] +* `methylation_ml.ipynb` [DNA Methylation] + +In order to run the notebooks, choose one of the two options: + +* Run `differential_expression_analysis.Rmd` and `differential_methylation_analysis.Rmd` in order to generate necessary input data +* (recommended) Download the already generated input data from this [link](https://drive.google.com/drive/folders/19xE-Op_HhuKsD_RzS7DQ4XFe3gN7WDv6?usp=drive_link) + * `/rna-seq` folder for `rnaseq_ml.ipynb` + * `/dna-methylation` folder for `methylation_ml.ipynb` + +#### Annotation + +`Methylation_Postprocessing.ipynb` + +The following files are declared in the notebook's DMP part as df_mdd, df_asd and df_scz: +* [Depression DMPs](https://drive.google.com/file/d/1-fjNNhFld2ljCBb99UfjbD3NT4YL5VV2/view?usp=drive_link) +* [Autism DMPS](https://drive.google.com/file/d/1tdVfNiz2Zo3TiH7eaqhDixAcnvmKmoW3/view?usp=drive_link) +* [Schizophrenia DMPs](https://drive.google.com/file/d/1t1HFzL-MkvLZIGYYLvGhW0C-_JuZglpO/view?usp=drive_link) + +The following files are declared in the notebook's GSEA part as mdd, asd and scz: +* [Depression GSEA](https://drive.google.com/file/d/1-ixarw_W3SKAgMxzlk6Kya_3SWNpDQQc/view?usp=drive_link) +* [Autism GSEA](https://drive.google.com/file/d/1jQhpZ2XtwpgMXk6leIqdl8YzMAjtVGa3/view?usp=drive_link) +* [Schizophrenia GSEA](https://drive.google.com/file/d/1O7bcGj97wlLpUwaV_vFNfMfhsTv-0h68/view?usp=drive_link) + +`annotation_gsea.ipynb` +* [Depression dataset](https://drive.google.com/file/d/16AwrCi4ncI51pnR8NItBZdvZRNY4xcgZ/view?usp=drive_link) +* [Schizophrenia dataset](https://drive.google.com/file/d/1_BYZnRuJznhfDuCA8i18p0g7fyKiyNrL/view?usp=drive_link) +* [Autism dataset](https://drive.google.com/file/d/1PSrGnP5tZEJ4r3P2f948WjxX67fHJJH5/view?usp=drive_link)