Skip to content
Snippets Groups Projects
Commit c1042f81 authored by james94's avatar james94
Browse files

update readme and snakemake

parent 88786c0c
No related branches found
No related tags found
No related merge requests found
......@@ -96,17 +96,17 @@ If `conda command not found` please close and re-open your terminal for conda i
## Creating our Snakemake conda environment
The pipeline requires the following software to run:
- snakemake (v6.6.1)
- python (v3.9.10)
- tabulate (v0.8.7)
- beautifulsoup4 (v4.9)
- mamba (v0.15.2) *[Newest version causes error]*
- pandoc (v2.15)
- tectonic (v0.8.2)
The pipeline requires the following software to run (some versions may not need to be exactly matching):
- snakemake (v7.6.2)
- beautifulsoup4 (v4.9.3)
- pandas (v1.4.2)
- numpy (v1.22.3)
- python (v3.10.4)
The easiest method to install this software stack is to create a GEP conda environment with the provided `installGEP.yaml` (see ***Note**)
You may need to close and reopen your terminal after the `conda env create ...` command, or use `source ~/.bashrc` depending on your system.
```
conda env create -f /<your_path_to>/GEP/installGEP.yaml
......@@ -117,6 +117,15 @@ conda activate GEP
snakemake --version
```
Installing mamba within your GEP environment is recommended, as it makes conda package managing/installing quicker. To do so, run the following command after the GEP environment is installed and activated:
```
conda install -c conda-forge mamba
```
If you do not install mamba, you will need to make sure the following flag is included in your snakemake command at the time of execution: `--conda-frontend=conda`
***Note** *If you already have a snakemake (or suitable Python) installation and would like to avoid installing again, ensure that all of the above software are in your `PATH`. If you do this instead of installing from the provided GEP environment (`installGEP.yaml`), you will still need at least the base conda installed/activated - as it's required to handle software dependencies of all the tools used within the workflow itself*
<br>
......@@ -308,7 +317,7 @@ keep-going: False
snakemake --profile SUBMIT_CONFIG/local/
```
#### SLURM
#### HPC (SLURM)
Modify the slurm snakemake parameters in the file located at [GEP/SUBMIT_CONFIG/**slurm**/config.yaml](GEP/SUBMIT_CONFIG/slurm/config.yaml). You can set your desired `partition` and `qos` under `default-resources:`, as well as any snakemake parameters.
......@@ -318,8 +327,31 @@ Modify the slurm snakemake parameters in the file located at [GEP/SUBMIT_CONFIG/
snakemake --profile SUBMIT_CONFIG/slurm/
```
##### Running using singularity container (only on HPC)
GEP has been containerized using docker. Within this container, snakemake will install the necessary software stacks as contained conda environments (separate to those installed during *normal* execution). All you have to do is make sure singularity is installed (often installed by your respective HPC IT team), and in your PATH:
e.g.
```
module load Singularity
```
And executed by adding `--use-singularity` and `--singularity-args "--bind <path1>,<path2> --workdir <userTempDirectory>" --contain`
Where the `--bind` paths are any location/s in your system that you require singularity to be able to ***recursively*** access, commonly your /scratch/username/ directory where both your raw data and results are/will be stored. Multiple paths can be given separated by a comma.
The `--workdir` is the singularity tmp directory. Often on HPC systems this will be automatically set as `/tmp/...`, but this can lead to issues during initial installation of the container as this default `tmp` directory may be too small. Instead, provide a path to a new directory to be used as the `tmp` for snakemake + singularity.
The `--contain` flag ensures that the container is completely kept separate from the host systems files so that the packages being installed are not shared with those that already exist (or are in the host system's conda cache).
#### EXAMPLE SINGULARITY EXECUTION
```
snakemake --profile SUBMIT_CONFIG/slurm/ --use-singularity --singularity-args "--bind /scratch/james94,/scratch2/james94 --contain --workdir /scratch/james94/tmp"
```
### Run in background
For those not aware, you can run GEP with `nohup` and `&` added to the command (both local and slurm)
For those not aware, you can run GEP with `nohup` and `&` added to the command (both local and slurm, and with or without singularity). `nohup` will run the snakemake command in a way that cannot be interrupted when you lose connection to the server/cluster. The trailing `&` will simply run the command in the background of your current terminal, allowing you to freely use the terminal instance in the meantime.
```
nohup snakemake --profile SUBMIT_CONFIG/slurm/ &
```
......
......@@ -4,8 +4,8 @@ channels:
- bioconda
- anaconda
dependencies:
- snakemake==7.3.8
- beautifulsoup4=4.9
- pandas
- numpy
- mamba
- snakemake==7.6.2
- beautifulsoup4==4.9.3
- pandas==1.4.2
- numpy==1.22.3
- python==3.10.4
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment