From c287309bbbe526de7607efc7c7897755918eaf1a Mon Sep 17 00:00:00 2001 From: fisched99 <fisched99@mi.fu-berlin.de> Date: Wed, 14 Jun 2023 08:31:25 +0000 Subject: [PATCH] update readme --- README.md | 13 ++++--------- project2/README.md | 2 +- 2 files changed, 5 insertions(+), 10 deletions(-) diff --git a/README.md b/README.md index dcc3016..3512d56 100644 --- a/README.md +++ b/README.md @@ -2,13 +2,8 @@ ## Group 2 Project code for the masters course applied sequence analysis. -## Run pipeline -To run the pipeline first change into the project1 directory and then define either the `samples.tsv` containing the samplenames and paths to the different sequence files, or the directory containing the sequence files, either via the config file (`project1/config/config.yaml`) or as a command line argument using the respective flags `input_directory` and `samples`. You then also have to provide a reference genome file for the mapping rule: +## Project 1 +This project contains the basics of Snakemake and was part of the Snakemake tutorial of the course. -``` -snakemake --profile config/local --config samples=input.tsv ref=path/to/ref.fa -``` -or -``` -snakemake --profile config/local --config input_directory=path/to/sequence/files ref=path/to/ref.fa -``` \ No newline at end of file +## Project 2 +This project contains a workflow to process (viral) NGS data including quality control, trimming, decontamination, denovo assembly, assembly polishing, scaffolding, clustering of the assemblies and per base variance analysis. diff --git a/project2/README.md b/project2/README.md index 0e33337..a408e78 100644 --- a/project2/README.md +++ b/project2/README.md @@ -5,7 +5,7 @@ Project code for the masters course applied sequence analysis. ## Test data Test data (SARS-CoV2 sequencing data and human reference genome) can be found [here](https://box.fu-berlin.de/s/dt2d5MbwaxjfWtZ). -The SARS-CoV2 reference genome can be found in the resources directory in `ref.fasta`. +The SARS-CoV2 reference genomes (different variants for optimal reference selection in the scaffolding process) can be found under `resources/references` in the respective `.fasta` files. ## Run pipeline To run the pipeline define either the `samples.tsv` containing the samplenames and paths to the different sequence files, or the directory containing the sequence files, either via the config file (`project1/config/config.yaml`) or as a command line argument using the respective flags `input_directory` and `samples`. You then also have to provide a reference genome file of your target organism for the scaffolding rule and a kraken2 database for the screening (please provide a host sequence if you want to run the decontamination step): -- GitLab