Skip to content
Snippets Groups Projects
Code owners
Assign users and groups as approvers for specific file changes. Learn more.

AppliedSequenceAnalysis

Group 2

Project code for the masters course applied sequence analysis.

Test data

Test data (SARS-CoV2 sequencing data and human reference genome) can be found here.

The SARS-CoV2 reference genomes (different variants for optimal reference selection in the scaffolding process) can be found under resources/references in the respective .fasta files.

Run pipeline

To run the pipeline define either the samples.tsv containing the samplenames and paths to the different sequence files, or the directory containing the sequence files, either via the config file (project1/config/config.yaml) or as a command line argument using the respective flags input_directory and samples. You then also have to provide a reference genome file of your target organism for the scaffolding rule and a kraken2 database for the screening (please provide a host sequence if you want to run the decontamination step):

snakemake --profile config/local --config samples=samples.tsv kraken2_db=k2_standard/ host_sequence=ncbi-genomes-2023-06-05/GCF_000001405.40_GRCh38.p14_genomic.fna ref=resources/ref.fasta