AppliedSequenceAnalysis
Group 2
Project code for the masters course applied sequence analysis.
Test data
Test data (SARS-CoV2 sequencing data and human reference genome) can be found here.
The SARS-CoV2 reference genomes (different variants for optimal reference selection in the scaffolding process) can be found under resources/references
in the respective .fasta
files.
Run pipeline
To run the pipeline define either the samples.tsv
containing the samplenames and paths to the different sequence files, or the directory containing the sequence files, either via the config file (project1/config/config.yaml
) or as a command line argument using the respective flags input_directory
and samples
. You then also have to provide a reference genome file of your target organism for the scaffolding rule and a kraken2 database for the screening (please provide a host sequence if you want to run the decontamination step):
snakemake --profile config/local --config samples=samples.tsv kraken2_db=k2_standard/ host_sequence=ncbi-genomes-2023-06-05/GCF_000001405.40_GRCh38.p14_genomic.fna ref=resources/ref.fasta