Select Git revision
mdriller authored
Code owners
Assign users and groups as approvers for specific file changes. Learn more.
README.md 1.42 KiB
Stacks2R2SCOs
A series of scripts to help creating a catalog of reduced-representation single-copy ortholog sequences (R2SCOs) starting from Stacks2 output. Expected input includes results from the de novo Stack2 pipeline ran using overlapping paired-end reads.
The bash script stacks2R2SCOS.sh works as a wrapper for the workflow. Paths and variables need to be adjusted previously before running the script.
The workflow consists of 3 steps:
- Plotting of the stacks2 catalog length distribution & coverage per length needed to identify the size range to select for in step 2.
- inputs needed: gstacks output directory
- inputs needed: gstacks output directory
- Filtering of the catalog for a selected size distribution, filtering of loci with internal restriction sites & filtering of loci with coverage outside of coverage range per length. Coverage range: mean coverage+-3*standard deviation per length.
- inputs needed: restriction sites of enzymes used, min. and max. length of size distribution to select for (step 1)
- inputs needed: restriction sites of enzymes used, min. and max. length of size distribution to select for (step 1)
- Clustering of the filtered loci and selection of singleton loci.
- inputs needed: clustering threshold(Tintra), filtered catalog (step 2)
- inputs needed: clustering threshold(Tintra), filtered catalog (step 2)
Requirements:
- vsearch (installed globally, otherwise the path needs to be adjusted in the stacks2R2SCOS.sh wrapper)
- R with libraries ggplot2 and gridExtra
- python3 with libraries biopython and networkx