Skip to content
Snippets Groups Projects
Select Git revision
  • master
1 result

README.md

Blame
  • mdriller's avatar
    mdriller authored
    dbd8b950
    History
    Code owners
    Assign users and groups as approvers for specific file changes. Learn more.
    README.md 1.42 KiB

    Stacks2R2SCOs

    A series of scripts to help creating a catalog of reduced-representation single-copy ortholog sequences (R2SCOs) starting from Stacks2 output. Expected input includes results from the de novo Stack2 pipeline ran using overlapping paired-end reads.

    The bash script stacks2R2SCOS.sh works as a wrapper for the workflow. Paths and variables need to be adjusted previously before running the script.

    The workflow consists of 3 steps:

    1. Plotting of the stacks2 catalog length distribution & coverage per length needed to identify the size range to select for in step 2.
      • inputs needed: gstacks output directory

    2. Filtering of the catalog for a selected size distribution, filtering of loci with internal restriction sites & filtering of loci with coverage outside of coverage range per length. Coverage range: mean coverage+-3*standard deviation per length.
      • inputs needed: restriction sites of enzymes used, min. and max. length of size distribution to select for (step 1)

    3. Clustering of the filtered loci and selection of singleton loci.
      • inputs needed: clustering threshold(Tintra), filtered catalog (step 2)

    Requirements:

    • vsearch (installed globally, otherwise the path needs to be adjusted in the stacks2R2SCOS.sh wrapper)
    • R with libraries ggplot2 and gridExtra
    • python3 with libraries biopython and networkx