Skip to content
Snippets Groups Projects
Commit ef1ad307 authored by mdriller's avatar mdriller
Browse files

Update README.md

parent 4b62102a
Branches
No related tags found
No related merge requests found
# REAPRLong : A Tool to Scaffold and Quality Control genome assemblies using (low coverage) long reads
<p align="center">
<img src="figures/simple_workflow.pdf">
<img src="figures/simple_workflow.svg">
</p>
### Dependencies/Prerequisites:
......@@ -24,36 +24,11 @@ Otherwise the script can be run using python directly: "python main.py ..."
The help function can be accessed via: ./main.py -h|--help and provides a general overview of how to use and which parameters can be set when using the tool.
REAPRLong needs a genome assembly in fasta format, long reads (e.g. PacBio or ONT) in fastq or fasta format and a path, where output files will be generated, as mandatory input to run. Additional parameters can be set but the default values are tested and generally provide the best results.
REAPRLong can be used as follows:
\textbf{usage:} main.py [-h] -ge GENOME -fq FASTQ -out OUTDIR [-m MODE] [-t THREADS]
[-ml MINLINKS] [-mo MINOVERLAP] [-mi MINIDENT] [-it ITERATIONS]
[-s SIZE] [-fa]
<p align="center">
<img src="figures/helpfunction.png">
</p>
>>optional arguments:
-h, --help show this help message and exit
-ge GENOME, --genome GENOME
Contigs in fasta format
-fq FASTQ, --fastq FASTQ
Reads in fastq format
-out OUTDIR, --outdir OUTDIR
Directory for output files
-m MODE, --mode MODE Mode for different types of input data data. Options:
pb/ont [default=pb]
-t THREADS, --threads THREADS
Threads to use(during mapping for now...) [default=4]
-mo MINOVERLAP, --minoverlap MINOVERLAP
Minimum overlap needed for two contigs to be merged
[default=100]
-mi MINIDENT, --minident MINIDENT
Minimum identity in overlap needed for two contigs to
be merged [default=90.0]
-it ITERATIONS, --iterations ITERATIONS
maximal iterations performed for breaking set to 0 to
perfrom NO breaking
-s SIZE, --size SIZE Size in which the input reads will be split into
[default=500]
-fa, --fasta Enable if input reads are in fasta format
Example Usage:
./main.py -s 500 -ge /home/max/tests/genome.fasta -fq /home/max/tests/pacbioReads.fastq -out /home/max/tests/output -it 3
......@@ -74,6 +49,6 @@ REAPRLong generates multiple output files in the specified output directory.
10. misjoins\_it\*.txt - identified misjoins (within the genome compared to the reads). The \* is an integer value indicating the iteration of QC, starting with 0 which represents the original assembly. Every subsequent number relates to the adjusted\_contigs\_it\*.fa of the previous iteration.
### Workflow
<p align="center">
<img src="figures/workflow_noOptional.pdf">
<img src="figures/workflow_noOptional.svg">
</p>
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment