diff --git a/README.md b/README.md
index 17968257e6a223457c203b6edef4c2496be23e71..b3e1f0430c532442116fbb6206741ef973eb22d1 100644
--- a/README.md
+++ b/README.md
@@ -1,165 +1,92 @@
-# Genome Evaluation Pipeline
+# Genome Evaluation Pipeline (GEP)
 
-This pipeline allows users the ability to produce a wide range of commonly used evaluation metrics for genome assemblies, no matter your level of command-line experience/.
+* User-friendly and **all-in-one** **quality control and evaluation** pipeline for genome assemblies
 
-By harnessing the capabilities of snakemake, we present a workflow that incorporates a number of command-line tools and can be run on multiple independent genome assemblies in parallel.  A streamlined user-experience is paramount to the devlopment process of this pipeline, as we strive for three key user-oriented components:
+* Run **multiple genome evaluations** in one go (as many as you want!)
 
-1. **Automate**:  Beginning to End with just a few clicks!
- - Reduce user interaction significantly when compared to running the individual tools by themselves.  
+* Seamlessly **scaled to server, cluster, grid and cloud environments** 
 
-2. **Scalability**
- - Seamlessly scaled to server, cluster, grid and cloud environments without the need to modify the workflow definition.
+* Required **software** **stack** **automatically deployed** to any execution environment using **snakemake** and **conda**
 
-3. **Portability**
- - Workflows entail a description of required software, which will be automatically deployed to any execution environment.
 
-Snakemake will use conda to both install and manage our software packages and required tools. This helps to avoid software dependency conflicts, which will prevent the analysis from being simple to use and easily applied to different hardware. It also means that you, the user, do not have to be concerned with this at all - it is done for you!
 
-#######################################################################
 
-The software/tools used as part of our genome evaluation are as follows:
-
-####    Pre-Processing (least biased short-read dataset available):
-* Trimmomatic (*Bolger, A. M., Lohse, M., & Usadel, B. (2014)*. http://www.usadellab.org/cms/?page=trimmomatic)
-* Trim_galore (*Felix Krueger* bioinformatics.babraham.ac.uk)
-* Fastqc (*Simon Andrews* https://github.com/s-andrews/FastQC
-* Multiqc (*Ewels, P., Magnusson, M., Lundin, S., Käller, M. (2016)*. https://doi.org/10.1093/bioinformatics/btw354)
-
-#### Reference-free Genome Profiling
-* GenomeScope2 		(*Ranallo-Benavidez, T.R., Jaron, K.S. & Schatz, M.C. (2020)* https://github.com/tbenavi1/genomescope2.0)
-
-#### K-mer distribution (copy-number spectra) analysis
-* meryl 			(*Rhie, A., Walenz, B.P., Koren, S. et al. (2020)*. https://doi.org/10.1186/s13059-020-02134-9)
-* merqury		(*Rhie, A., Walenz, B.P., Koren, S. et al. (2020)*. https://doi.org/10.1186/s13059-020-02134-9)
-
-
-#### Assessing quality and annotation completeness with Benchmarking Universal Single-Copy Orthologs (BUSCOs)
-* BUSCOv4 			(*Seppey M., Manni M., Zdobnov E.M. (2019)* https://busco.ezlab.org/ )
-
-#### Scaffold/contig statistics: N# and L# stats, scaffold metrics, sequence counts, GC content, Estimated genome size
-* Python scripts (*Mike Trizna. assembly_stats 0.1.4 (Version 0.1.4). Zenodo. (2020)*.  http://doi.org/10.5281/zenodo.3968775 )
-
-#######################################################################
-# How to choose your illumina libraries
-
-Variations in sequencing methods/protocols can lead to an increase in bias in the corresponding raw sequencing libraries.  Sequencing a biological sample may often consist of both mate-pair/long-insert (e.g. insert sizes of 5k, 10k, 20k bp, etc.) and short-insert (e.g. insert-sizes 180, 250, 500, 800bp) paired-end libraries, respectively.  Usually you can deduce the insert sizes and library types from the metadata found within NCBI or and SRA archive.  In order to maintain a little bias as possible whilst maintaining decent coverage, you should ideally use only short-insert paired-end libraries for this evaluation pipeline.
-
-If your library/s was sequenced using 10x barcodes (10X Genomics), you should remove the first 25-30bp of the forward read (R1) only.  This will remove all barcode content.
-
-**Use trimmomatic**
-
-*Will be incorporated automatically shortly*
-
-
-
-# Using the pipeline
 
+# Getting Started
 
 **Step 1. Downloading the workflow**
 -
-First things first, we want to download/install this workflow.  The easiest way to do this would be to clone the git repository, provided you have the git command line tool installed (for instructions on how to do this: https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)
 
 To clone the repository, use the following command:
 ```
 git clone https://git.imp.fu-berlin.de/cmazzoni/GEP.git
 ```
 
-*Here give an illustration (tree) of the folder structure for the project*
-
+---
 
 **Step 2. Conda management**
 -
-If you already have conda installed on your system, please skip to step 3
+- Conda (v4.10.3)  *but may work on older versions*
 
-We will use a minimal version of conda - miniconda3, it has everything we need.
+If you already have conda installed on your system, please **skip to step 3**
 
-Please downlaod the linux Miniconda3 installer from the following URL: https://docs.conda.io/en/latest/miniconda.html
+Download the linux Miniconda3 installer from the following URL: https://docs.conda.io/en/latest/miniconda.html
 
 
-Run the miniconda3 installation using:
+Run the miniconda3 installation and check if it worked:
 
 ```
 bash /<your_path_to>/Miniconda3-latest-Linux-x86_64.sh
-```
-
-Hold enter for the licensing agreement to print in it's entirety and agree to is by typing yes then
-pressing ENTER once.
+##Follow miniconda3 installation instructions##
 
-Choose the location you wish to install miniconda3, or use the default-determined location by simply hitting ENTER.
+source ~/.bashrc
 
-It is a good idea to update conda to it's newest available version:
-```
 conda update conda
 ```
 
-If it says something like `conda command not found` please either close and re-open your terminal for conda installation to take effect, or run the command:
-
-On Linux
-```
-source ~/.bashrc
-```
-
+If  `conda command not found` please close and re-open your terminal for conda installation to take effect, and then update.
 
+---
 
 **Step 3. Creating our Snakemake conda environment**
 -
-Inside the main porject folder will be a file called `installGEP.yaml`
+The pipeline requires the following software to run:
+- snakemake (6.6.1+)
+- python (3.9.1+)
+- tabulate (0.8.7+)
+- beautifulsoup4 (4.9+)
+- mamba (0.15.2)
 
-i.e.
-`/<your_path_to>/GEP/installGEP.yaml`
+The easiest method to install this software stack is to create a GEP conda environment with the provided `installGEP.yaml` ***Note**
 
-If you have followed the process of installing miniconda3 correctly (along with closing and
-re-opening your terminal or sourcing your bash profile), installing your GEP environment is very simple.
 
-To create/install GEP environment:
-```
-conda env create -f installGEP.yaml
 ```
+conda env create -f /<your_path_to>/GEP/installGEP.yaml
 
-The environment should install on it's own, press ENTER if prompted to install the list of
-packages.
-
-Your environment is now created and installed - we want to activate it by running the command:
-```
 conda activate GEP
-```
-
-
-**Step 4. Modifying our configuration files**
--
-There are only two files (`config.yaml` and `samples.tsv`) that you as the user are required to *modify*. These are found in the `configuration` folder.
-
-Firstly, we will modify the `samples.tsv`, which consists of the paths to your data files.
 
-The data required for the workflow to work as intended are the actual genome assembly we wish to evaluation (in `.fasta`,`.fa`, or `fna` format), and the set of illumina short-insert paired-end raw sequencing libraries used to assemble the genome in question (in `.fastq`, or `fq`, and can be gzipped. e.g. `.fastq.gz`)
+##check snakemake installed correctly
 
-An example of this samples.tsv file is as follows:
-
-```
-assemblyName        Library_R1		                     Library_R2			       assembly_fasta		         EstimatedSize(bps)
-
-AssemblyX	     path/to/AssemblyX_R1_library1.fastq    path/to/AssemblyX_R2_library1.fastq	path/to/AssemblyX_genome.fasta	  2136623189
-AssemblyX	     path/to/AssemblyX_R1_library2.fastq    path/to/AssemblyX_R2_library2.fastq	path/to/AssemblyX_genome.fasta	  
-AssemblyY	     path/to/AssemblyY_R1_library1.fastq    path/to/AssemblyY_R2_library1.fastq	path/to/AssemblyY_genome.fasta
-AssemblyZ	     path/to/AssemblyZ_R1_library1.fastq    path/to/AssemblyZ_R2_library1.fastq	path/to/AssemblyZ_genome.fasta	  1983101299
-AssemblyZ	     path/to/AssemblyZ_R1_library2.fastq    path/to/AssemblyZ_R2_library2.fastq	path/to/AssemblyZ_genome.fasta	  1983101299
+snakemake --version
 ```
+**Note** If you already have a snakemake installation and would like to avoid installing it again, ensure that the above software are all in your `PATH` and you have conda installed/activated.
 
-This is a tab (or just white-space) separated document where you will fill out the paths to the relevant files.  Usually you will not have more than a handful (or sometimes only one-pair) of raw illumina reads, so this document will most of the time be rather 'clean'.
-
+**Step 4. Running the pipeline**
+-
+***BELOW NOT COMPLETE***
 
-Column1= Assembly name
-This column you can put whatever you want to identify the results by, so if you are running analysis on Homo Sapiens, you could put the sample name as HomoSapiens and find your results within the `HomoSapiens` folder **
+GEP can be run in two modes:
+1. Create meryl database 
+     - Input: Sample sheet outlining either Illumina PE or PacBio HiFi reads
+     - Output: (`.meryl`) k-mer database  
+     
+2. Run evaluation
+     - Input: Sample sheet outlining k-mer database (`.meryl`) and corresponding assembly (`.fa/fasta/fna`)
 
-Column 2 and 3= paths to one pair of raw illumina libraryies. R1 (column2) and paired R2 (column3) (**Can be gzipped**)
 
-Column 4= path to the actual genome/assembly that you want to evaluate. (**NOTE: currently must be uncompressed; .fa, .fna, .fasta**) **
 
-Column 5= If you know the estimated/expected genome size of the species for which you wish to evaluate an assembly, you can provide it in this column.**
 
-**Important to note: As you can see in the above example file users can have multiple rows for the SAME sampleName and the SAME assembly.  This is often the case when you have multiple illumina libraries that we were used to build the assembly. In this case, the `assemblyName` (column 1) and `assembly_fasta` (column 4) remain the same, but the paths to the raw illumina libraries (column 2 and 3) will differ, as each row points to one *library* from a possible set of multiple libraries.
 
-For the estimated genome size (column 5), you can leave this blank as the pipeline will infer this from genomescope2.  Further, if you have multiple rows (libraries) for the same assembly, you can simple provide the estimated genome size for one of the rows (i.e. AssemblyX in the above example), leaving any other rows for the same assembly blank.  This is something I will add the ability to do in regards to column 1 and 4 as well.
 
 
 
@@ -237,6 +164,56 @@ For example, you can assign a total amount of memory (e.g. 100GB) allowed by the
 snakemake --cores 32 --use-conda --resources mem_mb=100000 && snakemake --report
 ```
 
+
+
+## INTRO
+This pipeline allows users the ability to produce a wide range of commonly used evaluation metrics for genome assemblies, no matter your level of command-line experience/.
+
+By harnessing the capabilities of snakemake, we present a workflow that incorporates a number of command-line tools and can be run on multiple independent genome assemblies in parallel.  A streamlined user-experience is paramount to the devlopment process of this pipeline, as we strive for three key user-oriented components:
+
+
+
+Snakemake will use conda to both install and manage our software packages and required tools. This helps to avoid software dependency conflicts, which will prevent the analysis from being simple to use and easily applied to different hardware. It also means that you, the user, do not have to be concerned with this at all - it is done for you!
+
+#######################################################################
+
+The software/tools used as part of our genome evaluation are as follows:
+
+####    Pre-Processing (least biased short-read dataset available):
+* Trimmomatic (*Bolger, A. M., Lohse, M., & Usadel, B. (2014)*. http://www.usadellab.org/cms/?page=trimmomatic)
+* Trim_galore (*Felix Krueger* bioinformatics.babraham.ac.uk)
+* Fastqc (*Simon Andrews* https://github.com/s-andrews/FastQC
+* Multiqc (*Ewels, P., Magnusson, M., Lundin, S., Käller, M. (2016)*. https://doi.org/10.1093/bioinformatics/btw354)
+
+#### Reference-free Genome Profiling
+* GenomeScope2 		(*Ranallo-Benavidez, T.R., Jaron, K.S. & Schatz, M.C. (2020)* https://github.com/tbenavi1/genomescope2.0)
+
+#### K-mer distribution (copy-number spectra) analysis
+* meryl 			(*Rhie, A., Walenz, B.P., Koren, S. et al. (2020)*. https://doi.org/10.1186/s13059-020-02134-9)
+* merqury		(*Rhie, A., Walenz, B.P., Koren, S. et al. (2020)*. https://doi.org/10.1186/s13059-020-02134-9)
+
+
+#### Assessing quality and annotation completeness with Benchmarking Universal Single-Copy Orthologs (BUSCOs)
+* BUSCOv4 			(*Seppey M., Manni M., Zdobnov E.M. (2019)* https://busco.ezlab.org/ )
+
+#### Scaffold/contig statistics: N# and L# stats, scaffold metrics, sequence counts, GC content, Estimated genome size
+* Python scripts (*Mike Trizna. assembly_stats 0.1.4 (Version 0.1.4). Zenodo. (2020)*.  http://doi.org/10.5281/zenodo.3968775 )
+
+#######################################################################
+# How to choose your illumina libraries
+
+Variations in sequencing methods/protocols can lead to an increase in bias in the corresponding raw sequencing libraries.  Sequencing a biological sample may often consist of both mate-pair/long-insert (e.g. insert sizes of 5k, 10k, 20k bp, etc.) and short-insert (e.g. insert-sizes 180, 250, 500, 800bp) paired-end libraries, respectively.  Usually you can deduce the insert sizes and library types from the metadata found within NCBI or and SRA archive.  In order to maintain a little bias as possible whilst maintaining decent coverage, you should ideally use only short-insert paired-end libraries for this evaluation pipeline.
+
+If your library/s was sequenced using 10x barcodes (10X Genomics), you should remove the first 25-30bp of the forward read (R1) only.  This will remove all barcode content.
+
+**Use trimmomatic**
+
+*Will be incorporated automatically shortly*
+
+
+
+
+
 # Reporting
 
 Instead of or as well as retrieving the result files directly from the locations specified in the Results section (Step 6), the `&& snakemake --report` argument used when running will create an interactive html report upon completion.  This .html document will consist of all the relevant key files among other things such as the Directed Acyclic Graph (DAG) that snakemake uses to drive the order of execution, run-times of each individual step, and more (work in progress)
@@ -288,3 +265,5 @@ The key result files are:
 There is a separately created folder within the main results directory (i.e. `/path/to/Results/allAssemblies_keyResults` )
 
 Within this folder you will find a combined aggregate file (`/path/to/Results/allAssemblies_keyResults/key_results.tsv`, a tsv that combines the aforementioned key values from each assembly evaluated, respectively, into one single file.  This is useful for plotting the key values across multiple assemblies.
+
+