Skip to the content.

GitHub license fair-software.eu GitHub stars GitHub issues

biotools:nf-rnaseqmetagen

nf-rnaSeqMetagen is a Nextflow

To use the nf-rnaSeqMetagen pipeline, the following are required:

  1. Software dependencies:
  2. RNA-seq data (paired-end for now - support for single-ended reads to follow)
  3. Reference genome (FASTA sequences) and its annotation file (GFT)


1. Obtaining the nf-rnaSeqMetagen Pipeline and Preparing Data

First, you need to clone the nf-rnaSeqMetagen repository onto you machine. You can either use git or nextflow. I recommend you use nextflow. The rest of this documentation assumes that you have used nextflow to clone this workflow.

nextflow pull https://github.com/phelelani/nf-rnaSeqMetagen

Content of the repository (located in $HOME/.nextflow/assets/phelelani/nf-rnaSeqCount):

To get the help menu for the workflow, execute the following command from anywherre on your system:

nextflow run nf-rnaSeqMetagen --help

1.1. Download test datasets (optional)

We will now download the reference genome (along with its annotation file) from Ensembl. We will also download the FASTQ files from the H3ABioNet site, which we will analyse using the nf-rnaSeqMetagen workflow. NB: Skip this section if you have your own data to analyse using this workflow! This section is only for getting data to practice using the nf-rnaSeqMetagen workflow!

Make directories:

mkdir example
cd example
mkdir reference
mkdir data

Download and decompress the mouse reference genome along with its annotation:

wget -c -O reference/genome.fa.gz ftp://ftp.ensembl.org/pub/release-68/fasta/mus_musculus/dna/Mus_musculus.GRCm38.68.dna.toplevel.fa.gz
wget -c -O reference/genes.gtf.gz ftp://ftp.ensembl.org/pub/release-68/gtf/mus_musculus/Mus_musculus.GRCm38.68.gtf.gz
gunzip reference/genome.fa.gz
gunzip reference/genes.gtf.gz

Download RNA-seq test dataset from H3ABioNet: script.

cd data
wget https://phelelani.github.io/nf-rnaSeqMetagen/examples/data/get_data.sh
sh get_data.sh
ls -l 
cd ..

1.2. Download the singularity containers (required to execute the pipeline):

nextflow run nf-rnaSeqMetagen -profile slurm --mode prep.Containers

1.3. Generating genome indexes.

To generate the STAR genome indexes, run the following commands:

nextflow run nf-rnaSeqMetagen -profile slurm --mode prep.GenomeIndexes --genome "$PWD/reference/genome.fa" --genes "$PWD/reference/genes.gtf"

1.4. Creating the Kraken2 database:

To create the Kraken2 database, run the following command:

nextflow run nf-rnaSeqMetagen -profile slurm --mode prep.KrakenDB --db $PWD/K2DB

We are now ready to execute the workflow!


2. Executing the Main nf-rnaSeqMetagen Pipeline

As seen on the help menu above, there are a couple of options that you can use with this workflow. It can become a bit tedious and confusing having to specify these commands everytime you have to execute the each section for the analysis. To make your life easier, we will create a configuration script that we will use in this tutorial (we will pass this using the -c option of nextflow). You can name it whatever you want, but for now, lets call it myparams.config. We will add the mandatory arguements for now, but as you become more farmiliar with the workflow - you can experiment with other options. You can use your favourite text editor to create the myparams.config file. Copy and paste the the parameters below:

params {
    data   = $PWD/data
    out    = $PWD/myresults
    genome = $PWD/reference/genome.fa
    genes  = $PWD/reference/gene.gtf
    db     = $PWD/K2DB
}

To perform filtering of host reads and classification of exogeneous reads, use this command:

nextflow run nf-rnaSeqMetagen -profile slurm --mode run.FilterClassify -c myparams.config

3. Exploring nf-rnaSeqMetagen Results

- [1] Sample analysis directories  =>    `<output_directory>/<sample_1> .. <sample_N>`
- [2] MultiQC                      =>    `<output_directory>/MultiQC`
- [3] Upset tool                   =>    `<output_directory>/upset`
- [4] Workflow tracing             =>    `<output_directory>/workflow-tracing

3.1. MultiQC

View the full MultiQC report here.

3.2. Sample analysis directories

3.2.1. Krona report: raw reads (SRR5074528)

View full Krona chart for raw reads here.

3.2.2 Krona report: assembled reads (SRR5074528)

View full Krona chart for assembled reads here.

3.3. UpSet visualisation tool

View full UpSet plot here.

3.4. Workflow tracing

3.4.1. Report

View full Nextflow report here.

3.4.2. Timeline

View full timeline report here.