Don’t let your unmapped reads go to waste
ROP is a computational protocol aimed to discover the source of all reads, which originate from complex RNA molecules, recombinant B and T cell receptors and microbial communities. The ROP accounts for 98.8% of all reads across poly(A) and ribo-depletion protocols, compared to 83.8% by conventional reference-based protocols. ROP profiles repeats, circRNAs, gene fusions, trans-splicing events, recombined B and T cell receptor sequences and microbial communities. The ‘dumpster diving’ profile of unmapped reads output by our method is not limited to RNA-Seq technology and may be applied to whole-genome and whole-exome sequencing.
In general case, you need to map the reads with any of available high-throughput aligners (e.g. STAR, tophat2) and save unmapped reads in .bam (binary format, requires less space) or .fastq (text format). The instructions how to map the reads and save the unmapped reads are provided in the ROP tutorial.
ROP protocol consists of two (optional) modules to categorize the mapped reads:
- We developed gprofile, a tool to categorize the mapped reads into genomic categories (CDS, UTR, intons, etc).
- We developed rprofile, a tool to profile repetitive elements (e.g. SINEs, LINEs, LTRs).
ROP protocol consists of six steps to characterize the unmapped reads:
- Quality control. Exclude low-quality, low-complexity and rRNA reads (in-house code, SEQLEAN, Megablast )
- Identify lost human reads, which are missed (not mapped) due to the heuristics implemented for computational speed in conventional aligners. These include reads with mismatches and short gaps relative to the reference set, but can also include perfectly matched reads (Bowtie2, previously Megablast)
- Identify lost repeat sequences, by mapping unmapped reads onto the database of repeat sequences (Megablast )
- Identify ‘non-co-linear’ RNAs reads from circRNAs, gene fusions, and trans-splicing events, which combine sequences from distant elements (CIRI)
- Identify reads from recombinations of B and T cell receptors i.e. V(D)J recombinations (ImReP, previously IgBLAST)
- Profile taxonomic composition of microbial communities using the microbial reads mapped onto the microbial genomes and marker genes (Megablast, MetaPhlAn)
ROP Tutorial is available here
- Mangul, Serghei, et al. “Dumpster diving in RNA-sequencing to find the source of every last read.” bioRxiv (2016): 053041.
- Mangul, Serghei, et al. “Total RNA Sequencing reveals microbial communities in human blood and disease specific effects.” bioRxiv (2016): 057570.
- Mangul, Serghei and Mandric Igor, et al., “Profiling adaptive immune repertoires across multiple human tissues by RNA Sequencing“, bioRxiv (2016)
- Ensembl GRCh37/hg19
- Human ribosomal DNA complete repeating unit
- Repeat elements (RepBase20.07)
- V(D)J genes of B and T cell receptors
- Bacterial genomes
- Viral genomes
- Genomes of eukaryotic pathogens
- Repeat annotations
Third Party software
ROP is distributed with several open source components that were developed by other groups. These components are (c) their respective developers and are redistributed with ROP to provide ease-of-use. Please see the list of tools (not exhaustive) for licensing details:
This software was developed by Serghei Mangul, Harry Yang (Taegyun), Kevin Hsieh, and Linus Chen. If you have any questions, please send an email to Serghei Mangul (email@example.com). To report a bug please use github.
ROP 1.0.7 release 02/16/2017
ROP 1.0.7 is available for download at https://sourceforge.net/projects/rop2/files/1.0.7/ . This is a new release with the following fixes and changes:
- ROP now accepts bam file with mapped and unmapped reads to profile immune repertoires. Please use rop-imrep.sh in case reads were mapped to hg19 (GRCh37) or rop-imrepGRCh38.sh in case reads were mapped to GRCh38. rop-imrep will extract reads mapped to BCR and TCR loci and will merge those with unmapped reads to perform immune profiling. More details are here.
- We have added ggprofilePlus.py to randomly assign multi-mapped reads considering estimated gene expression levels. This is implemented in a new module gprofilePlus.py. Prior to ggprofilePlus.py, one need to run gprofile.py with –m option. More details are here.
ROP 1.0.6 release 01/10/2017
ROP 1.0.6 is available for download at https://sourceforge.net/projects/rop2/files/1.0.6/, thanks for contributions from Kevin Hsieh and Linus Chen. This is a new release with the following fixes and changes:
- ROP code was reorganized and split into three modules and functionalized. The modules include rop.py, rop_globals.py, rop_functions.py.
- Several bugs were fixed, including running out of reads crash, problems running specific steps only.
- We have switched from IgBLAST to ImReP to profile B and T cell receptor repertoires. ImReP (developed by Igor Mandric and Serghei Mangul) is able to quantify individual immune response based on a recombination landscape of genes encoding B and T cell receptors (BCR and TCR). ImreP shows superior accuracy compared to existing tools (see our manuscript “Profiling adaptive immune repertoires across multiple human tissues by RNA Sequencing” available at bioRxiv).
- Detailed instructions on how to submit qsub job array have been added to ROP Tutorial. Thanks to William Van. Instructions can be accessed here.
- ROP tutorial has been updated to reflect described changes
- Note: We are currently testing ImReP on mouse data. In case you planning to use ROP-ImReP for mouse data, consider contacting us (firstname.lastname@example.org). We are planning to have a new version of ROP-ImReP working for the mouse data in the next ROP release.
ROP 1.0.5 release 09/05/2016
- ROP can now be applied for mouse data. Thanks for contribution from Linus Chen and Kevin Hsieh
- gprofile can now be applied for mouse data
- the —-perCategory option (for gprofile) has been added to report assignment of every mapped read . It will also report the genomic profile per gene
- fix several bugs in the Step 1 (Quality Control)
- code improvements thanks to contribution from Linus Chen and Kevin Hsieh
- summary statistics for lost repeat elements has been added thanks for contribution from by Kevin Wesel. Currently it is available as a standalone application. We are planning to ingrate this functionality in the ROP in the next releases.
- we developed Microbial Scanner, a computational tool for identification microbes in the sequencing sample (including viruses, bacteria, and eukaryotic pathogens) based on the mapping of the unmapped reads onto the microbial genomes thanks for contribution from by Jeremy Rotman. The tool is available here
- instructions how to perform the functional profiling of candidate microbial reads using HUMAnN2 have been added, thanks for contribution from William Van Der Wey
- ROP tutorial has been improved and new pages has been added
ROP 1.0.4 release 07/23/2016
- ROP now is using custom script to identify low quality reads. Low quality reads are defined as reads with 75% of the bases with quality scores lower than 20
- the –-f option (force option) has been added to overwrite the content of analysis directory. Please use with caution
- details about the input ROP accepts has been added to the ROP tutorial. Please refer to ROP input details
- the gzip option has added to accept the input in
.gzformat. For example one can provide unmapped reads in
- ROP is distributed as
rop.tar.gz. Please refer to How to install ROP?
ROP 1.0.3 release 06/29/2016
ROP 1.0.3 is available for download here. This is a new release with the following fixes and changes:
- We switched to SourceForge to store official releases of ROP . The code at github is in permanent development and should be used for development only. Consider contacting email@example.com
- The possibility to download the reference database (refDB) of your choice has been added. This allows to save the time of downlaod and space in case you are interested in a particular step of ROP. Additionally it allows to connect new version of ROP to an existing refDB using –link2db option. Please use getDB.py (previously referred as instalation.py). More details are here
- The list of the tools, parameters and reference databases used by the ROP is now saved to a log file. Please check tools.log under the analysis directory.
- the –maui option allows to submit jobs via Maui scheduler, a job scheduler developed by Adaptive Computing
- Fix a bug that caused ROP to fail step 5a ( profiling of IGK : immunoglobulin kappa locus)
- Fix a bug that caused log file not to be displayed until ROP is finished. Now it shows the progress as ROP is running
- Output format of IgBlast has been modified to include the sequence of the read mapped to a gene segment (V, D or J) of BCR or TCR loci.
- Broken symbolic links were causing problem with the installation for EasyBuild. The problem has been fixed.
- The tab delimited output files were renamed to be .tsv (.csv in the previous version)
ROP 1.0.2 release 05/30/2016
ROP 1.0.2 is available for download here. ROP 1.0.2 is a maintenance release with the minor changes.
ROP 1.0.1 release 05/16/2016
- the –immune option allows to run the BCR/TCR profiling only
- the –microbiome option allows to run the microbiome profiling only
- the –repeat option now allows to run the lost repeat profiling only
- the –circRNA option allows to run the circular RNA profiling only
- the –metaphlan option allows to run Metaphlan2 only to obtain
taxonomic profile of microbial communities
Other changes include:
- the –b option now allows to accept the input (unmapped reads) in the .bam format
- the -z/–fastqGz option now allows to accept the input (unmapped reads) in the compress format (.fasta.gz)
- the –gzip option now allows to compress (gzip) fasta file produced after filtering the reads from (step 1) QC and (step 2) Remapping to human references (lost human reads)
- the –skipQC option now allows to skip the QC (filtering low-quality, low-complexity and rRNA reads)
- toy example with mapped reads (chr22 only) is now distributed with ROP. Toy example (mapped reads) is designed to help get you familiar with genomic profiling (gprofile.py) and repeat profiling (rprofile.py)
ROP 1.0 release 05/12/2016
- installation script was added, allowing to download the reference databases used by ROP
- ROP Tutorial has beed added and is available here
- toy example with 2508 unmapped reads is now distributed with ROP. Toy example is designed to help get familiar with ROP
- ROP now is using Bowtie2 to identify lost human reads
- ROP now is using CIRI to identify circular RNAs
- option to save the genomic category of each read was added to gpofile (genomic profile of RNA-Seq)
- functionality to map reads onto the database of eukaryotic pathogens was added
ROP 0.2 release 04/10/2016
This is a new release with the following changes:
- Genomic profile module has beed added
- Repeat profile module has been added
ROP 0.1 release 04/01/2016
The first public release of ROP. Because this is the first release, the manual is very limited. Only the basic options have been described, but we plan to update it frequently. If you have any questions about how ROP works, please contact Serghei Mangul (firstname.lastname@example.org).