Integrating Synapse in your RNA-Seq workflow

Goals

The goal is to demonstrate how to use Synapse in a RNA-Seq workflow to manage files and track processing steps.

Getting Raw Data

The first step is to download the data onto the computer where you will be processing it.

The data used in this example is a small RNA-Seq dataset for adrenal and brain tissue generated by the https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-513/ project. A sub-sampled raw dataset has been stored in a public Synapse project.

Here, we access two fastq files and a small region of chr19 (300000-350000 Mb) of the hg19 reference genome directly by their Synapse identifiers and download them to our local computer.

# Get brain fastq file
synapse get syn2468554

# Get adrenal fastq file
synapse get syn2468552

Map the Raw Reads

Now that we have the data, you can then use the alignment tool of choice to map these reads. We will use STAR to map the reads

Setting Up the Local Environment

mkdir demo-rnaseq-workflow

#dir to store the STAR genome index for the reference genome
mkdir ref-genome

Creating a Genome Index

# the reference genome
# downloads as hg19_chr19_subregion.fasta
synapse get --downloadLocation ref-genome/ syn2468557

#create a STAR genome index
STAR --runMode genomeGenerate --genomeDir ref_genome --genomeFastaFiles ref-genome/hg19\_chr19\_subregion.fasta

Map Adrenal and Brain Tissue Reads

star --runThreadN 1 --genomeDir ref-genome/ --outFileNamePrefix brain --outSAMunmapped Within --readFilesIn brain.fastq

star --runThreadN 1 --genomeDir ref-genome/ --outFileNamePrefix adrenal --outSAMunmapped Within --readFilesIn adrenal.fastq

Let’s store the results and provenance in Synapse.

# create a project
synapse create Project --name demo-rnaseq-workflow

# Use Synapse ID reported from the above command to use as the parent ID
synapse store brain.sam --parentId syn234567890 --used brain.fastq ref-genome/hg19\_chr19\_subregion.fasta

synapse store adrenal.sam --parentId syn234567890 --used adrenal.fastq ref-genome/hg19\_chr19\_subregion.fasta

Need More Help?

Try posting a question to our Forum.

Let us know what was unclear or what has not been covered. Reader feedback is key to making the documentation better, so please let us know or open an issue in our Github repository (Sage-Bionetworks/synapseDocs).