This getting started is for non-technical users who are interested in learning about Synapse. By following this getting started, you’ll learn fundamental Synapse features by performing some simple tasks. You’ll learn how to:
Synapse is an open source software platform that data scientists can use to carry out, track, and communicate their research in real time. Synapse enables co-location of scientific content (data, code, results) and narrative descriptions of that work. Synapse has seeded a growing number of living research projects and resources including Sage/DREAM Challenges.
With Synapse, you can:
Synapse was created to encourage open science initiatives to advance our understanding of human health. Sage Bionetworks provides Synapse services free of charge to the scientific community through generous support from the National Cancer Institute (NCI), the Washington State Life Science Development Fund (LSDF), and the National Heart, Lung, and Blood Institute (NIH NHLBI).
Synapse operates under a complete governance process that includes well-documented Terms and Conditions of Use, guidelines and operating procedures, privacy enhancing technologies, as well as the right of audit and external reviews.
Synapse is built on a number of RESTful web APIs that allow users to interact with the system via a number of clients. One of these clients is the web client, i.e. the website www.synapse.org. Synapse also provides three programmatic clients (R, Python, and Command Line). Content can be uploaded, downloaded, annotated, and queried from any of these interfaces. In the getting started guide we will run through examples using all three programmatic interfaces. At any point you can pick the language you would like to see examples in by clicking the corresponding tab at the bottom of every example. Unless otherwise noted the examples are can be typed into the respective environment. That is a shell prompt for the command line examples, a Python session such as an ipython notbook of script, and an R session for the R examples.
Synapse credentials are required to use the programmatic clients. Register to create an account, and even if you login with a Google account, make sure you go through the extra step of creating a Synapse username and password.
At the command line you can login by specifying your Synapse username and password.
The login credentials can be specified for every Synapse client session, but this is not the recommended practice as your password will be visible. Instead, by passing the
rememberMe parameter you can cache your credentials for use in future Synapse client sessions.
To login with your username/email and password:
You can store your credentials in your home directory in a file called
.synapseConfig (note the period at the beginning of the file which makes this a hidden, system file on Linux-like OS’s. The format is as such:
[authentication] username: firstname.lastname@example.org password: secret
Anyone can browse public content in Synapse but in order to download and create content you will need to register for an account:
As Synapse can store human subject data that has sharing and use restrictions, you will also need to become certified and take a quiz about what kinds of items can be shared in Synapse. To start this process:
Explore our accounts, certification and profile validation page to find out more information on the different levels of users.
Now that you have your Synapse account you can start adding content. All Synapse content is organized according to user-created
Projects. Select a unique name for your
Project, such as “My uniquely named project”, and create your
Projects are an organizational unit in which you can collaboratively access and share
Files (a distributed file system to store data, code, and results from your work), and
Tables (web-accessible, sharable, and queryable data where columns can have a user-specified structured schema). Each
Project also contains a project-specific
By default, your newly created
Project is private; you are the only person who can access it and any content you include in it. To invite others to view or edit your
Project, click on the Share icon in the upper right hand portion of the screen. For more information on Sharing, please see the Content Controls article.
As an exercise we are going to create an example
Project to store some cell line analysis.
Project names must be unique in Synapse, let me suggest a project name for you: Foo
By default, your newly created
Project is private; you are the only person who can access it and any content you include in it. Later on we will share your created
Project with other users.
Projects created in Synapse are assigned unique identifiers which are used for unique reference (a Synapse ID) with the format
syn12345678. For example, your newly created
Project will be assigned a Synapse ID.
You can view what you have created in Synapse on the web with:
Wiki tab in a
Project provides a space for you to build narrative content to describe your research. These
Wikis can also be nested as subpages to build up a hierarchy of content within your
Project as well as be attached to specific
Folders in your
Project. Examples of content that you may want to include are project descriptions, specific aims, progress updates of data generation or analysis, analysis results (either in prose or via markdown-based notebooks such as knitr or IPython notebook), or web-accessible publication-like summaries of your research.
Wiki pages can contain highly customized content including, but not limited to images, tables, code blocks, LaTeX formatted equations, and scholarly references. Synapse-specific widgets also allow users to embed dynamic content based on other resources stored in Synapse (e.g., Entity List, User/Team badge, Query Table, or Provenance Graph).
Here we will create a small
Files tab houses a remote file system that you can utilize to share your project’s data, code, results, and any other information pertinent to your research. Unlike the file system on your local computer, Synapse
Folders are identified by a unique identifier, are versioned, and can be linked to one another using the Synapse
Provenance services. These
Folders, like all Synapse content, can be accessed either through the web or through one of our analytical clients using their unique Synapse ID.
Folders are used just as folders are on a local file system – to organize and segment content.
Folders can also contain (or be parents of) any number of other
To add a
Files are also much like files on a local file system – except they are web-accessible to anyone who has access, can be richly annotated (and queried on), can be embedded as links or images within a Synapse
Wiki, and can be associated with a DOI.
Files carry the Conditions for Use of the
Folder they are placed into in addition to additional specific Conditions for Use they have on their own.
Lets upload a local file
data/cell_lines_raw_data.csv into this newly created
Folder. To follow along you can pick any file you have and replace the name with your chosen file. We will also attach some annotations to this file describing the content of the file. In the example, we will associate the key
foo with the value
bar along with two numerical annotations.
Folders is controlled by the Sharing setting that you select for your project. You may also set individual Sharing settings for specific
Folders within a
Synapse provides advanced capabilities for formally tracking the relationship between digital assets (e.g. data, code, analytical results) stored within the system through the Synapse provenance system in order to aide in disseminating their work in ways that others can reproduce and reuse. The Synapse provenance system allows users to formally track their analysis history by aiding in the communication and sharing of a sequence of processing steps. Provenance relationships can, for example, be specified between raw data, analysis code and results that occur in a complex processing pipeline, regardless of where it is run. Synapse’s web services for managing provenance expose a very general data model based on the W3C Prov spec. Central to the design, users are not required to use a particular execution environment or workflow tool. Instead, provenance can be specified by inserting calls to the Synapse web service layer into their normal workflows to record activity; pipelines may be created through simple scripting or by using workflow execution engines. The provenance system allows users to branch off workflows at any point in prior analyses, while maintaining detailed records of data, code, and environment versions needed to reproduce the work.
Provenance is easiest specified when you are uploading or editing a file in Synapse. To specify the provenance you specify the files used as input and any files that were executed to generate the
File. Both used and executed can be either an arbitrary URL such as a reference to a code commit on github, a file stored on an ftp site or references to items in Synapse. Here we are going to add a figure to Synapse and indicate that the code https://github.com/Sage-Bionetworks/synapseTutorials was used to generate the figure from the data in the file
data/cell_lines_raw_data.csv that we uploaded previously
Find additional information and tutorials through our User Guide.
Try posting a question to our Forum.
Let us know what was unclear or what has not been covered. Reader feedback is key to making the documentation better, so please let us know or open an issue in our Github repository (Sage-Bionetworks/synapseDocs).