This getting started guide is for new users who are interested in learning about Synapse. By following this guide, you will learn fundamental Synapse features by performing some simple tasks. You’ll learn how to:
Synapse is an open source software platform that data scientists can use to carry out, track, and communicate their research in real time. Synapse enables co-location of scientific content (data, code, results) and narrative descriptions of that work. Synapse has seeded a growing number of living research projects and resources including Sage/DREAM Challenges.
Sage Bionetworks provides Synapse services free of charge to the scientific community through generous support from various funding sources. Synapse hosts many living research projects and resources including Sage/DREAM Challenges.
Synapse operates under a complete governance process that includes well-documented Terms and Conditions of Use, guidelines and operating procedures, privacy enhancing technologies, as well as the right of audit and external reviews.
Anyone can browse public content on the Synapse web site, but in order to download and create content you will need to register for an account using an email address. You will receive an email message for verification to complete the registration process.
Since Synapse can store human subject data that has sharing and use restrictions, before you can create content you will need to become a Certified User and take a quiz about what kinds of items can be shared in Synapse. To start this process:
Explore our accounts, certification and profile validation page to find out more information on the different levels of users.
Synapse is built on a number of RESTful web APIs that allow users to interact with the system via a number of clients. One of these clients is the web client, i.e. the website www.synapse.org. Synapse also provides three programmatic clients (R, Python, and Command Line). Content can be uploaded, downloaded, annotated, and queried from any of these interfaces. In the getting started guide we will run through examples using all three programmatic interfaces. At any point you can pick the language you would like to see examples in by clicking the corresponding tab at the bottom of every example. Unless otherwise noted the examples are can be typed into the respective environment. That is a shell prompt for the command line examples, a Python session such as an ipython notbook of script, and an R session for the R examples.
Synapse credentials are required to use the programmatic clients. Register to create an account, and even if you login with a Google account, make sure you go through the extra step of creating a Synapse username and password.
At the command line you can login by specifying your Synapse username and password.
The login credentials can be specified for every Synapse client session, but this is not the recommended practice as your password will be visible. Instead, by passing the
rememberMe parameter you can cache your credentials for use in future Synapse client sessions.
To login with your username/email and password:
You can store your credentials in your home directory in a file called
.synapseConfig (note the period at the beginning of the file which makes this a hidden, system file on Linux-like OS’s. The format is as such:
[authentication] username: email@example.com password: secret
Once you have a Synapse account and have installed a client (or navigated to the Synapse website) you can start adding content. All Synapse content is organized in user-created
Projects, an organizational unit in which you can collaboratively access and share
Files (a distributed file system to store data, code, and results from your work), and
Tables (web-accessible, sharable, and queryable data where columns can have a user-specified structured schema). Each
Project also contains a project-specific
By default, your newly created
Project is private; you are the only person who can access it and any content you include in it. To invite others to view or edit your
Project, click on the Share icon in the upper right hand portion of the screen. For more information on Sharing, please see the Content Controls article.
As an exercise we are going to create an example
Project to store some cell line analysis.
Decide on a unique name for your
Project names must be unique in Synapse, let me suggest a project name for you:
Use this project name in the example scripts below.
By default, your newly created
Project is private; you are the only person who can access it and any content you include in it. Later on we will share your created
Project with other users.
Projects created in Synapse are assigned unique identifiers which are used for unique reference (a Synapse ID) with the format
syn12345678. For example, your newly created
Project will be assigned a Synapse ID.
You can view what you have created in Synapse on the web with:
Wiki tab in a
Project provides a space for you to build narrative content to describe your research. These
Wikis can also be nested as subpages to build up a hierarchy of content within your
Project as well as be attached to specific
Folders in your
Project. Examples of content that you may want to include are project descriptions, specific aims, progress updates of data generation or analysis, analysis results (either in prose or via markdown-based notebooks such as knitr or IPython notebook), or web-accessible publication-like summaries of your research.
Wiki pages can contain highly customized content including, but not limited to images, tables, code blocks, LaTeX formatted equations, and scholarly references. Synapse-specific widgets also allow users to embed dynamic content based on other resources stored in Synapse (e.g., Entity List, User/Team badge, Query Table, or Provenance Graph).
See the Wiki user guide for more information and examples.
Here we will create a small
Files tab houses a remote file system that you can utilize to share your project’s data, code, results, and any other information pertinent to your research. Unlike the file system on your local computer, Synapse
Folders are identified by a unique identifier, are versioned, and can be linked to one another using the Synapse
Provenance services. These
Folders, like all Synapse content, can be accessed either through the web or through one of our analytical clients using their unique Synapse ID.
Folders are used just as folders are on a local file system – to organize and segment content.
Folders can also contain (or be parents of) any number of other
To add a
Files are also much like files on a local file system – except they are web-accessible to anyone who has access, can be richly annotated (and queried on), can be embedded as links or images within a Synapse
Wiki, and can be associated with a DOI.
Files carry the Conditions for Use of the
Folder they are placed into in addition to additional specific Conditions for Use they have on their own.
Let’s upload a local file
data/cell_lines_raw_data.csv into this newly created
Folder. To follow along you can pick any file you have and replace the name with your chosen file. We will also attach some annotations to this file describing the content of the file. In the example, we will associate the key
foo with the value
bar along with two numerical annotations.
Folders is controlled by the Sharing setting that you select for your project. You may also set individual Sharing settings for specific
Folders within a
Synapse provides advanced capabilities for formally tracking the relationship between digital assets (e.g. data, code, analytical results) through the Synapse provenance system. This aides in disseminating data and results in ways that other users can reproduce and reuse. Using provenance allows you to track your analysis history and communicate and share a sequence of processing steps. Provenance relationships can, for example, be specified between raw data, analysis code and results that occur in a complex processing pipeline, regardless of where it is run. The Synapse web services for managing provenance expose a general data model based on the W3C Prov spec. Central to the design, users are not required to use a particular execution environment or workflow tool. Instead, provenance can be specified by inserting calls to the Synapse web service layer into their normal workflows to record activity; pipelines may be created through simple scripting or by using workflow execution engines. The provenance system allows users to branch off workflows at any point in prior analyses, while maintaining detailed records of data, code, and environment versions needed to reproduce the work.
Provenance is most easily specified when you are uploading or editing a file in Synapse. To specify the provenance you specify the files used as input and any files that were executed to generate the
File. Both used and executed can be either an arbitrary URL such as a reference to a code commit on Github, a file stored on an FTP site or references to items in Synapse. Here we demonstrate how to add a figure to Synapse and indicate that code from https://github.com/Sage-Bionetworks/synapseTutorials was used to generate the figure from the data in the file
data/cell_lines_raw_data.csv that we uploaded previously:
Find additional information and tutorials through our User Guide.
Try posting a question to our Forum.
Let us know what was unclear or what has not been covered. Reader feedback is key to making the documentation better, so please let us know or open an issue in our Github repository (Sage-Bionetworks/synapseDocs).