Synapse Python Client documentation

Overview

The synapseclient package provides an interface to Synapse, a collaborative workspace for reproducible data intensive research projects, providing support for:

  • integrated presentation of data, code and text
  • fine grained access control
  • provenance tracking

The synapseclient package lets you communicate with the cloud-hosted Synapse service to access data and create shared data analysis projects from within Python scripts or at the interactive Python console. Other Synapse clients exist for R, Java, and the web. The Python client can also be used from the command line.

If you’re just getting started with Synapse, have a look at the Getting Started guides for Synapse and the Python client.

Good example projects are:

Installation

The synapseclient package is available from PyPI. It can be installed or upgraded with pip:

(sudo) pip install (--upgrade) synapseclient[pandas,pysftp]

The dependencies on pandas and pysftp are optional. The Synapse synapseclient.table feature integrates with Pandas. Support for sftp is required for users of SFTP file storage. Both require native libraries to be compiled or installed separately from prebuilt binaries.

Source code and development versions are available on Github. Installing from source:

git clone git://github.com/Sage-Bionetworks/synapsePythonClient.git
cd synapsePythonClient

You can stay on the master branch to get the latest stable release or check out the develop branch or a tagged revision:

git checkout <branch or tag>

Next, either install the package in the site-packages directory python setup.py install or python setup.py develop to make the installation follow the head without having to reinstall:

python setup.py <install or develop>

Connecting to Synapse

To use Synapse, you’ll need to register for an account. The Synapse website can authenticate using a Google account, but you’ll need to take the extra step of creating a Synapse password to use the programmatic clients.

Once that’s done, you’ll be able to load the library, create a Synapse object and login:

import synapseclient
syn = synapseclient.Synapse()

syn.login('me@nowhere.com', 'secret')

For more information, see:

Imports

Several components of the synapseclient can be imported as needed:

from synapseclient import Activity
from synapseclient import Entity, Project, Folder, File, Link
from synapseclient import Evaluation, Submission, SubmissionStatus
from synapseclient import Wiki

Accessing Data

Synapse identifiers are used to refer to projects and data which are represented by synapseclient.entity objects. For example, the entity syn1899498 represents a tab-delimited file containing a 100 by 4 matrix. Getting the entity retrieves an object that holds metadata describing the matrix, and also downloads the file to a local cache:

entity = syn.get('syn1899498')

View the entity’s metadata in the Python console:

print(entity)

This is one simple way to read in a small matrix:

rows = []
with open(entity.path) as f:
    header = f.readline().split('\t')
    for line in f:
        row = [float(x) for x in line.split('\t')]
        rows.append(row)

View the entity in the browser:

syn.onweb('syn1899498')

Organizing data in a Project

You can create your own projects and upload your own data sets. Synapse stores entities in a hierarchical or tree structure. Projects are at the top level and must be uniquely named:

import synapseclient
from synapseclient import Project, Folder, File, Link

project = Project('My uniquely named project')
project = syn.store(project)

Creating a folder:

data_folder = Folder('Data', parent=project)
data_folder = syn.store(data_folder)

Adding files to the project:

test_entity = File('/path/to/data/file.xyz', description='Fancy new data', parent=data_folder)
test_entity = syn.store(test_entity)

In addition to simple data storage, Synapse entities can be annotated with key/value metadata, described in markdown documents (wikis), and linked together in provenance graphs to create a reproducible record of a data analysis pipeline.

See also:

Annotating Synapse entities

Annotations are arbitrary metadata attached to Synapse entities, for example:

test_entity.genome_assembly = "hg19"

See:

Provenance

Synapse provides tools for tracking ‘provenance’, or the transformation of raw data into processed results, by linking derived data objects to source data and the code used to perform the transformation.

See:

Tables

Tables can be built up by adding sets of rows that follow a user-defined schema and queried using a SQL-like syntax.

See:

Wikis

Wiki pages can be attached to an Synapse entity (i.e. project, folder, file, etc). Text and graphics can be composed in markdown and rendered in the web view of the object.

See:

Evaluations

An evaluation is a Synapse construct useful for building processing pipelines and for scoring predictive modelling and data analysis challenges.

See:

Querying

Synapse supports a SQL-like query language:

results = syn.query('SELECT id, name FROM entity WHERE parentId=="syn1899495"')

for result in results['results']:
    print(result['entity.id'], result['entity.name'])

Querying for my projects. Finding projects owned by the current user:

profile = syn.getUserProfile()
results = syn.query('SELECT id, name FROM project WHERE project.createdByPrincipalId==%s' % profile['ownerId'])

for result in results['results']:
    print(result['project.id'], result['project.name'])

See:

Access control

By default, data sets in Synapse are private to your user account, but they can easily be shared with specific users, groups, or the public.

TODO: finish this once there is a reasonable way to find principalIds.

See:

Accessing the API directly

These methods enable access to the Synapse REST(ish) API taking care of details like endpoints and authentication. See the REST API documentation.

See:

Synapse utilites

There is a companion module called synapseutils that provide higher level functionality such as recursive copying of content, syncing with Synapse and additional query functionality.

See: - synapseutils

More information

For more information see the Synapse User Guide. These API docs are browsable online at http://docs.synapse.org/python/.

Getting updates

To get information about new versions of the client including development versions see synapseclient.check_for_updates() and synapseclient.release_notes().