Tables

Tables

Synapse Tables is a feature designed to provide users the ability to create web-accessible, sharable, and queryable data where columns can have a user-specified structured schema. Users may define table columns to contain common primitive data types (text, numbers, dates, etc.) or references to other Synapse objects (e.g., Files). Therefore, it is possible to organize sample metadata into tables that include columns for desired sample fields, as well as columns that link the sample information to a heterogeneous collection of files. Tables may be queried and edited through both the Synapse web portal as well as through our analytical clients that enable direct access to these data from analysis pipeline code. Unlike most NoSQL systems, the data in Synapse Tables is strongly consistent, not eventually consistent. This is an important design consideration for scientific data processing, as analysis on eventually-consistent data sources can limit the types of analysis performed, and may require special coding strategies to ensure reasonable accuracy.

Table Schema

Synapse Tables contain metadata that specifies the types of columns included in the Table. These columns can be specified manually, or Synapse can recommend column types when a user uploads a data file.

Table Data

The data contained within a Synapse Table can be retrieved by using a SQL-like query language either through the web portal or through the analytical clients. See the API docs for an enumeration of the types of queries that can be performed. Here are a couple of simple examples.

To get all of the columns from a Table with id syn3079449, the following query would be used:

SELECT * FROM syn3079449

To get only the two columns called “age” and “gender”:

SELECT age, gender FROM syn3079449

To count the number of rows:

SELECT count(*) FROM syn3079449

To get all columns, but only rows where age is greater that 50:

SELECT * FROM syn3079449 WHERE age > 50

To get all columns, but only rows where age is greater that 50 - and sort by treatmentArm:

SELECT * FROM syn3079449 WHERE age > 50 ORDER BY "treatmentArm" ASC

Using Tables

A Schema is defined in terms of Column objects that specify types from the following choices:

<columnType>
STRING
DOUBLE
INTEGER
BOOLEAN
DATE
ENTITYID
FILEHANDLEID


For the following code examples, we’ll be using the Wondrous Research Project (syn1901847). The .csv file used in the example below can be downloaded here.

Creating a Table

1. Start with a dataframe, .csv, or .tsv:

import synapseclient
syn = synapseclient.login()

# Convert a .csv to a Pandas DataFrame (this requires that you have Pandas installed)
import pandas as pd
df = pd.read_csv('path/to/jazzAlbums.csv', index_col=False)


2. Create table schema: Each client has a utility function to create Columns from a data frame. Python uses synapseclient.as_table_columns and R uses as.tableColumns.

project = syn.get('syn1901847')
cols = synapseclient.as_table_columns(df)

schema = synapseclient.Schema(name='Jazz Albums', columns=cols, parent=project)


The Table() function takes two arguments, a schema object and data in some form, which can be:

  • a path to a CSV file
  • a dataframe
  • a RowSet object
  • a list of lists where each of the inner lists is a row


3. Store table in Synapse:

table = synapseclient.Table(schema, df)
table = syn.store(table)


4. Query the table: Python returns an iterator and R returns a data frame by default. To return a data frame in Python use Pandas results.asDataFrame().

results = syn.tableQuery('select * from syn7264701')
df = results.asDataFrame()


Making changes to tables

Once the schema is settled, changes can be made by adding, appending, and deleting.

When updating, begin by querying the table to ensure you have the latest schema and values.

Updating existing values

# Change the album value 'Vol. 2' to 'Volume 2' 
schema = syn.get(table.schema.id)
query = syn.tableQuery("select * from %s where album='Vol. 2'" %table.schema.id)
df = query.asDataFrame()
df['album'] = 'Volume 2'
syn.store(Table(schema, df))

# Alternatively, this can be done as:
query = syn.tableQuery("select * from syn7264701 where album='Vol. 2'")
df = query.asDataFrame()
df['album'] = 'Volume 2'
syn.store(Table(results.tableId, df))


Changing Columns

synapse logo Note: To be compatible across multiple languages the common practice of using dots (.) in column names in R is not supported in Synapse Tables.

Adding new columns

#Define a new column
new_column = syn.store(synapseclient.Column(name='purchased', columnType='STRING'))
#Add the new column to existing schema
schema.addColumn(new_column)
schema = syn.store(schema)
# Query for the newest schema
results = syn.tableQuery('select * from syn7264701')
df = results.asDataFrame()
# Add values into the new column
df['purchased'] = ['yes', 'yes', 'no', 'yes']
# Store the new table
syn.store(Table(schema, df))


Deleting columns

cols = syn.getTableColumns(schema)
for col in cols:
    if col.name == 'purchased':
        schema.removeColumn(col)
schema = syn.store(schema)


Modifying existing columns

# Renaming or otherwise modifying a column involves removing the column and adding a new column
cols = syn.getTableColumns(schema)
for col in cols:
    if col.name == 'purchased':
        schema.removeColumn(col)
new_column2 = syn.store(synapseclient.Column(name='sold', columnType='STRING'))
schema.addColumn(new_column2)
schema = syn.store(schema)


Changing Rows

Adding new rows

new_rows = [['Charles Mingus', 'Blues & Roots', 1960, 'SD 1305'],
            ['Eugen Cicero', 'Rokoko-Jazz', 1965, 'SB 15027']]
table = syn.store(synapseclient.Table(schema, new_rows))


Deleting rows

# Query for the rows you want to delete and call syn.delete on the results:
rowsToDelete = syn.tableQuery("select * from %s where artist='Sonny Rollins'" %table.schema.id)
a = syn.delete(rowsToDelete.asRowSet())
#or if you have already converted your query to data frame you can use:
schema = syn.get(table.schema.id)
tabledf = rowsToDelete.asDataFrame()
a = syn.delete(synapseclient.Table(schema,tabledf))


Modifying existing rows

results = syn.tableQuery("select * from %s where artist='Sonny Rollins'" %table.schema.id)
df = results.asDataFrame()
df['purchased'] = ['yes', 'yes']
table = syn.store(synapseclient.Table(schema, df))


Deleting the whole table

Delete the entire table

#Deleting the schema deletes the whole table and all rows
syn.delete(schema)


Working with Files in a Table

Table attached files

Synapse Tables support a special column type called File which contain a file handle, an identifier of a file stored in Synapse. Here’s an example of how to upload files into Synapse, associate them with a table and read them back later.

1. Add a new column for files in the table we’re currently working with

# Add a column for files
col = syn.store(Column(name='covers', columnType='FILEHANDLEID'))
schema.addColumn(col)
schema = syn.store(schema)


2. Retrieve the most current table and save as a data frame

# retrieve the most current table
results = syn.tableQuery('select * from syn7264701')
df = results.asDataFrame()


3. Upload the files: The files used in this example can be downloaded here. It is assumed that the files are in your current working directory.

## the actual data
files = ['./coltraneBlueTrain.jpg', './rollinsBN1558.jpg', 
		'./rollinsBN4001.jpg','./burrellWarholBN1543.jpg']

# Upload to filehandle service
files = [syn._uploadToFileHandleService(f) for f in files]

# get the filehandle ids
fileHandleIds = [i['id'] for i in files]

# assign the filehandle ids to the new column
df['covers'] = fileHandleIds

# store the new column
syn.store(Table(schema, df))


4. Query the table and download the album cover files

results = syn.tableQuery("select covers from %s where artist = 'Sonny Rollins'" % schema.id)
cover_files = syn.downloadTableColumns(results, ['cover'])

Table Facets

The faceted navigation on Tables (also known as simple search) can be used to simplify your search without having to use SQL-like queries. Simple search uses radio buttons and sliders to show all available facets in a menu to the left of the Table whereas advanced search employs a SQL-like query to filter the Table. To use table facets, navigate to a Table or a File View. For this example, we will be using the Synapse Table Demo found in the Wondrous Research Example. Simple and advanced search both allow you to query for features of interest in aTable using different methods.

Set facets

In order to use simple search, you must first set columns to be facets in the schema editor. Select Schema in the upper right of your table and click on Edit Schema. In the resulting pop-up, select Values or Range from the dropdowns under the Facet option. Values can be thought of as categories whereas Range is a date or number.

synapse logo Note: If you change Column Type in the schema, you have to set its facet selection again.

To see all the facets, click on Show simple search found above the SQL-query bar:

Select the features you are interested in to filter the table.

You can toggle from the simple search to the advanced search without losing the query results. For example, if in the simple search you had selected treatmentArm A, age of 23:64, and gender as female, the query will be preserved in the advanced search bar. However, this is unidirectional because the advance search allows parameters that are not available with facets. Therefore switching from advanced to simple search will result in resetting the search query.

synapse logo Note: The slider for range in simple search is inclusive.


synapse logo Warning: When toggling back to simple search, the query will be reset.

More on tables

There are additional docs available for Tables in both Python and R that cover more advanced topics.

For Python check out our Python Docs.

For R, open up your R session and check out our vignettes by typing vignette("tables", package="synapseClient") into your console.


See Also

Annotations and Queries, Downloading Data, Files and Versioning

Need More Help!

Try posting a question to our Forum.

Let us know what was unclear or what has not been covered. Reader feedback is key to making the documentation better, so please let us know or open an issue in our Github repository (Sage-Bionetworks/synapseDocs).

2016 Sage Bionetworks Contact us Creative Commons License