Annotations in Synapse are semi-structured metadata that can be added to Projects, Files, Folders, and Tables. Annotations can be based on an existing ontology (controlled vocabulary such as a disease Ontology (GO)), an agreed upon set of terms (e.g., describing the results of a sequencing pipeline), or be completely free form (like a tag system). These annotations can be used to systematically describe groups of entities, which provides a way to search and discover Synapse entities.
Synapse annotations are structured as key-value pairs: the key is the name of the annotation group while the value provides specifics. An example; let’s say you have a collection of alignment files in the BAM file format from an RNA sequencing experiment, each representing a sample and replicate.
As is common, much of this information may be encoded in the file name (e.g.,
However, this makes it difficult to find specific groups of files, such as all replicates of
Adding this information as Synapse annotations enables a more complete description of the contents of the
File and allows for discovery.
Continuing this example, the annotations you may want to add are:
assay = RNA-seq
fileType = bam
sample = 1
condition = A
All files you want to be able to search for should have a consistent set of annotations (ie, they should all have the same keys:
condition.) If these samples were part of a dataset with multiple assays, such as RNA-seq and ATAC-seq, you would annotate the file entities with either
assay = RNA-seq or
assay = ATAC-seq. Searching for the specific assay would therefore result in the assay specific files
Annotations can be one of four basic types
It is easiest to add annotations when initially uploading a file. This can be done using the command line client, the Python client, the R client, or from the Web. Using the analytical clients facilitates batch and automated population of annotations across many files. The Web client is useful when uploading a single file, or if a minor change needs to be made to annotations on a few files.
If you have not decided on the annotations to add yet, you can add and modify the annotations at a later time as well, and you can manipulate annotations that have already been uploaded.
Queries in Synapse look SQL-like:
SELECT * FROM <data type> WHERE <expression>
where currently supported
<data type> are:
If you know the entity type you are looking for, searching in
Folder is what you want.
To search over annotations of all entity types, use
<expression> section are the conditions for limiting the search. Below demonstrates some examples of limiting searches.
For complete information on how to form queries and the types of limiting that can be performed, see the Synapse Query API.
Along with annotations added by users, every entity has a number of fields useful for searching. For a complete list, see here.
Here are some really useful ones:
projectIdwhich project the entity is associated with
parentId: where the entity resides (inside a folder/sub-folder; may also be the project if in the root folder) - useful for finding all files if you know they are in a specific folder
To find all files in a specific
Project, you need to know the
projectId, which is a Synapse identifier (looks like
For example, the TCGA Pan-Cancer Project has a
projectId of syn300013.
So, the query to find all files and all annotations associated with this
Project would be:
SELECT * FROM file WHERE projectId=="syn300013"
To list the
Files and annotations in a specific
Folder, you need to know the Synapse ID of the
Folder (for example syn1524884, which has data from TCGA related to melanoma). All entities in this
Folder will have a
parentId of syn1524884.
So, the query to find all
Files and all annotations in this
Folder would be:
SELECT * FROM file WHERE parentId=="syn1524884"
If you wanted to find all the sub-folders in this
Folder, you would do:
SELECT * FROM folder WHERE parentId=="syn1524884"
If you wanted both
Folders, this would work:
SELECT * FROM entity WHERE parentId=="syn1524884"
Annotations are most useful for discovery of similar types of entities. Essentially, all annotations across Synapse are stored in a large table that can be queried to find entities with annotations matching your own criteria. While it can be useful to search for files that exist within an known project or folder, this requires that you know ahead of time where the files are.
If annotations have been diligently added to
Files, they can be used to discover files of interest.
For example, you can identify all
Files annotated as
bam files (
fileType = bam) uploaded to Synapse with the following query:
SELECT * FROM file WHERE fileType=="bam"
Likewise, if you had put the RNA-seq related files described in the section above into a project (for example syn12345) with the described annotations, then you could find all of the files for
Condition A for
SELECT * FROM file WHERE projectId=="syn12345" AND sample=="1" AND condition=="A"
Lastly, you can query on a subset of entities that have a specific annotation. You can limit the annotations you want displayed as following.
SELECT id,name,dataType,fileType FROM file WHERE projectId=="syn12345" AND sample=="1" AND condition=="A"
Queries can be constructed using one of the analytical clients (command line, Python, and R) and on the web client, query results can be displayed in a table on a wiki page. Using the example above this can be done as following:
You can download files in a folder using queries. Currently this feature is only available in the command line client. For example, if you want to download all files in a folder that has a synapse id of
synapse get -q 'SELECT * FROM file WHERE parentId == "syn00123"'
Try posting a question to our Forum.
Let us know what was unclear or what has not been covered. Reader feedback is key to making the documentation better, so please let us know or open an issue in our Github repository (Sage-Bionetworks/synapseDocs).