Annotations in Synapse are semi-structured custom metadata that can be added to
Tables to further describe the object in question. Annotations can be based on an existing ontology or controlled vocabulary, or can be created in an ad hoc manner and modified later as the metadata evolves. Annotations can be a powerful tool used to systematically group and/or describe things (files or data, etc.) in Synapse, which then provides a way for those things to be searched for and discovered.
Annotations are structured as key-value pairs: the key is the name of the annotation while the value provides specifics. For example, if you have uploaded a collection of alignment files in the BAM file format from an RNA sequencing experiment, each representing a sample and experimental replicate, you can use annotations to add this information to each file in a structured way. Sometimes, users encode this information in file names, e.g.,
Sample1_ConditionA.bam, which makes it “human-readable” but makes it difficult to search for in a systematic way, such as finding all replicates of
Sample1. Adding this information as Synapse annotations enables a more complete description of the contents of the
In this case, the annotations you may want to add might look like this:
assay = RNA-seq
fileType = bam
sample = 1
condition = A
For the second sample in the same experimental set with the same conditions, the annotations would have the same “keys” but slightly different “values”, such as:
assay = RNA-seq
fileType = bam
sample = 2(note that we’ve changed the value here but the rest are the same)
condition = A
If these samples were part of a dataset with multiple assays, such as RNA-seq and ATAC-seq, you would annotate the file entities with either
assay = RNA-seq or
assay = ATAC-seq.
Annotations can be one of four basic types
You cannot have duplicate annotation keys on the same object in Synapse.
Annotations may be added when initially uploading a file, or at a later date. This can be done using the command line client, the Python client, the R client, or from the Web. Using the analytical clients facilitates batch and automated population of annotations across many files. The Web client is useful when uploading a single file, or if a minor change needs to be made to annotations on a few files.
If you have not decided on the annotations to add yet, you can add and modify the annotations at a later time as well, and you can manipulate annotations that have already been uploaded.
Queries in Synapse look SQL-like:
SELECT * FROM <data type> WHERE <expression>
where currently supported
<data type> are:
If you know the entity type you are looking for, searching in
Folder is what you want.
To search over annotations of all entity types, use
<expression> section are the conditions for limiting the search. Below demonstrates some examples of limiting searches.
For complete information on how to form queries and the types of limiting that can be performed, see the Synapse Query API.
Along with annotations added by users, every entity has a number of fields useful for searching. For a complete list, see here.
Here are some really useful ones:
projectIdwhich project the entity is associated with
parentId: where the entity resides (inside a folder/sub-folder; may also be the project if in the root folder) - useful for finding all files if you know they are in a specific folder
To find all files in a specific
Project, you need to know the
projectId, which is a Synapse identifier (looks like
For example, the TCGA Pan-Cancer Project has a
projectId of syn300013.
So, the query to find all files and all annotations associated with this
Project would be:
SELECT * FROM file WHERE projectId=="syn300013"
To list the
Files and annotations in a specific
Folder, you need to know the Synapse ID of the
Folder (for example syn1524884, which has data from TCGA related to melanoma). All entities in this
Folder will have a
parentId of syn1524884.
So, the query to find all
Files and all annotations in this
Folder would be:
SELECT * FROM file WHERE parentId=="syn1524884"
If you wanted to find all the sub-folders in this
Folder, you would do:
SELECT * FROM folder WHERE parentId=="syn1524884"
If you wanted both
Folders, this would work:
SELECT * FROM entity WHERE parentId=="syn1524884"
Annotations are most useful for discovery of similar types of entities. Essentially, all annotations across Synapse are stored in a large table that can be queried to find entities with annotations matching your own criteria. While it can be useful to search for files that exist within an known project or folder, this requires that you know ahead of time where the files are.
If annotations have been diligently added to
Files, they can be used to discover files of interest.
For example, you can identify all
Files annotated as
bam files (
fileType = bam) uploaded to Synapse with the following query:
SELECT * FROM file WHERE fileType=="bam"
Likewise, if you had put the RNA-seq related files described in the section above into a project (for example syn12345) with the described annotations, then you could find all of the files for
Condition A for
SELECT * FROM file WHERE projectId=="syn12345" AND sample=="1" AND condition=="A"
Lastly, you can query on a subset of entities that have a specific annotation. You can limit the annotations you want displayed as following.
SELECT id,name,dataType,fileType FROM file WHERE projectId=="syn12345" AND sample=="1" AND condition=="A"
Queries can be constructed using one of the analytical clients (command line, Python, and R) and on the web client, query results can be displayed in a table on a wiki page. Using the example above this can be done as following:
You can download files in a folder using queries. Currently this feature is only available in the command line client. For example, if you want to download all files in a folder that has a synapse id of
synapse get -q 'SELECT * FROM file WHERE parentId == "syn00123"'
Try posting a question to our Forum.
Let us know what was unclear or what has not been covered. Reader feedback is key to making the documentation better, so please let us know or open an issue in our Github repository (Sage-Bionetworks/synapseDocs).