Annotations are key-value pairs stored as metadata for
Tables that help users to find and query data. Annotations can be based on an existing ontology or controlled vocabulary, or can be created in an ad hoc manner and modified later as the metadata evolves. Annotations can be a powerful tool used to systematically group and/or describe things (files or data, etc.) in Synapse, which then provides a way for those things to be searched for and discovered.
For example, if you have uploaded a collection of alignment files in the BAM file format from an RNA-sequencing experiment, each representing a sample and experimental replicate, you can use annotations to add this information to each file in a structured way. Sometimes, users encode this information in file names, e.g.,
sampleA_conditionB.bam, which makes it “human-readable” but makes it difficult to search for in a systematic way, such as finding all replicates of
sampleA_conditionB. Adding this information as Synapse annotations enables a more complete description of the contents of the
In this case, the annotations you may want to add might look like this:
Annotations can be one of four types:
Annotations may be added when initially uploading a file or at a later date. This can be done using the command line client, the Python client, the R client, or from the web. Using the programmatic clients facilitates batch and automated population of annotations across many files. The web client can be used to bulk update many files using file views.
Queries in Synapse look SQL-like and you can query any
SELECT * FROM <synId> WHERE <expression>
<expression> section are the conditions for limiting the search. Below demonstrates some examples of limiting searches.
SELECT * FROM syn123456 WHERE "fileFormat"='fastq'
SELECT * FROM syn123456 WHERE "RIN"<=6.1
Along with annotations added by users, every entity has a number of fields useful for searching. For a complete list, see:
To find all files in a specific
Project, create a
File View in the web client. For example, if you’d like to see all files in a
Project, navigate to your project and then the
Tables tab. From there, click Tables Tools and Add File View. Click Add container and Enter Synapse Id to create a tabluar file view that contains every file in the project, which you can now query. Importantly, if you want to later query on annotations, you must select Add All Annotations. For a more in-depth look at this feature, please read our articles on File Views.
To list the
Files in a specific
Folder, you need to know the synID of the
Folder (for example syn1524884, which has data from TCGA related to melanoma). All entities in this
Folder will have a
parentId of syn1524884.
The function to find all
Files in this
Folder is called “getChildren”:
If annotations have been added to
Files, they can be used to discover files of interest from
File View syn123456.
For example, you can identify all
Files annotated as
bam files (
fileFormat = bam) with the following query:
SELECT * FROM syn123456 WHERE "fileFormat"='bam'
Likewise, if you had put the rnaSeq related files described in the section above into the project syn00123 with the described annotations, then you could find all of the files for
SELECT * FROM syn123456 WHERE "projectId"='syn00123' AND "specimenID"='sampleA_conditionB'
Lastly, you can query on a subset of entities that have a specific annotation. You can limit the annotations you want displayed as following.
SELECT specimenID,genomeBuild,fileFormat,platform FROM file WHERE "projectId"='syn00123' AND "specimenID"='sampleA_conditionB'
Reproducible queries can be constructed using one of the analytical clients (command line, Python, and R) and on the web client, query results can be displayed in a table on a wiki page.
You can download files in a folder using queries. Currently this feature is only available in the command line client. For example, if you want to download all files in a folder that has a synapse id of
synapse get -q "SELECT * FROM file WHERE parentId = 'syn00123'"
Single quotes in Synapse queries must be replaced by double quotes or two single quotes. In order to query for the
SELECT * FROM syn123 where "chemicalStructure" = '4"-chemical' #OR SELECT * FROM syn123 where "chemicalStructure" = '4''-chemical'