Uploading a large number of files can be tedious when uploading one at a time, especially if you want to set annotations and provenance. The command line and Python client have functionality for bulk upload and download called
sync. The uploads are specified by a tab delimited manifest file that specifies each file upload as a row in the manifest. In this article we will cover how to:
Read more about the helper functions on the synapseutils page.
Bulk file uploads are specified using a tab separated (
.tsv) manifest file. This file has columns of information about the file to be uploaded along with annotations that will be added in Synapse. Specifically the manifest has a set of required columns (
parent), columns for provenance, and columns for annotations. A simple example manifest that uploads a single file:
The above manifest describes a “file.csv” to upload to a Synapse folder
syn123 and name it “Tardar Sauce”. It would have provenance indicating that the code (https://github.com/your/code/repo) was executed and used input data in
syn654. Additionally it would have the annotations,
emotion = 'grumpy' and
species = 'cat'. Additonal annotations would be added by specifying more columns, where the name of the column indicates the name of the annotation to apply, and the values determined by what is entered in a row for that column.
%%Tables from syanpseutils docs To review:
Download the template.
The format of the manifest file (called ‘filesToUpload.tsv’ in this example) can be validated prior to uploading by using the parameter
dryRun = True in
Using the validated manifest above, we can now upload the files to Synapse. Once the files have all uploaded, you will receive an email notification of the completed upload. It will also notify you if there were any errors that were encountered during upload.
Files can be downloaded in bulk using the
syncFromSynapse function found in
synapseutils. This function allows you to download all the files in a folder or project along with all the annotations and provenance on those files. If the files are being downloaded to a specific location outside of the Synapse Cache, it will create a manifest that can be used to upload those files back up to Synapse.
For example, to download all the files in
syn123 to a folder called “myFolder” on your local computer:
You can edit files in bulk by changing the values in the manifest and pushing it up to Synapse using the
syncToSynapse function. The manifest allows you to modify everything: file path, provenance, annotations, and versions. However, if only annotations are being updated, we recommend using our File Views feature.
Please note that you cannot move things with a manifest. If the parentId is changed, it will create a copy and the file will exist in two different locations.
Try posting a question to our Forum.
Let us know what was unclear or what has not been covered. Reader feedback is key to making the documentation better, so please let us know or open an issue in our Github repository (Sage-Bionetworks/synapseDocs).