Tutorial
1.Introduction

The Genome Sequence Archive (GSA) is a data repository for archiving raw sequence reads. It accepts data submissions from all over the world and provides free access to all publicly available data for global scientific communities.

2.Log in to the BIG Submission Portal

Login to the BIG Submission Portal (BIG Sub, https://bigd.big.ac.cn/gsub/): Click the ‘login’ tab, then login. If you do not have an account yet, click the ‘Register’ tab to create one. If you have any problems about your account, please contact bigd-admin@big.ac.cn for assistance.

Notice: After logged in to the BIG Submission Portal, you can follow the steps below to finish the submission.

3.Create a GSA Submission

The BIG Sub provides a browser-based user interface for submitting GSA metadata as well as various options for uploading data files.

The page tabs presented by the Submission wizard are:

  • Submitter– the name and email information are auto-filled if you have logged in, and in this case, the system is able to identify the person who is entering the data in the form.
  • General information–this page collects general descriptive information about the GSA’s Release Date, Title and Description, Project Information and Sample Information.  This step links your existing project (PRJCA#) or samples (SAMC#) with your GSA data.
  • Notice:If you have already created GSA related Biological Sample(s) in the BioSample database, please select the ‘GSA related BioSample information has been created’. Then follow the wizard to complete the submission.

    If you have not created GSA related Biological Sample(s), please select the No GSA related BioSample information was created’. Then follow the wizard to complete the submission.

    Notice:If you select ‘Release immediately following curation’, the records will be released after the approval passed. If you select Release on a specified date, the GSA will be released on the date you specify.

  • Sample Type –this page provides a preview of the sample type that submitter is asked to supply during the submission process.
  • Notice:If you determine that your human data must be submitted via GSA Human database, please delete your current submission and contact us at gsa@big.ac.cn.

  • Attributes –this page allows you to upload attributes information about multiple samples in a single table. To finish the procedure, you should:
  • 1) Download the BioSample submission template table Plant.us.xlsx. For column explanations and examples, please see the e.g.Plant.us.xlsx. For more information, please see the Help.

    Notice:Downloading new template ensures that you get the most current and correct version.

    2) Fill in the template table and double-check it before uploading. Use the Selection box to select your completed table.

    3) Then click the ‘Check’ button to verify the submitted batch information online.

    4) If the file has passed the examination, please click the ‘Save and forward’ button to complete your submission. If not, please click the ‘Delete’ button. You should edit and re-upload the file until it is correct.

  • Metadata – this page allows you to upload metadata information about the raw sequences, including Experiments and Runs. To finish the procedure, you should:
  • 1) Download the GSA submission template table GSA_Template.us.xlsx. For column explanations and examples, please see the e.g.GSA_Template.us.xlsxFor more information, please see the Help

    Notice:Downloading new template ensures that you get the most current and correct version

    2) Fill in the template table and double-check it before uploading. Use the Selection box to select your completed table.

    3) Then click the ‘Check’ button to verify the submitted batch information online.

    4) If the file has passed the examination, please click the ‘Save and forward’ button to go to the next step of the submission. If not, please click the ‘Delete’ button. You should edit and re-upload the file until it is correct.

  • File Upload this page allows you to select the file upload method, including FTP, Aspera Command Line (Recommend) and Web browser upload via Aspera Connect plugin. You can also upload your files before the metadata submission if you choose the FTP or Aspera Command Line method.
  • Notice:

    1) Please remember to check the names and MD5 checksums of the sequence files, which must be the same as those you filled in the batch submission table. Otherwise, your files cannot be archived correctly.

    2) If you choose the Aspera Command Line to upload files, please write down the Aspera Command Line information.

  • Overview – this page presents a summary of the provided information. On this step, please be careful to check the details of your submission. If you need to make changes, go back by clicking the relevant tab then edit. If everything is correct, click the Submit button to complete the submission.
  • After completing the submission, please wait for data curation. We will check both metadata and the sequence files and send feedbacks to your registered Email if they are not perfectly correct. So, please pay attention to your mail feedbacks. After the curation, your data will be archived to a single GSA set and the assigned accession number will be shown in your GSA list.

    Notice:

    1) Each new submission receives a temporary Submission ID in the form of sub#, like subCRA019091. Please provide this ID when contacting the GSA Working Team. DO NOT use the temporary Submission ID in the publication or BIG Search.

    2) After the submission, you will get the GSA Accession numbers in the form of CRA#, like CRA012226. Please use this number in a publication or BIG Search

4.How to Edit, Delete or Add New Data

Before the GSA data are archived, you can click the Submission ID to enter the Overview page. On this page, you can 1) update the Release date and Title; 2) edit the Submitter information; 3) Append data by clicking the ‘Add Data’ button, for more information, please see ‘Create new GSA Submission’; 4) edit or delete Metadata information for each submitted Experiment or Run; and 5) upload or update data files by clicking ‘Upload File’ button.

Notice:For more detail about submission status and the available operations, please go to ‘Status and Operation’.

After the GSA data are archived (Status is Checked OK; confidential), you can click the Submission ID to enter the Overview page. On this page, you can 1) update the Release date and Title; 2) edit the Submitter information; 3) Append data by clicking the ‘Add Data’ button, for more information, please see ‘Create new GSA Submission’; and 4) upload or update data file by clicking the ‘Upload File’ button. If you still want to change the Metadata information for each submitted Experiment or Run, please contact us at gsa@big.ac.cn.

Notice:For more detail about submission status and the available operations, please go to ‘Status and Operation’.

6.How to Release Your Data

After the article is published, you can click on the ‘Release Now’ button in the ‘Operation’ column of the list as shown below.

Click ‘Yes’ in the ‘confirmation box’ to trigger the release. The release of GSA will trigger the release of the related BioProject and BioSample(s), so you DO NOT need to release BioProject and BioSample in their respective system again.

It will take several hours to release a GSA dataset, depending on its data size. After they are released, all the data of the GSA dataset can be retrieved from the BIG Search portal within 14 hours.

7.Data File Uploading

Three methods are offered for data uploading: Aspera Command Line, FTP and Aspera Connect plugin. Please choose one to upload your data. If you need any help during data file uploading, please contact the GSA Working Team at gsa@big.ac.cn or QQ group: 548170081.

If the files you are going to upload exceed 30 TB in size, please contact us at gsa@big.ac.cn.

NOTICE:

1. Unique file names should be used for all files, and each file must be listed in the GSA metadata file you uploaded.

2. Files must be compressed using gzip or bzip2.

3. Uploaded files will be removed after they are archived.

7.1 Aspera Command Line

Use Aspera Command Line to upload files. You may use the following command to upload files via Aspera Comand Line:

[path/to/ascp/] -P33001 -i [path/to/key/file] -QT -l100m -k1 -d [path/to/folder/containing/files] aspsub@submit.big.ac.cn:uploads/ [user dir]

Where:

[path/to/ascp/]:

Microsoft Windows: C:\Program Files\Aspera\Aspera Connect\bin\ascp.exe

or C:\users\[username]\AppData\Local\Programs\Aspera\Aspera Connect\bin\ascp.exe

Mac OS X: /Applications/Aspera/Connect.app/Contents/Resources/ascp (for admins installation)

or /Users/[username]/Applications/Aspera/Connect.app/Contents/Resources/ascp (for non-admins installation)

Linux: /opt/aspera/bin/ascp or /home/[username]/aspera/connect/bin/ascp

[path/to/key/file] must be an absolute path, e.g.: /home/keys/aspera.openssh

[path/to/folder/containing/files] needs to specify the local folder that contains all the files to upload.

[user dir] user directory. You can click the Submission ID to enter the Overview page. On this page, click the ‘Add Data’ button and enter the “04 Files” page to find the user directory information.

Get the key file

Notice:

1) Please make a new subdirectory for each new submission. Your submission subfolder is a temporary holding area and will be removed once the whole submission is complete.

2) Do not upload complex directory structures or files that do not contain sequence data.

3) Updating Files: After the metadata information has been submitted, you cannot directly access the file upload page through the navigation bar. If you need to re-upload or append data, click on the Submission ID to enter the Overview page. Then, click the Update file button to proceed to the 04 Files file upload page. Here, you can choose the appropriate file upload method and re-upload the files.

7.2 FTP Upload

● FTP client uploads data

Users need to use an FTP client software, such as FileZilla Client, to log in to the FTP server and upload data. The document uses FileZilla as an example.

1) Step 1: Download the client software from the website (https://filezilla-project.org/). The download page is shown in Figure 1. Click on the Download FileZilla Client’ button in the red box and follow the instructions to install the software.

Figure 1 FileZilla Client Software download



2) Step 2: Open the software, and the interface will appear as shown in Figure 2. Enter the host information as ‘submit.big.ac.cn’, and fill in your GSA database login account email and password as the username and password. Then click ‘Quick connect’. The status bar will display a successful login message. If an error message appears, please check the error reason as indicated.

3) Step 3: After successful login, choose the local data path where the data needs to be uploaded under ‘Local site’. In the ‘Remote site’, double-click on the GSA folder to enter the GSA directory.

4) Step 4: In ‘Local site’, select the data files or folders that need to be uploaded, right-click, and choose ‘Upload’, or directly drag them to ‘Remote site’, as shown in Figure 3.

5) Step 5: All uploaded data will be listed in the ‘Queue’ for uploading. After successful upload, the data information will be moved to the ‘Successful transfers’. If the upload is unsuccessful, it will be moved to ‘Failed transfers’ and will need to be re-uploaded. You can use ‘Resume’ for resuming the upload.

Figure 2 FileZilla Client Interface



Figure 3 FileZilla Client Upload Interface



Figure 4 Data Transfer Status



● FTP Upload data from the command line

ftp command: The commands that need to be entered are underlined

Upload successful interactive page:

● Possible problems

Question 1: When logging in via FTP, an error message of AUTH SSL appears in the status bar as shown in Figure 5.

Solution: Click ‘Site Manager’ in ‘File’ in the menu bar as shown in Figure 6, change the ‘Encryption’ option to ‘Use only ordinary FTP’ or ‘’, and fill in the correct host address: submit.big.a.cn, account number and password information. Finally click ‘Connect’.

Figure 5 Error Filezilla Message



Figure 6 Site Management Settings



Question 2: When logging in via FTP, as shown in Figure 7, an MLSD error appears in the status bar (as shown in Figure 7), showing ‘failed to read directory list’.

Solution: Modify the transmission mode in Filezila -> Edit -> Settings and change it to passive mode (as shown in Figure 8).

Figure 7 Error Filezilla Message



Figure 8 Transfer Mode Modification



7.3 Assist upload

GSA has fully considered the needs of users who submit large volumes of data, and has opened a green channel for hard disk delivery and assisted uploading for a one-time upload of data larger than 1TB. Please contact the GSA working group email at gsa@big.ac.cn, fill in ‘PRJCA [please write the number]-hard disk filling information document’, send the electronic version to the working group mailbox, and send the printed paper version to the hard disk with the data GSA.

8.Release of linked BioProject/BioSample/GSA

Release rules of linked BioProject, BioSample, and GSA are as follows:

1.The release of the BioProject records DO NOT trigger the release of the other linked data.

2.The release of the BioSample records JUST triggers the release of its BioProject.

3.The release of the GSA nucleotide sequence data DO trigger the release of the linked BioProject and BioSample records.

Notice: Therefore, please carefully fill in the ‘release time’ of a BioProject, BioSample and GSA. Once published, the representative data or information can be retrieved or downloaded by other users.

9.Status and Operation

GSA Status and Operation

No. Status Description operation
1 Unfinished at the General Info step Finished the Submitter step and enter the general info step. Edit[1] ; Delete
2 Unfinished at the Sample Type step Finished the General info step. If not created GSA related Biological Sample(s), enter the Sample type step. Edit[1] ; Delete
3 Unfinished at the Attributes step Finished Sample type step, enter the Attributes step Edit[1] ; Delete
4 Unfinished at the Metadata step Finished the Attributes step, enter the GSA metadata step. Edit[1] ; Delete
5 Unfinished at the File Upload step Finished the GSA metadata step, enter the File Upload step. Edit[1] ; Delete
6 Unfinished at the Overview step Enter the overview step. Edit[1] ; Delete
7 Unchecked All the information are submitted, waiting for check. Edit[1] ; Delete
8 Checking Data file(s) processing Edit[1] ; Delete
9 Checked failed Data file(s) processed error. Edit[1] ; Delete; Reload data file via FTP or Aspera Command Line[2]
10 Checked OK Data file(s) Processed succeed and GSA Accession number is assigned. Release Now; Share
11 Deleted Deleted

[1]: You can click the GSA Submission ID to enter the Overview page to edit GSA related metadata. For more detail, please see ‘How to Edit, Delete or Add New Data’.

[2]: For more details for data file upload, please see ‘Data File Upload’.

Experiment Status and Operation

No. Status Description operation
1 Unchecked Metadata submitted and waiting for check. Edit[1] ; Delete
2 Checked OK Metadata Checked OK Edit[1] ; Delete; Reload data file via FTP or Aspera Command Line[2]
3 Checked failed Metadata Checked failed Edit[1] ; Delete
4 Deleted Deleted

[1]: You can click the GSA Submission ID to enter the Overview page to edit the Experiment metadata. For more details, please see ‘How to Edit, Delete or Add New Data’.

[2]:For more details for data file upload, please see ‘Data File Upload’.

Run Status and Operation

No. Status Description operation
1 Unchecked Metadata submitted and waiting for check. Edit[1] ; Delete
2 Checked OK Metadata Checked OK Edit[1] ; Delete
3 Checked failed Metadata Checked failed Edit[1] ; Delete
4 Uploaded Succeed Data file(s) uploaded succeed, waiting for processing.
5 Processing Data file(s) under processing.
6 Processed succeed Data file(s) processed succeed
7 Processed error Data file(s) processed error Edit[1] ; Delete; Reload data file via FTP or Aspera Command Line[2]
8 Deleted Deleted

[1]: You can click the GSA Submission ID to enter the Overview page to edit the Run metadata. For more details, please see ‘How to Edit, Delete or Add New Data’.

[2]: For more details for data file upload, please see ‘Data File Upload’.