Tissue samples submitted to the Biodiversity Institute of Ontario core analytical facility are analyzed using standard high-throughput molecular protocols. Therefore it is critical that tissues destined for DNA barcoding analysis and associated specimen data arrive at BIO in a compliant lab-ready format. To ensure this, standard sampling kits developed at BIO are being supplied to external collaborators complete with detailed instructions for sampling and data submission. All prospective donors are encouraged to observe these sampling guidelines and resolve any questions with their project/campaign coordinator. BIO has developed several types of sample submission formats which allow direct compatibility with the high-throughput lab molecular protocols, particularly, the 96-well format of all analytical stages. The sample storage medium most advocated by the Lepidoptera campaign is the 96-well microplate. This medium is cheap and compact and allows submitting the minimum amount of tissue required for one DNA extraction (ca. 2-3 mm in dimension or 5 mg), thereby avoiding the necessity to manage residual tissue. It also allows the samples to enter directly the lab DNA extraction pipeline, bypassing the intermediate subsampling stage. To facilitate proper tracking of each sample throughout the analytical chain, it is critical that each specimen is assigned a unique individual identifier (e.g., collection catalogue number prefixed by the museum acronym), or Sample ID, which is recorded in the CCDB data record spreadsheet, providing a map of locations for each sample inside a 96-well plate. This number should exactly correspond to the Sample ID submitted to BOLD together with specimen provenance information and images, thereby linking the sequence and the specimen record. See also section on specimen numbering conventions.
Once a 96-sample array and associated specimen data arrive at BIO, collection information is transferred onto BOLD. Once BOLD data records are created, tissue plates are forwarded into the analytical pipeline. Standard high-throughput protocols for DNA extraction, PCR, and sequencing ensure a fast turnaround time, high sequence yield, and quality. All lab operations are performed by dedicated lab staff, while the project manager oversees the whole procedure. Project collaborators can also monitor the analytical progress in real time using the BOLD online interface. Depending on the success rate of a particular array of samples, positive hit picking of PCR products or negative hit picking of failures can be conducted using robotic liquid handling stations. The normal turnaround time for lab work is two to three weeks, but may be extended, depending on sample quality. More details on the BIO analytical facility and the analytical protocols used can be found on the CCDB website.
Data sharing and analyses via BOLD
In complex collaborative projects involving multiple donors and institutions, it is the responsibility of all project participants to ensure that interests and rights of each person and institution are pursued and their input is acknowledged in resulting publications, proportional to their contribution to the study and to the preparation of these publications.
All collaborators will be registered as project users on BOLD, providing them with direct access to the management console for their project, making it possible to monitor progress, and to run analyses using tools provided in BOLD before the results are made available publicly. They are also welcome to download the sequences and analyze them, using alternative analytical algorithms and software. However, sequence submissions and changes to sequence data contained in BOLD can only be made by authorized BIO laboratory staff and database managers. Additionally, systematists can edit specimen information, e.g., updating taxonomy directly on the webpage. BOLD serves not only as a data depository, but also a communication platform between researchers collaborating on a given barcoding project and the analytical lab. For example, the taxonomic browser that is integrated in BOLD will show what species have been barcoded and what species are needed. The integrated LIMS cross-reference allows real-time monitoring of progress and evaluation of sequencing success. Project users are encouraged to verify the accuracy and completeness of the data contained in their relevant projects. As a general rule, all cases of conflict between the DNA-inferred similarities and morphological assignment require additional investigation in order to confirm whether they reflect human error (in sampling, identification or analysis) or true biological phenomena deserving further study.
A newly created BOLD project is closed from public view. External Collaborators directly involved in it are provided with secure access to view the specimen data and analytical results, via the BOLD online interface. Requests to get access to active project details and reports on possible data errors and inconsistencies should be submitted to the BOLD system administrator through respective project managers. Sequence data contained in BOLD projects with restricted access may be used by the BOLD identification engine to provide DNA-based taxonomic identifications to public users submitting DNA barcode sequences. The reports generated by the BOLD identification engine include probability scores and tree-based identification with branch labels containing taxonomic names and broad geographic localization (to province level). It is possible for both External Collaborators and their Project Coordinators to use unpublished project data in other simultaneously prepared and submitted publications (e.g., specialized taxonomic revisions or new species descriptions). However, this has to be negotiated on a case-by-case basis with all relevant project participants. It is desirable that this intent be stated clearly at the initial stages of prospective projects.
Once the initial specimen data submission has been made to BOLD, provenance information and images become partially available to the public online through the BOLD Taxonomy Browser. This information is used to generate summary statistics and illustrative distribution maps and does not disclose the contents of individual research projects and specimen data records.
All sequence data contained in BOLD (including unpublished projects) are used by the BOLD identification engine to provide DNA-based taxonomic identifications to public users submitting DNA barcode sequences. Reports generated by the BOLD identification engine include probability scores and tree-based identification with branch labels containing detailed taxonomic names, broad geographic localization (to province level), and corresponding BOLD Process ID’s. Information on individual specimens (museum catalogue numbers and place of voucher deposition) and their detailed geographic origin are not disclosed through the BOLD identification engine.
Sequence data that uploaded to BOLD prior to July 1st 2010 will follow the prior policy on data release, which is that COI sequences are not be publicly accessible unless the project coordinator submits a specific request to release them. Normally, the collaborator and the project coordinator come up with agreements on how and when to publish the data. The collaborator can also share data with other people he/she wants to include in the project. The sequence data and progress are kept confidential between collaborators and project coordinator for a particular project. Starting July 1st, 2020, a new sequence data release policy will be enacted, which is in line with typical genomics project requirements.
As the producer of a community resource (the DNA barcode library), the CCDB is expected to make genomic data publicly available within one week following their generation (see http://www.ncbi.nlm.nih.gov/). The information disclosed includes the DNA sequence, associated sequence trace files, BOLD Process ID (individual accession number automatically generated by BOLD), taxonomic position down to the ordinal level, and the country of origin. Information on associated voucher specimens, their detailed geographic origin, and detailed taxonomic position remains confidential at this pre-publication stage. The rapid release principle applies to all genomic information generated by the CCDB through support from its key funding agencies. In the genomics community, this data release model has worked well, and both publication and sequence annotation follow this preliminary release of raw sequence data.
Full sequence and specimen data submitted to BOLD will be made publicly available through BOLD and GenBank upon project publication. Decision to publish a set of barcode data is made by the project manager, subject to approval by all project contributors. If the data contained in a closed project remain unchanged for a period of over one year and no manuscript is submitted for publication by participating external collaborator(s) for a period of over two years since the submission of DNA barcodes, the project is designated as “orphaned”. If funding for the analyses was provided through grants to CCDB/BIO, and no response is received from the External Collaborator on the intended publication schedule, BIO/CCDB retains the right to make the online contents of an “orphaned” project publicly available through the BOLD web site, subject to approval by the CCDB/BIO and iBOL administration.