Frequently Asked Questions

BLAST

What version of BLAST is implemented in the Hydra 2.0 BLAST tool?
What BLAST programs are available?
What BLAST databases are available?
What do the boxes in the 'Links' column of the BLAST output represent?

Genome Browser

What tracks are available for viewing in the Hydra 2.0 genome browser?
How can I navigate the Hydra 2.0 genome using JBrowse?
Is the Hydra 2.0 genome browser searchable?
Is there a preferred Web browser I should use to view the Genome Browser?
What are the label descriptions for the Protein BLAST tracks in JBrowse?

View a Gene Page

How do I search for a gene using the Hydra 2.0 View an Augustus Gene Model Page?
What type of information is available in the Hydra 2.0 View an Augustus Gene Model Page?

Fetch a Scaffold

How do I download a single genomic scaffold sequence?
Is there a way to download a partial scaffold sequence?
What are the other options for fetching a scaffold sequence?

Pfam Domains

What options are available for searching the Hydra 2.0 genome for Pfam-A domains?
How were Pfam domains identified in the Hydra 2.0 genome?
What is the structure of the definition line when sequences are downloaded from the Pfam-A domains page?

In Situ Images

How were the in situ images compiled? How can I submit my own in situ images?
What annotation is provided with each Hydra in situ image?
How do I search for an in situ image for my gene of interest?
How can I download a high-resolution copy of a Hydra in situ image?

Download Sequences

Where can I download the full Hydra 2.0 genome assembly?
What is the convention used for naming scaffolds?
How are the gene identifiers generated?
Where can I download the Hydra 2.0 functional annotation?

Miscellaneous

How should data derived from the Hydra 2.0 Genome Project Portal be cited?
How can I contact the Web site administrator regarding technical issues?
Where can I get additional information about the Hydra 2.0 Genome Project?
How frequently are the data and portal tools updated?

What version of BLAST is implemented in the Hydra 2.0 BLAST tool?

We use SequenceServer, which implements a BLAST+ server with an intuitive user interface for use over the web. SequenceServer (version 1.0.9) is free and open source and is provided by the Queen Mary University of London and subject to all Terms & Conditions set forth by the developers. We are currently running version 2.2.31 of the BLAST+ executable from NCBI.

What BLAST programs are available?

BLASTN: Compares a nucleotide query sequence against a nucleotide sequence database.
BLASTP: Compares an amino acid query sequence against a protein sequence database.
BLASTX: Compares a nucleotide query sequence translated in all reading frames against a protein sequence database.
TBLASTN: Compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames.
TBLASTX: Compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.

What BLAST databases are available?

Nucleotide Sequence Databases:

Augustus Gene Models: A BLAST database consisting of 36,059 Hydra 2.0 Augustus gene predictions models.
Hydra 2.0 Genome: A BLAST database containing all 5,525 Hydra 2.0 genomic scaffolds.
Juliano Trinity (JT): A BLAST database of assembled Hydra transcripts. Hydra RNA-seq reads from Juliano et al. 2015 (PNAS) were assembled using Trinity into 46,543 transcripts.
Juliano aepLRv2: A BLAST database containing 38,749 transcripts from the low redundancy transcriptome assembly for Hydra vulgaris AEP.
Petersen Trinity (PT): A BLAST database of assembled Hydra transcripts. Hydra RNA-seq reads from Petersen et al. 2015 (Molecular Biology and Evolution) from time points 0-12 hours were assembled, using Trinity into a de novo transcriptome consisting of 36,338 transcripts.

Protein Sequence Databases:

Augustus Protein Models: A BLAST database of 36,059 translated proteins derived from the Hydra 2.0 Augustus gene prediction models.

What do the boxes in the 'Links' column of the BLAST output represent?

The 'Links' column provides clickable hyperlinks for each significant BLAST alignment to their entries in the Hydra 2.0 Genome Browser (B) and the View a Gene Page (G).

What tracks are available for viewing in the Hydra genome browser?

Aligned ATACseq tracks: Aligned ATACseq reads as quantitative data in an x/y plotAligned ATACseq reads as quantitative data in an x/y plot. ATACseq reads (replicate 1, 2, and 3) were generated for whole homeostatic Hydra. The Juliano_whole_peaks track is a consensus peak track consisting of all peaks that passed an IDR threshold of 0.1 for at least one pairwise comparison among the three biological replicates (Juliano_whole_rep1-3).
Aligned RNA-seq tracks: RNA-seq reads from Petersen et al. 2015 (Molecular Biology and Evolution) from time points 0-48h were aligned to the Hydra 2.0 genome using HISAT2. Aligned reads tracks display all aligned RNA-seq reads. BigWig XY tracks display the aligned RNA-seq reads as quantitative data in an x/y plot.
Juliano_Trinity: RNA-seq reads from Juliano et al. 2014 (PNAS) were assembled into Trinity transcripts. The Trinity assembled transcripts were aligned to the Hydra (2.0) genome using Splign.
Juliano_aepLRv2: Low redundancy transcriptome assembly for Hydra vulgaris AEP (38,749 transcripts).
Petersen_Trinity: RNA-seq reads from Petersen et al. 2015 (Molecular Biology and Evolution) from time points 0–12 h were used to assemble a de novo transcriptome, which consists of 36,338 high quality transcripts. The Trinity assembled transcripts were aligned to the Hydra (2.0) genome using Splign.
AUGUSTUS: Hydra 2.0 gene models predicted using AUGUSTUS.
AUGUSTUS_unmasked: Hydra 2.0 gene models predicted using AUGUSTUS with the unmasked genome.
FGENESH: Hydra 2.0 gene models predicted using FGENESH.
PASA: RNA-seq reads from Juliano et al. 2014 (PNAS) were assembled into Trinity transcripts. Transcripts were used to create gene structures based on spliced alignments with PASA.
Hydra1.0_JGI: Hydra 1.0 gene models (from JGI) aligned to the Hydra 2.0 genome assembly using Splign.
Hydra1.0_NCBI: Hydra 1.0 gene models (from NCBI) aligned to the Hydra 2.0 genome assembly using Splign.
BLASTP_augustus_vs_NCBI_nr: BLASTP results from using Augustus protein models as queries against the NCBI nr protein database with an e-value cutoff of 1e-5.
BLASTP_augustus_vs_UniProt: BLASTP results from using Augustus protein models as queries against the UniProt protein database with an e-value cutoff of 1e-5.
PASA_coding_regions: Candidate coding regions identified from PASA gene structures by TransDecoder.
AUGUSTUS_PFAM: Hydra 2.0 protein domains derived from PFAM HMMscans with an e-value cutoff of 1e-6 using the AUGUSTUS datasets.
6FRAMES_PFAM: Hydra 2.0 protein domains derived from PFAM HMMscans with an e-value cutoff of 1e-6 using the six-frame translations of the Hydra 2.0 genome.
Reference Sequence: The Hydra 2.0 genomic sequence and corresponding six-frame translations depicted when fully zoomed-in.
MASK: Genomic regions that have been masked using RepeatMasker and RepeatModeler are highlighted in light blue.
SCF: Assembled genomic scaffolds (SCF) appear as solid black tracks with intermittent gaps shaded bright pink.

How can I navigate the Hydra 2.0 genome using JBrowse?

A user can zoom in or out and left or right on a region of interest by clicking on the appropriate icons centered on the blue toolbar. In addition, available tracks can be viewed (or hidden) by clicking on the appropriate track label on the left sidebar of the browser window.

Is the Hydra genome browser searchable?

The genome browser is currently searchable using Hydra 2.0 scaffold (e.g., Sc4wPfr_97) or gene (e.g., Sc4wPfr_97.g13.t1) identifiers.

What are the label descriptions for the Protein BLAST tracks in JBrowse?

The Protein BLAST tracks are labeled with the BLAST program used (e.g., BLASTP), a descriptor of the query sequence (e.g., Augustus) and the target database (e.g., UniProt). Each Augustus protein model within each Protein BLAST JBrowse track is labeled with the target protein accession number (e.g., P09327), the percent ID, the E-value and the protein description (e.g., Villin-1_Homo_sapiens).

Is there a preferred Web browser I should use to view the Genome Browser?

The Hydra 2.0 Genome Project Portal was developed and tested using Firefox and Google Chrome. The rendering of desired output while using other Web browsers, including Internet Explorer and Safari, is assumed but not guaranteed. Clearing browser cookies and refreshing the JBrowse page will typically resolve any track feature rendering issues.

How do I search for a gene using the Hydra 2.0 View an Augustus Gene Model Page?

The Hydra 2.0 View an Augustus Gene Model Page is accessible from the home page left sidebar and is searchable using an Hydra 2.0 Augustus gene identifier (e.g., Sc4wPfr_97.g13.t1) either by selecting an identifier from the GeneID drop-down menu or by entering an identifier in the appropriate search box.

What type of information is available in the Hydra 2.0 View an Augustus Gene Model Page?

Each record in the View an Augustus Gene Model Page represents a single Hydra 2.0 Augustus gene and provides the following annotation: the scaffold where the gene is located, a link to view that gene in the Genome Browser, protein and nucleotide sequences, coding exonic genomic coordinates, pre-computed BLAST hits from UniProt and nr displaying the top hits for each protein, PFAM domains, Blast2GO and Argot^2.5 functional annotation and in situ images.

How do I download a single genomic scaffold sequence?

A user can enter a ScaffoldID (e.g., Sc4wPfr_307) in the "Fetch Scaffold" textbox to return a single FASTA-formatted Hydra scaffold sequence.

Is there a way to download a partial scaffold sequence?

A partial scaffold sequence can be retrieved by entering a ScaffoldID (e.g., Sc4wPfr_307) while also specifying the relative beginning and ending coordinates in the "Fetch Scaffold" textbox.

What are the other options for fetching a scaffold sequence?

A user can optionally retrieve either a reverse complement or the six-frame translation of a scaffold or partial scaffold by selecting the appropriate "Fetch Scaffold" search option.

What options are available for searching the Hydra 2.0 genome for Pfam-A domains?

You can elect to search the Hydra 2.0 Augustus Protein Models, six-frame translations of scaffolds, or both ("Both"). You may either select a domain name or domain accession number from the drop-down menus, or enter a domain name or Pfam accession number in the search box.

How were Pfam domains identified in the Hydra 2.0 genome?

We used hmmscan from the HMMER suite to search the Hydra 2.0 Augustus Protein Models and the six-frame translations of the Hydra 2.0 genome for domains from the Pfam-A database (version 29). We filtered the hmmscan output using an e-value cutoff of 1e-6. For more information on Pfam, visit http://pfam.xfam.org/.

What is the structure of the definition line when sequences are downloaded from the Pfam-A domains page?

If the "Full-length protein sequence(s), in FASTA format" option is chosen, the definition line will contain the protein identifier only (e.g., >Sc4wPfr_153.g7865.t1). If the "Pfam-A domains of selected protein(s), in FASTA format" option is chosen, the definition line will contain the protein identifier, the name of the query domain, and the coordinate range of the domain in that protein model (e.g., >Sc4wPfr_153.g7865.t1|2Fe-2S_thioredx:178-606).

How were the in situ images compiled? How can I submit my own in situ images?

The in situ images were generated by numerous hydroid biologists, with the images spanning many years of Hydra research. We encourage investigators who have in situ images that would benefit the hydroid biology community to contribute their images to this collection. For information on how to submit images, please contact Rob Steele at resteele@uci.edu.

What annotation is provided with each Hydra in situ image?

Each Hydra in situ image is annotated with a gene symbol (e.g., AXIN), a Hydra 2.0 Augustus gene identifier (e.g., Sc4wPfr_365.g14938.t1), gene expression localization information (e.g., ectoderm of head), its developmental stage (e.g., Adult polyp with bud), and the name of the image submitter.

How do I search for an in situ image for my gene of interest?

Hydra in situ images are accessible from the Hydra web site’s home page by clicking on the In Situ Images link in the sidebar. Once on the In Situ Images landing page, users may view all Hydra in situ images by clicking the “View All Images” button. Images are also searchable by selecting identifier(s) from a drop-down menu (e.g., gene symbol(s), gene identifier(s), stage(s), or submitter(s)) or by entering a search term in the appropriate search box. Search results are then displayed, showing the Hydra thumbnail image and its respective annotation.

How can I download a high-resolution copy of a Hydra in situ image?

High-resolution Hydra in situ images can be downloaded as a JPEG file by clicking on a thumbnail image from either the search query results page or from the bottom of a Gene page entry. Then click on the download arrow in the lower right of the in situ image and save the JPEG file to your computer.

Where can I download the full Hydra 2.0 genome assembly?

The full genome assembly, consisting of 5,525 scaffolds, is available for download from the sidebar "Download Sequences -> Genome" search option. Optionally, users can download a single scaffold by entering or selecting a scaffold identifier in the search box or dropdown box respectively.

What is the convention used for naming Hydra 2.0 scaffolds?

Scaffolds (e.g., Sc4wPfr_XXXX.X) are named as follows:

Sc4wPfr = the scaffold prefix.
XXXX = one to four digit, non-padded number. There are 5,525 scaffolds.
.X = scaffold version (optional)

How are the gene identifiers generated?

Augustus genes (e.g., Sc4wPfr_X.gX.tX) are named as follows:

Sc4wPfr_X = corresponds with the scaffold where the gene is located.
gX = a non-padded gene number. It is unique in combination with the scaffold ID. It is usually in order of its most 5' position on the scaffold, but is not a requirement. Newly added genes get the next highest unused integer regardless of its position.
tX = a non-padded number corresponding with the transcript or isoform. The first reported will be '1', second reported will be '2' etc.

Where can I download the Hydra 2.0 functional annotation?

Functional annotation, generated using both Blast2GO and Argot^2.5, is available for download from the sidebar “Download Sequences/Images -> Functional Annotation” search option.

How should data derived from the Hydra 2.0 Genome Project Portal be cited?

Please cite this Web site:

https://research.nhgri.nih.gov/hydra/

How can I contact the Web site administrator regarding technical issues?

Please send any Web site usability or technical correspondence to bioinformatics@nhgri.nih.gov.

Where can I get additional information about the Hydra 2.0 Genome Project?

For additional information, comments, or questions regarding the Hydra 2.0 Genome Project, please contact Dr. Steele directly, at resteele@uci.edu.

How frequently are the data and portal tools updated?

Changes to the Web site and underlying data are documented in the Release History.

TOP

NHGRI Division of Intramural Research

Hydra 2.0 Genome Project Portal

Frequently Asked Questions