Exploring Non‐Coding RNAs in RNAcentral

Non‐coding RNAs are essential for all life and carry out a wide range of functions. Information about these molecules is distributed across dozens of specialized resources. RNAcentral is a database of non‐coding RNA sequences that provides a unified access point to non‐coding RNA annotations from >40 member databases and helps provide insight into the function of these RNAs. This article describes different ways of accessing the data, including searching the website and retrieving the data programmatically over web APIs and a public database. We also demonstrate an example Galaxy workflow for using RNAcentral for RNA‐seq differential expression analysis. RNAcentral is available at https://rnacentral.org. © 2020 The Authors.


INTRODUCTION
RNAcentral (https:// rnacentral.org) is a database of non-coding RNA (ncRNA) sequences that aggregates ncRNA data from >40 member resources known as Expert Databases (Bateman et al., 2011;The RNAcentral Consortium, 2019). RNAcentral is designed as a single-entry point for biologists and bioinformaticians interested in ncRNAs, where they can find a high-level overview of ncRNA content in different species or taxonomic groups, as well as functional information about individual ncRNAs. This includes RNA secondary structure, genome locations, Rfam annotations (see Current Protocols article: Kalvari et al., 2017), orthologs and paralogs, miRNA targets, RNA modifications, and more. In addition to the data from member resources, RNAcentral generates additional annotations, such as comprehensive genome mapping for >350 reference genomes (The RNAcentral Consortium, 2019) and template-based RNA secondary structure diagrams.
RNAcentral provides four key functionalities: (1) Viewing information about individual ncRNA sequences; (2) Text search that enables exploration of ncRNA sequences from different sources; (3) Sequence search for performing sequence similarity queries against a comprehensive set of ncRNA sequences; (4) FTP archive with downloadable files, including genome annotations in BED and GFF3 formats.
The following four basic protocols describe how to use the main RNAcentral features, focusing on advanced methods for data access. In addition, three support protocols discuss programmatic data retrieval using web APIs or a public Postgres database, and describe an example Galaxy workflow (Afgan et al., 2018) for analyzing RNA-seq data using RNAcentral.
As the RNAcentral database and the website are under active development, the most recent RNAcentral version may contain new types of data or other functionality not covered here. This article is based on RNAcentral release 14.

VIEWING RNAcentral SEQUENCE REPORTS
At the time of this writing, RNAcentral contains >16 million non-redundant sequences from a wide range of species. Each sequence has a dedicated report page that always includes the following information: ncRNA sequence, cross-reference(s) to the databases where this sequence is annotated, and its unique accession number (see Guidelines for Understanding Results for more information about RNAcentral identifiers).
Depending on the ncRNA type, organism, and source database, the report pages may include one or more additional sections: RNA secondary structure, an embedded genome browser, Rfam classification (see Current Protocols article: Kalvari et al., 2017), Gene Ontology (GO) terms (Huntley et al., 2014), orthologs and paralogs from Ensembl Compara (Pignatelli et al., 2016), microRNA targets from TarBase (Karagkouni et al., 2018) and LncBase (Paraskevopoulou et al., 2016), modifications from Modomics (Machnicka et al., 2013), literature references, sequence feature viewer, and more. 2. Click on the single search result to view the sequence report.
3. Explore the report page (Fig. 2), focusing on the RNA type, source databases, and genomic neighborhood. Most sections of the webpage are interactive. For example, you can click on the RNAcentral transcripts in the genome browser ( Fig. 2C) to find what database these sequences come from, or you can visualize the GO term hierarchy by clicking on the tree icons (Fig. 2E).
4. Follow the links to the source databases to find additional information about this RNA. For example, in miRBase (Kozomara, Birgaoanu, & Griffiths-Jones, 2019) you can view deep sequencing data supporting this microRNA as well as >370 papers that mention this RNA, while in MalaCards (see Current Protocols article: Rappaport et al., 2014) you can find details about the association of this microRNA with breast cancer and hepatocellular carcinoma, as well as pancreatic and prostate cancers.
5. Identify other sequences in the same genomic locations. The embedded genome browser (Fig. 2C) shows the sequence in the context of the reference genome, including other ncRNAs as well as protein-coding genes and pseudogenes found in the genomic neighborhood.
It is important to view sequence reports for all ncRNAs in a genomic region of interest. For example, the genome browser shows another precursor microRNA sequence (URS0000EFBE70_9606) in the same region provided by MirGeneDB (Fromm et al., 2020;Fig. 2C). This discrepancy is caused by the differences in the annotation methods between the two databases. See Guidelines for Understanding Results for more information about interpreting such cases.
Sweeney et al.

of 25
Current Protocols in Bioinformatics

of 25
Current Protocols in Bioinformatics 6. Click the "Taxonomy" tab to view a list of species where the RNA sequence also occurs. Note that this tab only shows the entries with the 100% identical sequence. For other related sequences, explore the "Related RNAs" section of the report page that shows ortholog and paralog sequences retrieved from Ensembl Compara.
7. Click the "Download" button to retrieve the ncRNA sequence in FASTA format or get the sequences and the annotations in JSON format.

USING RNAcentral TEXT SEARCH TO EXPLORE ncRNA SEQUENCES
The text search enables users to query RNAcentral by species, gene name, RNA type, or any other keyword. The search can be used for exploring the data from >40 databases on the RNAcentral website as shown in this protocol, or it can be used programmatically via an API (see Support Protocol 1).

Necessary Resources Hardware
Any device with Internet access

Software
An up-to-date Web browser, such as Chrome, Safari, or Firefox Browse all RNAcentral sequences 1. Start at the RNAcentral homepage (https:// rnacentral.org) and click "Browse sequences" (Fig. 1).
The key feature of the search interface is the 'facets', which facilitate filtering of the results and show how many sequences of each type match the query. For example, the 'RNA types' facet in Figure 3 shows that the majority of RNAcentral sequences come from the ubiquitously found ribosomal RNA (rRNA) and transfer RNAs (tRNAs).

Browse RNA sequences in a species or taxonomic group of interest
2. To list all ncRNAs in a particular species, type the species name in the search bar and select the species in the Organism facet.
a. Alternatively, you can use the syntax: taxonomy: "NCBI_TAXID" where NCBI_TAXID is the NCBI taxonomy identifier (taxid) (Federhen, 2012). For example, taxonomy: "9606" will return all sequences from Homo sapiens, which has the assigned NCBI taxid of 9606.  To make sure that the results contain the exact query string, surround the query with double quotes. For example, if you are looking for a specific microRNA like hsa-mir-126, run the following search: "hsa-mir-126" A search without double quotes will also match hsa-mir-1261, hsa-mir-1262, and other sequences.
b. Use logic operators.
The search supports logic operators, such as AND, OR, and NOT. For example, one can identify microRNAs from miRBase that are not found in RefSeq or Ensembl (note the use of parentheses for grouping the search terms):  rna_type: "miRNA" AND expert_db: "miRBase" NOT (expert_db: "RefSeq" OR expert_db: "Ensembl") c. Limit sequence length.
One can restrict the length of the sequences; for example, the following search returns tRNAs between 60 and 100 nucleotides long (it is also possible to use the sequence length slider shown in Fig. 3

USING RNAcentral SEQUENCE SEARCH
RNAcentral hosts a sequence-similarity search powered by nhmmer (Wheeler & Eddy, 2013) that enables users to compare any sequence against a collection of ncRNA sequences available in RNAcentral. The RNAcentral sequence search can be used to find similar sequences, check if the exact sequence has been observed before, or confirm that a sequence does not match any known ncRNAs.

Necessary Resources Hardware
Any device with Internet access

Software
An up-to-date Web browser, such as Chrome, Safari, or Firefox Sweeney et al.

of 25
Current Protocols in Bioinformatics 1. Find similar sequences. Go to the sequence search page (https:// rnacentral.org/ sequence-search) and enter the following microRNA sequence:

GGGAUGAGGUAGUAGGUUGUAUAGUUUUAGGGUCACACCCACCACUGGGA GAUAACUAUACAAUCUACUGUCUUUC
2. Click the "Show details" button to show information about the match such as e-value, sequence identity, and numbers of matching bases. By default, the results are sorted by e-value, but the "Sort by" drop-down menu allows alternative orderings.
3. Explore the results using the same facets as in the text search (Basic Protocol 2). For example, the query sequence matched the human let-7 precursor microRNA as the top hit; however, one can view hits in other species listed under the Organisms facet. Each result links out to the sequence report page (see Basic Protocol 1). The results can also be filtered by any keyword using the "Search within results" field ( Fig. 5). See Basic Protocol 2 for more information on facets and different search strategies.
If a query sequence has an exact match in RNAcentral, it will be looked up as soon as the query is entered in the search box (Fig. 6). This is useful for checking if a sequence is found in RNAcentral.
More information about the RNAcentral sequence search can be found at https:// rnacentral.org/ help/ sequence-search.

USING RNAcentral FTP ARCHIVE
The FTP archive provides users with an easy way to fetch large amounts of data from RNAcentral. The archive contains downloadable files, including sequences in FASTA format, identifier mapping files that can be used to convert between RNAcentral and Sweeney et al.

of 25
Current Protocols in Bioinformatics Figure 6 Example RNAcentral sequence search lookup using human 5S rRNA sequence (URS00000F9D45_9606) as a query. The sequence is found in 8 species (human and 7 others) which can be explored using the provided links. external identifiers, Rfam annotations, GO annotations, and genome annotations in GFF3 and BED formats. In addition, the archive enables access to previous RNAcentral releases.
In the following protocol, we demonstrate how to compare a set of genomic coordinates with a comprehensive collection of ncRNAs from RNAcentral by intersecting an RNAcentral BED file with an example GFF3 file (the same steps work with BED, VCF, or BAM files). This protocol can be used to find if a genomic region of interest overlaps with RNAcentral sequences as part of RNA-seq data analysis (see Support Protocol 3 for an alternative workflow using Galaxy).

Necessary Resources Hardware
A computer with access to UNIX terminal and the Internet bedtools intersect -a rnacentral.bed -b example.gff3 > output.bed 3. Filter results using RNAcentral-specific metadata BED fields. In addition to the standard fields, the RNAcentral BED files contain two additional columns (RNA type and a list of source databases) that enable selecting a subset of sequences using a command-line tool like grep (Fig. 7). For example, it is possible to filter the genomic regions by RNA type, member database, and other criteria using grep commands, as in the following. grep FlyBase output.bed > output-flybase.bed The resulting BED files can be visualized as a custom track in the Ensembl genome browser (Cunningham et al., 2019), viewed locally using IGV (Robinson et al., 2011), or used as an input for downstream bioinformatic analyses.

USING WEB APIs FOR PROGRAMMATIC DATA ACCESS
In addition to accessing the RNAcentral data through the website, it is possible to use it programmatically by taking advantage of two web APIs: 1. The RNAcentral API can be used to retrieve information about individual ncRNA entries; 2. The text search API can be used to search and retrieve information about RNAcentral entries as described in Basic Protocol 1.
These APIs can be used independently or cooperatively, depending on the use case, as explained below.

Necessary Resources Hardware
A computer with access to UNIX terminal and the Internet

Software
To interact with the API, you will need to run custom programs in your preferred programming language. The following example is written in Python and requires the requests package.

Using text search API
The RNAcentral text search (see Basic Protocol 2) is powered by the EMBL-EBI Search engine and has a REST API that can be used from any programming language that supports retrieving data over the Internet, such as Python or JavaScript (Madeira et al., 2019). This protocol covers basic usage of the API; for more detailed information about programmatic access, please refer to the EMBL-EBI Search documentation (https:// www. ebi.ac.uk/ ebisearch) and the RNAcentral API help page (https:// rnacentral.org/ api).
Here we demonstrate how to programmatically perform a search for has-mir-126 sequences from miRBase (Kozomara et al., 2019) using the text search, and then retrieve the description and RNA type of the results.  Figure 8 shows the output as of release 14.
Sweeney et al.

of 25
Current Protocols in Bioinformatics

Combining text search and RNAcentral APIs
The RNAcentral text search and the RNAcentral API can be used together to search and access RNAcentral data. For example, in the previous section we retrieved several fields directly from the text search (description and rna_type, see Fig. 8). However, not all of the metadata about the sequence is available in the text search. Most importantly, the nucleotide sequence is not part of the text search index, but it can be accessed via the RNAcentral API.
1b. Use a programming language to lookup the sequence for the results from above. In python this can be done with: We provide an example Python script that looks up RNAcentral identifiers using a text search query, and then loads their sequences from the RNAcentral API (Fig. 9).

USING PUBLIC POSTGRES DATABASE TO EXPORT LARGE DATASETS
A public copy of the RNAcentral Postgres database is made available in order to enable users to query RNAcentral in any programming language with database connectivity. This functionality can be used to automate data export or to export large datasets that cannot be downloaded from the RNAcentral website. The database is updated with each RNAcentral release and contains a copy of the data available through the RNAcentral website.
The database connection details can be found in Table 1. We recommend using a Postgres client like DBeaver or PgAdmin for exploring the schema and testing SQL queries, but for exporting large volumes of data, it is best to use a command-line client.

Current Protocols in Bioinformatics
The file output.fasta will contain the desired subset of RNAcentral sequences in FASTA format.
More information about using the RNAcentral Postgres database can be found at https: // rnacentral.org/ help/ public-database.

ANALYZE NON-CODING RNA IN RNA-seq DATASETS USING RNAcentral AND GALAXY
RNA-seq experiments can provide information about gene expression in the cells of interest. There are a wide range of RNA-seq technologies (Stark, Grzelak, & Hadfield, 2019) targeting different types of transcripts. For example, TGIRT-seq (Nottingham et al., 2016) uses the thermostable group II intron reverse transcriptase and can process highly structured, short RNAs, such as tRNAs and snoRNAs (Boivin et al., 2018(Boivin et al., , 2020. In the following protocol, we demonstrate an example RNA-seq workflow using Galaxy, a web platform that enables users to perform computational workflows in the cloud (Afgan et al., 2018). We will analyze a single-end RNA-seq dataset based on Drosophila melanogaster S2 cells sequenced under normal conditions and amino-acid starvation (project PRJNA601750, https:// www.ebi.ac.uk/ ena/ data/ view/ PRJNA601750). We will use the RNAcentral ncRNA annotations to compare ncRNA expression between the two conditions.

Necessary Resources Hardware
Any device with Internet access

Software
An up-to-date Web browser, such as Chrome, Safari, or Firefox Optional: IGV (Robinson et al., 2011).

RNA-seq dataset (FASTQ format) RNAcentral annotations (GFF3 format)
Reference genome (FASTA format) 1. Import RNA-seq data into Galaxy. An RNA-seq dataset can be uploaded to Galaxy from a local computer or imported directly from the biological databases, an FTP archive, or any public URL.
a. Begin at the Galaxy homepage (https:// usegalaxy.org/ ) and log in to your Galaxy account. Click the "+" icon in the History panel to create a new history and keep track of input and output files (Fig. 11, right). b. Find the section "Get data" in the Tools panel (Fig. 11, left). Click "Download and Extract Reads in FASTA/Q format from NCBI SRA" in order to import data Sweeney et al.

of 25
Current Protocols in Bioinformatics  directly from the NCBI SRA database (Amid et al., 2020). Enter the four SRR accessions from Table 2 as shown in Figure 12 (each accession needs to be imported separately).
2. Import ncRNA annotations from RNAcentral. b. Go to the Galaxy dashboard and click the arrow icon in the upper right of the Tools panel, select "Paste/Fetch data," enter the GFF3 file address, and specify "gff3" for the file Type (Fig. 13).
3. Upload the reference genome.
a. Select the reference genome to use. Depending on the genome, you can download the FASTA format from Ensembl or the UCSC Genome browser (Kent et al., 2002).

of 25
Current Protocols in Bioinformatics Figure 12 Importing data into Galaxy from the NCBI SRA. Make sure that the output is "Uncompressed fastq."

Figure 13
Uploading RNAcentral genome annotations into Galaxy. Note that the Type is set to "gff3."

of 25
Current Protocols in Bioinformatics Figure 14 An overview of the RNAcentral Galaxy workflow. When viewed on the Galaxy website, the diagram is interactive and can be used to examine the data flow from one tool to the next.
a. At the top of the Galaxy page locate the "Workflow" tab that contains all available workflows. b. Click the "Import" button to upload a Galaxy workflow from a local file (see Sup-porting Information file RNAcentral-single-end-workflow.ga) or follow the URL below and click the "+" icon to add the workflow to your Galaxy account: https:// usegalaxy.org/ u/ rnacentral/ w/ current-protocols-workflow. c. Go to the "Workflow" tab again, click the newly imported workflow, and select "Edit" to view the steps (Fig. 14).
5. Prepare RNA-seq data for differential expression analysis. The data files must be grouped by experimental condition into the Galaxy "Dataset Lists" (for example, healthy and cancerous samples, or different stages of embryonic development). All replicates/samples for each condition must be located in one Dataset List. Make sure that your FASTQ files are unzipped.
a. Go the Galaxy dashboard and click the checkbox icon in the History panel. Select all FASTQ replicates/samples that should be in one list (Fig. 15, left and center). b. Click "For all selected…" and select "Build Dataset List" to create a folder with the selected samples (Fig. 15, right). In this example, SRR10904051 and SRR10904052 should be grouped and called "control" while SRR10904053 and SRR10904054 should be grouped and called "test" or similar.
If a sample is subdivided into several FASTQ files (which does not apply to this example), then before creating the Dataset List, the FASTQ files from the same sample should be combined using the "Collapse collection" function (available under "Collection operations" in the Tools section).

Detailed instructions on combining multiple samples in collections can be found in a
Galaxy tutorial (https:// galaxyproject.org/ tutorials/ collections/ ).

Run Galaxy workflow.
The main purpose of this workflow is the DE analysis of ncRNAs under different conditions. The workflow aligns the FASTQ files onto the reference genome to produce Sweeney et al.

of 25
Current Protocols in Bioinformatics Figure 15 Creating a "control" Dataset List from SRR10904051 and SRR10904052. This operation should be repeated to create a "test" Dataset List for SRR10904053 and SRR10904054.

Figure 16
Selecting Galaxy workflow inputs. Here "control" and "test" refer to the Dataset Lists with normal and starvation samples.
BAM files, analyses the alignments, counts reads using the RNAcentral annotations, normalizes count matrices, and performs the DE analysis using DESeq2. a. Click "Workflows" at the top of the webpage, select the imported workflow and choose "Run." b. Select workflow inputs (Fig. 16): control: a Dataset List containing the control samples. In this example, it is the dataset list containing files SRR10904051 and SRR10904052 where cells were grown under normal conditions test: a Dataset List containing samples subjected to amino-acid starvation (files SRR10904053 and SRR10904054) reference genome: the Drosophila melanogaster reference genome ncRNA annotation: RNAcentral ncRNA annotations or any other annotations in GFF3 format c. Change individual tool settings. Depending on the sequencing technology, the strand-specificity options need to be specified for the HISAT2 and featureCounts tools. Since in this example the sample was prepared using the Illumina TruSeq Stranded Total RNA library kit, the "Reversed" option should be selected in test and control steps for HISAT2 and featureCounts (Fig. 17). When following this Sweeney et al.

of 25
Current Protocols in Bioinformatics Figure 17 Configuring HISAT2 and featureCounts tools in Galaxy for analyzing test and control data (all four parameters should be set to "Reversed" for this example). protocol for analyzing different datasets, make sure to specify the correct strandedness depending on your specific use case. d. Click "Run workflow."

Examine workflow outputs.
When all tasks in the History panel turn green, the results are ready and you should see the following outputs: A selected_annotation GFF3 file containing ncRNAs with statistically significant changes in expression levels between the control and test conditions. Click the eye icon (Fig. 18, left) to view the data (Fig. 18,

of 25
Current Protocols in Bioinformatics Figure 19 Local IGV browser showing an example differentially expressed lncRNA CR44218 (URS0000068A58_7227).
In addition, the History panel will contain quality control reports from FASTQC, alignments of the reads to the reference genome (BAM format), and Count read matrix for all samples.
The precomputed results are stored in a public Galaxy history and can be viewed at https: // usegalaxy.org/ u/ rnacentral/ h/ rnacentral-cpb-protocol. 8. Visualize results locally using IGV.
To visualize the genome location of the differentially expressed ncRNAs, you can use the IGV software (Robinson et al., 2011).
a. To specify the species and the genome assembly of the output GFF3 file, find the selected_annotation file in the History panel and click "Edit attributes" pencil icon. Next, choose the dm6 genome assembly in the "Database/Build" dropdown list. b. Open the IGV application on your computer and click "display with IGV local" in the Galaxy History panel. You can navigate the genome to zoom in to the differentially expressed RNAs shown in the selected_annotation IGV track (Fig. 19).

RNAcentral identifiers
In RNAcentral, each distinct ncRNA sequence is assigned a Unique RNA Sequence identifier (URS ID), which is stable across releases. As the same sequence can be observed in multiple species, RNAcentral also supports species-specific identifiers (The RNAcentral Consortium, 2017), which consist of the URS ID joined with the NCBI taxid (Federhen, 2012) for the species where the sequence occurs.
For example, URS00004BFD1E_9606 refers to the human hsa-let-7f-1 microRNA, while URS00004BFD1E_9544 refers to the same sequence in rhesus macaque. Note that the URS ID is the same in both cases while 9606 and 9544 are the NCBI taxids for human and rhesus macaque, respectively.

Genome mapping
RNAcentral genome annotations are based on a comprehensive mapping procedure that aligns all sequences without genome coordinates to the corresponding reference genome (The RNAcentral Consortium, 2019). If a member database provides the genome locations to RNAcentral when submitting the data, these coordinates are used and no mapping is performed. The source of genomic coordinates is specified both in the genome browser (Fig. 2C) and the GFF3 files in the FTP archive.
RNAcentral maps the sequences onto the most recent reference genome assembly from Ensembl. This procedure finds coordinates for ncRNA sequences from a database without an explicit connection to a genome, such as the ENA (Amid et al., 2020). The RNAcentral genome mapping helped improve model organism database annotations. For example, ten D. melanogaster snoRNA genes from the ENA database were added to FlyBase after these sequences were mapped to the fly genome in RNAcentral (The RNAcentral Consortium, 2019).
Note that the current procedure reports all genome alignments, and the results may require additional filtering. For example, short sequences, such as piRNAs can be similar to multiple genome regions, including those outside the piRNA clusters. Depending on the use case, such sequences can be excluded using a strategy described in Basic Protocol 4.

Alternative Galaxy workflows
Note that the results of the workflow described in Support Protocol 3 require additional interpretation and analysis. The readers are referred to the protocols dedicated to RNAseq analysis for further information (see Current Protocols article: Ji & Sadreyev, 2018; also see Yalamanchili, Wan, & Liu, 2017). The workflow described above may require modifications if newer software versions become available in Galaxy, or depending on the specific RNA-seq technology. For example, paired-end RNA-seq datasets should be processed using a different workflow (see https:// rnacentral.org/ help/ galaxy for an example).
If ncRNA annotations for the genome of interest is not available in RNAcentral, the genome can be annotated with Rfam covariance models and Infernal (Nawrocki & Eddy, 2013) using the steps outlined in the Rfam protocol (see Current Protocols article: Kalvari et al., 2018). In addition, RNAcentral sequences from a related taxonomic group can be mapped onto the genome using BLAST, blat, or other software (for example, all Diptera sequences from RNAcentral can be aligned to a newly sequenced fly genome). These workflows are discussed in more detail at https:// rnacentral.org/ help/ galaxy.

Non-coding RNAs
Non-coding RNAs (ncRNAs) are transcribed from the DNA similar to messenger RNAs, but are not translated into proteins. ncRNAs are found in all organisms and have a broad range of functions. For example, tRNA and rRNA are required for protein synthesis and are essential for all life, while the functions of many lncRNAs are still unclear. In humans, ncRNA expression has been tied to a variety of diseases such as ovarian cancer (Huang et al., 2002), hearing impairment (Finnilä & Majamaa, 2003), and dermatomyositis (Eisenberg et al., 2007). Due to their importance to cellular function, it is important to consider not only protein-coding Sweeney et al.

of 25
Current Protocols in Bioinformatics genes, but also ncRNAs when analyzing RNAseq datasets, as shown in Support Protocol 3.

Sequence naming and RNA type
RNAcentral provides descriptions and RNA type for all sequences. These annotations are essential to understanding the function of any RNA sequence, but there are some important factors to consider. These annotations are computed automatically from the descriptions and RNA types provided by the member databases. No description or RNA type is assigned manually in RNAcentral, although sequences from certain member databases, such as GENCODE or HGNC, may be manually curated. Additionally, member databases may disagree on an annotation. In such cases, RNAcentral strives to pick the annotations that are most consistent with the available data.

Transcript-level organization
RNAcentral is currently organized at the level of individual transcripts. This means that if several databases provide different sequences for the same RNA gene, all of these sequences will be available in RNAcentral under separate URS IDs. In order to visualize these related sequences, you can use the embedded genome browser on sequence report pages, as similar sequences will be mapped to the same genomic region (see Basic Protocol 1).
For example, Figure 20 shows the details of the overlapping RNAcentraltranscripts corresponding to the human miR-181b-1 mi-croRNA. There are two alternative versions of the precursor sequence, three alternative 5 and two 3 mature sequences. These entries come from different databases and have different types of annotations. At the time of writing, to get a complete picture of all the information available for this microRNA, it is recommended to view all seven sequence reports, as demonstrated in Basic Protocol 1. Work is underway to create gene-level entries that would aggregate and prioritize transcripts from the same gene.