Expression Report

NextGENe Online Help : NextGENe Viewer Reports and Tools : Expression Report

Expression Report

The Expression report provides expression levels/coverage for different regions of the reference genome, which is critical information that is needed for expression studies such as small RNA analysis and transcriptome studies. If you have a subscription to Pathway Studio, then for any sequence alignment project, you can import the report data for a single gene or all genes in to Pathway Studio and view the associated biological pathway information.

Before you can use the Expression report in conjunction with Pathway Studio, your Pathway Studio access option must be set for the NextGENe Viewer. See Process menu in NextGENe Viewer main menu.

The following procedure describes how to set up a new Expression report. Optionally, on the Expression Report Settings dialog box, you can click Load Settings to browse to and select a Settings file (.ini file) to generate the report based on the saved settings in the file.

1. On the NextGENe Viewer Reports menu, click Expression Report.

The Expression Report Settings dialog box opens. The General tab is the open tab.

2. Specify how to define the segments that are to be analyzed for the report.

Option	Description
Use segments as defined in reference files.	• Contig – Report coverage levels for each contig. Note: This option is appropriate if you are using a reference that was recreated from a BED file for custom amplicons. • Gene - Report coverage levels for each gene region. • Continuous Exon - Report coverage levels for the entire mRNA region for a gene, one region per gene. • ROI - Enabled only if you have loaded a project with Regions of Interest (ROIs) defined in a GenBank reference file. Report coverage levels based on Regions of Interest that are defined in the GenBank reference file. Note: For information about defining ROIs in a GenBank reference file, see Advanced GBK Editor tool. • Exon - Report coverage levels for each mRNA region. (Coding and non-coding exons.) • Continuous CDS - Report coverage levels for the entire coding region for a gene, one region per gene. • Amplicon - Available only if an amplicon BED file was loaded during the Load Data step for the project. (See To set ROI regions from a BED or GBK file.) For overlapping amplicons, each read is counted only for its intended amplicon, where the intended amplicon is determined by the percentage of the amplicon that the read covers. The amplicon with the higher coverage is selected as the intended amplicon. • CDS - Report coverage levels for each coding region.
Set incremental segment length	Specify the segment length, relative to either the reference positions in the contig or the chromosome positions.
Input region of interest (*.bed)	A BED file is a tab-delimited text file. You can upload a BED file only if the reference sequence contains chromosome information, which means that the reference sequence must be either a preloaded reference that NextGENe supplies, or a GenBank reference file that contains chromosome information.

3. Optionally, select one or both Limit options and if appropriate, modify the default limits (200 bp) for reporting the coverage for only the first or last '”x” number of bases of the selected segment type.

If any Limit option and CDS are selected, then the coverage levels for the first or last “x” number of bases in each CDS region are reported.

4. Open the Display tab and select the columns that are to be included in the report, or clear the options for the columns that are not to be included.

Column	Description
Index	An ordered count of the segments that are used in the report.
Chr • Name • Number	• The name of the chromosome on which the segment is located. • The number of the chromosome on which the segment is located.
Chr Position Start	The base number that indicates where the segment starts in the chromosome.
Chr Position End	The base number that indicates where the segment ends in the chromosome.
Chr Length	The total number of bases from where the segment starts in the chromosome to where the where the segment ends in the chromosome.
Gene	The gene name for the segment when the segment is the whole gene or the name of the gene on which the segment is found.
Entrez Gene ID	The unique integer ID generated for the gene by Entrez Gene.
Exon	The exon number where the segment is found. This number includes non-coding exons.
CDS	The coding sequence number for the segment.
RNA Accession	The RNA accession for the gene from NCBI.
Protein Accession	The protein accession for the gene from NCBI.
Description	Available if the reference file is a .fasta file with multiple segments. Select this option to display the title line for each segment in the Description column.
Contig	The contig on which the segment is located. The contig is based on the genome assembly from the NCBI.
Locus Tag	An alternate way to identify the gene.
Start	The starting location for the reference region.
End	The ending location for the reference region.
Reference Length	The total length of the reference region, which provides for easy identification of expressed regions by size (such as when locating small RNA transcripts).
Min Coverage	The minimum number of reads that aligned at any single position within the reference region. Note: For projects that also used condensation, this column shows the minimum number of condensed reads.
Max Coverage	The maximum number of reads that aligned at any single base position within the reference region. Note: For projects that also used condensation, this column shows the maximum number of condensed reads.
Average Coverage	The average coverage for the reference region, which is calculated according to the following: Total Number of Bases Aligned to the Region/Region Length Note: For projects that also used condensation, this calculation uses the total number of bases in the condensed reads.
Minimum Forward Read Coverage	The minimum number of forward reads that aligned at any single position within the reference region.
Minimum Reverse Read Coverage	The minimum number of reverse reads that aligned at any single position within the reference region.
Read Counts	The total number of reads aligned to the indicated reference region. Note: The middle base of a read must be aligned to the region to be counted. If only the end of the read is aligned to the region, then the read is not counted. Note: For projects that also used condensation, this is the total number of condensed reads.
Forward Read Counts	The number of forward reads aligned to the indicated reference region. Note: The middle base of a read must be aligned to the region to be counted. If only the end of the read is aligned to the region, then the read is not counted.
Fragment Counts	Applicable only if the project used paired read data. Counts each paired read as one fragment instead of each individual read being counted.
RPKM	Reads per Kilobase Exon Model per Million mapped reads. RPKM = 10^9 * R / (T*L) where: • R = Number of mapped reads in a region • T = Total number of mapped reads. • L = Length of the region. Normalizes the expression levels based on the length of the reference region and the total number of aligned reads.
RPK	Reads that mapped to the indicated segment divided by the total number of mapped reads and then multiplied by 1000. Normalizes the expression levels based on the total number of aligned reads.
FPKM	Applicable only if the project used paired-end data. Fragments per Kilobase of exon per Million mapped reads. FPKM = 10^9 * F / (T*L) where: • F = Number of mapped fragments in a region and: • A “fragment” corresponds to a pair of reads. • Single reads are not counted. • The position of a fragment is the location between the two 5’ ends of the pairs. • T = Total number of mapped fragments. • L = Length of the region. Normalizes the expression levels for paired end data based on the length of the reference region and the total number of aligned reads.
Original Max Coverage	Applicable only if the project also used condensation.
Original Average Coverage	Applicable only if the project also used condensation.
Original Read Counts	Applicable only if the project also used condensation.

5. Open the Filters tab, and then do the following:

• Specify the filter settings for the report. Only those genes that have the FPKM values or the RPKM values that fall within the indicated range are displayed in the report. (FPKM is available only if the project used paired-end data.)

Setting filters is particularly useful for limiting the number of genes that are imported into Pathway Studio for visualization of the biological pathway information.

• Optionally, to exclude any reads in the regions that are specified in the BED file from the report, select Exclude reads in BED regions.

FPKM is available only if the project used paired end data.

6. Open the Summary Report tab and specify how the Expression report is to be named and which of its information is to be displayed in the Summary report.

Setting	Description
Report Name	The name that is displayed for the Expression report in the Summary report.
Display Expression Report Summary	Display the summary information for the Expression report in the Summary report.
Display Expression Report	Display the expression information in the Summary report.

If you change any information on the Summary Report tab, then you must save these settings in a Settings file (.ini file). These settings are applied to the Expression report only if you select this Settings file during the setup of the Summary report. See Summary Report.

7. Optionally, click Save Settings to save the settings for this report in a Settings file (.ini file). You can use a saved Settings file to specify the post-processing options for a project in:

• The Project Wizard. See To specify the post-processing options for a Sequence Alignment project.

• The NextGENe AutoRun tool. See The NextGENe AutoRun Tool.

• The Summary report. See Summary Report.

8. Click OK to generate the report.

The report is interactive:

Option	Description
Sort the report results	Double-click any column heading.
View a position or region in the Alignment viewer	Double-click a value in any column.
Note: Before you can use the Coverage Curve report in conjunction with either Pathway Studio option below, your Pathway Studio access option must be set for the NextGENe Viewer. See Process menu in NextGENe Viewer main menu.
• Import a selected gene into Pathway Studio for visualization of its biological pathway information	• Right-click the gene, and on the context menu that opens, select View in Pathway Studio.
• Import all the genes that are displayed in the report into Pathway Studio for visualization of the biological pathway information	• On the report main menu, click File > Upload gene list to Pathway Studio.
Copy any information in the report (a single cell or multiple cells, a row or rows, or a column or columns) to your clipboard	Select the cells, rows, or columns, right-click any selection, and on the context menu that opens, click Copy. You can then use the standard Paste commands to paste the copied information into a third-party application.
Save the report to a text (*.txt) file	Do one of the following: • On the report menu, click File > Save. • On the report toolbar, click the Save Report icon . A default name (<project_name>_Coverage_Curve_Report) and location (project output folder) are provided for the file, but you can change both of these values.
Modify the report settings and dynamically update the report display	1. On the report menu, click Settings > Settings. The Expression Report Settings dialog box opens. 2. Modify the report settings, and then click OK to close the dialog box.