Ensembl Sequence Statistics
Each species in Ensembl has a number of statistics for sequence length
that are displayed on the home page and, where the sequence is assembled
into chromosomes, on MapView pages. Since it may not be clear to users
how these numbers are calculated, we offer the following definitions:
- Base Pairs per chromosome
- These are pre-calculated in order to speed up page display, and stored
in the seq_region table of the core database. The number is based on the
assembled end position of the last seq_region in each chromosome (from the AGP),
or if there is a terminal gap it is set to the assembled end location of that
terminal gap.
For the haplotype chromosomes (c6_COX etc), although there is only
haplotype-specific sequence for a small region of the chromosome, the
length of the seq_region is set to the full length of the chromosome
including the specific haplotype (eg. c6_COX is 170899992bp long).
- Base Pairs (whole assembly)
- The total number of base pairs for the entire assembly is the sum of
all sequences in the dna table of the core database. This includes redundant
regions such as haplotypic sequences and the pseudo-autosomal region (PAR)
of the Y chromosome in human, and gaps in Drosophila melanogaster.
See the assembly details of each species for more information.
- Golden Path
- The "golden path" is the length of the reference assembly. It
consists of the sum of all top-level sequences in the seq_region table,
omitting any redundant regions such as haplotypes and PARs.
For information on gene counts, see the MapView help page.