Мы используем файлы cookie.
Продолжая использовать сайт, вы даете свое согласие на работу с этими файлами.

Ensembl genome database project

Другие языки:

Ensembl genome database project

Подписчиков: 0, рейтинг: 0

Ensembl genome database project.
Content

Description	Ensembl
Contact
Research center	European Bioinformatics Institute
Primary citation	Yates, et al. (2020)
Access
Website	www.ensembl.org

Ensembl genome database project is a scientific project at the European Bioinformatics Institute, which provides a centralized resource for geneticists, molecular biologists and other researchers studying the genomes of our own species and other vertebrates and model organisms. Ensembl is one of several well known genome browsers for the retrieval of genomic information.

Similar databases and browsers are found at NCBI and the University of California, Santa Cruz (UCSC).

History

The human genome consists of three billion base pairs, which code for approximately 20,000–25,000 genes. However the genome alone is of little use, unless the locations and relationships of individual genes can be identified. One option is manual annotation, whereby a team of scientists tries to locate genes using experimental data from scientific journals and public databases. However this is a slow, painstaking task. The alternative, known as automated annotation, is to use the power of computers to do the complex pattern-matching of protein to DNA. The Ensembl project was launched in 1999 in response to the imminent completion of the Human Genome Project, with the initial goals of automatically annotate the human genome, integrate this annotation with available biological data and make all this knowledge publicly available.

In the Ensembl project, sequence data are fed into the gene annotation system (a collection of software "pipelines" written in Perl) which creates a set of predicted gene locations and saves them in a MySQL database for subsequent analysis and display. Ensembl makes these data freely accessible to the world research community. All the data and code produced by the Ensembl project is available to download, and there is also a publicly accessible database server allowing remote access. In addition, the Ensembl website provides computer-generated visual displays of much of the data.

Over time the project has expanded to include additional species (including key model organisms such as mouse, fruitfly and zebrafish) as well as a wider range of genomic data, including genetic variations and regulatory features. Since April 2009, a sister project, Ensembl Genomes, has extended the scope of Ensembl into invertebrate metazoa, plants, fungi, bacteria, and protists, focusing on providing taxonomic and evolutionary context to genes, whilst the original project continues to focus on vertebrates.

As of 2020, Ensembl supported over 50 000 genomes across both Ensembl and Ensembl Genomes databases, adding some new innovative features such as Rapid Release, a new website designed to make genome annotation data available more quickly to users, and COVID-19, a new website to access to SARS-CoV-2 reference genome.

Displaying genomic data

Gene SGCB aligned to the human genome

Central to the Ensembl concept is the ability to automatically generate graphical views of the alignment of genes and other genomic data against a reference genome. These are shown as data tracks, and individual tracks can be turned on and off, allowing the user to customise the display to suit their research interests. The interface also enables the user to zoom in to a region or move along the genome in either direction.

Other displays show data at varying levels of resolution, from whole karyotypes down to text-based representations of DNA and amino acid sequences, or present other types of display such as trees of similar genes (homologues) across a range of species. The graphics are complemented by tabular displays, and in many cases data can be exported directly from the page in a variety of standard file formats such as FASTA.

Externally produced data can also be added to the display by uploading a suitable file in one of the supported formats, such as BAM, BED, or PSL.

Graphics are generated using a suite of custom Perl modules based on GD, the standard Perl graphics display library.

Alternative access methods

In addition to its website, Ensembl provides a REST API and a Perl API (Application Programming Interface) that models biological objects such as genes and proteins, allowing simple scripts to be written to retrieve data of interest. The same API is used internally by the web interface to display the data. It is divided in sections like the core API, the compara API (for comparative genomics data), the variation API (for accessing SNPs, SNVs, CNVs..), and the functional genomics API (to access regulatory data). The Ensembl website provides extensive information on how to install and use the API.

This software can be used to access the public MySQL database, avoiding the need to download enormous datasets. The users could even choose to retrieve data from the MySQL with direct SQL queries, but this requires an extensive knowledge of the current database schema.

Large datasets can be retrieved using the BioMart data-mining tool. It provides a web interface for downloading datasets using complex queries.

Last, there is an FTP server which can be used to download entire MySQL databases as well some selected data sets in other formats.

Current species

The annotated genomes include most fully sequenced vertebrates and selected model organisms. All of them are eukaryotes, there are no prokaryotes. As of 2022, there are 271 species registered, this includes:

Species
Chordata	Mammalia	Euarchontoglires	Primates	Angola colobus, black-capped squirrel monkey, black snub-nosed monkey, bonobo, bushbaby, capuchin, chimpanzee, common marmoset, Coquerel's sifaka, crab-eating macaque, drill, human, macaque, mouse lemur, gelada, gibbon, golden snub-nosed monkey, gorilla, greater bamboo lemur, green monkey, Ma's night monkey, olive baboon, orangutan, pig-tailed macaque, sooty mangabey, tarsier, Ugandan red colobus
			Scandentia	tree shrew
			Glires (Rodents + Lagomorphs)	Algerian mouse, alpine marmot, american beaver, arctic ground squirrel, Brazilian guineapig, chinese hamster, damaraland mole rat, daurian ground squirrel, degu, eurasian red squirrel, golden hamster, ground squirrel, guineapig, kangaroo rat, lesser Egyptian jerboa, long-tailed chinchilla, mongolian gerbil, mouse, naked mole-rat, North American deermouse, rat, pika, prairie vole, rabbit, Ryukyu mouse, shrew mouse, steppe mouse, thirteen-lined ground squirrel, Upper Galilee mountains blind mole rat
		Laurasiatheria		Alpaca, american bison, american black bear, american mink, Arabian camel, asian black bear, beluga whale, blue whale, chacoan peccary, California sea lion, Canada lynx, cat, cow, dingo, dog, dolphin, domestic yak, donkey, goat, ferret, giant panda, greater horseshoe bat, hedgehog, horse, leopard, lesser hedgehog tenrec, lion, meerkat, megabat, microbat, narwhal, polar bear, pig, red fox, sheep, shrew, Siberian musk deer, sperm whale, Siberian tiger, vaquita, wild yak, yarkand deer
		Afrotheria		Elephant, hyrax, tenrec
		Xenarthra		Armadillo, sloth
		Marsupialia		Common wombat, koala, opossum, Tasmanian devil, wallaby
		Monotremes		Platypus
	Reptilia			Argentine black and white tegu, blue-ringed sea krait, central bearded dragon, chinese softshell turtle, common snapping turtle, common wall lizard, desert tortoise, eastern brown snake, saltwater crocodile, Goode's thornscrub tortoise, green anole, indian cobra, komodo dragon, mainland tiger snake, painted turtle, Pinta Island tortoise, three-toed box turtle, tuatara, West African mud turtle
	Birds			African ostrich, bengalese finch, blue-crowned manakin, blue tit, budgerigar, burrowing owl, chicken, chicken (Red junglefowl), chicken (maternal Broiler), chicken (paternal White leghorn layer), chilean tinamou, colared flycatcher, common canary, common kestrel, dark-eyed junco, duck, eastern buzzard, eastern spot-billed duck, emu, eurasian eagle-owl, eurasian sparrowhawk, golden eagle, golden pheasant, golden-collared manakin, gouldian finch, great tit, great spotted kiwi, helmeted guineafowl, indian peafowl, japanese quail, kakapo, little spotted kiwi, mallard, medium ground finch, muscovy duck, New Caledonian crow, northern spotted owl, okarito brown kiwi, oriental scops owl, pink-footed goose, ring-necked pheasant, ruff, rufous-capped babbler, silver-eye, small tree finch, spoon-billed sandpiper, superb fairywren, Swainson's thrush, swan goose, turkey, white-throated sparrow, yellow-billed amazon, zebu, zebra finch
	Lissamphibia			Leisan spiny toad, Xenopus tropicalis
	Teleosts			Amazon molly, asian arowana, atlantic cod, atlantic herring, atlantic salmon, ballan wrasse, barramundi perch, bicolor damselfish, blind barbel, blue tilapia, blunt-snouted clingfish, brown trout, Burton's mouthbrooder, channel bull blenny, channel catfish, chinese rmedaka, chinook salmon, climbing perch, clown anemonefish, coelacanth, coho salmon, common carp, denticle herring, eastern happy, electric eel, elephant shark, european bass, gilthead bream, golden-line barbel, goldfish, greater amberjack, guppy, hagfish, horned golden-line barbel, huchen, indian glassy fish, indian medaka, japanese medaka, javanese ricefish, jewelled blenny, large yellow croaker, live sharksucker, lumpfish, lyretail cichlid, Makobe island chichlid, mangrove rivulus, mexican tetra, Midas chichlid, Monterrey platyfish, mummichog, Nile tilapia, northern pike, ocean sunfish, orange clownfish, orbiculate cardinalfish, Paramormyrops kingsleyae, Periophthalmus magnuspinnatus, pike-perch, pinecone soldierfish, platyfish, rainbow trout, red-bellied piranha, reedfish, round goby, sailfin molly, sheepshead minnow, shortfin molly, Siamese fighting fish, spinny chromis, spotted gar, swamp eel, tetraodon, three-spined stickleback, tiger tail seahorse, tongue sole, turbot, turquoise killfish, western mosquitofish, yellowtail amberjack, Takifugu rubripes (fugu), zebrafish, zebra mbuna, zigzag eel
	Cyclostomata			Petromyzon marinus (sea lamprey)
	Tunicates			Ciona intestinalis, Ciona savignyi
Invertebrates	Insects			Drosophila melanogaster (fruitfly), Anopheles gambiae (mosquito), Aedes aegypti (mosquito)
Invertebrates	Worms			Caenorhabditis elegans
Yeast				Saccharomyces cerevisiae (baker's yeast)

Open source/mirrors

All data part of the Ensembl project is open access and all software is open source, being freely available to the scientific community, under a CC BY 4.0 license. Currently, Ensembl database website is mirrored at four different locations worldwide to improve the service.


Official mirror sites
UK (Sanger Institute) ---- main website
US West (Amazon AWS) ---- Cloud-based mirror on West Coast of United States
US East (Amazon AWS) ---- Cloud-based mirror on East Coast of United States
Asia (Amazon AWS) ---- Cloud-based mirror in Singapore

External links

Wikimedia Commons has media related to Ensembl.

Official website
Vega
Pre-Ensembl
Ensembl genomes
UCSC Genome Browser
NCBI
Ensembl: Browsing chordate genomes on EBI Train OnLine

v t e Bioinformatics
Databases	Sequence databases: GenBank, European Nucleotide Archive, DNA Data Bank of Japan and China National GeneBank Secondary databases: UniProt, database of protein sequences grouping together Swiss-Prot, TrEMBL and Protein Information Resource Other databases: Protein Data Bank, Ensembl and InterPro Specialised genomic databases: BOLD, Saccharomyces Genome Database, FlyBase, VectorBase, WormBase, Rat Genome Database, PHI-base, Arabidopsis Information Resource, GISAID and Zebrafish Information Network
Software	BLAST Bowtie Clustal EMBOSS HMMER MUSCLE PANGOLIN SAMtools SOAP suite TopHat
Other	Server: ExPASy Ontology: Gene Ontology Rosalind (education platform)
Institutions	Broad Institute Computational Biology Department (CBD) Microsoft Research - University of Trento Centre for Computational and Systems Biology (COSBI) Database Center for Life Science (DBCLS) DNA Data Bank of Japan (DDBJ) European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory (EMBL) Flatiron Institute J. Craig Venter Institute (JCVI) Max Planck Institute of Molecular Cell Biology and Genetics (MPI-CBG) US National Center for Biotechnology Information (NCBI) Japanese Institute of Genetics Netherlands Bioinformatics Centre (NBIC) Philippine Genome Center (PGC) Scripps Research Swiss Institute of Bioinformatics (SIB) Wellcome Sanger Institute Whitehead Institute
Organizations	African Society for Bioinformatics and Computational Biology (ASBCB) Australia Bioinformatics Resource (EMBL-AR) European Molecular Biology network (EMBnet) International Nucleotide Sequence Database Collaboration (INSDC) International Society for Biocuration (ISB) International Society for Computational Biology (ISCB) Student Council (ISCB-SC) Institute of Genomics and Integrative Biology (CSIR-IGIB) Japanese Society for Bioinformatics (JSBi)
Meetings	Basel Computational Biology Conference‎ ([BC²]) European Conference on Computational Biology (ECCB) Intelligent Systems for Molecular Biology (ISMB) International Conference on Bioinformatics (InCoB) International Conference on Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB) ISCB Africa ASBCB Conference on Bioinformatics Pacific Symposium on Biocomputing (PSB) Research in Computational Molecular Biology (RECOMB)
File formats	CRAM format FASTA format FASTQ format NeXML format Nexus format Pileup format SAM format Stockholm format VCF format
Related topics	Computational biology List of biobanks List of biological databases Molecular phylogenetics Sequencing Sequence database Sequence alignment
Category Commons

Wellcome Trust

Centres and institutes

Current	Francis Crick Institute Gurdon Institute Sainsbury Wellcome Centre for Neural Circuits and Behaviour Science Learning Centres WTC for Cell-Matrix Research WTC for Gene Regulation and Expression WTC for Human Genetics WTC for Mitochondrial Research WTC for Molecular Parasitology WTC for Neuroimaging WTC for Stem Cell Research Wellcome Sanger Institute
Former	Wellcome Trust Centre for the History of Medicine Wellcome Research Laboratories

Projects and facilities

Board of Governors

Eliza Manningham-Buller
Michael Ferguson
Tobias Bonhoeffer
Alan Brown
William Burns
Bryan Grenfell
Naguib Kheraj
Fiona Powrie

Executive Leadership Team

Jeremy Farrar
Chris Bird
Stephen Caddick
Simon Chaplin
Alyson Fox
Peter Pereira Gray
Mark Henderson
Chonnettia Jones
Tim Livett
Nick Moakes
Kathy Poole
Jim Smith
James Thomas
Ed Whiting

Former directors

Peter Williams (1965–1991)
Bridget Ogilvie (1991–1998)
Michael Dexter (1998–2008)
Mark Walport (2003–2013)

Other key people

Damon Buffini
William Castell
Dominic Cadbury
Harold Cook
Kay Davies
Oliver Franks
Roger Gibbs
Henry Dale
Richard Hynes
Anne Johnson
Roy Porter
Peter Rigby
David Steel
David Stuart
John Sulston
Henry Wellcome

Awards and fellowships

Capital Awards
Collaborative Awards in Science
Investigator Awards in Science
Institutional Strategic Support Fund
Science Strategic Award
Sir Henry Dale Fellowship
Sir Henry Wellcome Postdoctoral Fellowship
Wellcome Book Prize
Wellcome Image Awards
Wellcome Trust Centre
Wellcome Trust Principal Research Fellow
Wellcome Trust Senior Research Fellow

Category

Ensembl genome database project

History

Displaying genomic data

Alternative access methods

Current species

Open source/mirrors

See also

External links