background preloader

ENCODE

Facebook Twitter

New functions for 'junk' DNA? DNA is the molecule that encodes the genetic instructions enabling a cell to produce the thousands of proteins it typically needs. The linear sequence of the A, T, C, and G bases in what is called coding DNA determines the particular protein that a short segment of DNA, known as a gene, will encode. But in many organisms, there is much more DNA in a cell than is needed to code for all the necessary proteins. This non-coding DNA was often referred to as "junk" DNA because it seemed unnecessary. But in retrospect, we did not yet understand the function of these seemingly unnecessary DNA sequences. We now know that non-coding DNA can have important functions other than encoding proteins. To identify the most highly conserved plant CNSs, Burgess and Freeling compared the genome (one copy of all the DNA in an organism) of the model plant Arabidopsis, a member of the mustard family, with the genome of columbine, a distantly related plant of the buttercup family.

More information: References. ENCODE Virtual Machine and Cloud Resource. The ENCODE consortium have published an integrated analysis of ENCODE genome wide data at Every analysis presented in the paper depends upon specific software processing that has a series of source data files, that are then transformed into output files for that analysis from which the final figure(s) and statements in the paper are made.

As part of the supplementary material for this paper, we have established a virtual machine instance of the software, using the code bundles from ftp.ebi.ac.uk/pub/databases/ensembl/encode/supplementary/, where each analysis program has been tested and run. Where possible the VM enables complete reproduction of the analysis as it was performed to generate the figures, tables or other information. However in some cases the analysis involved highly parallelised processing within a specialised multiprocessor environment. Instructions for accessing the VM are provided further down on this page.

Cloud Instance. Most of what you read was wrong: how press releases rewrote scientific history. This week, the ENCODE project released the results of its latest attempt to catalog all the activities associated with the human genome. Although we've had the sequence of bases that comprise the genome for over a decade, there were still many questions about what a lot of those bases do when inside a cell. ENCODE is a large consortium of labs dedicated to helping sort that out by identifying everything they can about the genome: what proteins stick to it and where, which pieces interact, what bases pick up chemical modifications, and so on.

What the studies can't generally do, however, is figure out the biological consequences of these activities, which will require additional work. Yet the third sentence of the lead ENCODE paper contains an eye-catching figure that ended up being reported widely: "These data enabled us to assign biochemical functions for 80 percent of the genome. " This was more than a matter of semantics. What we know about DNA, and when we knew it. Ewan's Blog; bioinformatician at large: ENCODE: My own thoughts. 5 September 2012 - Today sees the embargo lift on the second phase of the ENCODE project and the simultaneous publication of 30 coordinated, open-access papers in Nature, Genome Research and Genome Biology as well as publications in Science, Cell, JBC and others. The Nature publication has a number of firsts: cross-publication topic threads, a dedicated iPad/eBook App and web site and a virtual machine. This ENCODE event represents five years of dedicated work from over 400 scientists, one of whom is myself, Ewan Birney.

I was the lead analysis coordinator for ENCODE for the past five years (and before that had effectively the same role in the pilot project) and for the past 11 months have spent a lot of time working up to this moment. There were countless details to see to for the scientific publications and, later, to explain it all in editorials, commentary, general press features and other exotic things. Q. A. Q. A. Q. A. Q. A. Q. A. Q. A. Q. A. Q. A. Q. A. Q. A. Q. A. Q. A. Q. A. Beyond the sequence. : Genomics: ENCODE explained : Nature. ENCODE : Nature Publishing Group : A landmark in the understanding of the human genome. Fighting about ENCODE and junk. A red junk at Tsim Sha Tsui Alfonso Jimenez and Flickr On Wednesday, a handful of journals, including this one, released more than 30 papers describing results from the second phase of ENCODE: a consortium-driven project tasked with building the ‘ENCyclopedia Of DNA Elements’, a manual of sorts that defines and describes all the functional bits of the genome.

Many reactions to the slew of papers, their web and iPad app presentations and the news coverage that accompanied the release were favourable. But several critics have challenged some of the most prominently reported claims in the papers, the way their publication was handled and the indelicate use of the word ‘junk’ on some material promoting the research.

First up was a scientific critique that the authors had engaged in hyperbole. In the main ENCODE summary paper, published in Nature, the authors prominently claim that the ENCODE project has thus far assigned “biochemical functions for 80% of the genome”. An integrated encyclopedia of DNA elements in the human genome : Nature. Since 2007, ENCODE has developed methods and performed a large number of sequence-based studies to map functional elements across the human genome3. The elements mapped (and approaches used) include RNA transcribed regions (RNA-seq, CAGE, RNA-PET and manual annotation), protein-coding regions (mass spectrometry), transcription-factor-binding sites (ChIP-seq and DNase-seq), chromatin structure (DNase-seq, FAIRE-seq, histone ChIP-seq and MNase-seq), and DNA methylation sites (RRBS assay) (Box 1 lists methods and abbreviations; Supplementary Table 1, section P, details production statistics)3.

To compare and integrate results across the different laboratories, data production efforts focused on two selected sets of cell lines, designated ‘tier 1’ and ‘tier 2’ (Box 1). To capture a broader spectrum of biological diversity, selected assays were also executed on a third tier comprising more than 100 cell types including primary cells.

Box 1: ENCODE abbreviations Integration methodology. ENCODE: the rough guide to the human genome. Back in 2001, the Human Genome Project gave us a nigh-complete readout of our DNA. Somehow, those As, Gs, Cs, and Ts contained the full instructions for making one of us, but they were hardly a simple blueprint or recipe book. The genome was there, but we had little idea about how it was used, controlled or organised, much less how it led to a living, breathing human. That gap has just got a little smaller. A massive international project called ENCODE – the Encyclopedia Of DNA Elements – has moved us from “Here’s the genome” towards “Here’s what the genome does”. For years, we’ve known that only 1.5 percent of the genome actually contains instructions for making proteins, the molecular workhorses of our cells.

It contains docking sites where proteins can stick and switch genes on or off. According to ENCODE’s analysis, 80 percent of the genome has a “biochemical function”. And what’s in the remaining 20 percent? Think of the human genome as a city. Where will it lead us? The 3-D genome. Scientists discover double meaning in genetic code. Scientists have discovered a second code hiding within DNA. This second code contains information that changes how scientists read the instructions contained in DNA and interpret mutations to make sense of health and disease. High resolutionClick to expand Genome scientist Dr. John Stamatoyannopoulos led a team that discovered a second code hidden in DNA. A research team led by Dr. John Stamatoyannopoulos, University of Washington associate professor of genome sciences and of medicine, made the discovery. Read the research paper. The work is part of the Encyclopedia of DNA Elements Project, also known as ENCODE.

Since the genetic code was deciphered in the 1960s, scientists have assumed that it was used exclusively to write information about proteins. “For over 40 years we have assumed that DNA changes affecting the genetic code solely impact how proteins are made,” said Stamatoyannopoulos. The genetic code uses a 64-letter alphabet called codons. Stephanie H.