background preloader

Databases

Facebook Twitter

RNASeqMetaDB – a database and web server for navigating metadata of publicly available mouse RNA-Seq data sets. Gene targeting is a protocol for introducing a mutation to a specific gene in an organism. Because of the importance of in vivo assessment of gene function and modeling of human diseases, this technique has been widely adopted to generate a large number of mutant mouse models. Due to the recent breakthroughs in high-throughput sequencing technologies, RNA-Seq experiments have been performed on many of these mouse models, leading to hundreds of publicly available data sets. To facilitate the reuse of these data sets, researchers from Texas A&M University collected the associated metadata and organized them in a database called RNASeqMetaDB. The metadata was manually curated to ensure annotation consistency. Availability – Freely available on the web at: Contact: pengyu.bio@gmail.com. LifeMap Sciences Announces the Next Generation of GeneCards® – Version 4.0.

The New GeneCards® Powers LifeMap’s NGS Analysis and Interpretation PlatformEnables Rapid Integration with Partners in the NGS Storage, Analysis and Interpretation Markets and Will Allow for Partnering OpportunitiesNew User Interface, Functionality and Application Programming Interface (“API”) API Provide Improved Offering for Academia, Biopharma and Diagnostic Companies ALAMEDA, Calif. –(BUSINESS WIRE)–LifeMap Sciences, Inc., a subsidiary of BioTime, Inc., announced today the release of GeneCards® Version 4.0 with a new technology platform and redesigned user interface. The new release is available now at www.genecards.org. GeneCards is a database that provides comprehensive information on all known human genes, and is available via the cloud and onsite installation. “We are excited to introduce the new GeneCards 4 platform, a product in development over the last two years” “It’s a pleasure to see this new major version of GeneCards come to light.

About LifeMap Sciences, Inc. MSProGene – integrative proteogenomics beyond six-frames and single nucleotide polymorphisms. Ongoing advances in high-throughput technologies have facilitated accurate proteomic measurements and provide a wealth of information on genomic and transcript level. In proteogenomics, this multi-omics data is combined to analyze unannotated organisms and to allow more accurate sample-specific predictions. Existing analysis methods still mainly depend on six-frame translations or reference protein databases that are extended by transcriptomic information or known single nucleotide polymorphisms (SNPs). However, six-frames introduce an artificial sixfold increase of the target database and SNP integration requires a suitable database summarizing results from previous experiments.

Researchers from the Robert Koch Institute overcome these limitations by introducing MSProGene, a new method for integrative proteogenomic analysis based on customized RNA-Seq driven transcript databases. Simplified example of a proteogenomic network Availability – MSProGene is written in Java and Python. TranscriptCoder – Proteomic validation of transcript isoforms, including those assembled from RNA-Seq data. SATRAT – Staphylococcus aureus transcript regulatory network analysis tool. IIIDB – a database for isoform-isoform interactions and isoform network modules. Protein-protein interactions (PPIs) are key to understanding diverse cellular processes and disease mechanisms. However, current PPI databases only provide low-resolution knowledge of PPIs, in the sense that “proteins”of currently known PPIs generally refer to “genes.”It is known that alternative splicing often impacts PPI by either directly affecting protein interacting domains, or by indirectly impacting other domains, which, in turn, impacts the PPI binding.

Thus, proteins translated from different isoforms of the same gene can have different interaction partners. Due to the limitations of current experimental capacities, little data is available for PPIs at the resolution of isoforms, although such high-resolution data is crucial to map pathways and to understand protein functions. In fact, alternative splicing can often change the internal structure of a pathway by rearranging specific PPIs. Availability – The web interface allows users to search for IIIs or III network modules.

PD_NGSAtlas – a reference database combining next-generation sequencing epigenomic and transcriptomic data for psychiatric disorders. Psychiatric disorders such as schizophrenia (SZ) and bipolar disorder (BP) are projected to lead the global disease burden within the next decade. Several lines of evidence suggest that epigenetic- or genetic-mediated dysfunction is frequently present in these disorders. To date, the inheritance patterns have been complicated by the problem of integrating epigenomic and transcriptomic factors that have yet to be elucidated. Therefore, there is a need to build a comprehensive database for storing epigenomic and transcriptomic data relating to psychiatric disorders. Researchers at Harbin Medical University have developed the PD_NGSAtlas, which focuses on the efficient storage of epigenomic and transcriptomic data based on next-generation sequencing and on the quantitative analyses of epigenetic and transcriptional alterations involved in psychiatric disorders.

Availability – The database is available at. The Sorghum Transcriptome Database. SMITH – a LIMS for handling next-generation sequencing workflows. Wet-lab scientists of the Centre for Genomic Science and database experts from the Politecnico of Milan in the context of a Genomic Data Model Project developed SMITH a web application with a MySQL server at the backend. The data base schema stores all the information of an NGS experiment, including the descriptions of all protocols and algorithms used in the process. Notably, an attribute-value table allows associating an unconstrained textual description to each sample and all the data produced afterwards. This method permits the creation of metadata that can be used to search the database for specific files as well as for statistical analyses. SMITH runs automatically and limits direct human interaction mainly to administrative tasks.

SMITH data-delivery procedures were standardized making it easier for biologists and analysts to navigate the data. Automation also helps saving time. The workflows are available through an API provided by the workflow management system. ChiTaRS 2.1 – an improved database of the chimeric transcripts and RNA-seq data with novel sense-antisense chimeric RNA transcripts. Chimeric RNAs that comprise two or more different transcripts have been identified in many cancers and among the Expressed Sequence Tags (ESTs) isolated from different organisms; they might represent functional proteins and produce different disease phenotypes. The ChiTaRS 2.1 database of chimeric transcripts and RNA-Seq data is the second version of the ChiTaRS database and includes improvements in content and functionality. Chimeras from eight organisms have been collated including novel sense-antisense (SAS) chimeras resulting from the slippage of the sense and anti-sense intragenic regions.

The new database version collects more than 29 000 chimeric transcripts and indicates the expression and tissue specificity for 333 entries confirmed by RNA-seq reads mapping the chimeric junction sites. User interface allows for rapid and easy analysis of evolutionary conservation of fusions, literature references and experimental data supporting fusions in different organisms. Newly aligned modENCODE RNA-Seq coverage data. Sue Celniker’s group has provided FlyBase with updated RNA-Seq coverage data for the modENCODE transcriptome datasets. The high-throughput sequencing reads from these experiments have been re-aligned to the new BDGP Release 6 reference genome assembly (NCBI accession GCA_000001215.4). This update includes new RNA-Seq coverage profiles for unpublished Sindbis virus treatments.

These re-aligned data are part of FlyBase update FB2014_06, and replace the “Release 5-to-Release 6” lift-over RNA-Seq data in place for the past two Release 6-based FlyBase updates (FB2014_04, FB2014_05). The great advantage of these re-aligned data is that RNA-Seq coverage data is now available for new regions of the genome assembly. This improvement is notable in regions of centric heterochromatin, which have undergone substantial revision for the new Release 6 genome assembly. The re-aligned modENCODE transcriptome data include the developmental profile of Graveley et al. Source – Flybase. Tissue-specific transcriptome sequencing analysis expands the non-human primate reference transcriptome resource (NHPRTR) GeneFriends – a human RNA-seq-based gene and transcript co-expression database. Co-expression networks have proven effective at assigning putative functions to genes based on the functional annotation of their co-expressed partners, in candidate gene prioritization studies and in improving our understanding of regulatory networks.

The growing number of genome resequencing efforts and genome-wide association studies often identify loci containing novel genes and there is a need to infer their functions and interaction partners. To facilitate this researchers at the University of Liverpool have expanded GeneFriends, an online database that allows users to identify co-expressed genes with one or more user-defined genes. This expansion entails an RNA-seq-based co-expression map that includes genes and transcripts that are not present in the microarray-based co-expression maps, including over 10 000 non-coding RNAs. The results users obtain from GeneFriends include a co-expression network as well as a summary of the functional enrichment among the co-expressed genes.

HSDB – Thoroughbred Horse Single Nucleotide Polymorphism and Expression Database. Brain RNA-Seq – An RNA-Sequencing Transcriptome and Splicing Database. MiRNEST 2.0 – a database of plant and animal microRNAs. APADB: a database for alternative polyadenylation and microRNA regulation events. Researchers at the University of Liverpool are building the world’s most comprehensive database.

Researchers at the University of Liverpool are building the world’s most comprehensive database describing human and animal pathogens, which can be used to prevent and tackle disease outbreaks around the globe. The Enhanced Infectious Diseases (EID2) database has been developed by the Liverpool University Climate and Infectious Diseases of Animals (LUCINDA) team and is funded by a BBSRC Strategic Tools and Resources Development Fund grant.

Effectively mapping the relationships between human and animal diseases and their hosts, disease-causing pathogens and the ways in which pathogens are transmitted can offer huge benefits when it comes to knowing what the disease risks are in a population or geographical area, and how best to manage and eliminate them. The EID2 team realised that there was a potential treasure trove of data already available in the scientific literature and in pre-existing databases, which was just waiting to be mined for useful insights – a ‘Big Data’ approach. ChiloDB – a genomic and transcriptome database for an important rice insect pest Chilo suppressalis. A novel multi-alignment pipeline for high-throughput sequencing data.

Sample sequencing of vascular plants demonstrates widespread conservation and divergence of microRNAs. SFGD: a comprehensive platform for mining functional information from soybean transcriptome data. The MOuse NOnCode Lung database. Ensembl 74 has been released! The Integrated Microbial Genomes (IMG) data warehouse now includes analysis pipeline for RNA-Seq. The Queryable RNA Seq Database. TIARA genome database – update 2013. NeXtProt - exploring the universe of human proteins. New features in the Variant Effect Predictor | Ensembl Blog. The Variant Effect Predictor (VEP) software can predict the consequence of genomic variants using the genomic annotations provided by Ensembl. In release 63 of Ensembl we have added new features to both the script and web versions of the VEP. Regulatory consequences have made their return; the VEP now reports if a variant falls within a regulatory region or a transcription factor binding motif, and furthermore if the variant falls in a high information locus within the motif.

The VEP now also has a dedicated area of the Ensembl website documentation. Script version To improve performance for users in the USA, we have now deployed a mirror of the public database server; to use this simply pass the flag “–host useastdb.ensembl.org” when running the script. We have also implemented a caching system in the VEP, such that is possible to use almost all of the functionality of the script without the script querying the database at all.

Web version. New public MySQL server | Ensembl Blog. Alongside our website, ensembl provides direct access to our databases through our public MySQL server ensembldb.ensembl.org and as of today, we are pleased to announce the availability of a second MySQL mirror hosted on the east coast of the US. The new server is running on Amazon Cloud with the hostname useastdb.ensembl.org it can be directly direct accessed with the mysql client using port 5306 and username anonymous. eg. mysql -h useastdb.ensembl.org -u anonymous -P5306 It may also be accessed through our perl API with the following registry incantation: use Bio::EnsEMBL::Registry; my $registry = 'Bio::EnsEMBL::Registry'; $registry->load_registry_from_db( -host => 'useastdb.ensembl.org', -user => 'anonymous'); useastDB will provide the current ensembl release alongside the previous on a rolling basis. We hope that our users enjoy the faster access to our data that this new MySQL mirror should provide.

FlyBase adds RNA-Seq Data Sets. Nucleotide Databases.