background preloader

Systems biology research

Facebook Twitter

STATegra. Welcome | Graph Mining. Home page. Data visualization - How to interpret mean of Silhouette plot? Gerstein Lab Lecture Summary.

Systems biology researcher profiles

Pvclust: An R package for hierarchical clustering with p-values. An R package for hierarchical clustering with p-values Ryota Suzuki(a, b) and Hidetoshi Shimodaira(a) a) Department of Mathematical and Computing Sciences Tokyo Institute of Technology b) Ef-prime, Inc. What is pvclust? Pvclust is an R package for assessing the uncertainty in hierarchical cluster analysis. Pvclust provides two types of p-values: AU (Approximately Unbiased) p-value and BP (Bootstrap Probability) value. Pvclust performs hierarchical cluster analysis via function hclust and automatically computes p-values for all clusters contained in the clustering of original data. An example of analysis on Boston data (in library MASS) is shown in the right figure. 14 attributes of houses are examined and hierarchical clustering has been done.

Installation pvclust can be easily installed from CRAN. Install.packages("pvclust") On Windows you can use Packages -> Install package(s) from CRAN... from menu bar. Download The latest version should be found at the CRAN web site [FAQ] Q. > data(lung) OBRC: Online Bioinformatics Resources Collection | HSLS.

Online Learning | Informatics Training. In this course, Next Generation Sequencing (NGS), students learn about NGS technologies as well as computational and annotation tools for conducting practical genome-wide analysis and interpretation of NGS data. Furthermore, the promise of personalized medicine and the applications and implications of NGS in clinical settings will be discussed. Computational Statistics provides a practical introduction to analysis of biological and biomedical data. Basic statistical techniques will be covered, including descriptive statistics, elements of probability, hypothesis testing, nonparametric methods, correlation analysis, and linear regression.

Emphasis is on how to choose appropriate statistical tests and how to assess statistical significance. NGS - Module 1 | Informatics Training. Module 1: NGS - Technologies and Design OBJECTIVE: Gain basic knowledge of NGS technologies and platforms and develop an understanding of important aspects of NGS studies, including coverage and depth of sequencing, base calling and quality scores, and sources of error. Learn about NGS applications, including RNA-seq, ChIP-seq, and other functional sequencing assays. Assignment Write a research proposal for a study that utilizes NGS.

Assume that you have a budget of 20K for basic biological experiments or 100K for clinical-oriented projects. Some suggested page lengths for each section are noted, but these are guidelines only. Start with a description of the problem of interest, including the hypothesis that you would like to test. [~1 page] Find out what sequencing facilities are available to you and how much each type of sequencing will cost. http:/pcpgm.partners.org/research-services/dna-sequencing/pricing. [~.5-1 page.]

NIH LINCS Program. How can Taverna help me? | Taverna. If you need to perform multi-step or repetitive analysis that involves invoking several services, or if you find yourself copying and pasting results between different Web pages or services, and would like to automate this process, then Taverna could be suitable for you. Taverna allows you to define how your data flows between the services, without having to worry how you are going to invoke these services. It will automate and pipeline processing of your data. Taverna can help you convert data from one format to another in cases when the services you are using are not 100% compatible and shied you from services’ (non-)interoperability horror.

Taverna allows for rapid incorporation of new service s without coding. Taverna will provide you with trackable results of your experiments using the OPM (Open Provenance Model) standard. Read more about the characteristics and features of Taverna. Seven tips for bio-statistical analysis of gene expression data | Biogazelle. Many scientists have a hate-love relationship with statistics. Personally, I didn’t like statistics (at all) during my masters degree education [1]. Too theoretical, didn’t see the utility of it. Only when I generated my first data during my PhD research, I started realizing the necessity and power of bio-statistics.

Later, I almost really fell in love with statistics after reading Intuitive biostatistics by Harvey Motulsky. In September of this year, Nature Methods has initiated a new column 'Points of Significance’ devoted to statistics. Obviously, this blog does not aim to serve as a crash course on statistics. 1. Gene expression levels are heavily skewed in linear scale: half of the data-point (the lower expressed genes) are between 0 and 1 (with 1 meaning no change), and the other half (the higher expressed genes) between 1 and positive infinity. 2.

Paired information means that values in one group are related to the values in the other group. 3. 4. 5. 6. 7. BIOS 560R: High-throughput data analysis using R and bioConductor. Class Information Instructor: Hao Wu. Email: hao.wu at emory dot edu. TA: TBD. Class/Lab: Monday and Wednesday 3-4:50PM at GCR 115. Summary This course covers the basics of high-throughput (microarray and second-generation sequencing) data analysis. This class put more emphases on applications instead of statisical theories. Understand the biological motivations and technological procedures of high-throughput experiments including different types of microarrays and second generation sequencing. Grading: Four sets of homework, each worth 15%. Lab: Bring laptop to the labs. Reading Materials: Here are some reading materials related to the class.

Final project: The final project could be (but not limited to) exploratory analysis, statistical modeling, or analytical software development for any type of genomic data. Exploring the relationship of histone modifications and gene expression. Students need to submit a short report for the final project, as well as related programs. Class schedule. Bioinformatics and Computational Biology. It Takes 30.