Andrew Gelman, Department of Statistics and Department of Political Science, Columbia University Andrew Gelman is a professor of statistics and political science and director of the Applied Statistics Center at Columbia University. He has received the Outstanding Statistical Application award from the American Statistical Association, the award for best article published in the American Political Science Review, and the Council of Presidents of Statistical Societies award for outstanding contributions by a person under the age of 40. His books include Bayesian Data Analysis (with John Carlin, Hal Stern, David Dunson, Aki Vehtari, and Don Rubin), Teaching Statistics: A Bag of Tricks (with Deb Nolan), Data Analysis Using Regression and Multilevel/Hierarchical Models (with Jennifer Hill), Red State, Blue State, Rich State, Poor State: Why Americans Vote the Way They Do (with David Park, Boris Shor, and Jeronimo Cortina), and A Quantitative Tour of the Social Sciences (co-edited with Jeronimo Cortina).
The importance of simulating the extremes Simulation is commonly used by statisticians/data analysts to: (1) estimate variability/improve predictors , (2) to evaluate the space of potential outcomes , and (3) to evaluate the properties of new algorithms or procedures. Over the last couple of days, discussions of simulation have popped up in a couple of different places. First, the reviewers of a paper that my student is working on had asked a question about the behavior of the method in different conditions. I mentioned in passing, that I thought it was a good idea to simulate some cases where our method will definitely break down.
logTool: Revealing the Hidden Patterns of Online Surfing Behavior logTool [onformative.com] is a data visualization tool that displays your online activity, based on data from the powerful network packet sniffing tool Carnivore. By analyzing the different IP addresses and ports, the visualization is able to determine and represent what kind of application or service sends or receives the packets. Developed for the magazine Weave, logTool was used to digest the surfing behavior of several interaction designers, artists and developers. The time period of a whole day was split into 288 timeslots, 5 minutes each, represented by a radial bar graph. Project details:WEKA The Weka workbench contains a collection of visualization tools and algorithms for data analysis and predictive modelling, together with graphical user interfaces for easy access to this functionality. The main strengths of Weka are that it is freely available under the GNU General Public License, very portable because it is fully implemented in the Java programming language and thus runs on almost any computing platform, contains a comprehensive collection of data preprocessing and modeling techniques, and is easy to use by a novice due to the graphical user interfaces it contains. Weka supports several standard data mining tasks, more specifically, data preprocessing, clustering, classification, regression, visualization, and feature selection.
“Statistical Science and Philosophy of Science: where should they meet?” Four score years ago (!) we held the conference “Statistical Science and Philosophy of Science: Where Do (Should) They meet?” at the London School of Economics, Center for the Philosophy of Natural and Social Science, CPNSS, where I’m visiting professor  Many of the discussions on this blog grew out of contributions from the conference, and conversations initiated soon after.
Technical Methods Report: Guidelines for Multiple Testing in Impact Evaluations - Appendix B: Introduction to Multiple Testing This appendix introduces the hypothesis testing framework for this report, the multiple testing problem, statistical methods to adjust for multiplicity, and some concerns that have been raised about these solutions. The goal is to provide an intuitive, nontechnical discussion of key issues related to this complex topic to help education researchers apply the guidelines presented in the report. A comprehensive review of the extensive literature in this area is beyond the scope of this introductory discussion. The focus is on continuous outcomes, but appropriate procedures are highlighted for other types of outcomes (such as binary outcomes). The appendix concludes with recommended methods.1
Weave Data Tutorial Weave Data TutorialTutorial for Weave Magazine 03.10 about visualizing network data Client: Weave Magazine / Page publisher We all use twitter, write emails, skype and blog all day long. There isn’t a single day we are not going to visit websites like google, youtube, the website of our favorite newspaper or social network and browse through the web.
Primer WEKA is a comprehensive toolbench for machine learning and data mining. Its main strengths lie in the classification area, where all current ML approaches -- and quite a few older ones -- have been implemented within a clean, object-oriented Java class hierarchy. Regression, Association Rules and clustering algorithms have also been implemented. 1D Interface Statistical Distributions Text data editor Cut-and-paste text or edit data here This data is provided as an example, cut and paste as needed to model your data. All lines of text that do not begin with a number are ignored.