background preloader

R-Data manipulation

Facebook Twitter

Quickly read Excel worksheets into R (Windows only…sorry) Video: Data Mining with R. Excel data - JGR and XLConnect. Despite the fact that Excel is the most widespread application for data manipulation and (perhaps) analysis, R's support for the xls and xlsx file formats has left a lot to be desired. Fortunately, the XLConnect package has been created to fill this void, and now JGR 1.7-8 includes integration with XLConnect package to load .xls and .xlsx documents into R. Not fancy, but very useful. To leave a comment for the author, please follow the link and comment on his blog: Fells Stats » R . R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization ( ggplot2 , Boxplots , maps , animation ), programming ( RStudio , Sweave , LaTeX , SQL , Eclipse , git , hadoop , Web Scraping ) statistics ( regression , PCA , time series , ecdf , trading ) and more...

Software for Research, Part 3: [R], RStudio and ggplot2 for Statistics. [R] is an excellent open-source statistics language. It's cross-platform and free and I think it will eventually displace proprietary stat's packages due to its rapid development, speed and ease of use. So there's no time like the present to get used to using it. This figure from the site r4stats.com would appear to support that view (number of posts in main discussion groups per month) Like all new programs, it has a little bit of a learning curve, especially if you're not used to using the command line. But don't let that turn you off, for any sort of statistics beyond the most basic, you're going to end up working with scripts anyway, it's just the most efficient way to run analyses. Graphical menus while useful to begin with quickly become a hindrance. Contents Installing [R]Using [R] and ggplotMore about RStudioGetting help - StackoverflowLinksUseful commands Here is a full list of text editors that play nice with R on all platforms.

Links. Migrating from SPSS/Excel to R, Part 3: Preparing your Data. In this post, I describe how to prepare your data for migrating between SPSS/Excel and R. This is the third post in a series, the first two of which can be found here and here. Don’t forget, this is primarily aimed at those working on datasets for psychology experiments, as that’s what I do. One of the golden rules of working with datasets in SPSS is that you need to have one row for each participant. I know there are some exceptions to this, but it’s an important general rule for SPSS. The main consequence of this is that, when you’re dealing with any form of within-subjects data, your dataset quickly becomes very wide indeed. Let’s look at an example below.

Here, we have 10 participants, involved in two experimental sessions. That’s not too messy (note that I just pasted in 1200 for the values as this is just an illustration). Well, we can’t fit it all into a single screenshot, as the dataset has a large number of columns. When you think about it, wide datasets can be a real pain.

Aspiring Data Journalists: R. Picking up on Paul Bradshaw’s post A quick exercise for aspiring data journalists which hints at how you can use Google Spreadsheets to grab – and explore – a mortality dataset highlighted by Ben Goldacre in DIY statistical analysis: experience the thrill of touching real data, I thought I’d describe a quick way of analysing the data using R, a very powerful statistical programming environment that should probably be part of your toolkit if you ever want to get round to doing some serious stats, and have a go at reproducing the analysis using a bit of judicious websearching and some cut-and-paste action… R is an open-source, cross-platform environment that allows you to do programming like things with stats, as well as producing a wide range of graphical statistics (stats visualisations) as if by magic.

So, to get started. Paul describes a dataset posted as an HTML table by Ben Goldacre that is used to generate the dots on this graph: We can inspect the data we’ve imported as follows: fp. Data is the new gold. We need more data journalism. How else will we find the nuggets of data and information worth reading? Life should become easier for data journalists, as the Guardian, one of the data journalism pioneers, points out in this article about the new open data initiative of the European Union (EU). The aims of the EU's open data strategy are bold. Data is seen as the new gold of the digital age. The EU is estimating that public data is already generating economic value of €32bn each year, with growth potential to €70bn, if more data will be made available. Here is the link to the press statement, which I highly recommend reading: EUROPA - Press Releases - Neelie Kroes Vice-President of the European Commission responsible for the Digital Agenda, Data is the new gold, Opening Remarks, Press Conference on Open Data Strategy Brussels, 12th December 2011 I am particularly impressed that the EU even aims to harmonise the way data will be published by the various bodies.

How Might Data Journalists Show Their Working? Sweave. If part of the role of data journalism is to make transparent the justification behind claims that are, or aren’t, backed up by data, there’s good reason to suppose that the journalists should be able to back up their own data-based claims with evidence about how they made use of the data. Posting links to raw data helps to a certain extent – at least third parties can then explore the data themselves and check the claims the press are making – but you could also argue that the journalists should also make their notes available regarding how they worked the data. (The same is true in public reports, where summary statistics and charts are included in a report, along with a link to the raw data, but no transparency in how the summary reports/charts were actually produced from the data.) In Power Tools for Aspiring Data Journalists: R, I explored how we might use the R statistical programming language to replicate a chart that appeared in one of Ben Goldacre’s Bad Science columns.

Data Referenced Journalism and the Media. Reading our local weekly press this evening (the Isle of Wight County Press), I noticed a page 5 headline declaring “Alarm over death rates at St Mary’s”, St Mary’s being the local general hospital. It seems a Department of Health report on hospital mortality rates came out earlier this week, and the Island’s hospital, it seems, has not performed so well… I also put together a couple of posts describing how the funnel plot could be generated from a data set using the statistical programming language R. Given the interest there appears to be around data journalism at the moment (amongst the digerati at least), I thought there might be a reasonable chance of finding some data inspired commentary around the hospital mortality figures.

If we do a general, recency filtered, search for hospital death rates on either Google web search: or Google news search: we see a wealth of stories from various local press outlets. Is that Tamworth Herald as the trusted source, or the Department of Health?!