Making R graphics legible in presentation slides | Civil Statistician I only visited a few JSM sessions today, as I’ve been focused on preparing for my own talk tomorrow morning. However, I went to several talks in a row which all had a common problem that made me cringe: graphics where the fonts (titles, axes, labels) are too small to read. You used R's default settings when putting this graph in your slides? Dear colleagues: if we’re going to the effort of analyzing our data carefully, and creating a lovely graph in R or otherwise to convey our results in a slideshow, let’s PLEASE save our graphs in a way that the text is legible on the slides! For those of us working in R, here are some very quick suggestions that would help me focus on the content of your graphics, not on how hard I’m squinting to read them. Instead of clicking “Save as” or “Copy to clipboard” to get your graph into your slides, use functions like png or pdf to save it to a file. If any of that’s unclear, here’s a quick example, using the trusty old iris data:

40 Techniques Used by Data Scientists These techniques cover most of what data scientists and related practitioners are using in their daily activities, whether they use solutions offered by a vendor, or whether they design proprietary tools. When you click on any of the 40 links below, you will find a selection of articles related to the entry in question. Most of these articles are hard to find with a Google search, so in some ways this gives you access to the hidden literature on data science, machine learning, and statistical science. Starred techniques (marked with a *) belong to what I call deep data science, a branch of data science that has little if any overlap with closely related fields such as machine learning, computer science, operations research, mathematics, or statistics. To learn more about deep data science, click here. Also, to discover in which contexts and applications the 40 techniques below are used, I invite you to read the following articles: The 40 data science techniques DSC Resources

SQLZOO Learn SQL using: SQL Server, Oracle, MySQL, DB2, and PostgreSQL. Reference: how to... How to read the data from a database. 2 CREATE and DROP How to create tables, indexes, views and other things. How to get rid of them. 3 INSERT and DELETE How to put records into a table, change them and how to take them out again. 4 DATE and TIME How to work with dates; adding, subtracting and formatting. 5 Functions How to use string functions, logical functions and mathematical functions. 6 Users How to create users, GRANT and DENY access, get at other peoples tables. 7 Meta Data How to find out what tables and columns exist. 8 SQL Hacks Some SQL Hacks, taken from "SQL Hacks" published by O'Reilly 9 Using SQL with PHP on Amazon EC2 servers Video tutorials showing how to run MySQL, PHP and Apache on Amazon's EC2 cloud servers. 10 An introduction to transactions Video tutorials showing how sessions can interfere with each other and how to stop it. 11 Using SQL with C# in Visual Studio

R scripts for analyzing survey data Another site pops up with open code for analyzing public survey data: It will be interesting to see whether this gets used by the general public--given the growing trend of data journalism and so forth--versus academics. It is a useful resource for both. To leave a comment for the author, please follow the link and comment on his blog: The Data Monkey. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Data.gov Impatient R Translations français: Translated by Kate Bondareva. Serbo-Croatian: Translated by Jovana Milutinovich from Geeks Education. Preface This is a tutorial (previously known as “Some hints for the R beginner”) for beginning to learn the R programming language. It is a tree of pages — move through the pages in whatever way best suits your style of learning. You are probably impatient to learn R — most people are. This page has several sections, they can be put into the four categories: General, Objects, Actions, Help. General Introduction Blank screen syndrome Misconceptions because of a previous language Helpful computer environments R vocabulary Epilogue Objects Key objects Reading data into R Seeing objects Saving objects Magic functions, magic objects Some file types Packages Actions What happens at R startup Key actions Errors and such Graphics Vectorization Make mistakes on purpose

Handy statistical lexicon These are all important methods and concepts related to statistics that are not as well known as they should be. I hope that by giving them names, we will make the ideas more accessible to people: Mister P: Multilevel regression and poststratification. The Secret Weapon: Fitting a statistical model repeatedly on several different datasets and then displaying all these estimates together. The Superplot: Line plot of estimates in an interaction, with circles showing group sizes and a line showing the regression of the aggregate averages. The Folk Theorem: When you have computational problems, often there’s a problem with your model. The Pinch-Hitter Syndrome: People whose job it is to do just one thing are not always so good at that one thing. Weakly Informative Priors: What you should be doing when you think you want to use noninformative priors. P-values and U-values: They’re different. Conservatism: In statistics, the desire to use methods that have been used before. P.S.

www.iki.fi/sol - Tutorials - GalaXQL Who said SQL tutorials have to be boring? Try out GalaXQL 3.0 beta! Runs on your browser. Note: heavy javascript and webgl. Quotes / Testimonials "Incidentally, we've trained several students to be web developers using only your tutorial for SQL instruction--great work!" -- Dr Christopher Pound, Rice University "I have been looking for a good way to show SQL to analysts who need to learn it and this by far the best tool I have ever come across." -- Julie LeMay, DELL "Noodling with GalaXQL is the most fun database tutorial I've ever seen. -- Joey deVilla at Tucows' "the farm" "Much more entertaining and freeform than ordinary attempts at tutorials, certainly exciting!" -- Thomas Van Der Pol "I've just completed your tutorial and was very impressed! -- Stephed Bridges "[GalaXQL] rocks! -- Reuben Grinberg in his blog GalaXQL is an interactive SQL tutorial. GalaXQL 1.0 Virtual teacher (win32) GalaXQL 1.0 Virtual teacher (mac os x) Follow the instructions by your virtual teacher. Somewhat altered galaxy

Get JSON from Excel using Python, xlrd | Anthony DeBarros Powering interactive news applications off flat files rather than a call to a database server is an option worth considering. Cutting a production database and data access layer out of the mix eliminates a whole slice of complexity and trims development time. Flat files aren’t right for every situation, but for small apps they’re often all you need. These days, most of the apps I help build at Gannett Digital consume JSON. Simpler apps — such as the table/modal displays we deployed in February for our Oscar Scorecard and Princeton Review Best Value Colleges — run off one or two JSON files. I wrote last year how to use Python to generate JSON files from a SQL database. The key ingredient is the Python library xlrd. (Another choice is openpyxl, which has similar features and works with newer .xlsx formatted Excel files. Basic xlrd operations Let’s say we have an Excel workbook containing a small table repeated over three worksheets. From Excel to JSON Pretty cool stuff.

IEEE DataPort - IEEE Big Data IEEE DataPort™ is now available for use! Go to ieee-dataport.org to be connected to this valuable one-stop shop data repository serving the growing technical community focused on Big Data! Contact Melissa Handa today at melissa.handa@ieee.org for a coupon code to become a subscriber free of charge! Share, Access and Analyze Big Data with IEEE DataPort™! IEEE realizes that data generation and data analytics are increasingly critical in many aspects of research and industry. What Capabilities Does IEEE DataPort™ Provide? 1. 2. 3. 4. Get involved! Go to ieee-dataport.org to load your first dataset today!

knitr: Elegant, flexible and fast dynamic report generation with R | knitr Overview The knitr package was designed to be a transparent engine for dynamic report generation with R, solve some long-standing problems in Sweave, and combine features in other add-on packages into one package (knitr ≈ Sweave + cacheSweave + pgfSweave + weaver + animation::saveLatex + R2HTML::RweaveHTML + highlight::HighlightWeaveLatex + 0.2 * brew + 0.1 * SweaveListingUtils + more). This package is developed on GitHub; for installation instructions and FAQ’s, see README. Motivation One of the difficulties with extending Sweave is we have to copy a large amount of code from the utils package (the file SweaveDrivers.R has more than 700 lines of R code), and this is what the two packages mentioned above have done. Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to humans what we want the computer to do. – Donald E. Features Acknowledgements Misc

Toward sustainable insights, or why polygamy is bad for you | the morning paper Toward sustainable insights, or why polygamy is bad for you Binning et al., CIDR 2017 Buckle up! Today we’re going to be talking about statistics, p-values, and the multiple comparisons problem. For my own benefit, I’ll try and explain what follows as simply as possible – I find it incredibly easy to make mistakes otherwise! p-values If we observe some variable and see value , we might wonder “what are the odds of that!” we’d be able to give an answer. about the underlying distribution. will be given that hypothesis, or : . Time to move on from dice rolls. we observe is now a measure of correlation between two measured phenomena. exactly equal to some value we need to ask ‘what are the odds of seeing a value (or )?’ . Suppose we see a suspiciously large value. p-value = (source: wikipedia) Here’s the first thinking trap. An arbitrary but universally accepted p-value of 0.05 (there’s a 5% chance of this observation given the hypothesis) is deemed as the threshold for ‘statistical significance.’ .

Python Programming Language – Official Website