background preloader

Programming R

Facebook Twitter

Tips and useful commands for data analysis and data visualization

Multilevel analysis

How to perform a Logistic Regression in R | R-bloggers. Logistic regression is a method for fitting a regression curve, y = f(x), when y is a categorical variable. The typical use of this model is predicting y given a set of predictors x. The predictors can be continuous, categorical or a mix of both. The categorical variable y, in general, can assume different values. In the simplest case scenario y is binary meaning that it can assume either the value 1 or 0. Logistic regression implementation in R R makes it very easy to fit a logistic regression model. The dataset We’ll be working on the Titanic dataset. The data cleaning process When working with a real dataset we need to take into account the fact that some data might be missing or corrupted, therefore we need to prepare the dataset for our analysis. Training.data.raw <- read.csv('train.csv',header=T,na.strings=c("")) A visual take on the missing values might be helpful: the Amelia package has a special plotting function missmap() that will plot your dataset and highlight missing values:

R. Framework And Applications Of ARIMA Time Series Models. Quick Recap Hopefully, you would have gained useful insights on time series concepts by now. If not, don’t worry! You can quickly glance through the series of time series articles: Step by Step guide to Learn Time Series, Time Series in R, ARMA Time Series Model. This is the fourth & final article of this series. A quick revision, till this point we have covered various concepts of ARIMA modelling in bits and pieces. In this article we will take you through a comprehensive framework to build a time series model. Overview of the Framework This framework(shown below) specifies the step by step approach on ‘How to do a Time Series Analysis': As you would be aware, the first three steps have already been discussed in detail in previous articles. Step 1: Visualize the time series It is very essential to analyze the trends prior to building any kind of time series model.

Step 2: Stationarize the series There are three commonly used technique to make a time series stationary: 1. 2. P : AR d : I q : MA. Auto-Regression & Moving-Average Time Series - Simplified. ARMA models are commonly used for time series modeling. In ARMA model, AR stands for auto-regression and MA stands for moving average. If the sound of these words is scaring you, worry not – we will simplify these concepts in next few minutes for you! Pedagogy In this article, we will develop a knack for these terms and understand the characteristics associated with these models. But before we start, you should remember, AR or MA are not applicable on non-stationary series.

In case you get a non stationary series, you first need to stationarize the series (by taking difference / transformation) and then choose from the available time series models. We’ll first begin with explaining each of these two models (AR & MA) individually. Auto-Regressive Time Series model Let’s develop an understanding of AR models using the case below: The current GDP of a country say x(t) is dependent on the last year’s GDP i.e. x(t – 1). Hence, we can formally write the equation of GDP as: Q1.

Q2. End Notes. Data Scientist. rCharts. Set Working Directory in R. If you want to read files from a specific location or write files to a specific location you will need to set working directory in R. The following example shows how to set the working directory in R to the folder “Data” within the folder “Documents and Settings” on the C drive. # Set the working directory setwd("C:/Documents and Settings/Data") Remember that you must use the forward slash / or double backslash \\ in R! The Windows format of single backslash will not work. Here’s the official R-manual page on setting the working directory: Thanks for reading!

Beautiful tables for linear model summaries #rstats. In this blog post I’d like to show some (old and) new features of the sjt.lm function from my sjPlot-package. These functions are currently only implemented in the development snapshot on GitHub. A package update is planned to be submitted soon to CRAN. There are two new major features I added to this function: Comparing models with different predictors (e.g. stepwise regression) and automatic grouping of categorical predictors.

There are examples below that demonstrate these features. The sjt.lm function prints results and summaries of linear models as HTML-table. Please note: The following tables may look a bit cluttered – this is because I just pasted the HTML-code created by knitr into this blog post, so style sheets may interfere. All following tables can be reproduced with the sjPlot package and the sample data set from this package. Linear model summaries as HTML table Before starting, sample data is loaded and sample models are fitted: sjt.lm(fit1, fit2) Custom labels. 3 big universities proclaim: Learn data science online! It's been a while since I covered MOOCs. It sounds vaguely like an epithet, but for those of you who have been hiding in a cave, MOOC means "massive online open course," which in normal-people talk is “taking courses online.” It has also been a while since I expressed my derision for the term “data scientist,” but in the last few news cycles these two topics have come together: Three major universities now offer online certifications in data science.

What’s interesting is the difference between them. Three ways to get your data science cert John Hopkins University is offering a paid-for “specialization” certificatethrough Coursera. UC Berkeley is offering a master's degree in data science with a more traditional application process. The Berkeley program seems to be a lot more theoretical than the one from Hopkins, but it also hits R and covers MapReduce. Who cares about the cert? Have you taken one of these? Learn to crunch big data with R. A few years ago I was the CTO and co-founder of a startup in the medical practice management software space. One of the problems we were trying to solve was how medical office visit schedules can optimize everyone’s time. Too often, office visits are scheduled to optimize the physician’s time, and patients have to wait way too long in overcrowded waiting rooms in the company of people coughing contagious diseases out their lungs.

One of my co-founders, a hospital medical director, had a multivariate linear model that could predict the required length for an office visit based on the reason for the visit, whether the patient needs a translator, the average historical visit lengths of both doctor and patient, and other possibly relevant factors. One of the subsystems I needed to build was a monthly regression task to update all of the coefficients in the model based on historical data. Essential R scripting Start by installing R and RStudio on your desktop. W <- 1 + sqrt(x) / 2 ? 50 Things Everyone Should Know How To Do.

How to get drop down list in excel - Buscar con Google. Stata News | Export tables to Excel. A new feature in Stata 13, putexcel, allows you to easily export matrices, expressions, and stored results to an Excel file. Combining putexcel with a Stata command’s stored results allows you to create the table displayed in your Stata Results window in an Excel file.

Let me show you. A stored result is simply a scalar, macro, or matrix stored in memory after you run a Stata command. The two main types of stored results are e-class (for estimation commands) and r-class (for general commands). You can list a command’s stored results after it has been run by typing ereturn list (for estimation commands) or return list (for general commands). . sysuse auto (1978 Automobile Data) . correlate foreign mpg (obs=74) Because correlate is not an estimation command, we use return list to see its stored results. . return list scalars: r(N) = 74 r(rho) = .3933974152205484 matrices: r(C) : 2 x 2 Now we can use putexcel to export these results to Excel. If you are working with matrices, the syntax is. Welcome to the London Datastore | London DataStore.

Time Series ARIMA Models - Econometrics Academy. Excel Charting Samples for Microsoft .NET, ASP.NET, C#, VB.NET, XLS and Microsoft Visual Studio .NET. Richly formatted workbooks with fast and complete calculations are the heart and soul of a spreadsheet, but the ability to make good decisions is greatly enhanced by the ability to visualize data. Enhance your users' understanding of their data by taking advantage of SpreadsheetGear 2012's comprehensive Excel compatible charting support.

This sample dynamically creates a chart gallery which demonstrates some of the most commonly used Excel charting features from a single Excel 2007-2010 Open XML workbook. This sample shows how to create a new workbook, add some values, add a chart, and stream it to Microsoft Excel. This sample shows how to create a new workbook, copy data from a DataTable, add a stock chart, use various formatting to format the chart, and stream it to Microsoft Excel. This sample shows how to create a new workbook, add some values, add a chart, use multiple chart groups and multiple axes groups to create a stacked combination chart, and stream it to Microsoft Excel.

Epi and biostat

Programming Stata. This section is a gentle introduction to programming Stata. We discuss macros and loops, and show how to write your own (simple) programs. This is a large subject and all we can hope to do here is provide a few tips that hopefully will spark your interest in further study. However, the material covered will help you use Stata more effectively. Stata 9 introduced a new and extremely powerful matrix programming language called Mata.

To learn more about programming Stata you should read Chapter 18 in the User's Guide and then refer to the Programming volume and/or the online help as needed. 4.1 Macros A macro is simply a name associated with some text. 4.1.1 Storing Text in Local Macros Local macros have names of up to 31 characters and are known only in the current context (the console, a do file, or a program). You define a local macro using local name [=] text and you evaluate it using `name'. Example: Control Variables in Regression. local controls age agesq education income 4.2 Looping. ARIMA Modelling of Time Series. Description Fit an ARIMA model to a univariate time series. Usage arima(x, order = c(0L, 0L, 0L), seasonal = list(order = c(0L, 0L, 0L), period = NA), xreg = NULL, include.mean = TRUE, transform.pars = TRUE, fixed = NULL, init = NULL, method = c("CSS-ML", "ML", "CSS"), n.cond, SSinit = c("Gardner1980", "Rossignol2011"), optim.method = "BFGS", optim.control = list(), kappa = 1e6) Arguments Details Different definitions of ARMA models have different signs for the AR and/or MA coefficients.

X[t] = a[1]X[t-1] + … + a[p]X[t-p] + e[t] + b[1]e[t-1] + … + b[q]e[t-q] and so the MA coefficients differ in sign from those of S-PLUS. The variance matrix of the estimates is found from the Hessian of the log-likelihood, and so may only be a rough guide. Optimization is done by optim. Value A list of class "Arima" with components: Fitting methods The exact likelihood is computed via a state-space representation of the ARIMA process, and the innovations and their variance found by a Kalman filter. Note References. Time Series ARIMA Models. 8.7 ARIMA modelling in R. How does auto.arima() work ? The auto.arima() function in R uses a variation of the Hyndman and Khandakar algorithm which combines unit root tests, minimization of the AICc and MLE to obtain an ARIMA model.

The algorithm follows these steps. Hyndman-Khandakar algorithm for automatic ARIMA modelling The number of differences $d$ is determined using repeated KPSS tests.The values of $p$ and $q$ are then chosen by minimizing the AICc after differencing the data $d$ times. Rather than considering every possible combination of $p$ and $q$, the algorithm uses a stepwise search to traverse the model space. (a) The best model (with smallest AICc) is selected from the following four: ARIMA(2,d,2), ARIMA(0,d,0), ARIMA(1,d,0), ARIMA(0,d,1). Choosing your own model If you want to choose the model yourself, use the Arima() function in R. R code fit <- Arima(usconsumption[,1], order=c(0,0,3)) There is another function arima() in R which also fits an ARIMA model.

Modelling procedure Plot the data. Arima() Data Visualization. R-related. 46-hidden-tips-and-tricks-to-use-google-search-like-a-boss.png (PNG Image, 600 × 6115 pixels) - Scaled (16%) FAQ: Using a plugin to connect to a database. How do I connect to a database by using a Stata plugin? ODBC vs. plugin The easiest way to import data from a database directly into Stata is to use the odbc command. However, there are occasions where the odbc command will not work or is not the best solution for importing data.

For example, the odbc command will not work on your operating system (Solaris), there is not an ODBC driver for the database in question, or ODBC is too slow. If you encounter any of the above problems, you can use a Stata plugin to import and export your data directly to your database if your database has an application programming interface (API). Most database applications have an API, so the only real question is how to connect Stata to the database by using the API. This FAQ assumes that you have read and understood the FAQ on Stata plugins at the following URL: The example will use ANSI C as the plugin langauge and gcc as the compiler.

Create a test database from your terminal. The Comprehensive R Archive Network. The R Project for Statistical Computing. Data Management. Quick-R: Home Page. Visual overview for creating graphs.