background preloader


Facebook Twitter


Eight Terminal Utilities Every OS X Command Line User Should Know · The OS X Terminal opens up a world of powerful UNIX utilities and scripts. If you’re migrating from Linux, you’ll find many familiar commands work the way you expect. But power users often aren’t aware that OS X comes with a number of its own text-based utilities not found on any other operating system. Learning about these Mac-only programs can make you more productive on the command line and help you bridge the gap between UNIX and your Mac. Update: Thanks to reader feedback, I’ve written about a few more commands in a follow-up post: (And eight hundred more). 1. open open opens files, directories and applications.

Exciting, right? $ open /Applications/ …will launch Safari as if you had double-clicked its icon in the Finder. If you point open at a file instead, it will try to load the file with its associated GUI application. open screenshot.png on an image will open that image in Preview. Running open on a directory will take you straight to that directory in a Finder window. Intro to The data.table Package. Data Frames R provides a helpful data structure called the “data frame” that gives the user an intuitive way to organize, view, and access data. Many of the functions that you would use to read in external files (e.g. read.csv) or connect to databases (RMySQL), will return a data frame structure by default. While there are other important data structures, such as the vector, list and matrix, the data frame winds up being at the heart of many operations not the least of which is aggregation.

Before we get into that let me offer a very brief review of data frame concepts: A data frame is a set of rows and columns.Each row is of the same length and data typeEvery column is of the same length but can be of differing data typesA data frame has characteristics of both a matrix and a listBracket notation is the customary method of indexing into a data frame Subsetting Data The Old School Way Here are some examples of getting specific subsets of information from the built in data frame mtcars. Ŷhat | 10 R packages I wish I knew about earlier. I started using R about 3 years ago. It was slow going at first. R had tricky and less intuitive syntax than languages I was used to, and it took a while to get accustomed to the nuances. It wasn't immediately clear to me that the power of the language was bound up with the community and the diverse packages available. R can be more prickly and obscure than other languages like Python or Java.

The good news is that there are tons of packages which provide simple and familiar interfaces on top of Base R. This post is about ten packages I love and use everyday and ones I wish I knew about earlier. sqldf install.packages("sqldf") One of the steepest parts of the R learning curve is the syntax. RandomForest install.packages("randomForest") This list wouldn't be complete without including at least one machine learning package you can impress your friends with. Introducing xda: R package for exploratory data analysis. This R package contains several tools to perform initial exploratory analysis on any input dataset. It includes custom functions for plotting the data as well as performing different kinds of analyses such as univariate, bivariate and multivariate investigation which is the first step of any predictive modeling pipeline.

This package can be used to get a good sense of any dataset before jumping on to building predictive models. You can install the package from GitHub. The functions currently included in the package are mentioned below: Installation To install the xda package, devtools package needs to be installed first. Then, use the following commands to install xda: Usage For all examples below, the popular iris dataset has been used. The package is constantly under development and more functionalities will be added soon. Related. Stock Price Prediction With Big Data and Machine Learning - Eugene Zhulenev.

Apache Spark and Spark MLLib for building price movement prediction model from order log data. The code for this application app can be found on Github Synopsis This post is based on Modeling high-frequency limit order book dynamics with support vector machines paper. Roughly speaking I’m implementing ideas introduced in this paper in scala with Spark and Spark MLLib. Authors are using sampling, I’m going to use full order log from NYSE (sample data is available from NYSE FTP), just because I can easily do it with Spark. Instead of using SVM, I’m going to use Decision Tree algorithm for classification, because in Spark MLLib it supports multiclass classification out of the box.

If you want to get deep understanding of the problem and proposed solution, you need to read the paper. Predictive modelling is the process by which a model is created or chosen to try to best predict the probability of an outcome. Model Architecture Feature Extraction and Training Data Preparation Order Log Data. GitHub - ezhulenev/orderbook-dynamics: Modeling high-frequency limit order book dynamics with support vector machines. Getting started with PostgreSQL in R. When dealing with large datasets that potentially exceed the memory of your machine it is nice to have another possibility such as your own server with an SQL/PostgreSQL database on it, where you can query the data in smaller digestible chunks.

For example, recently I was facing a financial dataset of 5 GB. Although 5 GB fit into my RAM the data uses a lot of resources. One solution is to use an SQL-based database, where I can query data in smaller chunks, leaving resources for the computation. While MySQL is the more widely used, PostgreSQL has the advantage of being open source and free for all usages. However, we still need to get a server. One possible way to do it is to rent Amazon server, however, as I don’t have a budget for my projects and because I only need the data on my own machine I wanted to set up a server on my Windows 8.1 machine. First, we need to install the necessary software. Pg_ctl -D "C:Program FilesPostgreSQL9.4data" start cd C:/Program Files/PostgreSQL/9.4/bin.