background preloader

Benco

Facebook Twitter

Plotsuper

Eight Terminal Utilities Every OS X Command Line User Should Know · mitchchn.me. The OS X Terminal opens up a world of powerful UNIX utilities and scripts.

Eight Terminal Utilities Every OS X Command Line User Should Know · mitchchn.me

If you’re migrating from Linux, you’ll find many familiar commands work the way you expect. But power users often aren’t aware that OS X comes with a number of its own text-based utilities not found on any other operating system. Learning about these Mac-only programs can make you more productive on the command line and help you bridge the gap between UNIX and your Mac. Update: Thanks to reader feedback, I’ve written about a few more commands in a follow-up post: (And eight hundred more). 1. open. Intro to The data.table Package. Data Frames R provides a helpful data structure called the “data frame” that gives the user an intuitive way to organize, view, and access data.

Intro to The data.table Package

Many of the functions that you would use to read in external files (e.g. read.csv) or connect to databases (RMySQL), will return a data frame structure by default. While there are other important data structures, such as the vector, list and matrix, the data frame winds up being at the heart of many operations not the least of which is aggregation.

10 R packages I wish I knew about earlier. I started using R about 3 years ago.

10 R packages I wish I knew about earlier

It was slow going at first. R had tricky and less intuitive syntax than languages I was used to, and it took a while to get accustomed to the nuances. It wasn't immediately clear to me that the power of the language was bound up with the community and the diverse packages available. R can be more prickly and obscure than other languages like Python or Java. Introducing xda: R package for exploratory data analysis. This R package contains several tools to perform initial exploratory analysis on any input dataset.

Introducing xda: R package for exploratory data analysis

It includes custom functions for plotting the data as well as performing different kinds of analyses such as univariate, bivariate and multivariate investigation which is the first step of any predictive modeling pipeline. This package can be used to get a good sense of any dataset before jumping on to building predictive models. Stock Price Prediction With Big Data and Machine Learning - Eugene Zhulenev. Apache Spark and Spark MLLib for building price movement prediction model from order log data.

Stock Price Prediction With Big Data and Machine Learning - Eugene Zhulenev

The code for this application app can be found on Github Synopsis This post is based on Modeling high-frequency limit order book dynamics with support vector machines paper. Roughly speaking I’m implementing ideas introduced in this paper in scala with Spark and Spark MLLib. GitHub - ezhulenev/orderbook-dynamics: Modeling high-frequency limit order book dynamics with support vector machines. Getting started with PostgreSQL in R. When dealing with large datasets that potentially exceed the memory of your machine it is nice to have another possibility such as your own server with an SQL/PostgreSQL database on it, where you can query the data in smaller digestible chunks.

Getting started with PostgreSQL in R

For example, recently I was facing a financial dataset of 5 GB. Although 5 GB fit into my RAM the data uses a lot of resources. One solution is to use an SQL-based database, where I can query data in smaller chunks, leaving resources for the computation. While MySQL is the more widely used, PostgreSQL has the advantage of being open source and free for all usages.