Spark Tutorial (Part I): Setting Up Spark and IPython Notebook within 10 minutes Introduction: The objective of this post is to share a step-by-step procedure of setting up data science local environment consisted of IPython Notebook (Anaconda Analystics) with ability of scaling up by parallizing/distributing tasks through Apache Spark local machine or a remote cluster. Anaconda Analystics is one of the most popular Python IDE among Python data scienctist community, featuring the interactivity of web-based IPython Notebook (gallery), the ease of setting-up and inclusion of a comphresive collection of built-in python modules. At the other hand, Apache Spark is described as a lightning-fast cluster computing and a complementary piece to Apache Hadoop. For Python user, there are a number advantages of using web-based IPython Notebook to conduct data science projects rather than using the console-based ipython/pyspark. Install Apache Spark and Anaconda (IPython Notebook) on the local machine
Getting any Docker image running in your own OpenShift cluster This post was written by Chris Milsted, Senior Solution Architect at Red Hat. For those who have been using stand-alone development environments, leveraging containers using pre-existing docker format images (either internal images or images from an external registry such as the Red Hat registry or docker hub) but have outgrown your single machine and want to leverage the power of OpenShift using Kubernetes, this blog post is for you! My assumption is that readers are familiar with how to install OpenShift and how to setup projects with quotas. Linux-flavored Windows I never used to be a fan of the command line. Coming from a graphical background – and being so used to the GUI on Windows – using a text-only interface is extremely foreign. That said, it’s also very consistent, fast, and powerful. For the past couple of months, I’ve been moving between Windows and Mac on an almost daily basis. I have a Windows machine for use at home, and a Mac that I use when I’m in the office, on the road, on the bus, in the other room, waiting for dinner … you get the idea.
Install, Setup, and Test Spark and Cassandra on Mac OS X Install, Setup, and Test Spark and Cassandra on Mac OS X This Gist assumes you already followed the instructions to install Cassandra, created a keyspace and table, and added some data. Install Apache Spark brew install apache-spark Get the Spark Cassandra Connector Vert.x Docker Images - Vert.x It is also possible to deploy a Vert.x application packaged as a fat jar into a docker container. For this you don’t need the images provided by Vert.x, you can directly use a base Java image. Let’s have a look. Configuring IPython Notebook Support for PySpark · John Ramey 01 Feb 2015 Apache Spark is a great way for performing large-scale data processing. Lately, I have begun working with PySpark, a way of interfacing with Spark through Python. After a discussion with a coworker, we were curious whether PySpark could run from within an IPython Notebook. It turns out that this is fairly straightforward by setting up an IPython profile.
Install the latest OpenShift V3 on CentOS 7.x Prerequisites: CentOS 7.x minimal install (tested on 7.2) Updated 2016/04/23: to docker 1.11.x & OPENSHIFT_VERSION=v1.2.0-rc1 Mode: Single node setup, all manual. Step 1: Install docker and tweak INSECURE_REGISTRY for smoother operation on “integrated docker registry”. cat > /etc/yum.repos.d/docker.repo << '__EOF__'[docker]name=Docker Repository baseurl= __EOF__ yum -y install docker-engine wget git ### Tweak for systemd way of setting INSECURE_REGISTRY ### Ref: TIOBE Software: Tiobe Index TIOBE Index for January 2016 January Headline: Java is TIOBE's Programming Language of 2015! Java has won the TIOBE Index programming language award of the year.