Apache Spark™ - Lightning-Fast Cluster Computing

Announcing Spark Packages | Databricks Blog Today, we are happy to announce Spark Packages ( a community package index to track the growing number of open source packages and libraries that work with Apache Spark. Spark Packages makes it easy for users to find, discuss, rate, and install packages for any version of Spark, and makes it easy for developers to contribute packages. Spark Packages will feature integrations with various data sources, management tools, higher level domain-specific libraries, machine learning algorithms, code samples, and other Spark content. Please give Spark Packages a try and let us know if you have any questions when working with the site!

Welcome to Apache Flume — Apache Flume GitHub - mikeaddison93/sbt-spark-package: Sbt plugin for Spark packages Sharding & IDs at Instagram GitHub - mikeaddison93/spark-package-cmd-tool: A command line tool for Spark packages Redis SparkR by amplab-extras SparkR is an R package that provides a light-weight frontend to use Apache Spark from R. SparkR exposes the Spark API through the RDD class and allows users to interactively run jobs from the R shell on a cluster. NOTE: As of April 2015, SparkR has been officially merged into Apache Spark and is shipping in an upcoming release (1.4) due early summer 2015. You can contribute and follow SparkR developments on the Apache Spark mailing lists and issue tracker. NOTE: The API from the upcoming Spark release (1.4) will not have the same API as described here. Initial support for Spark in R be focussed on high level operations instead of low level ETL. Features SparkR exposes the RDD API of Spark as distributed lists in R. sc <- sparkR.init("local") lines <- textFile(sc, " wordsPerLine <- lapply(lines, function(line) { length(unlist(strsplit(line, " "))) }) In addition to lapply, SparkR also allows closures to be applied on every partition using lapplyWithPartition. . . . .

An Introduction to the ELK stack By combining the massively popular Elasticsearch, Logstash and Kibana, Elasticsearch Inc has created an end-to-end stack that delivers actionable insights in real time from almost any type of structured and unstructured data source. Built and supported by the engineers behind each of these open source products, the Elasticsearch ELK stack makes searching and analyzing data easier than ever before. Thousands of organizations worldwide use these products for an endless variety of business critical functions. And we'd like to show you how the ELK stack will make your life better, too. During this webinar, you will be treated to: An overview of the key features of Elasticsearch, Logstash & Kibana A deeper dive on how their powers combine to deliver an end to end solution for analytics, logging, search & visualization A demo of the ELK stack in action We'll be talking speeds and feeds, but this webinar is not just for the developers & ops folks at your organization.

Premiers pas avec ElasticSearch (Partie 1) ElasticSearch est un moteur de recherche open source qui fait beaucoup parler de lui. Et pour cause, il possède un atout majeur : il suffit de quelques minutes à peine pour disposer d’un moteur de recherche clusterisé, automatiquement sauvegardé et répliqué, interrogeable via une API REST et proposant toutes les fonctionnalités d’un moteur de recherche dernière génération. Malgré une prise en main rapide et une documentation officielle très complète, l’utilisation d’ElasticSearch peut devenir rapidement complexe pour qui n’a jamais utilisé de moteur de recherche. C’est pourquoi nous avons choisi de démarrer une nouvelle série d’articles, dans laquelle nous allons essayer de présenter les notions de base d’ElasticSearch et les fonctionnalités les plus utilisées de ce fantastique outil. Dans ce premier article, nous verrons comment installer et configurer ElasticSearch. ElasticSearch : cool, bonsaï, cool ! ElasticSearch est un projet open source développé en Java sous licence Apache2. Avec