background preloader

Big Data, Machine Learning and HPC

Facebook Twitter

Apache Spark Introduction. Industries are using Hadoop extensively to analyze their data sets.

Apache Spark Introduction

The reason is that Hadoop framework is based on a simple programming model (MapReduce) and it enables a computing solution that is scalable, flexible, fault-tolerant and cost effective. Here, the main concern is to maintain speed in processing large datasets in terms of waiting time between queries and waiting time to run the program. Spark was introduced by Apache Software Foundation for speeding up the Hadoop computational computing software process.

Tutoriel sur Apache Mahout, un sous-projet Hadoop pour la mise en place d'outils de recommandation. Distributed TensorFlow Has Arrived. KDnuggets has taken seriously its role to keep up with the newest releases of major deep learning projects, and in the recent past we have seen landmark such releases from major technology giants and as well as universities and research labs.

Distributed TensorFlow Has Arrived

While Microsoft, Yahoo! , AMPLabs, and others have all contributed outstanding projects in their own right, the landscape was most impacted in November, 2015, with the release of what is now the most popular open source machine learning library on Github by a wide margin, Google's TensorFlow. Some Background I wrote in the early days after its release of my initial dissatisfaction with the project, based primarily on the lack of distributed training capabilities (especially given that such capabilities were directly alluded to in the accompanying whitepaper's title). Alces Flight. Deep Learning Episode 3: Supercomputer vs Pong. I’ve always enjoyed playing games, but the buzz from writing programs that play games has repeatedly claimed months of my conscious thought at a time.

Deep Learning Episode 3: Supercomputer vs Pong

I’m not sure that writing programs that write programs that play games is the perfect solution, but I do know that I can never resist it. Back in 2015 DeepMind published their landmark Nature article on training deep neural networks to learn to play Atari games using raw pixels and a score function as inputs. This was pretty impressive at the time, and they spent several days of training on multiple GPUs to achieve it. This spring my attention was caught by another interesting result: using a new approach they were able to beat Atari Pong in just 2 hours of training time on a single node by parallelizing the work into multiple concurrent agents.

The 10 Algorithms Machine Learning Engineers Need to Know. By James Le, New Story Charity.

The 10 Algorithms Machine Learning Engineers Need to Know

It is no doubt that the sub-field of machine learning / artificial intelligence has increasingly gained more popularity in the past couple of years. As Big Data is the hottest trend in the tech industry at the moment, machine learning is incredibly powerful to make predictions or calculated suggestions based on large amounts of data. Some of the most common examples of machine learning are Netflix’s algorithms to make movie suggestions based on movies you have watched in the past or Amazon’s algorithms that recommend books based on books you have bought before. So if you want to learn more about machine learning, how do you start? For me, my first introduction is when I took an Artificial Intelligence class when I was studying abroad in Copenhagen.

Cray's Urika-GX aims at big data analytics. Business analytics are a core feature of most business systems today and to get the most from them companies are allotting them more and more compute power.

Cray's Urika-GX aims at big data analytics

Cray's new Urika-GX, the latest in the line of its top platform, provides an open, enterprise framework aimed specifically at the analytics market. The new machines are already being used by customers across the life sciences, healthcare, and cybersecurity industries, the company said. For example the Broad Institute of MIT and Harvard, a research institute, is using the Cray Urika-GX system for analyzing high-throughput genome sequencing data. According to Dominik Ulmer, Cray's VP of business operations EMEA, this is not exactly new ground for Cray. The company has been in the analytics business for the last four years.