background preloader

High Performance Big Data Analytics Infrastructure

Facebook Twitter

Handling five billion sessions a day – in real time. Since we first released Answers seven months ago, we’ve been thrilled by tremendous adoption from the mobile community.

Handling five billion sessions a day – in real time

We now see about five billion sessions per day, and growing. Hundreds of millions of devices send millions of events every second to the Answers endpoint. During the time that it took you to read to here, the Answers back-end will have received and processed about 10,000,000 analytics events. The challenge for us is to use this information to provide app developers with reliable, real-time and actionable insights into their mobile apps. At a high level, we guide our architectural decisions on the principles of decoupled components, asynchronous communication and graceful service degradation in response to catastrophic failures. Databricks - The next generation of Big Data.

Big Data Business Insights. Looking for data-driven business insights?

Big Data Business Insights

With the integrated Teradata Aster Discovery Platform, organizations attain unmatched competitive advantage by making it faster and easier for a wider group of users to generate powerful, high impact business insights from big data. Cloud Services - HDInsight (Hadoop) Scale elastically on demand HDInsight is a Hadoop distribution powered by the cloud.

Cloud Services - HDInsight (Hadoop)

This means HDInsight was architected to handle any amount of data, scaling from terabytes to petabytes on demand. You can spin up any number of nodes at anytime. We charge only for the compute and storage you actually use. It's part of our audit requirements that we keep data for seven years, and some information has to be retained for as long as 30 years. –Don Wood, Beth Israel Deaconess Medical Center. Big Data Analytics - Platfora. Real-time data analysis with Kubernetes, Redis, and BigQuery on Google Cloud Platform.

What is BigQuery? - Google BigQuery. Querying massive datasets can be time consuming and expensive without the right hardware and infrastructure.

What is BigQuery? - Google BigQuery

Google BigQuery solves this problem by enabling super-fast SQL queries against append-only tables using the processing power of Google's infrastructure. Simply move your data into BigQuery and let us handle the hard work. You can control access to both the project and your data based on your business needs, such as giving others the ability to view or query your data.

You can access BigQuery by using a web UI or a command-line tool, or by making calls to the BigQuery REST API using a variety of client libraries such as Java, .NET or Python. There are also a variety of third-party tools that you can use to interact with BigQuery, such as visualizing the data or loading the data. Get started now with creating an app, running a web query or using the command-line tool, or read on for more information about BigQuery fundamentals and how you can work with the product. Projects Tables. Pivotal Cloud Foundry. What is the Buildpack Architecture in Pivotal Cloud Foundry?

Pivotal Cloud Foundry

Pivotal CF uses a flexible approach called buildpacks to dynamically assemble and configure a complete runtime environment for executing a particular type of applications. Since buildpacks are extensible to most modern runtimes and frameworks, applications written in nearly any language can be deployed to Pivotal Cloud Foundry. Developers benefit from an “it just works” experience as the platform applies the appropriate buildpack to detect, download and configure the language, framework, container and libraries for the application.

Pivotal Cloud Foundry provided buildpacks for Java, Ruby, Node, PHP, Python and golang are part of a broad buildpack provider ecosystem that ensures constant updates and maintenance for virtually any language. Containerization Combining the power of virtualization with efficient container scheduling, Pivotal Cloud Foundry delivers a higher server density than traditional environments. AWS Lambda. The code you run on AWS Lambda is called a “Lambda function.”

AWS Lambda

After you create your Lambda function it is always ready to run as soon as it is triggered, similar to a formula in a spreadsheet. Each function includes your code as well as some associated configuration information, including the function name and resource requirements. Lambda functions are “stateless,” with no affinity to the underlying infrastructure, so that Lambda can rapidly launch as many copies of the function as needed to scale to the rate of incoming events.

After you upload your code to AWS Lambda, you can associate your function with specific AWS resources (e.g. a particular Amazon S3 bucket, Amazon DynamoDB table, or Amazon Kinesis stream). Then, when the resource changes, Lambda will execute your function and manage the compute resources as needed in order to keep up with incoming requests. Sqrrl Enterprise - Linked Data Analysis for Hadoop. Our flagship product is Sqrrl Enterprise, a unified solution for integrating data to enable secure, real-time search, discovery, and analytics, powered by Apache Accumulo.

Sqrrl Enterprise - Linked Data Analysis for Hadoop

Sqrrl Enterprise enables organizations to ingest, secure, connect, and analyze massive amounts of structured, semi-structured, and unstructured data: Ingest: Streaming or bulk data ingest from any source.Secure: Encryption and labeling of data with fine-grained access controls.Connect: Automatically organize data and extract information about the entities and relationships you care about.Analyze: Web-based dashboarding and visual, contextual navigation of the data and relationships in the system.

Giraph - Welcome To Apache Giraph! Hama - a general BSP framework on top of Hadoop.