background preloader


Related:  High Performance Big Data Analytics Infrastructure

The year in big data and data science Big data and data science have both been with us for a while. According to McKinsey & Company’s May 2011 report on big data, back in 2009 “nearly all sectors in the U.S. economy had at least an average of 200 terabytes of stored data … per company with more than 1,000 employees.” And on the data-science front, Amazon’s John Rauser used his presentation at Strata New York (below) to trace the profession of data scientist all the way back to 18th-century German astronomer Tobias Mayer. Of course, novelty and growth are separate things, and in 2011, there were a number of new technologies and companies developed to address big data’s issues of storage, transfer, and analysis. With that as a backdrop, below I take a look at three evolving data trends that played an important role over the last year. The ubiquity of Hadoop It was a big year for investment for Apache Hadoop-based companies. More data, more privacy and security concerns Open data’s inflection point Related:

Graph Analytics Graph Analytics The Graph Analytics Toolkit aims to provide high performance, distributed tools for graph mining, for use in community detection, social network discovery, etc. See the documentation for more details. The toolkit currently implements the following tools: Triangle Counting Two triangle counting program: Undirected Triangle Counting: counts the total number of triangles in a graph, or the the number of triangles each vertex is in.Directed Triangle Counting: Counts the number of types of triangles each vertex is in. PageRank A classical graph algorithm which assigns each vertex a numerical importance value based on random walk properties. KCore Decomposition Identifies a hierarchical ordering of the vertices in the graph, allowing discovery of the central components of the network.

What is BigQuery? - Google BigQuery Querying massive datasets can be time consuming and expensive without the right hardware and infrastructure. Google BigQuery solves this problem by enabling super-fast SQL queries against append-only tables using the processing power of Google's infrastructure. Simply move your data into BigQuery and let us handle the hard work. You can control access to both the project and your data based on your business needs, such as giving others the ability to view or query your data. You can access BigQuery by using a web UI or a command-line tool, or by making calls to the BigQuery REST API using a variety of client libraries such as Java, .NET or Python. Get started now with creating an app, running a web query or using the command-line tool, or read on for more information about BigQuery fundamentals and how you can work with the product. BigQuery fundamentals There are four main concepts you should understand when using BigQuery. Projects Tables Tables contain your data in BigQuery. Datasets Jobs

Speech, Language & Multimedia < Technology Services | Raytheon BBN Technologies For nearly four decades, Raytheon BBN Technologies has been a leader in speech and language technologies. Since the early 1970s, Raytheon BBN Technologies has been performing pioneering research in automatic speech recognition. Over the years, Raytheon BBN Technologies has had many firsts, including the first demonstration, in the early 1990s, of real-time, large-vocabulary, speaker-independent continuous speech recognition on commercial, off-the-shelf hardware. Raytheon BBN's Byblos, our primary speech recognition system, is an automatically trainable system that utilizes probabilistic hidden Markov models, and it continues to represent the state of the art in large-vocabulary, speaker-independent speech recognition. The Byblos engine forms the core of our application suite that includes Audio Indexer and Audio Monitoring System. Our natural language processing technologies can locate, identify, and organize information from a variety of sources and in multiple languages.

What is big data? Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it. The hot IT buzzword of 2012, big data has become viable as cost-effective approaches have emerged to tame the volume, velocity and variability of massive data. The value of big data to an organization falls into two categories: analytical use, and enabling new products. The past decade’s successful web startups are prime examples of big data used as an enabler of new products and services. The emergence of big data into the enterprise brings with it a necessary counterpart: agility. What does big data look like? As a catch-all term, “big data” can be pretty nebulous, in the same way that the term “cloud” covers diverse technologies. Volume This volume presents the most immediate challenge to conventional IT structures. Variety

Product Overview - Big Data Analytics - Datameer Integrate, prepare, analyze and visualize any data Datameer simplifies the big data analytics environment into a single application on top of the powerful Hadoop platform. The only end-to-end big data analytics application for Hadoop designed to make big data simple for everyone, Datameer combines self-service data integration, analytics and visualization functionality that provides the fastest time to insights. Data integration Liberate your data Data is the raw materials of insight and the more data you have, the deeper and broader the possible insights. Not just traditional, transaction data but all types of data so that you can get a complete view of your customers, better understand business processes and improve business performance. Smart Execution™ combines DAG-based data processing technology with data profiling and system information to optimally schedule and execute analytics tasks across various computation frameworks. Security for your data, in Datameer and Hadoop File Types

Platform as a Service | Pivotal Cloud Foundry | Pivotal What is the Buildpack Architecture in Pivotal Cloud Foundry? Pivotal CF uses a flexible approach called buildpacks to dynamically assemble and configure a complete runtime environment for executing a particular type of applications. Since buildpacks are extensible to most modern runtimes and frameworks, applications written in nearly any language can be deployed to Pivotal Cloud Foundry. Developers benefit from an “it just works” experience as the platform applies the appropriate buildpack to detect, download and configure the language, framework, container and libraries for the application. Pivotal Cloud Foundry provided buildpacks for Java, Ruby, Node, PHP, Python and golang are part of a broad buildpack provider ecosystem that ensures constant updates and maintenance for virtually any language. Containerization Combining the power of virtualization with efficient container scheduling, Pivotal Cloud Foundry delivers a higher server density than traditional environments. Monitoring Logging

Text Analysis and Text Mining Software: Lexalytics Les promesses du Big Data | ParisTech Review Le déluge des données numériques, évoqué dans nos colonnes par George Day et David Reibstein, n’impacte pas que les métiers du marketing. C’est l’ensemble des organisations de production qui est touché, et au-delà l’enjeu de compétitivité concerne les économies nationales. Ceux qui seront capables d’utiliser ces données auront une longueur d’avance pour connaître les opinions et détecter les mouvements culturels, mais aussi pour comprendre ce qui se joue au sein de leur organisation, en améliorant les processus et en informant mieux la prise de décision. Encore faut-il s’en donner les moyens: c’est tout la difficulté du “big data”, qui est à la fois une promesse et un défi. L’ère de l’information La question a d’abord surgi au sein du monde académique, quand une équipe dirigée par Peter Lyman et Hal R. Lyman et Varian évoquaient aussi la croissance déjà vertigineuse des échanges en ligne, avec le fameux Web 2.0 où chacun est un éditeur en puissance. Que faire de ces données?