background preloader

Big Data

Facebook Twitter

Big Data visualization

A few stats, rumors and stories on Hadoop’s rapid growth — Data | GigaOM. The Hadoop Distributed File System. The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. By distributing storage and computation across many servers, the resource can grow with demand while remaining economical at every size. We describe the architecture of HDFS and report on experience using HDFS to manage 40 petabytes of enterprise data at Yahoo!

8.1. Introduction Hadoop provides a distributed filesystem and a framework for the analysis and transformation of very large data sets using the MapReduce [DG04] paradigm. An important characteristic of Hadoop is the partitioning of data and computation across many (thousands) of hosts, and the execution of application computations in parallel close to their data.

HDFS stores filesystem metadata and application data separately. 8.2. 8.2.1. 8.2.2. 8.2.3. 8.3.

Real-time Big Data

Что такое FlexPod? Инфраструктура просто и легко с «частным облаком» / Блог компании NetApp. Месяц назад компании Cisco, NetApp и Microsoft объявили о разработке совместных облачных платформ в рамках инициативы Hyper-V Cloud FastTrack. Эта инициатива призвана объединить производителей аппаратного обеспечения и облачную платформу Microsoft с целью максимально быстрого и лёгкого построения подобных решений. Давайте теперь более подробно рассмотрим компоненты данного решения в исполнении NetApp и Cisco.

Решение построено на гибкой унифицированной архитектуре (Unified Architecture). В нашем конкретном случае схема выглядит так:1) Сервера Cisco UCS – компания Cisco недавно на серверном рынке, однако активно представляет свои решения в данной категории. Все основные аппаратные компоненты решения мы перечислили, теперь давайте посмотрим, что все-таки делает из этой груды метала облачную платформу: 1) Microsoft Hyper-V – гипервизор, основной компонент для виртуализации, входит в состав операционной системы Microsoft Windows Server 2008 R2 SP1.

Big Data To Drive $232 Billion In IT Spending Through 2016. Big data will drive $232 billion in spending through 2016. It will directly or indirectly drive $96 billion of worldwide IT spending in 2012, and is forecast to drive $120 billion of IT spending in 2013. Gartner Research published the results today. They draw several conclusions from their research: Big data is not a distinct market. That’s part of the story but the dynamics of memory, storage, and CPU capability provide context for what is happening in the market: Memory doubled.High speed and high-capacity networking technology pricing has decreased considerably.Storage technology is moving from spinning disk to solid state disk and flash.Enhanced CPU performance. Storage management tops the list of sub-markets influenced by big data spending: Big data technologies abound but customers need to consider how technologies will adapt over time.

Does big data really need custom hardware? — Data | GigaOM. The Hadoop Wars: Cloudera And Hortonworks’ Death Match For Mindshare. ORIGINALLY PUBLISHED JUNE 2011Analyst's Note: Since this note was originally published in June 2011, there have been significant developments in the Hadoop market. In particular, the table in this research note is now outdated. Since then, Hortonworks has shifted Rob Bearden to CEO and Eric Baldeschwieler to CTO, and added Herb Cunitz as President and Greg Pavlik as Vice President of Engineering.

Hortonworks also has added over 50 paying customers as of March 2013. Cloudera, meanwhile, has since added to its funding, with total funding of $141 million as of March 2013. Wikibon provides a detailed assessment of the market as of June 2012 in Hadoop: From Innovative Up-Start to Enterprise-Grade Big Data Platform and will likewise soon publish another update on the Hadoop market for Spring/early Summer 2013. Originating Author: Jeff Kelly, With David Vellante and John Furrier Red Hat has a $10B market cap, however and competitors don’t want to let Cloudera run away with the Hadoop prize. Marginally Interesting: One does not simply scale into real-time. Monday, October 10, 2011 Real-time seems to be the next big thing in big data.

Map-Reduced has shown how to perform big analyses on huge data sets in parallel, and the next challenge seems to be to find a similar kind of approach to real-time. When you look around the web, there are two major approaches out there which try to building something which can scale to deal with Twitter-firehose-scale amounts of data. One is starting with a MapReduce framework like Hadoop and somehow finagle real-time or at least streaming capabilities on it.

The other approach starts with some event-driven “streaming” computing architecture and makes it scale on cluster. These are interesting and very cool projects, however from our own experience with retweet analysis at TWIMPACT, I get the feeling that both approaches fall short of providing a definitive answer. In short: One does not simply scale into real-time. Real-Time Stream Analysis So what is real-time stream analysis? Databases Approach Stream Processing. The Secrets of Building Realtime Big Data Systems.

Companies

KB Ramesh - TB2957 - Real-time, big data analytics. What is Hadoop not good for. Top 5 Reasons Not to Use Hadoop for Analytics | Quantivo. As a former diehard fan of Hadoop, I LOVED the fact that you can work on up to Petabytes of data. I loved the ability to scale to thousands of nodes to process a large computation job. I loved the ability to store and load data in a very flexible format. In many ways, I loved Hadoop, until I tried to deploy it for analytics.

That’s when I became disillusioned with Hadoop (it just "ain't all that") . At Quantivo, we’ve explored many ways to deploy Hadoop to answer analytical queries (trust me – I made every attempt to include it in my day job). Let me share with you my top reasons why Hadoop should not be used for Analytics. 1 - Hadoop is a framework, not a solution – For many reasons, people have an expectation that Hadoop answers Big Data analytics questions right out of the box. 2 - Hive and Pig are good, but do not overcome architectural limitations – Both Hive and Pig are very well thought-out tools that enable the lay engineer to quickly being productive with Hadoop. Tweet. Hadoop, Yahoo, 'Big Data' Brighten BI Future - Data Storage. An increasing number of jumbo-size enterprise data sets-and all the technology needed to create, store, network, analyze, archive and retrieve them-are considered "big data. " This massive amount of information is pushing the limits on storage, servers and security, creating an immense problem for IT departments that must be addressed.

So what's the tipping point? When does average-size data become big data? eWEEK's crack at this definition, with help from research firm Gartner, goes like this: "Big data refers to the volume, variety and velocity of structured and unstructured data pouring through networks into processors and storage devices, along with the conversion of such data into business advice for enterprises.

"These elements can be broken down into three distinct categories: volume, variety and velocity. Big Data: Tools, Processes and Procedures. Why the days are numbered for Hadoop as we know it — Cloud Computing News. As Big Data Takes Off, the Hadoop Wars Begin — Cloud Computing News. The Vendor Landscape of BI and Analytics. “In God we trust, all others bring data” —————————- The “Raw Data -> Aggregated Data -> Intelligence -> Insights -> Decisions” is a differentiating causal chain in business today.

To service this “data->decision” chain a very large industry is emerging. The Business Intelligence, Performance Management and Data Analytics is a large confusing software category with multiple sub-categories — mega-vendors (full stack, niche vendors, data discovery, visualization, data appliances, Open Source, Cloud – SaaS, Data Integration, Data Quality, Mobile BI, Services and Custom Analytics). But the interest in BI and analytics is surging. Arnab Gupta, CEO of Opera states why analytics are taking center stage, “We live in a world where computers, not people, are in the driver’s seat. In banking, virtually 100% of the credit decisions are made by machines.

Here is a list of vendors who participate in this marketspace: Big Data Startup and Existing Companies to Watch. Большие Данные — новая теория и практика - № 10, 2011 | Открытые системы. Уже более трех лет много говорят и пишут о Больших Данных (Big Data) в сочетании со словом «проблема», усиливая таинственность этой темы. За это время «проблема» оказалась в фокусе внимания подавляющего большинства крупных производителей, в расчете на обнаружение ее решения создается множество стартапов, а все ведущие отраслевые аналитики трубят о том, насколько сейчас важно умение работать с большими объемами данных для обеспечения конкурентоспособности.

Подобная, не слишком аргументированная, массовость провоцирует инакомыслие, и можно встретить немало скептических высказываний на ту же тему, а иногда к Big Data даже прикладывают эпитет red herring (букв. «копченая селедка» — ложный след, отвлекающий маневр). Так что же такое Big Data? Предыстория То, что подавляющая часть упоминаний Big Data так или иначе связана с бизнесом, может ввести в заблуждение. Большие Данные и бизнес Почему Большие Данные оказались проблемой? Масштабирование и многоуровневое хранение Аналитика Больших Данных. Big Data Market Size And Vendor Revenues. By Jeff Kelly with David Vellante and David Floyer This is the 2011 report, originally published on February 15, 2012. See Big Data Vendor Revenue and Market Forecast 2012-2017 for the 2012 update. The Big Data market is on the verge of a rapid growth spurt that will see it top the $50 billion mark worldwide within the next five years.

As of early 2012, the Big Data market stands at just over $5 billion based on related software, hardware, and services revenue. Increased interest in and awareness of the power of Big Data and related analytic capabilities to gain competitive advantage and to improve operational efficiencies, coupled with developments in the technologies and services that make Big Data a practical reality, will result in a super-charged CAGR of 58% between now and 2016.

As explained in our Big Data Manifesto, Big Data is the new definitive source of competitive advantage across all industries. Below is Wikibon’s five-year forecast for the Big Data market as a whole: Big Data Manifesto | Hadoop, Business Analytics and Beyond. A Big Data Manifesto from the Wikibon Community Providing effective business analytics tools and technologies to the enterprise is a top priority of CIOs and for good reason. Effective business analytics – from basic reporting to advanced data mining and predictive analytics — allows data analysts and business users alike to extract insights from corporate data that, when translated into action, deliver higher levels of efficiency and profitability to the enterprise. Underlying every business analytics practice is data. Traditionally, this meant structured data created and stored by enterprises themselves, such as customer data housed in CRM applications, operational data stored in ERP systems or financial data tallied in accounting databases.

Traditional data management and business analytics tools and technologies are straining under the added weight of Big Data and new approaches are emerging to help enterprises gain actionable insights from Big Data. The Changing Nature of Big Data. Ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety. Big Data Sees Venture Capitalists Invest Over $1 Billion. Q2 2012 sees financing and deals up 150% and 304%, respectively.

Is big data another buzzword or the next big thing? Big data, whether you believe it be this year’s VC buzzword (remember hyperlocal?) Or truly the next big thing in IT, has clearly captured the hearts (and wallets) of VCs with big data financing eclipsing the $1 billion mark since Q2 2011. In fact, $1.15 billion has been invested across 90 transactions over that period. 503 Service Unavailable.