background preloader

BigQuery

BigQuery

Decomposing Twitter (Database Perspective) Twitter - one of the latest and hottest Web 2.0 trends used by millions of users around the world every day. How does it manage the enormous data produced by its users? This article reviews the technical aspects of the Twitter service, how it handles such a tremendous amount of tweets and other data, and looks at what we can learn from the findings. Twitter As we all know, Twitter, which launched its service in 2006, is expanding at an amazing pace. According to official About Twitter page, as of September 2010 some 175 million users use the service, generating about 95 million tweets a day. Simply put, the scale of service is amazing, but what is even more amazing is its growth rate. Understanding Twitter For those of us who are satisfied with sitting in the passenger seat, here is a brief outline of the services provided by Twitter. Twitter is a micro blogging service. Here are two core services of Twitter: Real-Time Data in Twitter Tweets However, there is a problem with this method.

Apache Drill Speed is Key Leveraging an efficient columnar storage format, an optimistic execution engine and a cache-conscious memory layout, Apache Drill is blazing fast. Coordination, query planning, optimization, scheduling, and execution are all distributed throughout nodes in a system to maximize parallelization. Liberate Nested Data Perform interactive analysis on all of your data, including nested and schema-less. Flexibility Strongly defined tiers and APIs for straightforward integration with a wide array of technologies. Disclaimer Apache Drill is an effort undergoing incubation at The Apache Software Foundation sponsored by the Apache Incubator PMC.

How to Fix Location-Based People Discovery Philip Cortes is co-founder of people discovery startup Meeteor. Follow him on Twitter @philipcortes. No clear winner came out of South by Southwest’s battle of people discovery apps. Highlight seems to have received the best press, and according to Robert Scoble, about 5% of SXSW used the service. Despite this buzz, the consensus was that all of these services fell short of expectations. Why did these apps fail? 1) Lack of Single-Player Mode. 2) Not Capturing Intent. The social overlap between users can act as the lubricant that facilitates meeting, but it alone won’t compel two strangers to meet. 3) Transparent Privacy Settings. 4) Pick a Niche. 5) Mimic Offline Behavior. All five of these solutions don’t have to be solved perfectly in order for one app or web service to win the race.

Drill Drill Overview Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Drill is the open source version of Google's Dremel system which is available as an IaaS service called Google BigQuery. One explicitly stated design goal is that Drill is able to scale to 10,000 servers or more and to be able to process petabyes of data and trillions of records in seconds. Currently, Drill is incubating at Apache. High Level Concept There is a strong need in the market for low-latency interactive analysis of large-scale datasets, including nested data (eg, JSON, Avro, Protocol Buffers). In recent years open source systems have emerged to address the need for scalable batch processing (Apache Hadoop) and stream processing (Storm, Apache S4). It is worth noting that, as explained by Google in the original paper, Dremel complements MapReduce-based computing. The Apache Drill team uses Chronon for testing.

State Street’s Chief Scientist on How to Tame Big Data Using Semantics Semantic databases are the next frontier in managing big data, says State Street's David Saul. Financial institutions are accumulating data at a rapid pace. Between massive amounts of internal information and an ever-growing pool of unstructured data to deal with, banks' data management and storage capabilities are being stretched thin. But relief may come in the form of semantic databases, which could be the next evolution in how banks manage big data, says David Saul, Chief Scientist for Boston-based State Street Corp. The semantic data model associates a meaning to each piece of data to allow for better evaluation and analysis, Saul notes, adding that given their ability to analyze relationships, semantic databases are particularly well-suited for the financial services industry. "Our most important asset is the data we own and the data we act as a custodian for," he says. Using a semantic database, each piece of data has a meaning associated with it, says Saul. More Insights

AWS | Amazon Redshift – Cloud Data Warehouse Solution It’s never been easier to get file data into Amazon Redshift, using AWS Lambda. You simply push files into a variety of locations on Amazon S3 and have them automatically loaded into your Amazon Redshift clusters. Read more in A Zero-Administration Amazon Redshift Database Loader (April 2015). Amazon Redshift delivers fast query performance by using columnar storage technology to improve I/O efficiency and parallelizing queries across multiple nodes. Amazon Redshift has custom JDBC and ODBC drivers that you can download from the Connect Client tab of our Console, allowing you to use a wide range of familiar SQL clients. Amazon Redshift’s data warehouse architecture allows you to automate most of the common administrative tasks associated with provisioning, configuring and monitoring a cloud data warehouse. Security is built-in. Amazon Redshift uses a variety of innovations to obtain very high query performance on datasets ranging in size from a hundred gigabytes to a petabyte or more.

Jasper Soft eBook Five Levels of Embedded Bi PDF 16098 5 QUESTIONS with Gnip Inc. CEO Jud Valeski Boulder-based Gnip Inc. specializes in gathering data from public social networks such as Twitter, Facebook and YouTube and providing real-time content to a variety of firms in industries such as social media monitoring, business intelligence, government and finance. CEO Jud Valeski spoke with the Camera recently about the company's current state, competition and direction. The following has been edited for clarity and space. 1. What are some of the biggest challenges in collecting this data? All of these different providers, all of these different consumers of the content use all of their own protocols and formats to communicate outward and inward to consumers. There's also a volume of content challenge, which is actually a bigger one. 2. There's this social cocktail and there are obviously a variety of different sources and a variety of different providers. ... 3. It's both a blessing and a curse. ... 4. We're 35 people now. 5. There's also a "what's old is new again" (shift).

10 Ways to Discover Social Media Content When brainstorming about what kinds of content to create and share in social media, you need not look any further for inspiration than social media channels themselves. Colleagues who communicate with customers every day can also provide excellent insight. Let’s take a closer look at how to uncover the important issues your community is ready and willing to discuss with you. 1. Ask Your Customers Directly Your customers are your best source of intelligence. 2. Your sales team spends their time talking to customers and prospects. 3. Customer service and technical support reps know what the weak points are in your products and marketing materials. 4. While asking your customers direct questions is one way to get information from them, following them on Twitter and other social media platforms is a way to find out what’s really on their minds. 5. LinkedIn Groups can be a great source of content ideas. 6. 7. 8. What kinds of questions are people asking online in your industry? 9. 10.

Study: Social media is 'brain candy' - Vote for the best company in Austin's business competition ACBJ archive Researchers at Harvard say that social media is brain candy for those who use it. The study comes as the deadline nears for nominations to participate in the Social Madness competition which will honor companies doing outstanding work in social networking. Staff Silicon Valley Business Journal A new study from Harvard says the reason that Facebook, Twitter and other social media are so popular and addictive is that they pleasantly stimulate the same part of the brain as when people eat food, get money or have sex. The researchers found that people simply like to talk about themselves and social media outlets provide a very effective way to do that. The researchers at Harvard asked test subjects hooked up to an MRI machine questions about their own opinions and some about other people's opinions. They found the brain was strongly engaged when the test subjects talked about themselves, and less engaged when talking about someone else.

fredmcclimans.com HP “All in on Big Data.” More Acquisitions Ahead? HP has watched Big Data develop over several years and believes that it will cause major disruption in the storage and systems market, says Manoj Goyal, Senior Director of Data Management Solutions in HP’s Enterprise Group. As a result, “HP is all in on Big Data,” he said in an interview in The SiliconAngle Cube at HP Discover 2012 (full video below). It is working to develop a complete ecosystem around Hadoop that includes hardware, middleware, software, and services to “make Big Data more accessible to customers.” And, he said, “While HP does not make its M&A plans public, we are always looking for partners and ways to fill in gaps. HP, he said, is solving the “Big Data consumption problem. Meanwhile, Goyal’s group is working with Autonomy and Vertica internally to create a software layer to support Big Data analysis. “Clearly Vertica is the best place to take the Big Data once it is structured,” he said. HP also realized early that Big Data goes hand-in-hand with cloud.

Understanding Big Data « opencollaborarchy The more we know the less we understand. Nowhere is this more true than on the Social Network, where volume, velocity, volatility and variability are increasing on a daily basis. Those 4 V’s are part of a definition of big data, which includes both structured and unstructured data. In general there are no data models, no data definitions, no rules and no discipline of housekeeping for unstructured data in Social Media. We do however have some rudimentary tools at our disposal, but like early man our technical bows and arrows are a poor match against the stampeding herd of beasts that is the social network stream. Knowing how to perform the three C’s is therefor one of the keys to success. Taking these observations a little further I believe the following 5 components are necessary in order to navigate, participate and collaborate in world of social information. 1. Peter Drucker dies at 95 (Photo credit: IsaacMao) 2. 3. 4. 5. Like this: Like Loading...

Related: