background preloader

Cloudera » Apache Hadoop for the Enterprise

Cloudera » Apache Hadoop for the Enterprise

Attributor raises $3.2M to crack down on plagiarism Attributor, a site that helps publishers track down unauthorized copies of their content, has raised another $3.2 million in funding. The San Mateo, Calif. company says it offers a sophisticated way to detect when text or photos have been copied, taking a “fingerprint” of a paragraph or image’s essential features, then scanning 35 billion Web pages to see where the content has been duplicated. Then it helps publishers contact the offending sites and make them link back to the original article, share their advertising revenue, or just take the content down altogether. Attributor also announced some big customers today, namely the Magazine Publishers of America and the United Kingdom’s Periodical Publishers Association. The company already serves large news organizations including Reuters and The Associated Press. The round brings Attributor’s total funding to $25.2 million.

About Kaggle and Crowdsourcing Data Modeling Kaggle is the world's largest community of data scientists. They compete with each other to solve complex data science problems, and the top competitors are invited to work on the most interesting and sensitive business problems from some of the world’s biggest companies through Masters competitions. Kaggle provides cutting-edge data science results to companies of all sizes. We have a proven track-record of solving real-world problems across a diverse array of industries including life sciences, financial services, energy, information technology, and retail. Read more about our solutions » Our community Sandbox Sandbox is a personal, portable Hadoop environment that comes with a dozen interactive Hadoop tutorials. Sandbox includes many of the most exciting developments from the latest HDP distribution, packaged up in a virtual environment that you can get up and running in 15 minutes! Learn HadoopSandbox comes with a dozen hands-on tutorials that will guide you through the basics of Hadoop; tutorials built on the experience gained from training thousands of people in our Hortonworks University Training classes. Build a Proof of ConceptThe Sandbox includes the Hortonworks Data Platform in an easy to use form. You can add your own datasets, and connect it to your existing tools and applications.

Attributor Digimarc is a digital watermarking technology provider enabling embedding of information into many forms of content, including printed material, audio, video, imagery, and certain objects. Digimarc technology provides solutions for media identification and management, counterfeit and piracy deterrence, and digital commerce.[3][4] History[edit] Digimarc was founded by Geoff Rhoads, an astrophysicist with a background in deep space imaging. Initial inspiration for the company came while photographing images of the planet Jupiter. He felt that his digital images were vulnerable on the internet, even with copyright protection.[3] In 1996, after initial venture funding,[5] Digimarc released its first product: a digital watermarking plug-in bundled with Adobe Photoshop, Corel, and Micrografix.[6] After a second round of venture funding and increased investments in research and technology, Digimarc signed a multi-year contract with a consortium of central banks.

Hadoop – The Power of the Elephant — eBay Tech Blog In a previous post, Junling discussed data mining and our need to process petabytes of data to gain insights from information. We use several tools and systems to help us with this task; the one I’ll discuss here is Apache Hadoop. Created by Doug Cutting in 2006 who named it after his son’s stuffed yellow elephant, and based on Google’s MapReduce paper in 2004, Hadoop is an open source framework for fault tolerant, scalable, distributed computing on commodity hardware. MapReduce is a flexible programming model for processing large data sets:Map takes key/value pairs as input and generates an intermediate output of another type of key/value pairs, while Reduce takes the keys produced in the Map step along with a list of values associated with the same key to produce the final output of key/value pairs. Map (key1, value1) -> list (key2, value2)Reduce (key2, list (value2)) -> list (key3, value3)

The DataSift Platform Social data is noisy. Whether you’re trying to social analyze trends within an industry, or mentions of your products or brands, you need a platform that can filter out the noise and allow you to focus on the data that’s most relevant to you. This is especially important when you are paying for the social data you receive. At the heart of the DataSift platform is a high-performance filtering engine with which you can find the exact content and conversations that are relevant to your business. Go beyond keywords and filter on more than over 300 unique fields including author, location, language, and demographics.

16 Top Big Data Analytics Platforms Teradata delivers unified big data architecture Analytical DBMS: Teradata, Teradata Aster.In-memory DBMS: Although not an in-memory DBMS, Teradata Intelligent Memory monitors queries and automatically moves the most-requested data to the fastest storage tiers available, with options including RAM, flash, SSD, and various speeds of conventional spinning discs.Stream-analysis option: None.Hadoop distribution: Resells and supports the Hortonworks Data Platform. Hardware/software systems: Teradata and Teradata Aster are integrated software/hardware systems. Hadoop is supported with two Teradata appliance offerings as well as standardized Dell configurations.

New study suggests e-book piracy is on the rise Last January a company called Attributor conducted its first e-book piracy study. And back in May, I mentioned that study in piece called " Is Pad supercharging e-book piracy? " Well, Attributor has conducted a second study more recently and come up with some interesting data. The company says its key findings are: Big Data Jobs at Jive Software Jive is on a singular mission to transform the way we work. We’re the first company to bring the social innovation of the consumer web to the enterprise. And in doing so, we’re making work great. We’re breaking down the barriers separating employees, customers, and partners making it possible for the first time to engage socially and genuinely around what matters most to them. That’s a Big Data problem. We regularly process multi-terabyte datasets that encompass the public web, enterprise systems of record, social graph and interaction data, and more.

The most established distribution by far with most number of referenced deployments. Powerful tooling for deployment, management and monitoring are available. Impala is developed and contributed by Cloudera to offer real time processing of big data. by sergeykucherov Jul 15

Related:  Big DataBlogOSS Big Data