background preloader

Cloudera » Apache Hadoop for the Enterprise

Cloudera » Apache Hadoop for the Enterprise

http://www.cloudera.com/content/cloudera/en/home.html

Attributor raises $3.2M to crack down on plagiarism Attributor, a site that helps publishers track down unauthorized copies of their content, has raised another $3.2 million in funding. The San Mateo, Calif. company says it offers a sophisticated way to detect when text or photos have been copied, taking a “fingerprint” of a paragraph or image’s essential features, then scanning 35 billion Web pages to see where the content has been duplicated. Then it helps publishers contact the offending sites and make them link back to the original article, share their advertising revenue, or just take the content down altogether. Attributor also announced some big customers today, namely the Magazine Publishers of America and the United Kingdom’s Periodical Publishers Association. The company already serves large news organizations including Reuters and The Associated Press. The round brings Attributor’s total funding to $25.2 million.

About Kaggle and Crowdsourcing Data Modeling Kaggle is the world's largest community of data scientists. They compete with each other to solve complex data science problems, and the top competitors are invited to work on the most interesting and sensitive business problems from some of the world’s biggest companies through Masters competitions. Kaggle provides cutting-edge data science results to companies of all sizes. We have a proven track-record of solving real-world problems across a diverse array of industries including life sciences, financial services, energy, information technology, and retail. Read more about our solutions » Our community Attributor Digimarc is a digital watermarking technology provider enabling embedding of information into many forms of content, including printed material, audio, video, imagery, and certain objects. Digimarc technology provides solutions for media identification and management, counterfeit and piracy deterrence, and digital commerce.[3][4] History[edit] Digimarc was founded by Geoff Rhoads, an astrophysicist with a background in deep space imaging. Initial inspiration for the company came while photographing images of the planet Jupiter. He felt that his digital images were vulnerable on the internet, even with copyright protection.[3] In 1996, after initial venture funding,[5] Digimarc released its first product: a digital watermarking plug-in bundled with Adobe Photoshop, Corel, and Micrografix.[6] After a second round of venture funding and increased investments in research and technology, Digimarc signed a multi-year contract with a consortium of central banks.

Hadoop – The Power of the Elephant — eBay Tech Blog In a previous post, Junling discussed data mining and our need to process petabytes of data to gain insights from information. We use several tools and systems to help us with this task; the one I’ll discuss here is Apache Hadoop. Created by Doug Cutting in 2006 who named it after his son’s stuffed yellow elephant, and based on Google’s MapReduce paper in 2004, Hadoop is an open source framework for fault tolerant, scalable, distributed computing on commodity hardware. MapReduce is a flexible programming model for processing large data sets:Map takes key/value pairs as input and generates an intermediate output of another type of key/value pairs, while Reduce takes the keys produced in the Map step along with a list of values associated with the same key to produce the final output of key/value pairs. Map (key1, value1) -> list (key2, value2)Reduce (key2, list (value2)) -> list (key3, value3)

The DataSift Platform Social data is noisy. Whether you’re trying to social analyze trends within an industry, or mentions of your products or brands, you need a platform that can filter out the noise and allow you to focus on the data that’s most relevant to you. This is especially important when you are paying for the social data you receive. At the heart of the DataSift platform is a high-performance filtering engine with which you can find the exact content and conversations that are relevant to your business. Go beyond keywords and filter on more than over 300 unique fields including author, location, language, and demographics.

New study suggests e-book piracy is on the rise Last January a company called Attributor conducted its first e-book piracy study. And back in May, I mentioned that study in piece called " Is Pad supercharging e-book piracy? " Well, Attributor has conducted a second study more recently and come up with some interesting data. The company says its key findings are: Big Data Jobs at Jive Software Jive is on a singular mission to transform the way we work. We’re the first company to bring the social innovation of the consumer web to the enterprise. And in doing so, we’re making work great. We’re breaking down the barriers separating employees, customers, and partners making it possible for the first time to engage socially and genuinely around what matters most to them. That’s a Big Data problem. We regularly process multi-terabyte datasets that encompass the public web, enterprise systems of record, social graph and interaction data, and more.

Center'd (Startups) at Duck Duck Go Ignore this box please. Add to Browser Install DuckDuckGo Plus Install search plugin only More ways to add DDG Feedback Buysight Dear Customers, Friends, and Colleagues, Over the past four years, Buysight has developed an innovative suite of targeted display advertising products, building on our partnerships with online retailers, and rooted in our understanding of shopper purchase intent. Our retargeting and customer acquisition solutions have helped hundreds of clients increase their sales and realize a stellar return on their advertising dollar. Today we take the next step toward providing those clients with even more value and opportunities. Buysight has joined forces with Advertising.com as part of the AOL family. Together, we represent the strongest targeted display advertising platform on the Internet.

The most established distribution by far with most number of referenced deployments. Powerful tooling for deployment, management and monitoring are available. Impala is developed and contributed by Cloudera to offer real time processing of big data. by sergeykucherov Jul 15

Related:  Big DataOSS Big Data