background preloader

Products

Facebook Twitter

Real-time Discovery Engine - YourVersion: Discover Your Version of the Web™ Real-time Discovery Engine - YourVersion: Discover Your Version of the Web™ Real-time Discovery Engine - YourVersion: Discover Your Version of the Web™ Real-time Discovery Engine - YourVersion: Discover Your Version of the Web™ Mavuno: Hadoop-Based Text Mining Toolkit. Data Extraction, Web Screen Scraping Tool, Mozenda Scraper. Parallel Data Warehousing (PDW) Explained | James Serra's Blog.

Microsoft SQL Server Parallel Data Warehouse (PDW), formally called by its code name “Project Madison”, is an edition of Microsoft’s SQL Server 2008 R2 that was released in December 2010. PDW is Microsoft’s reworking of the DatAllegro Inc. massive parallel processing (MPP) product that Microsoft acquired in July 2008. It only works with certain hardware (two so far), the first of which is HP Enterprise Data Warehouse Appliance (Dell Parallel Data Warehouse Appliance is the other, with a couple more to come in the near future: IBM and Bull). This edition of SQL Server can’t be bought as an independent piece of software, it has to be bought along with the hardware. So what is MPP? MPP is also available from other companies such as EMC Greenplum, Teradata, Oracle Exadata, HP Vertica, and IBM Netezza, but those use proprietary systems, where PDW can be used with commodity hardware, providing a much lower cost per terabyte. HP calls PDW by a different name: Enterprise Data Warehouse (EDW).

Greenplum is driving the future of Big Data analytics. Welcome to Apache™ Hadoop™! Welcome to Hadoop™ MapReduce! DataFu for Pig and Hadoop. RainStor Runs Its Database Natively on Hadoop | ServicesANGLE. Hadoop Quickstart: Use Whirr to automate standup of your distributed cluster on Rackspace. We have previously provided a Quickstart guide to standing up Rackspace cloud servers (and have one for Amazon servers as well).

These are very low cost ways of building reliable, production ready capabilities for enterprise use (commercial and government). And Bryan Halfpap has provided a Quickstart guide which shows you how to build a Hadoop Cluster (leveraging Cloudera’s CDH3). Using Bryan’s guide you can have a Hadoop Cluster up and running in under 20 minutes. With this post we would like to provide you with some additional tips that flow from these other posts. What is Whirr? Whirr provides:A cloud-neutral way to run services. And the great news is you can use Whirr as a command line tool for deploying clusters. If you follow the tips below you can use Whirr to quickly standup distributed clusters. SSH into your Rackspace account by terminal window: sudo ssh root@50.56.237.236 After logging in, it is always a good idea to make sure you have the latest packages. Sudo yum upgrade.

Real-time Discovery Engine - YourVersion: Discover Your Version of the Web™ S4: Distributed Stream Computing Platform. BOOM -- Berkeley Orders of Magnitude -- Declarative Languages And Systems.