Apache Hive TM

HBase – Apache HBase Home Welcome to Apache Pig! Impala Cloudera Impala is the industry’s leading massively parallel processing (MPP) SQL query engine that runs natively in Apache Hadoop. The Apache-licensed, open source Impala project combines modern, scalable parallel database technology with the power of Hadoop, enabling users to directly query data stored in HDFS and Apache HBase without requiring data movement or transformation. Impala is designed from the ground up as part of the Hadoop ecosystem and shares the same flexible file and data formats, metadata, security and resource management frameworks used by MapReduce, Apache Hive, Apache Pig and other components of the Hadoop stack. Now You Have a Choice Before Impala, if your relational database was at capacity, you may have had no choice but to expand that system to maintain your expectations of performance. Now you have a choice. Impala delivers: Performance equivalent to leading MPP databases, and 10-100x faster than Apache Hive/Stinger. Key Features of Impala

Welcome to Apache™ Hadoop®! Oracle Database File System (DBFS) in Oracle Database 11g Release 2 Oracle has quite a long history with database file systems. The Oracle Internet File System (iFS) was released in the Oracle 8i days. This product was later renamed to Oracle Content Management SDK. DBFS creates a file system interface on top of database tables that store files as SecureFile LOBs. In this article I'll show the steps necessary to mount the DBFS on a Linux server. Related articles. Creating a File System Create a tablespace to hold the file system. CONN / AS SYSDBA CREATE TABLESPACE dbfs_ts DATAFILE '/u01/app/oracle/oradata/DB11G/dbfs01.dbf' SIZE 1M AUTOEXTEND ON NEXT 1M; Create a user, grant DBFS_ROLE to the user and make sure it has a quota on the tablespace. CONN / AS SYSDBA CREATE USER dbfs_user IDENTIFIED BY dbfs_user DEFAULT TABLESPACE dbfs_ts QUOTA UNLIMITED ON dbfs_ts; GRANT CREATE SESSION, RESOURCE, CREATE VIEW, DBFS_ROLE TO dbfs_user; Create the file system in tablespace by running the "dbfs_create_filesystem.sql" script as the test user. FUSE Installation # ldconfig

Tez - Groovy - Home Pangool - Hadoop API made easy Apache ZooKeeper - Home 22 free tools for data visualization and analysis You may not think you've got much in common with an investigative journalist or an academic medical researcher. But if you're trying to extract useful information from an ever-increasing inflow of data, you'll likely find visualization useful -- whether it's to show patterns or trends with graphics instead of mountains of text, or to try to explain complex issues to a nontechnical audience. There are many tools around to help turn data into graphics, but they can carry hefty price tags. The cost can make sense for professionals whose primary job is to find meaning in mountains of information, but you might not be able to justify such an expense if you or your users only need a graphics application from time to time, or if your budget for new tools is somewhat limited. If one of the higher-priced options is out of your reach, there are a surprising number of highly robust tools for data visualization and analysis that are available at no charge. Data cleaning DataWrangler

Apache Tez – Welcome to Apache Tez

A data warehouse system for Hadoop that offers a SQL-like query language to facilitate easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. by sergeykucherov Jul 15