background preloader

Hadoop

Facebook Twitter

Spark Packages. Which Linux distribution you find the most suitable for hadoop?

Apache Hadoop

HD Insight. Hortonworks. Installation - Linux. Install in Windows. MapR. New Tools for New Times – Primer on Big Data, Hadoop and “In-memory” Data Clouds Business Analytics 3. Understanding the Elements of Big Data More than a Hadoop Distribu... Ofir's random thoughts of data technologies. Tez - Build, Install, Configure and Run Apache Hadoop 2.2.0 in Microsoft Windows OS - SrcCodes. Natalino Busa: Hadoop 2.0 beyond mapreduce: A distributed generic application OS for data crunching. Central on the original concept of Hadoop is the map-reduce paradigm/architecture.

Natalino Busa: Hadoop 2.0 beyond mapreduce: A distributed generic application OS for data crunching

Mapreduce is based on two entities: one Job Tracker and a series of Task tracker (mostly one per data worker). This paradigm is powerful but this is only one way to accomplish distributed computing. This approach is batch oriented and is targeting crunching large files making the most use of data locality. However good mapreduce is for a specific class of distributed computing tasks, it is not a general pattern that applies well for all applications. Rather than forcing the mapreduce paradigm on each application running on the cluster hadoop 2.0 and yarn focus on the idea how to separate the hadoop application (mapreduce) from a more general problem of resource monitoring and management (yarn).

In YARN, when you submit a job you push the job to a scheduler which allocates resources according to a resource constraints scheduling algorithm of choice. When the client wants to execute a new job. Conclusion. Blogs. We’ve made a nice fix to the Templeton job submission service that runs on the HDInsight clusters for remote job submission.

Blogs

We’ve talked with a number of customers who want to be able to get access to the logs for the jobs remotely as well. This typically requires access directly to the cluster. We’ve updated Templeton to support dropping the job logs directly into ASV as part of the status directory. The way to do this is to pass “enablelogs” as a query string parameter set to true. Here’s the what the request looks like: Upon job completion, the logs will be moved into the status directory, under a logs folder with the following structure: Here’s a screen shot from Storage Studio that shows the folder structure: If you look in the syslog file here, you’ll see a bunch of goodness about your job execution. We will be updating the SDK to support this parameter in the next release, but for now, you can submit jobs directly to the cluster and add this parameter to get job logs. Samples topic title TBD - Windows Azure. 18 essential Hadoop tools for crunching big data. Hadoop for .NET Developers: Setting Up a Desktop Development Environment - Data Otaku.

NOTE This post is one in a series on Hadoop for .NET Developers.

Hadoop for .NET Developers: Setting Up a Desktop Development Environment - Data Otaku

If you are a .NET developer, you will want to setup a desktop development environment with the following components: Having these components installed on your desktop will allow you to develop against Hadoop locally as well as against a remote cluster (whether on-premise on in the cloud). You might be able to get away with not installing Hadoop locally, but most of the .NET-oriented documentation I’ve found assumes this is your setup. I will assume you are comfortable installing Visual Studio on your own and the NuGet site provides simple enough installation options. I do recommend installing Visual Studio and NuGet first and then making sure your system is up-to-date with patches before proceeding with the Hadoop installation. The Hadoop installation is very straightforward. S Tech Journal: Top 500 MSDN Links from Stack Overflow Posts. I explained in In my previous post how to run C# Map/Reduce jobs in Hadoop on Azure to find the top Namespaces in Stackoverflow posts.

s Tech Journal: Top 500 MSDN Links from Stack Overflow Posts

After that, I did another Map/Reduce on the Stackoverflow data dump and here is the list of Top 500 MSDN Urls we all referred in our Stackoverflow posts. This is just done on partial post data from the Stakoverflow data dump . Thought about sharing the same as it looked very interesting. S Tech Journal: Analyzing some ‘Big’ Data Using C#, Azure And Apache Hadoop – A Stack Overflow .NET Namespace Popularity Finder. Time to do something meaningful with C#, Azure and Apache Hadoop. In this post, we’ll explore how to create a Mapper and Reducer in C#, to analyze the popularity of namespaces in the Stack overflow posts.

Before we begin, let us explore Hadoop and Map Reduce concepts shortly. S Tech Journal: BIG DATA for .NET Devs: HDInsight, Writing Hadoop Map Reduce Jobs In C# And Querying Results Back Using LINQ. Sector RoadMap: SQL-on-Hadoop platforms in 2013. Hadapt. Cloud Computing and Hadoop. 24HOP/SQLRally - Fitting Microsoft Hadoop Into Your Enterprise BI Strategy - Cindy Gross - SQL Server and Big Data Troubleshooting + Tips. 24HOP/SQLRally - Fitting Microsoft Hadoop Into Your Enterprise BI Strategy Small Bites of Big Data Cindy Gross, SQLCAT PM The world of #bigdata and in particular #Hadoop is going mainstream.

24HOP/SQLRally - Fitting Microsoft Hadoop Into Your Enterprise BI Strategy - Cindy Gross - SQL Server and Big Data Troubleshooting + Tips

At 24HOP 2012 I talked about how a SQL Server professional fits into this big data world. Hadoop generally falls into the NOSQL realm. PoweredBy.