background preloader

Streaming CEP

Facebook Twitter

Esper - Complex Event Processing. Real-time Processing (Spark, Puma, HOP) Spark Streaming Spark Streaming is an interesting extension to Spark that adds support for continuous stream processing to Spark.

Real-time Processing (Spark, Puma, HOP)

Spark Streaming is in active development at UC Berkeley's amplab alongside the rest of the Spark project. The group recently gave a presentation at AmpCamp 2012 and the video gives a pretty good overview. If you'd like to follow along with the video with your own copy of the slides you can obtain them here. The full paper for Spark Streaming can also be obtained from this link for more detailed information. Spark Streaming Motivation As mentioned on the overview page, most Internet scale systems have real time data requirements along side their growing batch processing needs.

Spark Streaming seeks to support these queries while maintaining fault tolerance similar to batch systems that can recover from both outright failures and stragglers. What is Spark? In order to understand Spark Streaming, it is important to have a good understanding on Spark itself. MillWheel: Fault-Tolerant Stream Processing at Internet Scale. What Is StreamInsight? A Primer for Non-Programmers - Microsoft StreamInsight. Are you trying to figure out whether StreamInsight might be something you could use, but you’re having trouble sifting through all the programming jargon that’s used to describe it?

What Is StreamInsight? A Primer for Non-Programmers - Microsoft StreamInsight

StreamInsight is, ultimately, a set of programming tools, and at some point it takes a programmer to implement a StreamInsight solution. But it really should be possible to get a handle on what StreamInsight is all about even if you’re not a programmer yourself. A new article published in the TechNet Wiki may be able to help: StreamInsight for Non-Programmers . It gives an overview of the technology, but it leaves out the C# references and relates StreamInsight to more familiar SQL databases and queries.

Check it out. When you’re done there and are ready to dig a little deeper, take a look at Get Started with StreamInsight 2.1 . And, as always, you can post questions or comments here or on the TechNet Wiki. Regards, StreamInsight for Non-Programmers - TechNet Articles - United States (English) Queries are the heart of StreamInsight. You can define one or more queries that pick through the data of a moving stream, looking for interesting values or patterns. For example, you might define a query over a stock ticker stream that watches for large and rapid fluctuations in a particular stock price.

The query language used in StreamInsight is a variation of LINQ ( Language-Integrated Query ). Similar to the way T-SQL allows you to query SQL databases, StreamInsight LINQ allows you to query streaming data. For example, in SQL Server you might have a database table named "Tollbooth" that contains information on cars that passed through a tollbooth over the past week. Select * from Tollbooth where LicenseState = "WA" Now suppose that instead of a static database of historical tollbooth readings you had live data coming from the tollbooth in real time. Var myQuery = from car in TollStream where car.LicenseState = "WA" select car; TimeSpan.FromMinutes(1)) select win.Count(); .

Mwinkle.blog HD Insight. We’ve made a nice fix to the Templeton job submission service that runs on the HDInsight clusters for remote job submission.

mwinkle.blog HD Insight

We’ve talked with a number of customers who want to be able to get access to the logs for the jobs remotely as well. This typically requires access directly to the cluster. We’ve updated Templeton to support dropping the job logs directly into ASV as part of the status directory. The way to do this is to pass “enablelogs” as a query string parameter set to true. Here’s the what the request looks like: Upon job completion, the logs will be moved into the status directory, under a logs folder with the following structure: Here’s a screen shot from Storage Studio that shows the folder structure: If you look in the syslog file here, you’ll see a bunch of goodness about your job execution.

We will be updating the SDK to support this parameter in the next release, but for now, you can submit jobs directly to the cluster and add this parameter to get job logs.