Analytics

> >

Binary logarithm. Plot of log2n In mathematics, the binary logarithm (log2 n) is the logarithm to the base 2. It is the inverse function of n ↦ 2n. The binary logarithm of n is the power to which the number 2 must be raised to obtain the value n. This makes the binary logarithm useful for anything involving powers of 2, i.e. doubling. For example, the binary logarithm of 1 is 0, the binary logarithm of 2 is 1, the binary logarithm of 4 is 2, the binary logarithm of 8 is 3, the binary logarithm of 16 is 4 and the binary logarithm of 32 is 5. Applications[edit] Information theory[edit] The binary logarithm is often used in computer science and information theory because it is closely connected to the binary numeral system. In information theory, the definition of the amount of self-information and information entropy involves the binary logarithm; this is needed because the unit of information, the bit, refers to information resulting from an occurrence of one of two equally probable alternatives. so . ).

ID3 Decision Tree Algorithm - Part 1. Introduction Iterative Dichotomiser 3 or ID3 is an algorithm which is used to generate decision tree, details about the ID3 algorithm is in here. There are many usage of ID3 algorithm specially in the machine learning field. In this article, we will see the attribute selection procedure uses in ID3 algorithm. Attribute selection section has been divided into basic information related to data set, entropy and information gain has been discussed and few examples have been used to show How to calculate entropy and information gain using example data. Attribute selection Attribute selection is the fundamental step to construct a decision tree. Fig 1: Data set to calculate Entropy and Information gain using ID3 Algorithm Attributes In the above table, Day, Outlook, Temperature, Humidity, Wind, Play ball denotes as attributes, Class(C) or Classifier among these attributes Play ball refers as Class(C) or Classifier. Collection (S) All the records in the table refer as Collection (S).

So, Where, Fig 2. Tutorials on Machine Learning (Tom Dietterich) Weka (machine learning) Free availability under the GNU General Public License.Portability, since it is fully implemented in the Java programming language and thus runs on almost any modern computing platform.A comprehensive collection of data preprocessing and modeling techniques.Ease of use due to its graphical user interfaces. Weka supports several standard data mining tasks, more specifically, data preprocessing, clustering, classification, regression, visualization, and feature selection.

All of Weka's techniques are predicated on the assumption that the data is available as a single flat file or relation, where each data point is described by a fixed number of attributes (normally, numeric or nominal attributes, but some other attribute types are also supported). Weka provides access to SQL databases using Java Database Connectivity and can process the result returned by a database query.

The Explorer interface features several panels providing access to the main components of the workbench: Talent Shortage Could Make Big Data a Bit Smaller - Page 2 - Technorati Cloud Computing. How to turn big data into engaging infographics with a single app. How can big data and smart analytics tools ignite growth for your company? Find out at DataBeat, May 19-20 in San Francisco, from top data scientists, analysts, investors, and entrepreneurs. Register now and save $200! Digesting big data can be migraine-inducing.

Knoema wants to make it as easy as scanning a graph. “With data, we want to do what YouTube did for video,” Vladimir Bougay, founder and CTO of startup Knoema, told VentureBeat in a recent interview. The process is DIY, but the end result is a slick-looking and shareable page that lets you select and display the patterns and trends you find in the data you or anyone else has uploaded. That’s the other fun part about Knowma — the data itself. “I think it’s one of those businesses that would have increasing returns as people use it,” said Will Price, chief executive at Flite, after watching the company show off its site at the DEMO conference in Santa Clara, Calif. today. Above all, said Bougay, the process is fast. Amazon DynamoDB: Big Data's Big Cloud Moment - Software - Information Management.

Three quick notes about derived data. April 24, 2012 I had one of “those” trips last week: 20 meetings, a number of them very multi-hour.A broken laptop.Flights that arrived 10:30ish Sunday night and left 7:00 Saturday morning. So please pardon me if things are a bit disjointed … I’ve argued for a while that: All human-generated data should be retained.The more important kinds of machine-generated data should be retained as well.Raw data isn’t enough; it’s really important to store derived data as well.

Here are a few notes on the derived data trend. He doesn’t generally use the term, but a big proponent these days of the derived data story is Hortonworks founder/CTO Eric Baldeschwieler, aka Eric 14. The KXEN guys don’t use the term “derived data” much either, but they tend to see the idea as central to predictive modeling even so. . #3 is the most automated part, and #1 is what KXEN thinks its technology makes unnecessary. Comments. PreMBA Finance. Stock A would have a return of 7 percent from the "Good" outcome, and only a 3 percent return from a "Bad" outcome.

Using the formula from the previous section on expected return, you can easily see that the expected return is 5 percent for stock A (half, or .5, of 7 percent, plus half of 3 percent is 5 percent). Stock B would have a much better return of 15 percent in a "Good" outcome, but lose 5 percent in a "Bad" outcome. The expected return is half of 15 percent, plus half of -5 percent, or 5 percent, the same as that for stock A. Both stocks have the same expected return. But even in the worst case, stock A still earns the investor 3 percent.

You can make assumptions regarding the risk tolerance of the individuals in the challenge problems. Variance Most people are risk averse, in that they wish to minimize the amount of risk they must endure to earn a certain level of expected return. This course will review two methods to calculate the variance of an expected return. Big data startup Sociocast goes beyond analytics with new SaaS tools (exclusive)

Big data and analysis startup Sociocast has launched two new Software-as-a-Service tools aimed at helping advertising and media companies better understand their customers, the company revealed today. Sociocast’s software analyzes “big data” sets and delivers predictive real-time audience data to help companies make more informed marketing decisions. Sociocast CEO Albert Azout told VentureBeat that it can’t disclose specific customers yet, but said it works with many large data aggregators, ad networks, and agency trading desks.

The company’s significantly updated software tools are versions 2.2 of Sociocast Connect and Sociocast Signal, both of which advance the company’s mission of creating “actionable” intelligence. “Unlike other companies, we’re not just creating intelligence and analytics,” Azout told us. “We’re also working with decision makers that are actually buying media. Sociocast Connect, the company’s flagship product, now offers the following features: Home. Sumo Logic and UIs for text-oriented data. February 6, 2012 I talked with the Sumo Logic folks for an hour Thursday. Highlights included: Sumo Logic does SaaS (Software as a Service) log management.Sumo Logic is text indexing/Lucene-based. Thus, it is reasonable to think of Sumo Logic as “Splunk-like”. (However, Sumo Logic seems to have a stricter security/trouble-shooting orientation than Splunk, which is trying to branch out.)Sumo Logic has hacked Lucene for faster indexing, and says 10-30 second latencies are typical.Sumo Logic’s main differentiation is automated classification of events.

There’s some kind of streaming engine in the mix, to update counters and drive alerts.Sumo Logic has around 30 “customers,” free (mainly) or paying (around 5) as the case may be.A truly typical Sumo Logic customer has single to low double digits of gigabytes of log data per day. What interests me about Sumo Logic is that automated classification story. The payoff is that machine learning directly informs the Sumo Logic user interface. Comments on the 2012 Forrester Wave: Enterprise Hadoop Solutions. Business intelligence industry trends. February 21, 2012 This is one of a series of posts on business intelligence and related analytic technology subjects, keying off the 2011/2012 version of the Gartner Magic Quadrant for Business Intelligence Platforms.

The four posts in the series cover: Besides company-specific comments, the 2011/2012 Gartner Magic Quadrant for Business Intelligence (BI) Platforms offered observations on overall BI trends in a “Market Overview” section. I have mixed feelings about Gartner’s list. In particular: Not inconsistently with my comments on departmental analytics, Gartner sees actual BI business users as favoring ease of getting the job done, while IT departments are more concerned about full feature sets, integration, corporate standards, and license costs.However, Gartner says as a separate point that all kinds of users want to relieve some of the complexity of BI, and really of analytics in general.

Here’s the forest that I suspect Gartner is missing for the trees: Let me be even more specific.

IT architectures

CEP. BPM. BI. Data warehouse. Data Warehouse Overview In computing, a data warehouse (DW, DWH), or an enterprise data warehouse (EDW), is a database used for reporting and data analysis. Integrating data from one or more disparate sources creates a central repository of data, a data warehouse (DW). Data warehouses store current and historical data and are used for creating trending reports for senior management reporting such as annual and quarterly comparisons. The data stored in the warehouse is uploaded from the operational systems (such as marketing, sales, etc., shown in the figure to the right). The data may pass through an operational data store for additional operations before it is used in the DW for reporting.

A data warehouse constructed from integrated data source systems does not require ETL, staging databases, or operational data store databases. A data mart is a small data warehouse focused on a specific area of interest. This definition of the data warehouse focuses on data storage. History[edit] Is the Relational Database Doomed? Recently, a lot of new non-relational databases have cropped up both inside and outside the cloud. One key message this sends is, "if you want vast, on-demand scalability, you need a non-relational database". If that is true, then is this a sign that the once mighty relational database finally has a chink in its armor?

Is this a sign that relational databases have had their day and will decline over time? In this post, we'll look at the current trend of moving away from relational databases in certain situations and what this means for the future of the relational database. Relational databases have been around for over 30 years. First, Some Background A relational database is essentially a group of tables (entities). Relational databases are facilitated through Relational Database Management Systems (RDBMS).

The reasons for the dominance of relational databases are not trivial. However, to offer all of this, relational databases have to be incredibly complex internally. The New Breed. SAS Plans In-Memory Apps Built on HP Hardware > > Intelligent En. Data Warehousing Shifts to Analytics Arms Race > > Intelligent E.