background preloader

Business Intelligence

Facebook Twitter

12 Algorithms Every Data Scientist Should Know. Algorithms have become part of our daily lives and they can be found in almost any aspect of business. Gartner calls this the algorithmic business and it is changing the way we (should) run and manage our organizations. There are all kinds of algorithms and for each aspect of your business, there are different algorithms, which nowadays you can even buy at an algorithm marketplace. Algoritmia provides developers with over 800 algorithms in the fields of audio and visual processing, machine learning and computer vision, saving developers precious time and money. However, the algorithms available on the Algoritmia marketplace might not be suitable for your particular need. After all, for different circumstances you require different algorithms and the same algorithm in a different environment can produce different results. Therefore, sometimes buying an off-the-shelve algorithm and then tweaking it might not be the best option.

What is a Mathematical Model? An answer for non-Geeks Recently I saw a question on that was not answered, so I took the opportunity to provide a response.

What is a Mathematical Model?

After I thought about people I have worked with/for over the years, very brilliant people in their own field, who really did not know the answer to this question. What are some example models? First, not all models are mathematical. 5 Reasons why Power BI is taking over Tableau as the best BI Tool - Analytics. In the current data frenzy era, many companies are delivering their own data visualisation tools.

5 Reasons why Power BI is taking over Tableau as the best BI Tool - Analytics

Even Google is stepping into this domain with a product calledData Studio 360, included as part of its ‘Google 360’ suite. Tableau has been leading the market for quite some time already as the company was the first to introduce a data visualisation tool and its product is still considered the best. The ease at which Tableau’s tool can be used is hardly comparable to any other product in the Data Viz world. Nonetheless, a noteworthy competitor to Tableau has emerged recently. The BI tool from Microsoft called Power Bi is quickly catching up with Tableau, and it appears that Microsoft’s offering is close to becoming the number one BI tool in the market. These are the 5 reasons why Power BI is aggressively competing with Tableau. What really makes PBI powerful is the sharing options it provides.

The Data Modelling options that are served after loading the dataset have improved significantly. Kafka Ecosystem at LinkedIn. Kafka is self-service for the most part: users define their event schema and start producing to the topic.

Kafka Ecosystem at LinkedIn

The Kafka broker automatically creates the topic with default configurations and partition counts. Finally, any consumer can consume the topic, making Kafka completely open-access. As Kafka’s usage grows and new use-cases emerge, a number of limitations become apparent in the above approach. First, some topics require custom configurations that require special requests to Kafka SREs. Second, it is hard for most users to discover and examine metadata (e.g., byte-rate, audit completeness, schema history, etc.) about the topics that they own. Nuage is the self-service portal for online data-infrastructure resources at LinkedIn, and we have recently worked with the Nuage team to add support for Kafka within Nuage.

Parallel Everything Architecture. Like many competitors, XtremeData began with an open-source database software package.

Parallel Everything Architecture

But unlike others, we then re-engineered the core query execution code with a truly parallel, vectorized SQL engine developed from first principles. The reasons for this are simple. Legacy database software, including all open-source packages, were developed decades ago and are not optimized for the key computing resources of today: many-core CPUs, large amounts of memory and high-speed networks. Unlike “federated” systems, where multiple complete instances of a database run in parallel, XtremeData offers a single instance of a database that within itself contains a truly parallel SQL execution engine.

The core software layer manages all peer-to-peer communication and data exchange between nodes. At XtremeData we have benchmarked our engine against federation-based competitors and also against “NoSQL” solutions like Hive, and measured performance gains of 10x. Parallel Everything Architecture. Tableau vs MicroStrategy - Business Intelligence Comparison.


Hadoop vs Data Warehouse: Apples & Oranges? Many Hadoop experts believe an integrated data warehouse (IDW) is simply a huge pile of data.

Hadoop vs Data Warehouse: Apples & Oranges?

However, data volume has nothing to do with what makes a data warehouse. An IDW is a design pattern, an architecture for an analytics environment. First defined by Barry Devlin in 1988, the architecture quickly was called into question as implementers built huge databases with simple designs as well as small databases with complex designs. In 1992, Bill Inmon published “Building the Data Warehouse,” which described two competing implementations: data warehouses and data marts.

Gartner echoed Inmon’s position in 2005 in its research “Of Data Warehouses, Operational Data Stores, Data Marts and Data Outhouses.” “Subject oriented” means the IDW is a digital reflection of the business. In contrast, a data mart deploys a small fraction of one or two subject areas (i.e., a few tables).

Building the Enterprise Data Lake: A look at architecture.