background preloader

Data Science

Facebook Twitter

Pictures of Big Data. What are the Big Guys Using? Summary: The largest companies utilizing the most data science resources are moving rapidly toward more integrated advanced analytic platforms.

What are the Big Guys Using?

The features they are demanding are evolving to promote speed, simplicity, quality, and manageability. This has some interesting implications for open source R and Python widely taught in schools but significantly less necessary with these more sophisticated platforms. We continue to be dazzled, and perhaps rightly so, by the advances in deep learning and question answering machines like Watson. And while these are fun to read about and some of the apps that incorporate them can be both handy and addictive, they cause us to take our eye off the bigger ball. What is hardcore data science—in practice? Data science has become widely accepted across a broad range of industries in the past few years.

What is hardcore data science—in practice?

Originally more of a research topic, data science has early roots in scientists efforts to understand human intelligence and create artificial intelligence; it has since proven that it can add real business value. As an example, we can look at the company I work for: Zalando, one of Europe’s biggest fashion retailers, where data science is heavily used to provide data-driven recommendations, among other things. What is the difference between Data Science, Data Analysis, Big Data, Data Analytics, Data Mining and Machine Learning? Data Science deals with structured and unstructured data.

What is the difference between Data Science, Data Analysis, Big Data, Data Analytics, Data Mining and Machine Learning?

In principle, everything that relates to data cleansing, preparation and analysis lies within the scope of Data Science. There are different terms associated with Data Science. Let's look how these terms differ. Who’s Who In The Booming World Of Data Science » The nature of work and business in today’s super-connected world means that every second of every day, the world produces an astonishing amount of data.

Who’s Who In The Booming World Of Data Science »

Consider some of these statistics; every minute, Facebook users share nearly 2.5 million pieces of content, YouTube users upload over 72 hours of content, Apple users download nearly 50 000 apps and 200 million emails are sent. That’s every single minute. IBM estimated that we produce over 2.5 quintillion bytes of data every day and that growth shows no signs of slowing down. More than ever, we need people and systems to make sense of all that data and it’s no surprise that data science has become one of the hottest spaces in the tech industry. R vs Python: head to head data analysis. There have been dozens of articles written comparing Python and R from a subjective standpoint.

R vs Python: head to head data analysis

We’ll add our own views at some point, but this article aims to look at the languages more objectively. We’ll analyze a dataset side by side in Python and R, and show what code is needed in both languages to achieve the same result. This will let us understand the strengths and weaknesses of each language without the conjecture. At Dataquest, we teach both languages, and think both have a place in a data science toolkit.

We’ll be analyzing a dataset of NBA players and their performance in the 2013-2014 season. Read in a csv file nba <- read.csv("nba_2013.csv") Choosing R or Python for data analysis? An infographic. I think you'll agree with me if I say: It's HARD to know whether to use Python or R for data analysis.

Choosing R or Python for data analysis? An infographic

And this is especially true if you're a newbie data analyst looking for the right language to start with. It turns out that there are many good resources that can help you to figure out the strengths and weaknesses of both languages. They often go into great detail, and provide a tailored answer to questions such as "What should I use for Machine Learning? ", or "I need a fast solution, should I go for Python or R? ". RegexOne - Learn Regular Expressions - Lesson 1: An Introduction, and the ABCs.

Microsoft ODBC Driver for SQL Server on Linux. The ODBC driver for SQL Server allows native applications (C/C++) running on Linux to connect to SQL Server 2008, SQL Server 2008 R2, SQL Server 2012, and Microsoft Azure SQL Database.

Microsoft ODBC Driver for SQL Server on Linux

With Microsoft ODBC Driver 13 (Preview) for SQL Server, SQL Server 2014 and SQL Server 2016 (Preview), are now also supported. Always Encrypted The following are answers to questions about the ODBC Driver for SQL Server on Linux. For more information about the driver, see the Microsoft ODBC driver team blog. How do existing ODBC applications on Linux work with the driver? You should be able to compile and run the ODBC applications that you have been compiling and running on Linux using other drivers. Which features of SQL Server 2012 does this version of the driver support? A Beginner's Guide to Scaling to 11 Million+ Users on Amazon's AWS. How do you scale a system from one user to more than 11 million users?

A Beginner's Guide to Scaling to 11 Million+ Users on Amazon's AWS

Joel Williams, Amazon Web Services Solutions Architect, gives an excellent talk on just that subject: AWS re:Invent 2015 Scaling Up to Your First 10 Million Users. If you are an advanced AWS user this talk is not for you, but it’s a great way to get started if you are new to AWS, new to the cloud, or if you haven’t kept up with with constant stream of new features Amazon keeps pumping out. As you might expect since this is a talk by Amazon that Amazon services are always front and center as the solution to any problem. Blog - Writings about data science, from the makers of Dataquest.io. Learn Python The Hard Way (Programmier-Anfängerguide) Making R the Enterprise Standard for Cross-Platform Analytics, Both On-Premises and in the Cloud - Machine Learning. Advanced und Predictive Analytics – woher kommt das große Interesse? Advanced und Predictive Analytics – woher kommt das große Interesse?

Advanced und Predictive Analytics – woher kommt das große Interesse?

Die fortgeschrittene Analyse von Daten umfasst einfach gesprochen alles, was über die einfache Darstellung von Daten in Reports oder Dashboards oder simple Analysemöglichkeiten wie Sortierung/Gruppierung oder Anwendung von Grundrechenarten (z.B. Aggregation in einem OLAP Modell) hinausgeht. Die gängigsten Verfahren der fortgeschrittenen Analyse aus der Statistik und dem maschinellen Lernen dienen der Mustererkennung in Daten, womit vor allem mehrdimensionale Einflussfaktoren auf die Bildung von Segmenten, Feststellung von Abhängigkeiten oder die Vorhersage von Werten oder Klassenzugehörigkeiten (Predictive Analytics) einbezogen werden können.

Die Nachfrage nach der Nutzung bzw. Cheat Sheet: Data Visualization with R. Data Science Central Tutorials 16 members Description This is the place to post or read articles about implementations (linear or logistic regression, clustering, visualization, Hadoop, decision trees, collaborative filtering, etc.) using software (RapidMiner, Tableau, SAS, Pivotal, Teradata, SPSS etc.)

Cheat Sheet: Data Visualization with R

Comments Discussions Members. Cheat Sheet: Data Visualization with R. RStudio – Cheatsheets. The cheat sheets below make it easy to learn about and use some of our favorite packages. From time to time, we will add new cheat sheets to the gallery. If you’d like us to drop you an email when we do, let us know by clicking the button to the right. Data Visualization Cheat Sheet The ggplot2 package lets you make beautiful and customizable plots of your data. It implements the grammar of graphics, an easy to use system for building plots. Visualization of data science patterns. How to Become a Data Scientist. These days you can get a degree in data science so you can show your diploma that certifies your credentials. But these are relatively new so, with all due respect, if you only recently got your degree you are still a beginner. Those of us who use this title today most likely came from combination backgrounds of business, hard science, computer science, operations research, and statistics.

What you call yourself is one thing but what your employer or client is looking for can be quite a different kettle of fish. HP extends R programming language for big data use. Hewlett-Packard has devised a way to run programs written in the R statistical programming language against data sets that span more than one server, potentially paving the way for large-scale, real-time predictive analytics.

“Historically, big data has been focused on the past,” said Jeff Veis, HP vice president of marketing for the company’s big data business group. The new software will allow organizations to “anticipate breaking trends” by using very large data sets, he said. While various commercial packages offer ways to run R on computer clusters, HP’s new Distributed R is the first to offer this capability in an open source package, Veis said. With millions of users worldwide, the open-source R is one of the most widely used programming languages specifically designed for statistical computing and predictive analytics, alongside SAS, MatLab, Mathematica and a number of Python libraries.

Data Science Use Cases. Ataccama - DQ Analyzer. Easy, powerful data profiling and analysis + Download DQA Product SheetA critical task for today’s businesses of every size is identifying data issues before they become business issues. DQ Analyzer (DQA) combines advanced data profiling and analysis capabilities with a point-and-click interface that is simple enough for business managers to use without extensive training.

DQA may be simple, but it is not simplistic. It is a powerful tool that includes the robust, high-performance engine from Data Quality Center and offers an entire complement of the Ataccama expression language. What’s more, DQA offers IT professionals the opportunity to completely customize the process, from setting up business rules to parsing complex, unstructured fields. Best of all, DQA is part of our complete family of products. SAP Design Studio Archives - Visual BI Solutions. CSS Tips & Tricks: Sliding Panel Transition in SAP Design Studio Introduction A sliding panel transition can be a really good solution for someone looking to save on dashboard real estate.

Sliding it in and out of view with butter-smooth transitions on demand presents a very compelling case when it comes to dashboard aesthetics... Bookmarks in SAP BusinessObjects Design Studio 1.4: Tried and Tested Bookmarks in SAP BusinessObjects Design Studio 1.4 have undergone some enhancements and here are some of my findings after some extensive testing of the bookmark feature on local mode and on the BusinessObjects Platform. Basic Bookmarking: Changes made to the... SAP Design Studio 1.4 – What’s New in APIs SAP Design Studio 1.4 has been very surprising in many interesting ways ever since its release.

Fundamental methods of Data Science: Classification, Regression And Similarity Matching. Part 1: Integrating R with Web Applications : Business Intelligence, Analytics & Excel. *** UPDATE: On 1/23/2015 Surprise! Microsoft announced the acquisition of Revolution Analytics. This is great news and as a result I will be adding that solution option into this mix along with Azure ML R Web Service. *** In this multiple part series, I will share my journey reviewing and developing web applications with the analytics mega-star language R. Why would anyone want to integrate R into a web application or a dashboard? Kahn Academy - Probability and Statistic. Coursera - Data Science Specialization.

Data Mining Map. R project. Data Science. Kaggle. For any industry, we use the power of the world's largest community of data scientists to solve your data problem in a competitive framework, to improve the modeling power further and further from one participant to the next. The winners provide the very top models and code in exchange for prize money.

In fact, a stellar list of companies, governments, and researchers have posted their datasets on Kaggle to beat their pre-existing benchmarks. Host a Competition or See How it Works. DataCamp - Learn Data Analysis Online. The 6 Skills Required to be a Good Data Scientist - ANZ Blog. 10 things statistics taught us about big data analysis.