background preloader

Data Science

Facebook Twitter

Data Science Central. Home Page Fabio González. Analytics Discussions. 10 Real Data Scientist Interview Questions. We analyzed Glassdoor.com data to summarize 10 interview questions for Data Scientist Interview positions.

10 Real Data Scientist Interview Questions

How many can you answer? 1) Apple “How do you take millions of users with 100's of transactions each, amongst 10k's of products and group the users together in a meaningful segments?” 2) Facebook You're about to get on a plane to Seattle. 3) Netflix “How do you know if one algorithm is better than other?”

4) Goldman Sachs There's one box - has 12 black and 12 red cards, 2nd box has 24 black and 24 red; if you want to draw 2 cards at random from one of the 2 boxes, which box has the higher probability of getting the same color? 5) American Express We have like million card members and along with their transactions. 6) Quora Given two lists of sorted integers, develop an algorithm to sort these numbers into a single list efficiently. Recurrent Neural Networks Tutorial, Part 1 – Introduction to RNNs – WildML.

Recurrent Neural Networks (RNNs) are popular models that have shown great promise in many NLP tasks.

Recurrent Neural Networks Tutorial, Part 1 – Introduction to RNNs – WildML

But despite their recent popularity I’ve only found a limited number of resources that throughly explain how RNNs work, and how to implement them. That’s what this tutorial is about. It’s a multi-part series in which I’m planning to cover the following: As part of the tutorial we will implement a recurrent neural network based language model. The applications of language models are two-fold: First, it allows us to score arbitrary sentences based on how likely they are to occur in the real world.

I’m assuming that you are somewhat familiar with basic Neural Networks. Neural networks and deep learning. The human visual system is one of the wonders of the world.

Neural networks and deep learning

Consider the following sequence of handwritten digits: Most people effortlessly recognize those digits as 504192. That ease is deceptive. In each hemisphere of our brain, humans have a primary visual cortex, also known as V1, containing 140 million neurons, with tens of billions of connections between them. And yet human vision involves not just V1, but an entire series of visual cortices - V2, V3, V4, and V5 - doing progressively more complex image processing. The difficulty of visual pattern recognition becomes apparent if you attempt to write a computer program to recognize digits like those above. Neural networks approach the problem in a different way. And then develop a system which can learn from those training examples. Stitch Fix Technology – Multithreaded. Big Cloud Recruitment. How to use XGBoost algorithm in R in easy steps. Introduction Did you know using XGBoost algorithm is one of the popular winning recipe of data science competitions ?

How to use XGBoost algorithm in R in easy steps

So, what makes it more powerful than a traditional Random Forest or Neural Network ? In broad terms, it’s the efficiency, accuracy and feasibility of this algorithm. (I’ve discussed this part in detail below). In the last few years, predictive modeling has become much faster and accurate. Technically, “XGBoost” is a short form for Extreme Gradient Boosting. In this article, I’ve explained a simple approach to use xgboost in R.

16 Free Data Science Books. The Ultimate Plan to Become a Data Scientist in 2016. Introduction Data Scientist is one of the hottest jobs of this decade.

The Ultimate Plan to Become a Data Scientist in 2016

The demand for data scientists is much higher than available candidates (Source). So, there is a lot of incentive for people to look up to data science as a career option, and that is not going to change in near future. However, if you do one search on Google, you will see your dream vanishing. There are too many resources, advice and paths suggested by various people, which makes it impossible for a beginner to take right decisions. If you are facing a similar problem, let’s accomplish this in 2016. Click Here –> Link to Resources If you like what you just read & want to continue your analytics learning, subscribe to our emails, follow us on twitter or like our facebook page. Related My recommendations - SlideShare Presentations on Data Science Introduction Every one has their own learning sytle!

September 7, 2015 In "Business Analytics" Lifetime Lessons: 20 Things Every Data Scientist Must Know Today November 18, 2015. 11 Important Model Evaluation Techniques Everyone Should Know. Model evaluation metrics are used to assess goodness of fit between model and data, to compare different models, in the context of model selection, and to predict how predictions (associated with a specific model and data set) are expected to be accurate.

11 Important Model Evaluation Techniques Everyone Should Know

Confidence Interval. Confidence intervals are used to assess how reliable a statistical estimate is. Wide confidence intervals mean that your model is poor (and it is worth investigating other models), or that your data is very noisy if confidence intervals don't improve by changing the model (that is, testing a different theoretical statistical distribution for your observations.) Modern confidence intervals are model-free, data -driven: click here to see how to compute them. A more general framework to assess and reduce sources of variance is called analysis of variance. Confusion Matrix. Gain and Lift Chart. Kolmogorov-Smirnov Chart. Chi Square. ROC curve. Gini Coefficient. Root Mean Square Error. L^1 version of RSME. DSC Resources.