
ŷhat | Decision Making Under Uncertainty: An Introduction to Robust Optimization (Part 1) Measuring the Impact of Uncertainty Data analytics is a process that incorporates data into building models that help us make decisions. These decisions should add value to our business. Robust optimization (RO) is a tool that helps us improve our decisions in uncertain scenarios by allowing us to add uncertainty that is present in a problem directly to a model. A Simple Production Model We begin with a simple example that can be found in any introductory book on optimization. You make only two products: chairs and tables. Looking at the resources required table below, you must decide how to assign these resources to maximize your output. Resources Required (on average) The production assignment is not flexible. The optimization problem can be formulated as follows: This is a very simple optimization problem known as a deterministic linear program (LP). Output: Objective value: 4.0Number of chairs: 2.0Number of tables: 2.0 But what if our estimates for resource usage are wrong? Nominal case
What Is Data Analysis and How Can You Start Learning It Today? Did you know that data science and analysis positions are often the hardest ones for a company to fill? Thanks to exploding demand for data professionals, there are a ton of open roles and not enough candidates to fill them. Translation? Now, just to clear up a common misconception right off the bat: you don’t need to be a math/computer science/coding whiz to land a job in data analysis. But how do you know if data analysis is something that might interest you? In this sponsored post in partnership with Udemy, we’ll tell you everything you need to know about getting started with data analysis. Let's jump right in! Disclosure: This post is sponsored by Udemy and I’m also an affiliate for them. What Is Data Analysis? First things first: what IS data analysis? In short, data analysis involves sorting through massive amounts of unstructured information and deriving key insights from it. A quick note here: data analysis and data science are not the same. Why You Should Learn Data Analysis Skills
Dataquest Blog - Writings about data science, from the makers of Dataquest.io It’s an exciting time for data science. The field is new, but growing quickly. There’s huge demand for data scientists – average compensation in SF is well north of 100 thousand dollars a year. The first step to learning data science is usually asking “how do I learn data science?”. I can’t fully explain how immensely unmotivating it is to be given a huge list of resources without any context. Some people learn best with a list of books, but I learn best by building and trying things. That’s why I don’t think your first goal should be to learn linear algebra or statistics. An example of the visualizations you can make with data science (via The Economist) 1. Nobody ever talks about motivation in learning. You need something that will motivate you to keep learning, even when it’s midnight, formulas are starting to look blurry, and you’re wondering if this will be the night that neural networks finally make sense. I was obsessed with improving the performance of my programs. 2. 3. 4. 5.
SQLBolt - Learn SQL - Introduction to SQL Tracking down the Villains: Outlier Detection at Netflix It’s 2 a.m. and half of our reliability team is online searching for the root cause of why Netflix streaming isn’t working. None of our systems are obviously broken, but something is amiss and we’re not seeing it. After an hour of searching we realize there is one rogue server in our farm causing the problem. We missed it amongst the thousands of other servers because we were looking for a clearly visible problem, not an insidious deviant. In Netflix’s Marvel’s Daredevil, Matt Murdock uses his heightened senses to detect when a person’s actions are abnormal. The Netflix service currently runs on tens of thousands of servers; typically less than one percent of those become unhealthy. A slow or unhealthy server is worse than a down server because its effects can be small enough to stay within the tolerances of our monitoring system and be overlooked by an on-call engineer scanning through graphs, but still have a customer impact and drive calls to customer service. How DBSCAN Works
SQLCourse - Interactive Online SQL Training for Beginners This Is What Controversies Look Like in the Twittersphere Many a controversy has raged on social media platforms such as Twitter. Some last for weeks or months, others blow themselves in an afternoon. And yet most go unnoticed by most people. That could happen thanks to the work of Kiran Garimella and pals at Aalto University in Finland. Various researchers have studied controversies on Twitter but these have all focused on preidentified arguments, whereas Garimella and co want to spot them in the first place. And they think this structure can be spotted by studying various properties of the conversation, such as the network of connections between those involved in a topic; the structure of endorsements, who agrees with whom; and the sentiment of the discussion, whether positive and negative. They test this idea by first studying ten conversations associated with hashtags that are known to be controversial and ten that are known to be benign. These networks allow further study.
Live SQL - Tutorial: Introduction to SQL Tables are the basic unit of data storage in an Oracle Database. Data is stored in rows and columns. You define a table with a table name, such as employees, and a set of columns. You give each column a column name, such as employee_id, last_name, and job_id; a datatype, such as VARCHAR2, DATE, or NUMBER; and a width. You can specify rules for each column of a table. For example: create table DEPARTMENTS ( deptno number, name varchar2(50) not null, location varchar2(50), constraint pk_departments primary key (deptno) ); Tables can declarative specify relationships between tables, typically referred to as referential integrity. create table EMPLOYEES ( empno number, name varchar2(50) not null, job varchar2(50), manager number, hiredate date, salary number(7,2), commission number(7,2), deptno number, constraint pk_employees primary key (empno), constraint fk_employees_deptno foreign key (deptno) references DEPARTMENTS (deptno) );
Deeplearning4j - Open-source, distributed deep learning for the JVM Contents Definition & Structure Invented by Geoff Hinton, Restricted Boltzmann machines are useful for dimensionality reduction, classification, regression, collaborative filtering, feature learning and topic modeling. (For more concrete examples of how neural networks like RBMs can be employed, please see our page on use cases). Given their relative simplicity, restricted Boltzmann machines are the first neural network we’ll tackle. In the paragraphs below, we describe in diagrams and plain language how they work. RBMs are shallow, two-layer neural nets that constitute the building blocks of deep-belief networks. Each circle in the graph above represents a neuron-like unit called a node, and nodes are simply where calculations take place. That is, there is no intra-layer communication – this is the restriction in a restricted Boltzmann machine. Each visible node takes a low-level feature from an item in the dataset to be learned. activation f((weight w * input x) + bias b ) = output a
Dataquest Blog - Writings about data science, from the makers of Dataquest.io R vs Python — Opinions vs Facts There are dozens articles out there that compare R vs. Python from a subjective, opinion-based perspective. Both Python and R are great options for data analysis, or any work in the data science field. But if your goal is to figure out which language is right for you, reading the opinion of someone else may not be helpful. In this article, we're going to do something different. Keep in mind, you don't need to actually understand all of this code to make a judgment here! The good news? Why You Should Trust Us Since we'll be presenting code side-by-side in this article, you don't really need to "trust" anything — you can simply look at the code and make your own judgments. For the record, though, we don't take a side in the R vs Python debate! R vs Python: Importing a CSV Let's jump right into the real-world comparison, starting with how R and Python handle importing CSVs! library(readr) ba <- read_csv("nba_2013.csv") Python Finding the number of rows dim(nba) Dep.
Location Relevance at Airbnb by Maxim Charkov, Riley Newman & Jan Overgoor Here at Airbnb, as you can probably imagine, we’re big fans of travel. We love thinking about the diversity of experiences our host community offers, and we spend a fair amount of time trying to make sense of the tens of thousands of cities where people are booking trips every night. If Apple has the iPad and iPhone, we have New York and Paris. And Kavajë, Außervillgraten, and Bli Bli. SF heatmap of listings returned without location relevance model This was a decent first step, and our community worked with it resiliently. So we set out to build a location relevance signal into our search model that would endeavor to return the best listings possible, confined to the location a searcher wants to stay. ]4 SF heatmap with distance demotion To deal with this, we tried shifting from an exponential to a sigmoid demotion curve. So we decided to let our community solve the problem for us. ]8 SF heatmap with location relevance signal
How Airbnb uses machine learning to detect host preferences At Airbnb we seek to match people who are looking for accommodation – guests — with those looking to rent out their place – hosts. Guests reach out to hosts whose listings they wish to stay in, however a match succeeds only if the host also wants to accommodate the guest. I first heard about Airbnb in 2012 from a friend. About two years later, I joined Airbnb as a Data Scientist. What started as a small research project resulted in the development of a machine learning model that learns our hosts’ preferences for accommodation requests based on their past behavior. What affects hosts’ acceptance decisions? I kicked off my research into hosts’ acceptances by checking if other hosts maximized their occupancy like my friend. A host looking to have a high occupancy will try to avoid such gaps. But do all hosts try to maximize occupancy and prefer stays with short gaps? Indeed, when I looked at listings from big and small markets separately, I found that they behaved quite differently.
Learning a Personalized Homepage by Chris Alvino and Justin Basilico As we’ve described in our previous blog posts, at Netflix we use personalization extensively and treat every situation as an opportunity to present the right content to each of our over 57 million members. The main way a member interacts with our recommendations is via the homepage, which they see when they log into Netflix on any supported device. The primary function of the homepage is to help each member easily find something to watch that they will enjoy. A problem we face is that our catalog contains many more videos than can be displayed on a single page and each member comes with their own unique set of interests. This type of problem is not unique to Netflix, it is faced by others such as news sites, search engines, and online stores. Currently, the Netflix homepage on most devices is structured with videos (movies and TV shows) organized into thematically coherent rows presented in a two-dimensional layout. Why Rows Anyway? Page-level metrics
Simulated annealing Probabilistic optimization technique and metaheuristic The problems solved by SA are currently formulated by an objective function of many variables, subject to several mathematical constraints. In practice, the constraint can be penalized as part of the objective function. Similar techniques have been independently introduced on several occasions, including Pincus (1970),[2] Khachaturyan et al (1979,[3] 1981[4]), Kirkpatrick, Gelatt and Vecchi (1983), and Cerny (1985).[5] In 1983, this approach was used by Kirkpatrick, Gelatt Jr., Vecchi,[6] for a solution of the traveling salesman problem. This notion of slow cooling implemented in the simulated annealing algorithm is interpreted as a slow decrease in the probability of accepting worse solutions as the solution space is explored. The state s of some physical systems, and the function E(s) to be minimized, is analogous to the internal energy of the system in that state. The basic iteration [edit] The neighbors of a state and is greater than