background preloader

DataScientistProfession

Facebook Twitter

Can Kaggle make data science a spectator sport? — Data | GigaOM. The Data Scientist Will Be Replaced By Tools. Why The Search For The Mystical Data Scientist Should Not Be A Feat Of Magic. The data scientist is a mystical spirit. A wizard, whose skills are fired in the deep unknown of a developer’s lair. Their secrets are worth the gold of a million empires.They possess the keys to eternity.They have pet dragons. Not! It’s time to take away the staff and stop thinking of data scientists as lord wizards of middle earth lore. The reality is something much different as business intelligence provider SiSense found in a study it recently commissioned about the state of the data professional market.

It’s important to consider the source with commissioned surveys. Obviously, SiSense has an interest in providing business intelligence solutions to companies. It’s true that data scientists are highly sought after. It’s also not like we have an exact definition of a data scientist. For data professionals, the SiSense survey found that salaries start at $55,000. According to the survey, data scientists earn on average about $90,000 per year. Charlotte Prepares Students To Meet Demand For Data Scientists. What Everyone Needs to Learn from the Data Journalism Handbook. It's hard to pay attention to the business of journalism without hearing about data journalism or data-driven journalism. But despite all the discussion of the topic, there's precious little documentation to guide practicing and future journalists in becoming proficient in it.

The Data Journalism Handbook aims to fix that, albeit at a high level. The Data Journalism Handbook effort started at a workshop at the London MozFest 2011 last November. From there, the handbook represents the work of "an international, collaborative effort involving dozens of data journalism's leading advocates and best practitioners. " This includes folks from ProPublica, The Washington Post, the BBC, The New York Times and many others. The result, so far, is an online book that's just now in beta. Inside the Handbook The handbook offers a glimpse into the practice of data journalism, with some guidance on how to get started.

Most importantly, the book offers a resounding case for data-driven journalism. IBM VP Anjul Bhambhri on the Era of the Data Scientist. Just a few short years ago, the problem of database size scaling to colossal capacities that exceeded the scope of entire network storage units, seemed insurmountable. Today, it's practically under control, with a wealth of open source technology emerging not from database engineers but rather from Internet architects. Hadoop has transformed the very nature of transformation, becoming one of the most readily adopted technologies in the history of the data center. But is it mature? And will businesses have access to the right people with the skill sets necessary to master this new aspect of information management? After having spent five years as a senior engineer at Sybase, another six years as a development director at Informix, and over three years managing DB2 development for IBM, Anjul Bhambhri is arguably one of the most skilled plain data architects in the business.

In September 2010, IBM promoted her to the new post of Vice President for Big Data and Streams. Do you need a data scientist? How can big data and smart analytics tools ignite growth for your company? Find out at DataBeat, May 19-20 in San Francisco, from top data scientists, analysts, investors, and entrepreneurs. Register now and save $200! Some of the world’s biggest tech companies from Google to Facebook are data-driven, but few startup founders have any idea what a data scientist does, never mind whether they should hire one.

Here is VentureBeat’s guide to data science for startups. What does a data scientist do? DJ Patil led LinkedIn’s data science team and is now the Data Scientist in residence at Greylock Partners. For startups, the most relevant applications of data science are probably decision science and product and marketing analytics. Product analytics covers anything from how users are reacting to new features to developing standalone data products. Using data to showcase or market a product is the domain of marketing analytics. Who are the data scientists? What is data infrastructure? EMC Greenplum's Steven Hillion on What Is a Data Scientist? Amazon's John Rauser on "What Is a Data Scientist?" Does Science Need More Compelling Stories to Foster Public Trust? Image courtesy of iStockphoto/SchulteProductions The touching stories that advocacy groups are so good at telling—the 49-year old mother whose breast cancer was detected by an early mammogram before it had spread; the 60-year-old neighbor who had a prostate tumor removed thanks to a routine PSA test—should inspire scientists to use anecdotes of their own, argue two doctors from the University of Pennsylvania.

In the scientific realm, anecdotal evidence—the individual patient, the single result—tends to be shunned in favor of large, dense data sets and impersonal statistical analyses. Although that foundation must remain the core of solid research, examples and narratives should be invoked to round out the explanation of what the hard science says, Zachary Meisel and Jason Karlawish, both of the Perelman School of Medicine at Penn, contended in an essay published online Tuesday in JAMA, Journal of the American Medical Association. And the dry old scientific data supports this notion. Growing Your Own Data Scientists | CITO Research. CIOs and CTOs must learn to address a challenge, involving the divide between the people who know about the vast amount of new sources of data emanating from machines and other devices (“big data”) and the questions in the enterprise whose answers can be monetized.

One group of people knows about the technology for analyzing data (they’re usually in IT). The other group understands the pernicious questions that would lead to an answer that is worth money to an organization (they’re usually on the business side). The role of the data scientist is a hybrid role that can solve this problem. While the definition of the role is compelling, it’s a lot easier to define the role than it is to hire someone to fill it, and even when you do, communication problems may persist. See these articles on Forbes.com for definitions of a data scientist from leading experts in the field: This problem statement addresses the challenge of “growing your own” data scientist. Context and Background. LinkedIn's Daniel Tunkelang On "What Is a Data Scientist?" How to be a data journalist. Data journalism is huge. I don't mean 'huge' as in fashionable - although it has become that in recent months - but 'huge' as in 'incomprehensibly enormous'.

It represents the convergence of a number of fields which are significant in their own right - from investigative research and statistics to design and programming. The idea of combining those skills to tell important stories is powerful - but also intimidating. Who can do all that? The reality is that almost no one is doing all of that, but there are enough different parts of the puzzle for people to easily get involved in, and go from there. 1. 'Finding data' can involve anything from having expert knowledge and contacts to being able to use computer assisted reporting skills or, for some, specific technical skills such as MySQL or Python to gather the data for you. 2. 3. 4. Tools such as ManyEyes for visualisation, and Yahoo! How to begin?

So where does a budding data journalist start? Play around. And you know what? Big Data Technology Evaluation Checklist. Anyone who’s been following the rapid-fire technology developments in the world that is becoming known as “big data” sees a new capability, product, or company founded literally every week. The ambition of all of these players, established and newcomer, is tremendous, because the potential value to business is enormous. Each new arrival is aimed at addressing the pain that enterprises are experiencing around unrelenting growth in the velocity, volume, and variety of the data their operations generate.

What’s being lost, however, in some of this frothy marketing activity, is that it’s still early for big data technologies. There are vexing problems slowing the growth and the practical implementation of big data technologies. For the technologies to succeed at scale, there are several fundamental capabilities they should contain, including stream processing, parallelization, indexing, data evaluation environments and visualization. Some general questions to begin the evaluation process: