Big Data

Facebook Twitter

The Internet, peer-reviewed. It could be one of the most important innovations on the Internet since the browser.

The Internet, peer-reviewed

Imagine an open-source, crowd-sourced, community-moderated, distributed platform for sentence-level annotation of the Web. In other words, a way to cut through the babble and restore some sanity and trust. That’s the idea behind It will work as an overlay on top of any stable content, including news, blogs, scientific articles, books, terms of service, ballot initiatives, legislation and regulations, software code and more — without requiring participation of the underlying site. It’s based on a new draft standard for annotating digital documents currently being developed by the Open Annotation Collaboration, a consortium that includes the Internet Archive, NISO (National Information Standards Organization), O’Reilly Books, Amazon, Barnes and Noble, and a number of academic institutions. Yes, it’s been tried before, and didn’t catch on.

I just donated to their Kickstarter fund. Amara D. False beliefs persist, even after instant online corrections. It seems like a great idea: Provide instant corrections to web-surfers when they run across obviously false information on the Internet.

False beliefs persist, even after instant online corrections

But a new study suggests that this type of tool may not be a panacea for dispelling inaccurate beliefs, particularly among people who already want to believe the falsehood. “Real-time corrections do have some positive effect, but it is mostly with people who were predisposed to reject the false claim anyway,” said R. Kelly Garrett, lead author of the study and assistant professor of communication at Ohio State University. “The problem with trying to correct false information is that some people want to believe it, and simply telling them it is false won’t convince them.” For example, the rumor that President Obama was not born in the United States was widely believed during the past election season, even though it was thoroughly debunked.

But will it work? The final group was presented only with the inaccurate message during the study. Factual’s Gil Elbaz Wants to Gather the Data Universe. FACTUAL sells data to corporations and independent software developers on a sliding scale, based on how much the information is used.

Factual’s Gil Elbaz Wants to Gather the Data Universe

Small data feeds for things like prototypes are free; contracts with its biggest customers run into the millions. Sometimes, Factual trades data with other companies, building its resources. Some current uses are for adding information like restaurant locations to cellphone maps, or for planning sales campaigns. But more broadly, Factual is meant for the heart of a great business of our age: using all the cloud-based data and algorithms to find patterns in nature and society, for scientists to observe and businesses to exploit. “Data has always been seen as just a side effect in computing, something you look up while you are doing work,” Mr. A restaurant chain, for example, might use Factual to figure out whether a new location is near the competition, and how the locals have talked about the place on Yelp, the social ratings site.

Mr. MR. Urban Legends Reference Pages. 5D optical memory in glass could record the last evidence of civilization. Using nanostructured glass, scientists at the University of Southampton have, for the first time, experimentally demonstrated the recording and retrieval processes of five dimensional digital data by femtosecond laser writing.

5D optical memory in glass could record the last evidence of civilization

The storage allows unprecedented parameters including 360 TB/disc data capacity, thermal stability up to 1000°C and practically unlimited lifetime. Coined as the 'Superman' memory crystal, as the glass memory has been compared to the "memory crystals" used in the Superman films, the data is recorded via self-assembled nanostructures created in fused quartz, which is able to store vast quantities of data for over a million years. The information encoding is realised in five dimensions: the size and orientation in addition to the three dimensional position of these nanostructures. A 300 kb digital copy of a text file was successfully recorded in 5D using ultrafast laser, producing extremely short and intense pulses of light. Scientists ‘freeze’ light for an entire minute. So, as we have learned from Fantastic Voyage, our brain's thoughts are nothing more than little electrical signals traveling up and down our neurons.

Scientists ‘freeze’ light for an entire minute

So, if we could freeze that light we could freeze our thoughts. Then we could store them in a crystal for back up. I'm sure this will be very useful technology. However I'm sure that just like now people won't be bothered to back up their brain. 'My brain crashed and I didn't back up what I was thinking and working on! Million-Year Data Storage Disk Unveiled. Back in 1956, IBM introduced the world’s first commercial computer capable of storing data on a magnetic disk drive.

Million-Year Data Storage Disk Unveiled

The IBM 305 RAMAC used fifty 24-inch discs to store up to 5 MB, an impressive feat in those days. Today, however, it’s not difficult to find hard drives that can store 1 TB of data on a single 3.5-inch disk. But despite this huge increase in storage density and a similarly impressive improvement in power efficiency, one thing hasn’t changed. The lifetime over which data can be stored on magnetic discs is still about a decade. That raises an interesting problem. Today, we get an answer thanks to the work of Jeroen de Vries at the University of Twente in the Netherlands and a few pals.

These guys start with some theory about ageing. This is based on the idea that data must be stored in an energy minimum that is separated from other minima by an energy barrier. The probability that the system will jump in this way is governed by an idea known as Arrhenius law. How Quantum Computers and Machine Learning Will Revolutionize Big Data - Wired Science. When subatomic particles smash together at the Large Hadron Collider in Switzerland, they create showers of new particles whose signatures are recorded by four detectors. The LHC captures 5 trillion bits of data — more information than all of the world’s libraries combined — every second.

After the judicious application of filtering algorithms, more than 99 percent of those data are discarded, but the four experiments still produce a whopping 25 petabytes (25×1015 bytes) of data per year that must be stored and analyzed. That is a scale far beyond the computing resources of any single facility, so the LHC scientists rely on a vast computing grid of 160 data centers around the world, a distributed network that is capable of transferring as much as 10 gigabytes per second at peak performance.

The LHC’s approach to its big data problem reflects just how dramatically the nature of computing has changed over the last decade. Since Intel co-founder Gordon E. Memory and Movement. Curation.

Data Visualization / Infographics

The Mathematical Shape of Big Science Data. Simon DeDeo, a research fellow in applied mathematics and complex systems at the Santa Fe Institute, had a problem.

The Mathematical Shape of Big Science Data

He was collaborating on a new project analyzing 300 years’ worth of data from the archives of London’s Old Bailey, the central criminal court of England and Wales. Granted, there was clean data in the usual straightforward Excel spreadsheet format, including such variables as indictment, verdict, and sentence for each case. But there were also full court transcripts, containing some 10 million words recorded during just under 200,000 trials. Today’s big data is noisy, unstructured, and dynamic. “How the hell do you analyze that data?” “In physics, you typically have one kind of data and you know the system really well,” said DeDeo.

DeDeo is not the only researcher grapping with these challenges. Peter DaSilva for Quanta Magazine Gunnar Carlsson, a mathematician at Stanford University, uses topological data analysis to find structure in complex, unstructured data sets. Ayasdi. Data Scientist: The Sexiest Job of the 21st Century. Artwork: Tamar Cohen, Andrew J Buboltz, 2011, silk screen on a page from a high school yearbook, 8.5" x 12" Download a free chapter from Thomas H.

Data Scientist: The Sexiest Job of the 21st Century

Davenport's book Keeping Up with the Quants. When Jonathan Goldman arrived for work in June 2006 at LinkedIn, the business networking site, the place still felt like a start-up. The company had just under 8 million accounts, and the number was growing quickly as existing members invited their friends and colleagues to join. But users weren’t seeking out connections with the people who were already on the site at the rate executives had expected. Goldman, a PhD in physics from Stanford, was intrigued by the linking he did see going on and by the richness of the user profiles. Luckily, Reid Hoffman, LinkedIn’s cofounder and CEO at the time (now its executive chairman), had faith in the power of analytics because of his experiences at PayPal, and he had granted Goldman a high degree of autonomy. A New Breed Who Are These People? The Question to Ask Before Hiring a Data Scientist - Michael Li. By Michael Li | 10:00 AM August 6, 2014 When hiring data scientists, there’s nothing more frustrating than making the wrong hire.

The Question to Ask Before Hiring a Data Scientist - Michael Li

Data scientists are in notoriously high demand, hard to attract, and command large salaries — compounding the cost of a mistake. At The Data Incubator, we’ve talked to dozens of employers looking to hire data scientists from our training program, from large corporates like Pfizer and JPMorgan Chase to smaller tech startups like Foursquare and Upstart. Employers that didn’t have good hiring experiences in the past often failed to ask a key question: Is your data scientist producing analytics for machines or humans? This distinction is important across organizations, industries, and job titles (our fellows are being placed at jobs with titles that range from Quant to Data Scientist to Analyst to Statistician). While this isn’t the only distinction among data scientists, it’s one of the biggest when it comes to hiring. For Start-Ups, Sorting the Data Cloud Is the Next Big Thing. “My smartphone produces a huge amount of data, my car produces ridiculous amounts of really valuable data, my house is throwing off data, everything is making data,” said Erik Swan, 47, co-founder of Splunk, a San Francisco-based start-up whose software indexes vast quantities of machine-generated data into searchable links.

For Start-Ups, Sorting the Data Cloud Is the Next Big Thing

Companies search those links, as one searches Google, to analyze customer behavior in real time. Splunk is among a crop of enterprise software start-up companies that analyze big data and are establishing themselves in territory long controlled by giant business-technology vendors like Oracle and I.B.M. Founded in 2004, before the term “big data” had worked its way into the vocabulary of Silicon Valley, Splunk now has some 3,200 customers in more than 75 countries, including more than half the Fortune 100 companies. Macy’s uses Splunk’s software to observe its Web traffic in order to avoid costly down times, particularly during peak holiday shopping. How Big Data Gets Real. The business of Big Data, which involves collecting large amounts of data and then searching it for patterns and new revelations, is the result of cheap storage, abundant sensors and new software.

It has become a multibillion-dollar industry in less than a decade. Growing at speed like that, it is easy to miss how much remains to do before the industry has proven standards. Until then, lots of customers are probably wasting much of their money. There is essential work to be done training a core of people in very hard problems, like advanced statistics and software that ensures data quality and operational efficiency. Broad-based literacy in the uses of data should probably happen too, along with new kinds of management, better tools for reading the information, and privacy safeguards for corporate and personal information. That such a huge number of tasks are taking place is a good indicator that, even with the hype, Big Data is a big deal.

Why the world’s governments are interested in creating hubs for open data. Amid the tech giants and eager startups that have camped out in East London’s trendy Shoreditch neighborhood, the Open Data Institute is the rare nonprofit on the block that talks about feel-good sorts of things like “triple-bottom line” and “social and environmental value.” In fact, I first met ODI’s CEO Gavin Starks because he used to run AMEE, a startup that builds software for environmental data, and he was one of our first speakers at GigaOM’s early green conferences.

But ODI, which officially launched last October with funding from the U.K. government, is a private company and philanthropy isn’t its dominant aim. ODI helps companies, entrepreneurs and governments find value in the explosion of open data, and it seems to be starting to gain commercial success like a savvy street vendor selling hot cakes. ODI CEO Gavin Starks, in front of art in the ODI offices in Shoreditch. A vending machine that only disperses its contents when the economy is bad, in the ODI offices. The Limits of Big Data: A Review of Social Physics by Alex Pentland. In 1969, Playboy published a long, freewheeling interview with Marshall McLuhan in which the media theorist and sixties icon sketched a portrait of the future that was at once seductive and repellent. Noting the ability of digital computers to analyze data and communicate messages, he predicted that the machines eventually would be deployed to fine-tune society’s workings.

“The computer can be used to direct a network of global thermostats to pattern life in ways that will optimize human awareness,” he said. “Already, it’s technologically feasible to employ the computer to program societies in beneficial ways.” He acknowledged that such centralized control raised the specter of “brainwashing, or far worse,” but he stressed that “the programming of societies could actually be conducted quite constructively and humanistically.” The interview appeared when computers were used mainly for arcane scientific and industrial number-crunching. Big Data, Trying to Build Better Workers. Tears in rain: how Snapchat showed me the glory of data death.

"I've seen things you people wouldn't believe. Attack ships on fire off the shoulder of Orion. I watched c-beams glitter in the dark near the Tannhäuser Gate. All those moments will be lost in time, like tears in rain. Time to die. " Anyone who's seen Ridley Scott's sci-fi masterpiece Blade Runner probably knows this famous speech from its climax: the final words of Roy Batty, the ruthless but ultimately tragic leader of a band of androids rampaging across a dystopian future Los Angeles. Much has been said of that soliloquy, hauntingly and unexpectedly improvised by actor Rutger Hauer at the tail end of a long shoot.

Oddly, it’s the same dilemma that’s led me to become enamored with self-destructing media apps like Snapchat. Snapchat is where photos are born to die Snapchat is where photos are born to die. There’s nothing to "look back" on — the moment stays in the moment, then vanishes forever That's an increasingly important distinction. An artist statement on his Livejournal reads: IBM's Watson wants to fix America's doctor shortage. Words by the Millions, Sorted by Software. Down in the Data Dumps: Researchers Inventory a World of Information. How Companies Learn Your Secrets. What are you revealing online? Much more than you think. What data is being collected on you? Some shocking info. Everything We Know About What Data Brokers Know About You.

How Facebook Uses Your Data to Target Ads, Even Offline.