Wikipedia - Web Archiving, Aspects of Curation Web archiving is the process of collecting portions of the World Wide Web to ensure the information is preserved in an archive for future researchers, historians, and the public. Web archivists typically employ web crawlers for automated capture due to the massive size and amount of information on the Web. The largest web archiving organization based on a bulk crawling approach is the Internet Archive which strives to maintain an archive of the entire Web. The International Web Archiving Workshop (IWAW), begun in 2001, has provided a platform to share experiences and exchange ideas.
Big Data won't solve your company's problems The reams of data available to companies are only as useful as the people working with them. By Ethan Rouen, contributor FORTUNE -- "Oh, people can come up with statistics to prove anything. Fourteen percent of people know that." – Homer Simpson How San Francisco Used City Data to Save $1 Million on Street Cleaning The brilliant Code for America project — which connects cash-strapped city governments with cutting-edge web developers to achieve more impact with less money — has updated its blog with a story about how San Francisco used city data to save more than $1 million dollars on street cleaning. Ed Reiskin, Director of San Francisco’s Public Works department, noticed that some street cleaning trucks were returning with little or no trash on certain days or routes. This compelled Ed to ask for tonnage logs — how much trucks weigh going out vs. how much trucks weigh coming in — to determine how to optimize city cleaning. After about a month of study, Ed’s team concluded that they could find significant savings by re-routing certain routes and reducing others.
Kindling: An Introduction to Spark with Cassandra (Part 1) This is an introduction to the new (relatively) distributed compute platform Apache Spark. The focus will be on how to get up and running with Spark and Cassandra; with a small example of what can be done with Spark. I chose to make this the focus for one reason: when I was trying to learn Spark two months ago I had difficulty finding articles on how to setup Spark to use Cassandra.
What Does Big Data Mean to Infrastructure Professionals? Big data means the amount of data you’re working with today will look trivial within five years.Huge amounts of data will be kept longer and have way more value than today’s archived data.Business people will covet a new breed of alpha geeks. You will need new skills around data science, new types of programming, more math and statistics skills and data hackers…lots of data hackers.You are going to have to develop new techniques to access, secure, move, analyze, process, visualize and enhance data; in near real time.You will be minimizing data movement wherever possible by moving function to the data instead of data to function. You will be leveraging or inventing specialized capabilities to do certain types of processing- e.g. early recognition of images or content types – so you can do some processing close to the head.The cloud will become the compute and storage platform for big data which will be populated by mobile devices and social networks. via:
Racial Discrimination in Ohio: Neighborhood Denied Water Service A federal jury has found failure to provide water service to residents in a rural Ohio town violated state and federal civil rights laws. The African-American neighborhood of Coal Run was denied water service for decades. (Photo: Jacob Holdt) Live map of London Underground trains Loading... Powered by Leaflet — Map tiles © Thunderforest, data © OpenStreetMap contributors. Live London Underground map By Matthew Somerville. Writing about D3.js - a glob of nerdishness On Monday I signed a contract to write a book about D3.js for Manning Publications. If all goes according to schedule, it should be out early next year, with draft chapters available in electronic form for subscribers even sooner. Wow!
Research Publication: Sawzall Interpreting the Data: Parallel Analysis with Sawzall Rob Pike, Sean Dorward, Robert Griesemer, Sean Quinlan Abstract Very large data sets often have a flat but regular structure and span multiple disks and machines. MapQuest Launches International Bike Routing API on Open Data Online mapping and directions innovator MapQuest has been building new web services on top of data from the publicly-editable OpenStreetMap project since the company announced a new open platform initiative in August. Now MapQuest has a new addition to its family of open data–based services, bike routes: If you’re asking yourself, “what does MapQuest mean when they claim a more bike friendly route?” Well, we will route you on paths that are not vehicle accessible and also try to not let you do anything illegal, like riding on an interstate : ) On a more serious note, the following list provides some specific rules that are applied to bike routes:Avoids roads where bicycle access in OpenStreetMap is set to falseAvoids all limited access highwaysFavors bike specific paths (road segments that have bicycle access only – no auto or pedestrian)Favors walkways with no auto accessApplies various weights to roads based on the maxspeed tag (ex. favors routes where maxspeed <= 30 mph)