background preloader

Where can I find large datasets open to the public?

Happy to answer this but be aware, my writing abilities are quite limited. I am an essayist, an article writer of short, pithy, vignettes. That’s it. I cannot write novels (like Quorans Graeme Shimmin, Cristina Hartmann, Aman Anand, Clifford Meyer) or extremely persuasive pieces (Jon Mixon, Gary Teal, Marcus Geduld), nor can I distill massively complex issues to a single truth (Erica Friedman, Robert Frost, Alon Amit, Oliver Emberton). To name a few. But, for my writing style, this is what has helped me: 1. I write for hours, every day. 2. Consulting is the art of condensing massive amounts of information into a visual medium. 3. Original sentence: I have a tendency to make sentences overly complicated by adding more and more words until the meaning of the sentence is obfuscated under the weight of so many superfluous words. Post-edit: I’m verbose. I edit. 4. Some mild plagiarism is common in my more humorous writings. I read a lot and watch good TV. 5. 6. This was very difficult. 7.

Related:  Big Data / AnalyticsBIG DATA

Publicly Available Big Data Sets Public Data sets on Amazon AWS Amazon provides following data sets : ENSEMBL Annotated Gnome data, US Census data, UniGene, Freebase dump Data transfer is 'free' within Amazon eco system (within the same zone) AWS data sets InfoChimps InfoChimps has data marketplace with a wide variety of data sets. InfoChimps market place Comprehensive Knowledge Archive Network open source data portal platform data sets available on from Stanford network data collection Open Flights Crowd sourced flight data Flight arrival data

“A World That Counts” - The Data Revolution Report is Out This week the Independent Expert Advisory Group on the Data Revolution for Sustainable Development released its report “A World That Counts: Mobilising the Data Revolution for sustainable Development.” Congratulations to the authors for crafting such a useful document so quickly, and thank you to everyone who shared their thoughts during the consultation period. The short report is well worth reading - it highlights the opportunities and risks that new data and technologies present for the development community, and ends with calls to action in four areas: 1. Develop a global consensus on principles and standards Machine Learning Repository: Amazon Commerce reviews set Data Set Source: Dataset creator and donator: ZhiLiu, e-mail: liuzhi8673 '@', institution: National Engineering Research Center for E-Learning, Hubei Wuhan, China Data Set Information: dataset are derived from the customers’ reviews in Amazon Commerce Website for authorship identification.

3+ Alternatives to Apache Hadoop Next week the SiliconAngle team is heading to the HadoopWorld event in New York City. We’ll be broadcasting theCube live and covering all the latest developments in the Apache Hadoop ecosystem. But it’s important to remember that Hadoop isn’t the only game in town. As we ramp up our coverage of Hadoop in advance of the event, here are some other big data projects to keep in mind. Update: I just wrote about another alternative: Spark. HPCC Systems Easy Java Simulations Wiki About Easy Java/Javascript Simulations Easy Java/Javascript Simulations, also known as EjsS (and, formerly, EJS or Ejs), is a free authoring tool written in Java that helps non-programmers create interactive simulations in Java or Javascript, mainly for teaching or learning purposes. EjsS has been created by Francisco Esquembre and is part of the Open Source Physics project.

Finding Data on the Internet Skip to Content A Community Site for R – Sponsored by Revolution Analytics Home » How to » Finding Data on the Internet Data Revolution Report - Data Revolution Group The Secretary-General’s Independent Expert Advisory Group on a Data Revolution for Sustainable Development (IEAG) met the Secretary-General today to hand over their culminating report A World That Counts: Mobilising The Data Revolution for Sustainable Development. Download ‘A World That Counts’ The IEAG consists of over 20 international experts convened by the Secretary-General Ban Ki-moon to propose ways to improve data for achieving and monitoring sustainable development. The report highlights two big global challenges for the current state of data: The challenge of invisibility (gaps in what we know from data, and when we find out)The challenge of inequality (gaps between those who with and without information, and what they need to know make their own decisions)

Machine Learning - Course website Chris Thornton This course teaches the theory and practice of machine learning using a mixture of demos, lectures and labs. Instructions for lab sessions SNA & ONA Projects, Cases & Research by Orgnet, LLC We have participated in 500+ diverse consulting projects applying social network analysis [SNA] and organizational network analysis [ONA]. We have worked with large, medium, and small businesses, governments, universities, not-for-profits and their funders, and many consulting firms. Organizations, Projects, & Teams Human Capital + Social Capital = [PDF] Managing the 21st Century Organization [PDF] Networks of Adaptive/Agile Organizations [PDF] Human Relationships & Organizational Performance [PDF] Best Practice: Organizational Network Mapping [PDF] A More Accurate Way to Measure Diversity [PDF]Discovering Communities of Practice [Read...]

Films For Action Just imagine what could become possible if an entire city had seen just one of the documentaries above. Just imagine what would be possible if everyone in the country was aware of how unhealthy the mainstream media was for our future and started turning to independent sources in droves. Creating a better world really does start with an informed citizenry, and there's lots of subject matter to cover. From all the documentaries above, it's evident that our society needs a new story to belong to. The old story of empire and dominion over the earth has to be looked at in the full light of day - all of our ambient cultural stories and values that we take for granted and which remain invisible must become visible. But most of all, we need to see the promise of the alternatives - we need to be able to imagine new exciting ways that people could live, better than anything that the old paradigm could ever dream of providing.

IT Operations Analytics In the fields of information technology and systems management, IT Operations Analytics (ITOA) is an approach or method applied to application software designed to retrieve, analyze and report data for IT operations. ITOA has been described as applying big data analytics to large datasets where IT operations can extract unique business insights.[1][2] In its Hype Cycle Report, Gartner rated the business impact of ITOA as being ‘high’, meaning that its use will see businesses enjoy significantly increased revenue or cost saving opportunities.[3] By 2017, Gartner predicts that 15% of enterprises will use IT operations analytics technologies to deliver intelligence for both business execution and IT operations.[2] Definition[edit]

Alex Pentland Alex Paul "Sandy" Pentland (born 1952) is an American computer scientist, the Toshiba Professor at MIT, and serial entrepreneur. He is one of the most cited authors in computer science.[1] Biography[edit] Pentland received his B.A. from the University of Michigan and obtained his Ph.D. from MIT in 1981. He started as lecturer at Stanford University in both computer science and psychology, and joined the MIT faculty in 1986, where he became Academic Head of the Media Laboratory and received the Toshiba Chair in Media Arts and Sciences.

UCI Machine Learning Repository: Dermatology Data Set Source: Original Owners: 1. Nilsel Ilter, M.D., Ph.D., Gazi University, School of Medicine 06510 Ankara, Turkey Phone: +90 (312) 214 1080 2. H. A Formula for Data Gravity « Data Gravity Background Before creating I first blogged about Data Gravity on my personal blog in December of 2010 and several times since then. I have watched the concept of Data Gravity grow beyond anything that I ever expected. I have also watched as a startup-company decided to name itself DataGravity. As I began to speak about Data Gravity to others and answer questions, I realized that maybe it was something more than simply a novel concept describing an effect.