background preloader

Datamining

Facebook Twitter

MyLifeBits. Total Recall is coming out this September. This book is the culimation of our thoughts regarding MyLifebits and the larger CARPE research agenda. Stay up to date at the Total Recall blog . There are two parts to MyLifeBits: an experiment in lifetime storage, and a software research effort. The experiment: Gordon Bell has captured a lifetime's worth of articles, books, cards, CDs, letters, memos, papers, photos, pictures, presentations, home movies, videotaped lectures, and voice recordings and stored them digitally.

He is now paperless, and is beginning to capture phone calls, IM transcripts, television, and radio. The software research: Jim Gemmell and Roger Lueder have developed the MyLifeBits software, which leverages SQL server to support: hyperlinks, annotations, reports, saved queries, pivoting, clustering, and fast search. Watch our demo videos Papers Gordon Bell and Jim Gemmell, A Digital Life , Scientific American, March 2007. Presentations MyLifeBits In The News Related links. The different attitudes of computer scientists and economists. I have my own interpretation on this topic, mainly from the data mining point of view. Economists are interested in suggesting policies (i.e., suggest to people, "what to do"). Therefore, it is important to built models that assign causality. Computer scientists are rarely interested in the issue of causality.

Computer scientists control the system (the computer) and algorithms can be directed to perform one way or another. In contrast, economists cannot really control the system that they study. They do not even know how the system behaves. When a computer scientist proposes an algorithm, the main focus is to examine the performance of the algorithm under different settings of incoming data. One area that gets closer to economics in this respect is the area of data mining and machine learning. Let me give you an example: Suppose that you are trying to predict price per square feet for houses. Predictive modeling can survive (or even thrive) by exploiting such strange correlations.

Machine learning classifier gallery. Machine learning (ML) research with classifiers usually emphasizes quantitative evaluation, i.e. measuring accuracy, AUC or some other performance metric. But it's also useful to visualize what classifier algorithms do with different datasets. This is the index page of a "machine learning classifier gallery" which shows the results of numerous experiments on ML algorithms when applied to two-dimensional patterns. Each row shows a different pattern (or pattern set), described verbally then illustrated on a 2-D grid. These patterns were randomly generated (for the most part) on a 2-D grid of points, in [0:4] x [0:4] with a resolution of 0.05, yielding 6561 total points.

The points were then labeled based on where they fell in the pattern. On the right are algorithm classes (instance-space, rule + tree, etc.). Credits All figures were created with gnuplot. Contributions Construction of these web pages is entirely automated. -Tom Fawcett (tom.fawcett@gmail.com) Data Mining 101: Finding Subversives with Amazon Wishlists. Vast deposits of personal information sit in databases across the internet. Terms used in phone conversations have become the grounds for federal investigation. Reputable organizations like the Catholic Worker, Greenpeace, and the Vegan Community Project, have come under scrutiny by FBI "counterterrorism" agents. "Data mining" of all that information and communication is at the heart of the furor over the recent disclosure of government snooping. "U.S. "Some officials described the program as a large data mining operation, the Times said, and described it as much larger than the White House has acknowledged.

" Combining a data mining operation with the Patriot Act's power to access information makes it all too easy for the federal government to violate the Constitution's prohibition against unreasonable search. It used to be you had to get a warrant to monitor a person or a group of people. Amazon wishlists lets anyone bookmark books for later purchase. Field-name=edgar page=1 Keywords #!