datamining

TwitterFacebook
Get flash to fully experience Pearltrees
R

Ted Dunning is currently CTO of Deepdyve. He has previously served as Chief Scientist for Veoh Networks, ID Analytics and Musicmatch, where he researched methods for pattern discovery and analyzed symbolic sequences in language, genetic sequences, web-browsing behavior, musical preferences, purchasing behavior and financial transactions. His particular interest is very low cost algorithms for mining very large data streams, particularly those that involve text-like time embedded symbolic information. To download this program become a http://fora.tv/2009/10/14/ACM_Data_Mining_SIG_Ted_Dunning

ACM Data Mining SIG: Ted Dunning

Large-Scale Machine Learning: Parallelism and Massive Datasets

http://www.select.cs.cmu.edu/meetings/biglearn09/ Physical and economic limitations have forced computer architecture towards parallelism and away from exponential frequency scaling. Meanwhile, increased access to ubiquitous sensing and the web has resulted in an explosion in the size of machine learning tasks. In order to benefit from current and future trends in processor technology we must discover, understand, and exploit the available parallelism in machine learning.
In spite of over 40 years of shared memory parallel programming, there has been a surprising amount of confusion surrounding the basic meaning of shared variables. For example, an assignment to one structure field might interfere with a concurrent assignment to a field "too close" to it under poorly defined, and potentially hard to avoid, circumstances. This often left the basic ground rules for parallel programming fuzzy, and contributed to its perceived difficulty. C++11 finally integrates threads into the language, in part to addr

San Francisco Bay Area ACM , Archive » 2009 ACM Silicon Valley D

http://www.sfbayacm.org/?p=894
http://www.liwc.net/ What is LIWC? Linguistic Inquiry and Word Count (LIWC) is a text analysis software program designed by James W. Pennebaker, Roger J. Booth, and Martha E. Francis.

LIWC - Linguistic Inquiry and Word Count

Regressive Imagery Dictionary

Extract the content of the zip file into the WordStat Dictionary folder (by default: c:\Program files\Provalis Research\Dictionaries). Most versions of the dictionary come in two files, the main .CAT file includes the various categorization of words, while the .EXC dictionary handles exceptions by excluding specific word forms. Set the exclusion dictionary option in WordStat to the RID.EXC and the inclusion dictionary to the RID.CAT file. http://www.kovcomp.co.uk/wordstat/RID.html
http://www.americanscientist.org/issues/id.3822,y.0,no.,content.true,page.1,css.print/issue.aspx

The Britney Spears Problem » American Scientist

Back in 1999, the operators of the Lycos Internet portal began publishing a weekly list of the 50 most popular queries submitted to their Web search engine. Britney Spears—initially tagged a "teen songstress," later a "pop tart"—was No. 2 on that first weekly tabulation. She has never fallen off the list since then—440 consecutive appearances when I last checked. Other perennials include ­Pamela Anderson and Paris Hilton. What explains the enduring popularity of these celebrities, so famous for being famous? That's a fascinating question, and the answer would doubtless tell us something deep about modern culture.
Nice logo, eh? It was created by Liz Manicatide, a very nice artist friend as a commissioned work- she's a hired-gun artist. You can hire her for artistic, web, and user-interface work as well: http://crm114.sourceforge.net/

The CRM114 Discriminator - The Controllable Regex Mutilator

There has been considerable interest in random projections, an approximate algorithm for estimating distances between pairs of points in a high-dimensional vector space.

Very sparse random projections

http://dl.acm.org/citation.cfm?id=1150402.1150436&amp%3bcoll=&amp%3bdl=ACM&amp%3bCFID=15151515&amp%3bCFTOKEN=6184618
http://www.autonlab.org/tutorials/ Advertisment: In 2006 I joined Google. We are growing a Google Pittsburgh office on CMU's campus. We are hiring creative computer scientists who love programming, and Machine Learning is one the focus areas of the office. We're also currently accepting resumes for Fall 2008 intenships. If you might be interested, feel welcome to send me email: awm@google.com . The following links point to a set of tutorials on many aspects of statistical data mining, including the foundations of probability, the foundations of statistical data analysis, and most of the classic machine learning and data mining algorithms.

Statistical Data Mining Tutorials

Rajeev Motwani

Research Interests: Databases, data mining, information retrieval, and web searching. Privacy and security, particularly in the context of databases and information retrieval. Optimization and scheduling problems, particularly for applications in computer systems, compilers, and databases. http://theory.stanford.edu/~rajeev/

Statistical Data Mining Tutorial

Advertisment: In 2006 I joined Google. We are growing a Google Pittsburgh office on CMU's campus. We are hiring creative computer scientists who love programming, and Machine Learning is one the focus areas of the office.