background preloader

Data mining

Facebook Twitter

35 Free Online Books on Machine Learning. 27 Free Data Mining Books - DataOnFocus. As you know, here at DataOnFocus we love to share information, specially about data sciences and related subjects.

27 Free Data Mining Books - DataOnFocus

Dcube:dcube [Enforce Project] Discrimination Discovery Civil right laws worldwide prohibit discrimination on the basis of race, color, religion, nationality, sex, marital status, age and pregnancy in a number of settings, including: credit and insurance; sale, rental, and financing of housing; personnel selection and wages; access to public accommodations, education, nursing homes, adoptions, and health care.

dcube:dcube [Enforce Project]

With the advent of automatic decision support systems, such as credit scoring systems, the ease of data collection opens several challenges to data analysts for the fight against discrimination. Discrimination discovery in databases consists in the actual discovery of discriminatory situations and practices hidden in the historical decision records under analysis. The process of data analysis must then be supported by tools that implement legally-grounded measures and reasonings.

People Papers. University of Waikato. This course follows on from Data Mining with Weka and provides a deeper account of data mining tools and techniques.

University of Waikato

Again the emphasis is on principles and practical data mining using Weka, rather than mathematical theory or advanced details of particular algorithms. Students will work with multimillion-instance datasets, classify text, experiment with clustering, association rules, neural networks, and much more. The course is currently closed. Students should have completed Data Mining with Weka, or have equivalent knowledge of the subject. The course features: online access to chapters from Data Mining (3rd Edition) a detailed syllabusCC-BY videos & slides (see the materials site)online assessment leading to a Statement of Completion (example)English & Chinese captions on YouTube and Youku. Sandbox for Hadoop. Stopwords. OnePageR – Togaware. A Survival Guide to Data Science with R These draft chapters weave together a collection of tools for the data scientist—tools that are all part of the R Statistical Software Suite.

OnePageR – Togaware

Each chapter is a collection of one (or more) pages that cover particular aspects of the topic. The chapters can be worked through as a hands-on guide to a specific task and then used as a reference guide. Each page aims to be a bite sized chunk for hands-on learning, building on what has gone before. Many chapters also have a lecture pack and a laboratory session where a number of tasks can be completed. The material begins with an overview of how an organisation should go about setting up their Analytics capability and then introduce the Data Scientist to R. The material here is in various stages of completeness and is always under development! Enjoy! The data used across the chapters is available for download as 24 Data Science, R, Python, Excel, and Machine Learning Cheat Sheets. R Package for Data Mining - R and Data Mining.

To build an R package for data mining.

R Package for Data Mining - R and Data Mining

The package will provide various functionalities for data mining, with contributions from many R users. If you have developed or will implement any data mining algorithms in R, please participate in the RDataMining project on R-Forge to make your work available to R users worldwide. Background. 27 Free Data Mining Books - DataOnFocus.


Analytics, Data Mining, and Data Science. OCR. Tools for Exploring Text: Natural Language Processing. Natural language processing (NLP), also known as computational linguistics, is a set of models and techniques for analyzing text computationally.

Tools for Exploring Text: Natural Language Processing

In the context of the digital humanities, it can help take a question that a literary scholar or historian might ask of a body of text, and help turn it into a quantitative hypothesis. In a previous post, I talked about how visualization can be used to get a sense of text; this is the next in the series. Throughout this post, we’ll try to answer a hypothetical question a scholar in the humanities, perhaps a literary scholar or historian, might be interested in: “How is the character Mary talked about in this novel or historical text? “ It’s fairly open ended – what does “talked about” mean? The goal of of NLP is to model the workings of natural language as we speak, read, and write it, so all the tools here are motivated by some kind of language model.

N-Grams These are strings of consecutive words within a sentence. A gentle introduction to historical data analysis. It's surprisingly easy to use tools to explore texts and greatly improve research efficiency and open new research doors.

A gentle introduction to historical data analysis

The following techniques are incredibly useful for a small to intermediate amount of text. These techniques do not scale up to handle huge amounts of data, but then again most historians don't work with huge amounts of data. One example is using Voyant to explore a single text or set of texts. Let's say I want to explore the use of poison in the 19th century. First, we need digitized source material that might tell us something.

Look for words that might be informative. Welcome // About Text Analysis - Introduction to Text Analysis. Text Mining - Open Source Software. Where to start with text mining. This post is an outline of discussion topics I’m proposing for a workshop at NASSR2012 (a conference of Romanticists).

Where to start with text mining.

I’m putting it on the blog since some of the links might be useful for a broader audience. In the morning I’ll give a few examples of concrete literary results produced by text mining. I’ll start the afternoon workshop by opening two questions for discussion: first, what are the obstacles confronting a literary scholar who might want to experiment with quantitative methods? Second, how do those methods actually work, and what are their limits? I’ll also invite participants to play around with a collection of 818 works between 1780 and 1859, using an R program I’ve provided for the occasion. I. 1.

Not because bigger is better, or because “distant reading” is the new hotness. But if you want to interpret a single passage, you fortunately already have a wrinkled protein sponge that will do a better job than any computer. 2. What you see in a page image. 3. II. III. Conference Proceedings.