background preloader

Data mining

Facebook Twitter

35 Free Online Books on Machine Learning. 27 Free Data Mining Books - DataOnFocus. As you know, here at DataOnFocus we love to share information, specially about data sciences and related subjects.

27 Free Data Mining Books - DataOnFocus

And what is one of the best ways to learn about a specific topic? Reading a book about it, and then practice with the fresh knowledge you acquired. And what is better than increase your knowledge by studying a high quality book about a subject you like? It’s reading it for free! So we did some work and created an epic list of absolutelly free books on data related subjects, from which you can learn a lot and become an expert. The resources provided in pdf are great well known books about data mining, machine learning, predictive analytics and big data. An Introduction to Statistical Learning: with Applications in R Overview of statistical learning based on large datasets of information. Hope you enjoy our fantastic list of free data mining and machine learning resources. If you have any comment, feel free to contact us! Dcube:dcube [Enforce Project] Discrimination Discovery Civil right laws worldwide prohibit discrimination on the basis of race, color, religion, nationality, sex, marital status, age and pregnancy in a number of settings, including: credit and insurance; sale, rental, and financing of housing; personnel selection and wages; access to public accommodations, education, nursing homes, adoptions, and health care.

dcube:dcube [Enforce Project]

With the advent of automatic decision support systems, such as credit scoring systems, the ease of data collection opens several challenges to data analysts for the fight against discrimination. Discrimination discovery in databases consists in the actual discovery of discriminatory situations and practices hidden in the historical decision records under analysis. The process of data analysis must then be supported by tools that implement legally-grounded measures and reasonings. People Papers. University of Waikato. This course follows on from Data Mining with Weka and provides a deeper account of data mining tools and techniques.

University of Waikato

Again the emphasis is on principles and practical data mining using Weka, rather than mathematical theory or advanced details of particular algorithms. Students will work with multimillion-instance datasets, classify text, experiment with clustering, association rules, neural networks, and much more. The course is currently closed. Students should have completed Data Mining with Weka, or have equivalent knowledge of the subject.

The course features: online access to chapters from Data Mining (3rd Edition) a detailed syllabusCC-BY videos & slides (see the materials site)online assessment leading to a Statement of Completion (example)English & Chinese captions on YouTube and Youku. Sandbox for Hadoop. Stopwords. OnePageR – Togaware. 24 Data Science, R, Python, Excel, and Machine Learning Cheat Sheets. R Package for Data Mining - RDataMining.com: R and Data Mining. To build an R package for data mining.

R Package for Data Mining - RDataMining.com: R and Data Mining

The package will provide various functionalities for data mining, with contributions from many R users. If you have developed or will implement any data mining algorithms in R, please participate in the RDataMining project on R-Forge to make your work available to R users worldwide. Background. 27 Free Data Mining Books - DataOnFocus.

TextAnalysis

Analytics, Data Mining, and Data Science. OCR. Tools for Exploring Text: Natural Language Processing. Natural language processing (NLP), also known as computational linguistics, is a set of models and techniques for analyzing text computationally.

Tools for Exploring Text: Natural Language Processing

In the context of the digital humanities, it can help take a question that a literary scholar or historian might ask of a body of text, and help turn it into a quantitative hypothesis. In a previous post, I talked about how visualization can be used to get a sense of text; this is the next in the series. Throughout this post, we’ll try to answer a hypothetical question a scholar in the humanities, perhaps a literary scholar or historian, might be interested in: “How is the character Mary talked about in this novel or historical text?

“ It’s fairly open ended – what does “talked about” mean? The goal of of NLP is to model the workings of natural language as we speak, read, and write it, so all the tools here are motivated by some kind of language model. N-Grams These are strings of consecutive words within a sentence. A gentle introduction to historical data analysis. It's surprisingly easy to use tools to explore texts and greatly improve research efficiency and open new research doors.

A gentle introduction to historical data analysis

The following techniques are incredibly useful for a small to intermediate amount of text. These techniques do not scale up to handle huge amounts of data, but then again most historians don't work with huge amounts of data. One example is using Voyant to explore a single text or set of texts. Welcome // About Text Analysis - Introduction to Text Analysis. Text Mining - Open Source Software. Where to start with text mining. This post is an outline of discussion topics I’m proposing for a workshop at NASSR2012 (a conference of Romanticists).

Where to start with text mining.

I’m putting it on the blog since some of the links might be useful for a broader audience. In the morning I’ll give a few examples of concrete literary results produced by text mining. I’ll start the afternoon workshop by opening two questions for discussion: first, what are the obstacles confronting a literary scholar who might want to experiment with quantitative methods?

Second, how do those methods actually work, and what are their limits? I’ll also invite participants to play around with a collection of 818 works between 1780 and 1859, using an R program I’ve provided for the occasion. I. 1. Not because bigger is better, or because “distant reading” is the new hotness. But if you want to interpret a single passage, you fortunately already have a wrinkled protein sponge that will do a better job than any computer. 2. What you see in a page image. 3. II. III. Conference Proceedings.