background preloader

Statistics and Data Analysis

Facebook Twitter

GoogleVis Tutorial. Nanocubes: Fast Visualization of Large Spatiotemporal Datasets. Weka 3 - Data Mining with Open Source Machine Learning Software in Java. Weka is a collection of machine learning algorithms for data mining tasks. It contains tools for data preparation, classification, regression, clustering, association rules mining, and visualization.

Found only on the islands of New Zealand, the Weka is a flightless bird with an inquisitive nature. The name is pronounced like this, and the bird sounds like this. Weka is open source software issued under the GNU General Public License. We have put together several free online courses that teach machine learning and data mining using Weka. The videos for the courses are available on Youtube. Weka supports deep learning! Kaggle: Go from Big Data to Big Analytics.

Edwin Chen's Blog. Jaccard index. (If A and B are both empty, we define J(A,B)=1.) Clearly, The MinHash min-wise independent permutations locality sensitive hashing scheme may be used to efficiently compute an accurate estimate of the Jaccard similarity coefficient of pairs of sets, where each set is represented by a constant-sized signature derived from the minimum values of a hash function. The Jaccard distance, which measures dissimilarity between sample sets, is complementary to the Jaccard coefficient and is obtained by subtracting the Jaccard coefficient from 1, or, equivalently, by dividing the difference of the sizes of the union and the intersection of two sets by the size of the union: An alternate interpretation of the Jaccard distance is as the ratio of the size of the symmetric difference to the union.

This distance is a proper metric on sets of sets.[1][2] There is also a version of the Jaccard distance for measures, including probability measures. Is a measure on a measurable space , and the Jaccard distance by.

R Language