background preloader

Machine Learning Repository

Machine Learning Repository

Related:  Big data

Datasets for Data Mining and Data Science See also Data repositories AssetMacro, historical data of Macroeconomic Indicators and Market Data. CVonline: Image Databases Index by Topic Another helpful site is the YACVID page. Action Databases Biological/Medical Face Databases Large Network Dataset Collection Social networks Networks with ground-truth communities Communication networks Citation networks Collaboration networks Web graphs

Datasets per Topic - TC-11 Description: This collection contains table structure ground truth data (rows, columns, cells etc) for document images containing tables in the UNLV and UW3 datasets. The ground truth that we provide is stored in XML format which stores row, column boundaries, bounding boxes of cells and additional attributes such as row-spanning column-spanning cells.The XML ground truth files have the same basename as the name of the corresponding image in the respective dataset. These XML files can then be used to generate color encoded ground truth images in PNG format which can be directly used by the pixel accurate benchmarking framework described in [1].

50 Resources for Getting the Most Out of Google Analytics Google Analytics is a very useful free tool for tracking site statistics. For most users, however, it never becomes more than just a pretty interface with interesting graphs. The resources below will help anyone, from the beginner to those who have been using Google Analytics for some time, learn how to get the most out of this great tool. For Beginners

Where Can I Find Large Datasets Open To The Public? Common Google Universal Analytics Mistakes that kill your Analysis & Conversions I have audited hundreds of web analytics accounts and profiles. And each account/view had at least one or two issues which seriously stood in my way of getting optimum results from my analysis. I have put all of these issues into five broad categories: Directional Issues Data Collection Issues Data Integration issues Data Interpretation Issues Data Reporting Issues Using the New Cohort Analysis in Google Analytics The cohort was the basic tactical unit of Roman Legions following the reforms of Gaius Marius in 107 BC. Initially a Roman legion consisted of ten cohorts, each consisting of 480 men. Today we use the term cohort to distinguish between groups of consumers to help us make them spend more money on things they probably don’t need. Progress? I guess I’d rather live in a world where we try and get people to spend more money on shoes, than die violently by taking a spear to my chest while fighting Carthaginians; but it’s close. And now Google Analytics has a fancy new Cohort Analysis Report that lets us analyze the death rates from the Second Punic War… Er… no… it helps us analyze the consumer/shoe thing.

Advanced Content Analysis in Google Analytics The author's posts are entirely his or her own (excluding the unlikely event of hypnosis) and may not always reflect the views of Moz. We analyze the performance of our content every day. Sometimes it's subconscious, like when we check the number of tweets we get from a new blog post. Other times, we make more conscious efforts, like reviewing performance metrics in Google Analytics. This feedback—both formal and anecdotal—informs what we do next.

Learn Big Data Analytics using Top Youtube Tutorial Videos & TED Talks Introduction There has been a lot of investment in Big Data by various companies in last few years. This rise in usage of big data analytics has resulted in high demand of skilled big data professionals. 18 New Must Read Books for Data Scientists on R and Python Introduction “It’s called reading. It’s how people install new software into their brain” Personally, I haven’t learnt as much from videos & online tutorials as much I’ve learnt from books.

Data Science Cheat Sheets – Python / R / MySQL & SQL / Spark / Hadoop & Hive / Machine Learning / Django – AITS – Data Mining Club Gear up to speed and have Data Science & Data Mining concepts and commands handy with these cheatsheets covering R, Python, Django, MySQL, SQL, Hadoop, Apache Spark and Machine learning algorithms. There are thousands of packages and hundreds of functions out there in the Data science world! An aspiring data enthusiast need not know all.