background preloader


Facebook Twitter

Lecture notes - Perception, Sensing & Instrumentation lab. CognitiveJ – Image Analysis for Java. CognitiveJ is an open source Java library that makes it easy to detect, interpret and identify faces or features contained within raw images.

CognitiveJ – Image Analysis for Java

Powered by Project Oxford, The library can suggest a persons age, gender and emotional state. Based on machine learning, the library can also attempt to interpret and describe what is contained within an image. Its being released for public preview under the Apache 2 licence and at the time of writing, the features include; Faces Facial Detection with Age and Gender Vision Image Describe – Describe visual content of an image and return real world caption to what the image contains Image Analysis – Extract key details from an image and if the image is of an adult/racy natureOCR – Detect and extract a text stream from an imageThumbnail – Create thumbnail images based on key points of interest from an image Overlay Other Features Supports local and remote imagesValidation of parametersImage Grids Getting Started.

PredictionIO Open Source Machine Learning Server. The Stanford NLP (Natural Language Processing) Group. A Suite of Core NLP Tools About | Citing | Download | Usage | SUTime | Sentiment | Adding Annotators | Caseless Models | Shift Reduce Parser | Extensions | Questions | Mailing lists | Online demo | FAQ | Release history About Stanford CoreNLP provides a set of natural language analysis tools which can take raw text input and give the base forms of words, their parts of speech, whether they are names of companies, people, etc., normalize dates, times, and numeric quantities, and mark up the structure of sentences in terms of phrases and word dependencies, indicate which noun phrases refer to the same entities, indicate sentiment, etc.

The Stanford NLP (Natural Language Processing) Group

Stanford CoreNLP is an integrated framework. OpenCV. Boilerpipe - Boilerplate Removal and Fulltext Extraction from HTML pages. This project is moving to The following information is outdated and only provided for reference.

boilerpipe - Boilerplate Removal and Fulltext Extraction from HTML pages

The boilerpipe library provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page. The library already provides specific strategies for common tasks (for example: news article extraction) and may also be easily extended for individual problem settings. Watson wannabes: 4 open source projects for machine intelligence. Over the last year, as part of the new enterprise services that IBM has been pushing om its reinvention, Watson has become less of a "Jeopardy"-winning gimmick and more of a tool.

Watson wannabes: 4 open source projects for machine intelligence

It also remains IBM's proprietary creation. Deeplearning4j. Introduction[edit] Deeplearning4j is an open source project[6] primarily developed by a machine learning group in San Francisco led by Adam Gibson.[7][8] Deeplearning4j is the only open-source project listed on Google's Word2vec page for its Java implementation.[9] Deeplearning4j has been used in a number of commercial and academic applications.


The code is hosted on GitHub[10] and a support forum is maintained on Google Groups.[11] The framework is composable, meaning shallow neural nets such as restricted Boltzmann machines, convolutional nets, autoencoders and recurrent nets can be added to one another to create deep nets of varying types. Torch (machine learning) The following exemplifies using torch via its REPL interpreter: > a = torch.randn(3,4) > =a-0.2381 -0.3401 -1.7844 -0.2615 0.1411 1.6249 0.1708 0.8299-1.0434 2.2291 1.0525 0.8465[torch.DoubleTensor of dimension 3x4] > a[1][2]-0.34010116549482 > a:narrow(1,1,2)-0.2381 -0.3401 -1.7844 -0.2615 0.1411 1.6249 0.1708 0.8299[torch.DoubleTensor of dimension 2x4] > a:index(1, torch.LongTensor{1,2})-0.2381 -0.3401 -1.7844 -0.2615 0.1411 1.6249 0.1708 0.8299[torch.DoubleTensor of dimension 2x4] > a:min()-1.7844365427828 Objects created with the torch factory can also be serialized, as long as they do not contain references to objects that cannot be serialized, such as Lua coroutines, and Lua userdata.

Torch (machine learning) - Fast Scalable Machine Learning. 9 Free Books for Learning Data Mining & Data Analysis. Data mining, data analysis, these are the two terms that very often make the impressions of being very hard to understand – complex – and that you’re required to have the highest grade education in order to understand them.

9 Free Books for Learning Data Mining & Data Analysis

I can only disagree, and as with anything in this wonderful life of ours, we only need to spend a certain amount of time learning something, practicing it, before we realize that it’s not really all that hard. It’s difficult to see what is behind a closed door, and unless we go up to that door and open it, to see what is behind it, we’re never going to know. Though, this applies to most things in life, but I can definitely feel the ‘fear’ that people have of such complex studies as data science itself. By learning from these books, you will quickly uncover the ‘secrets’ of data mining and data analysis, and hopefully be able to make better judgement of what they do, and how they can help you in your working projects, both now and in the future. LIONbook: Machine Learning + Intelligent Optimization – completed, free personal download. This book combines two usually separated topics: machine learning and intelligent optimization, and does it with enough technical details to satisfy professionals, but also with concrete examples, vivid images, and fun.

LIONbook: Machine Learning + Intelligent Optimization – completed, free personal download

Buy a low-cost paperback or ebook (Kindle), or download a free PDF. By Gregory Piatetsky, Mar 11, 2014. LIONbook, is a new, just completed book, written by the developers of LionSolver software, Roberto Battiti and Mauro Brunato. Mining of Massive Datasets. Big-data is transforming the world.

Mining of Massive Datasets

Here you will learn data mining and machine learning techniques to process large datasets and extract valuable knowledge from them. The book. A Programmer's Guide to Data Mining. 9 Free Books for Learning Data Mining and Data Analysis. Whether you are learning data science for the first time or refreshing your memory or catching up on latest trends, these free books will help you excel through self-study.

9 Free Books for Learning Data Mining and Data Analysis

By Alex Ivanovs, CodeCondo, Apr 29, 2014. Data mining, data analysis, these are the two terms that very often make the impressions of being very hard to understand – complex – and that you’re required to have the highest grade education in order to understand them. I can only disagree, and as with anything in this wonderful life of ours, we only need to spend a certain amount of time learning something, practicing it, before we realize that it’s not really all that hard. No doubt that there are very smart people in this World, working for large corporations such as Google, Apple, Microsoft and plenty more (including security agencies), but if we continue to look up to them; we will always think it’s hard, because we have never given ourselves the chance to look at real examples and facts. 7 common mistakes when doing Machine Learning. In statistical modeling, there are various algorithms to build a classifier, and each algorithm makes a different set of assumptions about the data.

For Big Data, it pays off to analyze the data upfront and then design the modeling pipeline accordingly. By Cheng-Tao Chu (@chengtao_chu) . Statistical modeling is a lot like engineering. In engineering, there are various ways to build a key-value storage, and each design makes a different set of assumptions about the usage pattern. 7 common mistakes when doing Machine Learning. Surreal number. The surreal number tree visualization. Overview[edit] The surreal numbers are constructed in stages, along with an ordering ≤ such that for any two surreal numbers a and b either a ≤ b or b ≤ a. (Both may hold, in which case a and b are equivalent and denote the same number.) CSE599 Machine Learning for Big Data Winter 2013. Big Data Solutions: Intelligent Agents Find Meaning of Text. What if your computer could find ideas in documents? Building on the idea of fingerprinting documents, ai-one helped develop ai-BrainDocs – a tool to mine large sets of documents to find ideas using intelligent agents.

This solves a big problem for knowledge workers: How to find ideas in documents that are missed by traditional keyword search tools (such as Google, Lucine, Solr, FAST, etc.). Customers Struggle with Unstructured Text Almost every organization struggles to find value in “big data” – especially ideas buried within unstructured text. Often a very limited set of vocabulary can be used to express very different ideas.

Lawyers are not the only ones that need to find ideas inside documents. Apache Spark. Apache Spark is an open-source[1] data analytics cluster computing framework originally developed in the AMPLab at UC Berkeley. Spark fits into the Hadoop open-source community, building on top of the Hadoop Distributed File System (HDFS).[2] However, Spark is not tied to the two-stage MapReduce paradigm, and promises performance up to 100 times faster than Hadoop MapReduce, for certain applications.[3] Spark provides primitives for in-memory cluster computing that allows user programs to load data into a cluster's memory and query it repeatedly, making it well suited to machine learning algorithms.[4] Spark became an Apache Top-Level Project in February 2014,[5] and was previously an Apache Incubator project since June 2013.[6] It has received code contributions from large companies that use Spark, including Yahoo!

Features[edit] Java, Scala, and Python APIs.Proven scalability to 100 nodes in the research lab[14] and 80 nodes in production at Yahoo!. Videos for Spaun simulations. Apache UIMA - Apache UIMA. Knowledge from Information by Matthias Broecheler. Another Word For It. Computational Creativity. Alexandre Bouchard-Côté. General Email: bouchard AT stat.ubc.caAssistant Professor in the Department of Statistics at UBCPath: McGill -> UCB -> UBC. AKA: Alex, Bouchard, or 卜利森. See also: how to typeset my last name.Office: ESB, Room 3124Resumé (last updated: Nov. '13) Research Interests My main field of research is in statistical machine learning. Alexandre Bouchard-Côté. ELKI. Description[edit] The university project is developed for use in teaching and research.

The source code is written with extensibility, readability and reusability in mind, but it is not extensively optimized for performance. jFuzzyLogic: Open Source Fuzzy Logic (Java) R. Kurzweil Accelerating Intelligence. Applying complex system entropy cluster algorithm to mining principle of herbal combinations in traditional Chinese medicine. 5 of the Best Free and Open Source Data Mining Software. The process of extracting patterns from data is called data mining. :julianbrowne => @home.

Factor graph. In probability theory and its applications, a factor graph is a particular type of graphical model, with applications in Bayesian inference, that enables efficient computation of marginal distributions through the sum-product algorithm. Sebastian Thrun's Homepage.


AI Challenge. Data Mining: Finding Similar Items and Users. Because we want to give kick-ass product recommendations. I'm showing you how to find related items based on a really simple formula. If you pay attention, this technique is used all over the web (like on Amazon) to personalize the user experience and increase conversion rates. To get one question out of the way: there are already many available libraries that do this, but as you'll see there are multiple ways of skinning the cat and you won't be able to pick the right one without understanding the process, at least intuitively. Defining the Problem To find similar items to a certain item, you've got to first define what it means for 2 items to be similar and this depends on the problem you're trying to solve: In each case you need a way to classify these items you're comparing, whether it is tags, or items purchased, or movies reviewed.

Redefining the Problem in Terms of Geometry We'll be using my blog as sample. Bucket - XKCD Wiki. Intelligent Autonomous Systems - Home. Lift Conference's videos. Semantic web. Creative Machines Inc. Front. 5 de los mejores software de minería de datos de Código Libre y Abierto. El proceso de extracción de patrones a partir de datos se llama minería de datos.

Es reconocida como una herramienta esencial de los negocios modernos, ya que es capaz de convertir los datos en inteligencia de negocios dando así una ventaja de información. Actualmente, es ampliamente utilizado en las prácticas de perfil, como vigilancia, comercialización, descubrimientos científicos, y detección de fraudes. Hay cuatro tipos de tareas que normalmente se involucran en la minería de datos:Clasificación – la tarea de generalizar una estructura familiar para utilizarla en los nuevos datosAgrupamiento – la tarea de encontrar grupos y estructuras en los datos que son de alguna manera u otra lo mismo, sin necesidad de utilizar las estructuras observadas en los datos.Aprendizaje de reglas de asociación – Busca relaciones entre las variables.Regresión – Su objetivo es encontrar una función que modele los datos con el menor error. LingPipe Home.

Open Source Text Analytics by Seth Grimes. Open source is a great choice for many text analytics users, especially folks who have programming skills, who need custom capabilities or who are trying to get a feel for possibilities before committing themselves. Claudio martella.


Books. Vizz. Machine learning - Kernel PCA vs. k-means - Statistical Analysis - Stack Exchange. Mondrian (software) GGobi data visualization system. Features. Interesting Data Tools. Ggplot. Zdenek Kalal. Zdenek Kalal. Gremlin, graphml and yed - Gremlin-users. NLP at Google. ► Sigur Ros - Untitled 1 (Rx Remix) / The Hype Machine. WandoraWiki. Formal ontology. YAGO2 - D5: Databases and Information Systems (Max-Planck-Institut für Informatik) AlchemyAPI - Transforming Text Into Knowledge. About - About - KiWi - Open Source development platform for building Semantic Social Media Applications. Representing scientific discourse, or: why triples are not enough. Academic reference management software for researchers.

d3.js. Jsviz - Project Hosting on Google Code. SIMILE: Practical Metadata for the Semantic Web - Ésta es una idea de Google Docs. Why SpicyNodes? Raphaël—JavaScript Library. JqPlot Charts and Graphs for jQuery. Cytoscape Web. JSDot - Graphs for the Browser (12) Protovis. Dracula Graph Library. Jquery - Graph visualization code in javascript. Visualization Tools for Huge Graphs. Hoppala – Crea tu propia aplicación de Realidad Aumentada. Procedural computer graphics blog. Speech and Language Processing (2nd Ed.): Updates. Errata - Semantic Web for the Working Ontologist. Welcome to Elefant — Elefant. Sessions: Science Fair: Strata 2011 - O'Reilly Conferences, February 01 - 03, 2011. Explain the world with maps. - UUorld. A visual exploration on mapping complex networks. VideoLectures - exchange ideas & share knowledge. Video Lectures.