Factor graph In probability theory and its applications, a factor graph is a particular type of graphical model, with applications in Bayesian inference, that enables efficient computation of marginal distributions through the sum-product algorithm. One of the important success stories of factor graphs and the sum-product algorithm is the decoding of capacity-approaching error-correcting codes, such as LDPC and turbo codes. A factor graph is an example of a hypergraph, in that an arrow (i.e., a factor node) can connect more than one (normal) node. When there are no free variables, the factor graph of a function f is equivalent to the constraint graph of f, which is an instance to a constraint satisfaction problem. Definition A factor graph is a bipartite graph representing the factorization of a function. where , the corresponding factor graph consists of variable vertices , and edges . and variable vertex when . , such as the marginal distributions. Examples An example factor graph is defined as
Data Mining: Finding Similar Items and Users Because we want to give kick-ass product recommendations. I'm showing you how to find related items based on a really simple formula. If you pay attention, this technique is used all over the web (like on Amazon) to personalize the user experience and increase conversion rates. To get one question out of the way: there are already many available libraries that do this, but as you'll see there are multiple ways of skinning the cat and you won't be able to pick the right one without understanding the process, at least intuitively. Defining the Problem To find similar items to a certain item, you've got to first define what it means for 2 items to be similar and this depends on the problem you're trying to solve: In each case you need a way to classify these items you're comparing, whether it is tags, or items purchased, or movies reviewed. Redefining the Problem in Terms of Geometry We'll be using my blog as sample. ["API", "Algorithms", "Amazon", "Android", "Books", "Browser"] That's 6 tags.
Bucket - XKCD Wiki Bucket has an outer shell of metal; within the metal is a protective layer of high density plastic, in which may or may not reside pure HOH. There can only be speculation about what else the Bucket contains. Do not make our Bucket stupid or mean. Any stupiding of the Bucket will get you warned, kicked, and then banned.  Installing Download the source files from or using git, mirror the repository from here: $ wget $ wget $ wget $ wget Setup a database (MySQL recommended) - for example, on debian or ubuntu: $ sudo apt-get install mysql-server Create the tables described in bucket.sql. $ . People
Intelligent Autonomous Systems - Home 5 of the Best Free and Open Source Data Mining Software The process of extracting patterns from data is called data mining. It is recognized as an essential tool by modern business since it is able to convert data into business intelligence thus giving an informational edge. At present, it is widely used in profiling practices, like surveillance, marketing, scientific discovery, and fraud detection. There are four kinds of tasks that are normally involve in Data mining: * Classification - the task of generalizing familiar structure to employ to new data* Clustering - the task of finding groups and structures in the data that are in some way or another the same, without using noted structures in the data.* Association rule learning - Looks for relationships between variables.* Regression - Aims to find a function that models the data with the slightest error. For those of you who are looking for some data mining tools, here are five of the best open-source data mining software that you could get for free: Orange RapidMiner Weka JHepWork
5 de los mejores software de minería de datos de Código Libre y Abierto | El rincón de JMACOE El proceso de extracción de patrones a partir de datos se llama minería de datos. Es reconocida como una herramienta esencial de los negocios modernos, ya que es capaz de convertir los datos en inteligencia de negocios dando así una ventaja de información. Actualmente, es ampliamente utilizado en las prácticas de perfil, como vigilancia, comercialización, descubrimientos científicos, y detección de fraudes. Hay cuatro tipos de tareas que normalmente se involucran en la minería de datos:Clasificación – la tarea de generalizar una estructura familiar para utilizarla en los nuevos datosAgrupamiento – la tarea de encontrar grupos y estructuras en los datos que son de alguna manera u otra lo mismo, sin necesidad de utilizar las estructuras observadas en los datos.Aprendizaje de reglas de asociación – Busca relaciones entre las variables.Regresión – Su objetivo es encontrar una función que modele los datos con el menor error. Orange RapidMiner JHepWork
LingPipe Home How Can We Help You? Get the latest version: Free and Paid Licenses/DownloadsLearn how to use LingPipe: Tutorials Get expert help using LingPipe: Services Join us on Facebook What is LingPipe? LingPipe is tool kit for processing text using computational linguistics. LingPipe is used to do tasks like: Find the names of people, organizations or locations in newsAutomatically classify Twitter search results into categoriesSuggest correct spellings of queries To get a better idea of the range of possible LingPipe uses, visit our tutorials and sandbox. Architecture LingPipe's architecture is designed to be efficient, scalable, reusable, and robust. Latest Release: LingPipe 4.1.2 Intermediate Release The latest release of LingPipe is LingPipe 4.1.2, which patches some bugs and documentation. Migration from LingPipe 3 to LingPipe 4 LingPipe 4.1.2 is not backward compatible with LingPipe 3.9.3. Programs that compile in LingPipe 3.9.3 without deprecation warnings should compile and run in Lingpipe 4.1.2.
Open Source Text Analytics by Seth Grimes Open source is a great choice for many text analytics users, especially folks who have programming skills, who need custom capabilities or who are trying to get a feel for possibilities before committing themselves. Excellent options are available for all these users. Tools such as Gate, NLTK, R and RapidMiner share the low cost, power, flexibility and community that have driven adoptionof open-source software by individual users and enterprises alike. RapidMiner even combines text processing with business intelligence (BI) and visualization functions. This article will look at open source text analytics, focusing on those four tools. Be warned, however, that just as in other IT domains, open source text technologies are not for everyone. Lastly, hosted “as a service” options are very popular among new corporate users, but there are no significant, open source-based SaaS text analytics offerings available. Not Just for Programmers Gate is an ace at information extraction (IE). Conclusion
claudio martella In the past, I’ve written about Google Pregel. At the time, as it was quite obvious, there was no implementation of anything like Pregel out there of any kind, not to mention Open Source. Now things have changed, so I’d like to give a quick list of the projects out there that might help you getting started with this technology, as I see that very often people ask what the difference is between all of them. I have direct experience only with the Java implementations, so I can talk about them a bit more extensively. As you remember from my last post, Pregel is a framework for large-scale graph processing that builds on top of the BSP computational model. It allows the developer to write a vertex-centric algorithm for graph processing (meaning you write a function that receives messages from vertices and sends messages to other vertices) and forget about things as distribution and fault-tolerance. The first project to mention is Apache Hama. More in this direction you can try GoldenOrb.
machine learning - Kernel PCA vs. k-means - Statistical Analysis - Stack Exchange Mondrian (software) Mondrian is a general-purpose statistical data-visualization system. It features outstanding visualization techniques for data of almost any kind, and has its particular strength compared to other tools when working with Categorical Data, Geographical Data and LARGE Data. All plots in Mondrian are fully linked, and offer various interactions and queries. Any case selected in a plot in Mondrian is highlighted in all other plots. Currently implemented plots comprise Mosaic Plot, Scatterplots and SPLOM, Maps, Barcharts, Histograms, Missing Value Plot, Parallel Coordinates/Boxplots and Boxplots y by x. Mondrian works with data in standard tab-delimited or comma-separated ASCII files and can load data from R workspaces. Mondrian links to R and offers statistical procedures like interactive density estimation, scatterplot smoothers, multidimensional scaling (MDS) and principal component analysis (PCA). QuerySelectModifiy Theus, M. (2002).