background preloader

Algorithms - Apache Mahout

Algorithms - Apache Mahout
The Apache Mahout™ project's goal is to build a scalable machine learning library. Latest release version 0.9 has User and Item based recommenders Matrix factorization based recommenders K-Means, Fuzzy K-Means clustering Latent Dirichlet Allocation Singular Value Decomposition Logistic regression classifier (Complementary) Naive Bayes classifier Random forest classifier High performance java collections A vibrant community With scalable we mean: Scalable to large data sets. Our core algorithms for clustering, classfication and collaborative filtering are implemented on top of Apache Hadoop using the map/reduce paradigm.

Related:  AI Open SourceData ScienceTools/LibrariesWP 3 NoSQL Big Data

Running Hadoop on Windows « Hayes Davis What is Hadoop? Hadoop is a an open source Apache project written in Java and designed to provide users with two things: a distributed file system (HDFS) and a method for distributed computation. It’s based on Google’s published Google File System and MapReduce concept which discuss how to build a framework capable of executing intensive computations across tons of computers. Something that might, you know, be helpful in building a giant search index. machine learning in Python — scikit-learn v0.11 documentation "We use scikit-learn to support leading-edge basic research [...]" "I think it's the most well-designed ML package I've seen so far." "scikit-learn's ease-of-use, performance and overall variety of algorithms implemented has proved invaluable [...]."

Community – Supported Platforms Cassandra is completely free to download, use and share. DataStax Community is a free packaged distribution of Apache Cassandra™ made available by DataStax. There’s no faster, easier way to get started with the latest development release of Apache Cassandra than to download, install, and use DataStax Community.

To create a super-intelligent machine, start with an equation Intelligence is a very difficult concept and, until recently, no one has succeeded in giving it a satisfactory formal definition. Most researchers have given up grappling with the notion of intelligence in full generality, and instead focus on related but more limited concepts – but I argue that mathematically defining intelligence is not only possible, but crucial to understanding and developing super-intelligent machines. From this, my research group has even successfully developed software that can learn to play Pac-Man from scratch. Let me explain – but first, we need to define "intelligence". Knowledge Navigator <^> Introduction Ontology4 provides three different levels of query languages: Predicate Language has a more simple syntax and is closer to first order predicates. PQL expressions are translated to OQL queries. OQL has a set of functions, which can be directly translated to SQL-queries for the conceptual model of Ontology4. SQL queries are executed against the Ontology4 knowledge bases.

Google's Mind-Blowing Big-Data Tool Grows Open Source Twin Silicon Valley startup MapR has launched an open source project called Drill, which seeks to mimic a shocking effectively data-analysis tool built by Google Mike Olson and John Schroeder shared a stage at a recent meeting of Silicon Valley’s celebrated Churchill Club, and they didn’t exactly see eye to eye. Olson is the CEO of a Valley startup called Cloudera, and Schroeder is the boss at MapR, a conspicuous Cloudera rival. Both outfits deal in Hadoop — a sweeping open source software platform based on data center technologies that underpinned the rise of Google’s web-dominating search engine — but in building their particular businesses, the two startups approached Hadoop from two very different directions. Whereas Cloudera worked closely with the open source Hadoop project to enhance the software code that’s freely available to the world at large, MapR decided to rebuild the platform from the ground up, and when that was done, it sold the new code as proprietary software.

About Background Open source tools have recently reached a level of maturity which makes them suitable for building large-scale real-world systems. At the same time, the field of machine learning has developed a large body of powerful learning algorithms for a wide range of applications. Inspired by similar efforts in bioinformatics (BOSC) or statistics (useR), our aim is to build a forum for open source software in machine learning.

OAuth Updated on Mon, 2013-03-11 12:22 Send secure authorized requests to the Twitter API Twitter uses OAuth to provide authorized access to its API. Features Autopoietic Computing Proposed by: on 12/30/2013 Reality augmented autopoietic social structures A self replicating machine is a machine which can make a copy of itself. Personalized Exploratory Search in the Semantic Web Michal Tvarožek Faculty of Informatics and Information Technologies Slovak University of Technology in Bratislava Ilkovièova 3, 842 16 Bratislava, Slovakia Abstract. Effective access to information on the Web, which has become vital to many users and to the whole society, is being hampered by information overload, unavailability of information, navigation issues and user diversity. We aim to facilitate the slow adoption of the Semantic Web by devising an enhanced faceted semantic browser with support for multi-paradigm exploration, personalized recommendation and adaptive view generation. We employ facet and restriction selection, ordering and annotation to address information overload and user guidance, and adaptive view generation with incremental graph visualization to enable end-user grade exploration of semantic web content.

How MySpace Tested Their Live Site with 1 Million Concurrent Users This is a guest post by Dan Bartow, VP of SOASTA, talking about how they pelted MySpace with 1 million concurrent users using 800 EC2 instances. I thought this was an interesting story because: that's a lot of users, it takes big cajones to test your live site like that, and not everything worked out quite as expected. I'd like to thank Dan for taking the time to write and share this article.

Related:  semantic technologiesHadoopFuzzieRecomender SystemsMachine LearningHadoopapache\Apache-Projects