background preloader

Text analysis

Facebook Twitter

Orange

Twitter. Yahoo pipes and open source clones. EMML. Semantic technologies. Google+ Statistics on SocialStatistics.com. Weka 3 - Data Mining with Open Source Machine Learning Software in Java. Weka is a collection of machine learning algorithms for data mining tasks. It contains tools for data preparation, classification, regression, clustering, association rules mining, and visualization. Found only on the islands of New Zealand, the Weka is a flightless bird with an inquisitive nature. The name is pronounced like this, and the bird sounds like this. Weka is open source software issued under the GNU General Public License.

We have put together several free online courses that teach machine learning and data mining using Weka. Weka supports deep learning! About us. Viralheat is a social management platform that empowers enterprise businesses to use the social web to listen and learn about customers in order to build meaningful, deep, and relevant business connections across multiple social networks.

About us

Developed to provide a comprehensive and unified set of social marketing and management capabilities, Viralheat is the only software platform an enterprise business needs to monitor, create, publish, or analyze its social activities. Founded and developed by two Silicon Valley engineers with experience in the security and network industry, Viralheat offers the most powerful technology available in the social space, while providing the compliance and scalability to drive successful business results across marketing, sales, support or any functional organizations. Launched in 2011, Viralheat is headquartered in the Silicon Valley and serves thousands of businesses worldwide.

IN-SPIRE

The text-mining and semantic annotation architecture. Jeff's Search Engine Caffè: Java Open Source NLP and Text Mining tools. See my related post on Open-Source Search Engine Libraries. Here are some of the open source NLP and machine learning tools for text mining, information extraction, text classification, clustering, approximate string matching, language parsing and tagging, and more.

I've tried to roughly group the tools. However, the categories are quite loose and many of the tools fit into multiple categories. Machine learning and data miningWeka - is a collection of machine learning algorithms for data mining. It is one of the most popular text classification frameworks. Apache Lucene Mahout - An incubator project to created highly scalable distributed implementations of common machine learning algorithms on top of the Hadoop map-reduce framework.

NLP ToolsLingPipe - (not technically 'open-source, see below) Alias-I's Lingpipe is a suite of java tools for linguistic processing of text including entity extraction, speech tagging (pos) , clustering, classification, etc... Demos. List of Demos Each demo page contains instructions and examples of running on the web, as a command and in a GUI.

Demos

Echo Demo: Simply echoes input to output; transcodes character sets and normalizes HTML. Sentence Demo: Extract sentences from text. Part of Speech Demo: Assign parts of speech to words. Named Entity Demo: Extract entity mentions from text. Quick Start Instructions Ways to Run the Demos LingPipe's demos are available on the web, as shell commands and through a graphical user interface (GUI). Content Types: Plain, HTML, XML The demos all support XML, HTML and plain text input. About Us. Index. OpenNLP - Documentation. Mashpoint (alpha)