• The underlying properties of data • The ways to represent the current status of the data • The criteria to select relevant data and attributes • The algorithms to analyze the selected data and attributes • The ways to report the conclusions of the performed data analysis. The author Philipp K. Janert takes a designer approach rather than an implementer approach. That means that you will gain important suggestions and tips to propose a plan for data analysis, instead of how to build an entire or partial information infrastructure using open source tools like Python, R, PostgreSQL and Weka. Then, for some developers the lack of full programming constructs may be disappointing.
Apache Solr Search Previously I have used the core search module and also Google Coop Custom Search Engine (CSE) on various sites. Google CSE never really met my needs because it would return results where a term was mentioned on a particular page (like in the sidebar) but not in a specific post. The core search works pretty well in my opinion but it's light on features. It's a basic work horse that gets a basic job done.
This module integrates Drupal with the Apache Solr search platform . Solr search can be used as a replacement for core content search and boasts both extra features and better performance. If you're looking for Apache Solr integration, this is possibly the best option available. Features
The search technology area is highly important to people with websites. As a result, I've spent serious time looking at it. Several things have come from this time spent: The important thing: We'll soon be adding "hosted site search" capabilities to the Acquia Network for our subscribers.
Just putting the question out to the blogosphere (love that word!) – is there any interest in a hosted Lucene or SOLR search service? It may be something that is a non-starter, given that google and atomz have wrapped up a ‘hosted web search’ market segment already, but perhaps not. Most people need to search through their web data, true, but google/atomz/etc search the content after its published. Would there be much/any benefit in being able to index/search data before it’s published to the web (perhaps with extra meta data not necessarily easily publishable)? Or perhaps searching data that is used for other, non-web-publishing activities?
Cobalt $ 20 / mo 2 managed indexes 250,000 documents Chromium $ 50 / mo 5 managed indexes 500,000 documents Platinum $ 100 / mo