background preloader

Search Engine

Facebook Twitter

Cogniva Research Blog » How to Set up Solr and ManifoldCF on an Ubuntu Based Computer. This blog post is intended to provide some guidance on how to set up a computer to run Apache Solr ( and Apache ManifoldCF ( Solr is a wrapper for Lucene. It provides a web UI and a variety of features such as document text extraction (via Apache Tika). ManifoldCF is a utility for scheduling jobs and providing repository connectors. We have used it to import documents from both Windows (CIFS) file share and MS SharePoint 2010 into Solr.

This guide was written while installing and configuring Solr and ManifoldCF on a VirtualBox virtual machine running Linux Mint 15 (Mate) x64 ( The development of this guide was a joint effort of Chris Salter and myself. Figure 1: Solr test Open a browser Figure 2: Solr WebUI Figure 3: ManifoldCF WebUI Connect ManifoldCF to Solr Figure 4: Solr Output Connection Add Windows File Share Support Figure 5: connectors.xml Create new List Authority Connection to Windows File Share. Intro To Search API (Part 1) - How To Create Search Pages | Web Wash. The core Search module in Drupal 7 is great for simple search pages however, the configuration options are fairly limited. If you want to change the look and feel of the search results, you could use the Display Suite Search sub-module that ships with Display Suite.

You can go one step further and create a custom search page using just Views. All you need to do is create a page display and expose the Search: Search Terms filter, and you're done. But the filter still relies on the index data that the Search module creates. If you want flexibility and control over your search pages, then you should take a serious look at the Search API module. The biggest benefit of using the module is that it supports a number of search backends like Apache Solr, Xapian and MongoDB or you can store the index directly into a database. In this tutorial, we'll create a custom search page using Views and Search API for the results. Getting Started If you use Drush, run the following command: Steps 1. 2. 3. 1. 2. Crawl and index files and directories | Open Semantic Search. Crawl and index directories and files from your filesystem.

If you use linux that means you can crawl whatever is mountable to linux like a harddisk or partitions formated with fat, ext3, ext4 or a fileserver connected via ntfs, shares like smb or even sshfs or sftp on servers) into Apache Solr. Integrates automatic text recognition (OCR) for images and photos (i.e. as files like PNG, JPG, GIF ...) or inside PDFs (i.e.scanned Documents) using Tesseract-OCR. Usage Index a file Using the web admin interface: Open the page FilesEnter filename to the formPress button "crawl" Using the commandline: solr-index-file filename Using the REST-API: Index directrories Open the page FilesEnter directory name to the formPress button "crawl" Using the commandline: solr-index-dir directoryname Using the REST-API: Config Config file for indexing files: /etc/solr/solr-connector-files OCR (text recognition in graphical formats) Enable OCR OCR language.

Connecting Drupal to Solr Server. Happily, Solr also plays nicely with Drupal. So my colleagues want to connect their Drupal application to the installed Solr server (see my last blog entry). Drupal provide two modules for Solr server integration. The modules links external Solr server with the Drupal application, passing data into Solr to index, and then enabling Drupal to serve up the search results. As of the writing of this how-to two modules have an advantage and a disadvantage. The first is Search API Solr search with the advantage you can use the Solr Index straight in views. The disadvantage is you can not index files remotely on the Solr server Okay however which module we choose, the installation of the configuration files are the same. Next I have to copy the module configuration files from /solr-conf/solr-4.x/* to /opt/apache-solr-4.3.1/my-app/solr/collection1/conf/.

Apache Solr Attachments The Apache Solr attachments module lets you extract documents using Solr server. Uploading Data with Solr Cell using Apache Tika - Apache Solr Reference Guide. Solr uses code from the Apache Tika project to provide a framework for incorporating many different file-format parsers such as Apache PDFBox and Apache POI into Solr itself. Working with this framework, Solr's ExtractingRequestHandler can use Tika to support uploading binary files, including files in popular formats such as Word and PDF, for data extraction and indexing. When this framework was under development, it was called the Solr Content Extraction Library or CEL; from that abbreviation came this framework's name: Solr Cell. If you want to supply your own ContentHandler for Solr to use, you can extend the ExtractingRequestHandler and override the createFactory() method. This factory is responsible for constructing the SolrContentHandler that interacts with Tika, and allows literals to override Tika-parsed values.

Set the parameter literalsOverride, which normally defaults to *true, to *false to append Tika-parsed values to literal values. Topics covered in this section: Key Concepts. Solr Reference Guide - Apache Solr Reference Guide. Install Solr on Tomcat. Beginning with Solr 5.0, Solr is no longer distributed as a "war" (Web Application Archive) suitable for deployment in any Servlet Container. Solr is now distributed as a stand alone java server application, including start and stop scripts for use on Unix and MS-Windows platforms, as well as an installation script for setting up a "production" installation of Solr on *nix platforms managed via /etc/init.d.

See Solr has been tested on Tomcat 5.5, 6, and 7. In Tomcat 7 there was a bug with resolving URLs ending in "/". See the instructions in the generic Solr installation page for general info before consulting this page. Simple Example Install Solr4.3 requires completely different deployment. Though this page needs to be completely re-written for the latest Solr version, here are the main differences with Solr 4.3 (at least for running a single instance).

Installing Tomcat 6 Apache Tomcat is a web application server for Java servlets. Building Solr.

Carrot