Full text search software

From OpenSourceGov

Jump to: navigation, search

Full text search software

Software for searching, quickly, a large amount (up to millions or hundreds of millions) of documents. Also called 'Enterprise search'. May also be used for search facilities on websites.

This software is also referred to as search engines, which can lead to confusion with services such as Google (the situation isn't made easier by the fact that Google themselves produce a full text search appliance).

The operations carried out by this software tend to divide into indexing, where text is extracted from the individual source documents and built into an index, and searching. The former step often takes some time, as the software is optimised for fast searching.

Some examples

Lucene is a search library hosted by the Apache foundation. SOLR is a search server based on Lucene. Both these are written in Java, although Lucene has also been ported to other languages.

Xapian is a search library written in C/C++ with bindings to languages including Python, PHP, Perl, Java, C# and Ruby. Flax is an enterprise search platform based on Xapian. MySociety use Xapian extensively for government-related projects, for example in TheyWorkForYou.

Resources

List of free and low-cost search engines

SearchTools.com open source engines - a little out of date