NUCULAR fielded text searchable indexing: Documentation

nucular project page with download links

NUCULAR fielded text searchable indexing: Documentation

Nucular is a system for creating full text indices for fielded data. It can be accessed via a Python API or via a suite of command line interfaces.

Nucular archives fielded documents and retrieves them based on field value, field prefix, field word prefix, or full text word prefix, word proximity or combinations of these. Nucular also includes features for determining values related to a query often called query facets.

Features

Nucular is very light weight. Updates and accesses do not require any server process or other system support such as shared memory locking.
Nucular supports concurrency. Arbitrary concurrent updates and accesses by multiple processes or threads are supported, with no possible locking issues.
Nucular supports document threading in the manner of USENET replies. Built in semantics allows "follow ups" to messages to match patterns that match the "original" messages.
Nucular indexes and retrieves data quickly.

NEWS

0.5 Adds boolean query syntax. Now if you want dogs and not cats you can run L = session.dictionaries("dogs ~cats"). This provides an advanced terse query interface.

0.4 Data format is incompatible with previous releases. Do not attempt to open an archive created by a previous release using the 0.4 release -- it won't work. Regenerate the archive instead.

Table space wrapper added. The 0.4 release includes a simple wrapper that makes Nucular easier to use for developing data organizations similar to SQL databases (see documentation below). This release also includes a number of bugfixes.

Nucular now supports WIN32. Current releases of Nucular abstract the file system in order to emulate file system feature missing on NT file systems which prevented older versions from running correctly on Windows NT based systems.

Proximity search added: Current versions of Nucular allow queries to search for a sequence of words near eachother separated by no more than a specified number or other words.

Faceted suggestions: Nucular queries now support faceted suggestions for values for fields which are related to a query.

Faster index builds: Current releases of Nucular have completely revamped internal data structures which build indices much faster (and query a bit faster also). For example some large builds run more than 8 times faster than previously.

Documentation

The following HTML documents attempt to explain why and how to use the nucular archiving system.

The overview provides a general overview of the major concepts of the nucular system.
The script examples provides an introduction to interacting with a nucular archive using command line scripts.
The API examples provides an introduction to interacting with a nucular archive using the Python Applications Programmer Interface (API).
The script summary documents the usages for the command line scripts provided for interacting with a nucular archive.
The API Summary summarizes the methods available for interacting with a nucular archive from within a Python program.
The Tablespace wrapper API document describes the nTableSpace wrapper functionality which allows easy development of SQL-like relational database structures using Nucular Archives using Python code.
The demos and examples document discusses the various test and example programs, scripts, and CGI web interfaces provided with the package.

Change control

The canonical change control archive for nucular is at http://aaron.oirt.rutgers.edu/cgi-bin/nucularRepo.cgi. This archive typically may include experimental features not available in the current sourceforge release.

Dependencies

The parts of nucular that parse XML input require the presence of some version of the ElementTree XML parsing module. ElementTree is a standard component of recent versions of Python. There are no other components that are not standard parts of Python. If your python installation lacks ElementTree please see the getElementTree.sh shell script for an example of how to get it.

Installation

To install the package in the standard library locations unpack the archive and change directory to the top level directory for the package and run

python setup.py install

If that does not work (because you can't write the right directories) you need to put the "nucular" directory somewhere on the PYTHONPATH in some other manner.

Caveats

Updates must be combined into optimized index structures on a periodic basis using "aggregation" operations. Failure to aggregate too many updates will result in performance degradation.

License

Nucular may be used and copied under the BSD-style open source license.

Aaron Watters (Oct 2007)

End of NUCULAR fielded text searchable indexing: Documentation return to index