return to index
nucular project page with download links

NUCULAR fielded text searchable indexing: Documentation

Nucular is a system for creating full text indices for fielded data. It can be accessed via a Python API or via a suite of command line interfaces.

Nucular archives fielded documents and retrieves them based on field value, field prefix, field word prefix, or full text word prefix, word proximity or combinations of these. Nucular also includes features for determining values related to a query often called query facets.

Features

NEWS

0.5 Adds boolean query syntax. Now if you want dogs and not cats you can run L = session.dictionaries("dogs ~cats"). This provides an advanced terse query interface.

0.4 Data format is incompatible with previous releases. Do not attempt to open an archive created by a previous release using the 0.4 release -- it won't work. Regenerate the archive instead.

Table space wrapper added. The 0.4 release includes a simple wrapper that makes Nucular easier to use for developing data organizations similar to SQL databases (see documentation below). This release also includes a number of bugfixes.

Nucular now supports WIN32. Current releases of Nucular abstract the file system in order to emulate file system feature missing on NT file systems which prevented older versions from running correctly on Windows NT based systems.

Proximity search added: Current versions of Nucular allow queries to search for a sequence of words near eachother separated by no more than a specified number or other words.

Faceted suggestions: Nucular queries now support faceted suggestions for values for fields which are related to a query.

Faster index builds: Current releases of Nucular have completely revamped internal data structures which build indices much faster (and query a bit faster also). For example some large builds run more than 8 times faster than previously.

Documentation

The following HTML documents attempt to explain why and how to use the nucular archiving system.

Change control

The canonical change control archive for nucular is at http://aaron.oirt.rutgers.edu/cgi-bin/nucularRepo.cgi. This archive typically may include experimental features not available in the current sourceforge release.

Dependencies

The parts of nucular that parse XML input require the presence of some version of the ElementTree XML parsing module. ElementTree is a standard component of recent versions of Python. There are no other components that are not standard parts of Python. If your python installation lacks ElementTree please see the getElementTree.sh shell script for an example of how to get it.

Installation

To install the package in the standard library locations unpack the archive and change directory to the top level directory for the package and run
python setup.py install
If that does not work (because you can't write the right directories) you need to put the "nucular" directory somewhere on the PYTHONPATH in some other manner.

Caveats

Updates must be combined into optimized index structures on a periodic basis using "aggregation" operations. Failure to aggregate too many updates will result in performance degradation.

License

Nucular may be used and copied under the BSD-style open source license.
Aaron Watters (Oct 2007)



End of NUCULAR fielded text searchable indexing: Documentation
return to index