NUCULAR fielded text searchable indexing: Documentation
Nucular is a system for creating full text indices for fielded data.
It can be accessed via a Python API or via a suite of command line
interfaces.
Nucular archives fielded documents and retrieves them based on field
value, field prefix, field word prefix, or full text word prefix,
word proximity or
combinations of these. Nucular also includes features for determining
values related to a query often called query facets.
Features
-
Nucular is very light weight. Updates and accesses do not require
any server process or other system support such as shared memory
locking.
-
Nucular supports concurrency. Arbitrary concurrent updates and accesses
by multiple processes or threads are
supported, with no possible locking issues.
-
Nucular supports document threading in the manner
of USENET replies. Built in semantics allows "follow ups" to
messages to match patterns that match the "original" messages.
-
Nucular indexes and retrieves data quickly.
NEWS
0.5 Adds boolean query syntax. Now if you want dogs and not
cats you can run L = session.dictionaries("dogs ~cats")
.
This provides an
advanced terse query interface.
0.4 Data format is incompatible with previous releases.
Do not attempt to open an archive created by a previous release
using the 0.4 release -- it won't work. Regenerate the archive
instead.
Table space wrapper added. The 0.4 release includes a
simple wrapper that makes Nucular easier to use for developing
data organizations similar to SQL databases (see documentation below).
This release also includes a number of bugfixes.
Nucular now supports WIN32. Current releases of Nucular
abstract the file system in order to emulate file system feature
missing on NT file systems which prevented older versions from running
correctly on Windows NT based systems.
Proximity search added: Current versions of Nucular allow
queries to search for a sequence of words near eachother separated
by no more than a specified number or other words.
Faceted suggestions: Nucular queries now support faceted suggestions
for values for fields which are related to a query.
Faster index builds: Current releases of Nucular
have completely revamped internal data structures which build
indices much faster (and query a bit faster also). For example
some large builds run more than 8 times faster than previously.
Documentation
The following HTML documents attempt to explain why and how
to use the nucular archiving system.
- The overview provides a general overview of the
major concepts of the nucular system.
- The script examples provides an introduction
to interacting with a nucular archive using command line scripts.
- The API examples provides an introduction to
interacting with a nucular archive using the Python Applications Programmer Interface
(API).
- The script summary documents the usages
for the command line scripts provided for interacting with a nucular archive.
- The API Summary summarizes the methods available
for interacting with a nucular archive from within a Python program.
- The Tablespace wrapper API document describes the
nTableSpace
wrapper functionality which allows easy development of
SQL-like relational database structures using Nucular Archives using Python code.
- The demos and examples document discusses the various test
and example programs, scripts, and CGI web interfaces provided with the package.
Change control
The canonical change control archive for nucular is at
http://aaron.oirt.rutgers.edu/cgi-bin/nucularRepo.cgi.
This archive typically may include experimental features not
available in the current sourceforge release.
Dependencies
The parts of nucular that parse XML input require the presence
of some version of the
ElementTree XML parsing module.
ElementTree is a
standard component of recent versions of Python. There are no
other components that are not standard parts of Python.
If your python installation lacks ElementTree please see the
getElementTree.sh
shell script for an example of
how to get it.
Installation
To install the package in the standard library locations
unpack the archive and change directory to the top level directory
for the package and run
python setup.py install
If that does not work (because you can't write the right directories)
you need to put the "nucular" directory somewhere on the PYTHONPATH
in some other manner.
Caveats
Updates must be combined into optimized index structures on
a periodic basis using "aggregation" operations. Failure to
aggregate too many updates will result in performance degradation.
License
Nucular may be used and copied under the BSD-style open source
license.
Aaron Watters (Oct 2007)
End of NUCULAR fielded text searchable indexing: Documentation
return to index