return to index |
nucular project page with download links |
Nucular.Nucular
session object provides the
top level interface for an archive. It stores and retrieves
entry.Entry
objects and creates
Nucular.NucularQuery
objects. It also provides
methods for storing updates, evaluating queries using boolean query strings,
and aggregating the archive.
entry.Entry
object is a collection of field
names with values. It provides the primary unit
of storage for a Nucular archive (from a user's perspective).
Nucular.NucularQuery
object represents
a specification for a set of
entry.Entry
's to extract from
the archive. When successfully evaluated it generates
a Nucular.NucularResult object.
Nucular.NucularResult
object encapsulates the set of entry.Entry
's
which match the specification provided by a
Nucular.NucularQuery
object. It provides methods for extracting the
entry.Entry
objects
or their identities as well as other statistical information
about the result set.
entry.Entry
may contain a specialValues.SpecialValue
object, which can refer to the contents of an external URL, image, or
point to an internal index entry identity.
/usr/arw/data/Mondial/
.
All access to such an archive is mediated by a
Nucular.Nucular
object. Create one using the
following constructor:
from nucular import Nucular session = Nucular.Nucular(directory)If the session will make no changes to the archive the session will be slightly more optimized if openned readonly:
from nucular import Nucular session = Nucular.Nucular(directory, readOnly=True)Many sessions in many processes may be updating and querying an archive at the same time without interfering with eachother.
session.create()
will attempt to create
the session data structures on disk. The create
will
fail without making changes if the archive directory exists and
is not empty. The calling program must make sure the archive is
empty before calling create
.
session.store(lazy=True)This will prevent other sessions from seeing the updates until the session has been aggregated. The advantage of doing it this way is that the updates will not have any performance impact on the other sessions. The disadvantage is that the other sessions will not see the updates, of course.
To make the updates visible to all subsequent sessions immediately use
session.store(lazy=False)If too many sessions store visible updates before an aggregation access time for the archive may get slower and the session objects will consume more memory.
session.indexDictionary(identityString, dictionary)Here the
identityString
should be a string
uniquely identifying the new entry in the archive and
the dictionary should map string attribute names to archivable
values (values suitable for marshalling using the marshal
module, like strings, ints, longs, or tuples or lists of marshallables).
The insert will not be permanent until the session is stored.
from nucular import entry newEntry = entry.Entry(...) ... session.index(newEntry)
The insert will not be permanent until the session is stored.
resultObject = session.result(queryString) listOfDictionaries = session.dictionaries(queryString)The syntax and usage of boolean query strings is described in greater detail in the Boolean query document. In addition to using boolean query strings, queries may also be constructed using query objects as described below.
specialValues.SpecialValues
support these sorts of "pointer" values as an advanced
usage feature. The special values
may be used as values in dictionaries and entries inserted into
a Nucular index. Use the following session
methods
to create special values.
session.ExpandedURL(URLstring)
returns a "pointer" to
the contents of a URL which will be indexed for searching in the indices
and expanded into the full text upon
retrieval. The full text itself will not be stored in the index
directly.
session.UnExpandedURL(URLstring)
returns a "pointer" to
the contents of a URL which will be indexed for searching in the index
but not expanded into full text upon retrieval. The full text will not
be stored in the index directly.
session.UnIndexedURL(URLstring)
returns a "pointer" to a URL with is not indexed for content and not
stored in the indices.
session.ImageURL(URLstring)
returns a "pointer" to an image object which is not indexed for
content in the indices.
session.InternalLink(identityString)
returns a "pointer" to an entry in the archive by its identity.
nBrowse.py
HTTP browser code for examples of how these values may be used.
freeTextOnly
method suppresses indexing for word
searching within attributes
(fielded word searching). This reduction of functionality cuts
the build time for some archives approximately in half.
Suppress fielded word search functionality by calling:
session.freeTextOnly()
identityString
from an archive use
session.remove(identityString)
The delete will not be permanent until the session is stored.
identityString
use
theEntry = session.describe(identityString)
query = session.Query()Before evaluation the query must be specified using the query API described below.
nucularQuery
script, use
query = session.QueryFromXMLText(XMLText)Such a query may be evaluated immediately with no further initialization. Please see the
nucularQuery
script
documentation for discussion of the XML format required.
archive.aggregateRecent(verbose=False, fast=True)This operation should be performed "once in a while" for archives which are frequently updated.
If the fast
parameter is True
most of the aggregation process will be done in memory with
reduced disk accesses. This will be faster unless there
are a great many updates to aggregate (many thousands or more).
If fast
is set False
the aggregation
operation will use disk storage and should work
(possibly more slowly) for any number of updates.
As a side-effect, this operation will cause all deferred updates to become visible.
This operation may be performed with concurrent queries, updates, and deletes by other sessions in other processes.
session.moveTransientToBase(verbose=False)This operation should be performed whenever the
aggregateRecent
operation starts to get too slow
(which should be relatively infrequently). For large archives
this operation may take significant time to complete.
This operation may be performed with concurrent queries, updates, and deletes by other sessions in other processes.
session.cleanUp()
entry.Entry
objects encapsulate sets of attribute/value pairs
stored and retrieved by a Nucular archive. Many applications may
not need to interact directly with entry objects by using the interfaces
which manipulate dictionaries. The following is a brief summary
of the primary client accesses to entry objects.
from nucular import entry # create a new entry object with identity String "1234" myEntry = entry.Entry("1234") # add an attribute/value association myEntry[attribute] = value # get the identity for an entry myIdentity = myEntry.identity() # associate several values to an attribute myEntry.setValues(attribute, listOfValues) # get the value for an attribute theValue = myEntry[attribute] # get the sequence of values for an attribute listOfValues = myEntry.getValues(attribute) # get a dictionary mapping attributes to single values attributeToValue = myEntry.asDictionary() # get a dictionary mapping attributes to lists of values attributeToValueLists = myEntry.attrDict() # get an XML string representing the entry XMLString = myEntry.toXML()
# Get a query from a session. myQuery = session.Query() # Specify that attribute1 must exactly match value1. myQuery.matchAttribute(attribute1, value1) # Specify that attribute2 must start with prefix2. myQuery.prefixAttribute(attribute2, prefix2) # Specify that attribute3 must contain wordPrefix3 as a prefix to some word. # Matching is case insensitive. # This operation will only work if the index was built # with support for fielded word searches (freeTextOnly disabled). myQuery.attributeWord(attribute3, wordPrefix3) # Specify that wordPrefix4 must occurs as a prefix to some word in # some attribute. (matching is case insensitive) myQuery.anyWord(wordPrefix4) # Specify that attribute5 must be between AAALow and ZZZHigh alphabetically. myQuery.attributeRange(attribute5, AAALow, ZZZHigh) # Specify that proximateWords must occur in the order given # somewhere in each entry separated by no more than nearLimit # intervening words. myQuery.proximateWords(proximateWords, nearLimit) # Find all entries visible in the archive which match # all the specified conditions as a list of dictionaries. dictionaryList = myQuery.resultDictionaries() # Find all entries visible in the archive which match # all the specified conditions as a result object with a status indicator # [advanced]. May return result=None on failure. (result, status) = myQuery.evaluate()The
evaluate
method is useful when the result set
may be large and the client application is only interested in
fully extracting a small part of the set.
Below are some motivating examples for the descriptive methods:
# I am only interested in phone number (732)454-7633. myQuery.matchAttribute("phoneNumber", "(732)454-7633") # I only want URL's that begin with http. myQuery.prefixAttribute("URL", "http:") # I only want entries where the topic contains "bigfoot". myQuery.attributeWord("topic", "bigfoot") # I only want entries that mention "watters" somewhere myQuery.anyWord("watters") # I only want entries where the date is between 1961-12-29 and 1999-01-17 myQuery.attributeRange("date", "1961-12-29", "1999-01-17") # I only want entries containing "Charles Dickens" in that order # with no more than 3 intervening words in any field. myQuery.proximateWords("charles dickens", 3) # Alternate signature for proximateWords: myQuery.proximateWords(["charles", "dickens"], 3)
evaluate
method of a query object generates
a Nucular.NucularResult
object which may be useful
for queries that evaluate to 10000 entries, but the client application
only wants to extract the first 20 of them. In this case using
the result object to mediate the extraction may speed up the access
to the 20 entries of interest considerably (since the other 9990
entries need not be materialized). The following is a brief
summary of the possible interactions.
# get a result object and status string from a query (result, status) = myQuery.evaluate() # get the list of identity strings for the entities in the result set identityList = result.identities() # get the entity corresponding to a particular entity in the result set theEntity = result.describe(theIdentity)For example the following Python command line interaction queries the Gutenberg example archive for entries mentioning "bysshe" and extracts only the first entry completely (of 13 entries found).
>>> from nucular import Nucular >>> N = Nucular.Nucular("../testdata/gutenberg") >>> Q = N.Query() >>> Q.anyWord("bysshe") >>> (R, status) = Q.evaluate() >>> status 'complete' >>> ids = R.identities() >>> len(ids) 13 >>> firstId = ids[0] >>> theEntity = R.describe(firstId) >>> theEntity Entry({'Subtitle': [u' An Essay'], 'Title': [u'Shelley'], 'i': [u'1336'], 'Author': [u'Francis Thompson'], 'Comments': [u'[Subtitle: An Essay] (PG Note: about Percy Bysshe Shelley)'], 'link': [<fld n="?" special="UnIndexedURL">http://www.gutenberg.org/etext/1336</fld> <!-- UnIndexedURL -->]}) >>>
suggestions
method of the query object generates
suggested completions for full text and for fielded values
derived from entries that match the query. It returns
a tuple containing a list and a dictionary:
(L, D) = Q.suggestions()The list contains suggested completion strings for full text searches and the dictionary contains suggested completions broken down by attribute name (as a mapping from field name to list of suggestions). For example the following Python command line interaction requests suggestions for entries matching "bysshe" from the gutenberg example index:
>>> N = Nucular.Nucular("../testdata/gutenberg") >>> Q = N.Query() >>> Q.anyWord("bysshe") >>> (L, D) = Q.suggestions() >>> L [u'bysshe shelley'] >>> D.keys() ['Subtitle', 'Language', 'Author', 'i', 'Contains', 'Comments', 'Translator', 'Tr.', 'link', 'Editor', 'Title', 'Commentator'] >>> for x in D["Author"]: ... print x ... wtctlxxx thomas xxx pbs xenophon ptbllxxx hutchinson thompson dmntwxxx shelley bysshe sotheran >>>Here the suggested free text completion for "bysshe" is "bysshe shelley" and the following lines display words occurring in the Author fields from related entries in the archive. The suggestions interface is intended to be useful for web interfaces which support "drop down completion/suggestions".