return to index |
nucular project page with download links |
In the examples below the working directory is the scripts
directory of the distribution and the PYTHONPATH
environment variable
is set so that from nucular.nucular import Nucular
works.
% python nucularSite.py --reset ../testdata/ScriptExamplecreates an empty archive at
../testdata/ScriptExample
.
Since --reset
is specified any existing data
in the directory will be deleted.
% python nucularLoad.py --xml ../data/docExamples0.xml ../testdata/ScriptExampleLoads some data from
../data/docExamples0.xml
into the new archive. This is the content of ../data/docExamples0.xml
<entries> <entry id="123FROG"> <fld n="descr">little green slimy things</fld> <fld n="food">tastes delicious, like chicken</fld> <fld n="name">frog</fld> </entry> <entry id="456BUNNY"> <fld n="descr">cute and cuddly</fld> <fld n="food">just delicious with garlic</fld> <fld n="name">bunny rabbit</fld> </entry> <entry id="789KITTEN"> <fld n="descr">cute and cuddly</fld> <fld n="name">kitten</fld> <fld n="note">not edible</fld> </entry> <entry id="Joe Blow"> <fld n="c">great at a grill</fld> <fld n="g">male</fld> <fld n="p">333-2222</fld> </entry> <entry id="Joe Smithers"> <fld n="c">can't cook</fld> <fld n="g">male</fld> <fld n="p">111-3333</fld> </entry> <entry id="Lola Waller"> <fld n="g">female</fld> <fld n="n">thinks snails are delicious</fld> <fld n="p">333-2222</fld> </entry> <entry id="Sally Smithers"> <fld n="c">uses too much salt</fld> <fld n="g">female</fld> <fld n="p">111-3333</fld> </entry> <entry id="Sandy Waller"> <fld n="c">delicious pizza</fld> <fld n="g">female</fld> <fld n="p">333-2222</fld> </entry> </entries>Since we didn't specify
--visible
in the load command the data is not visible when we
pose the following query:
% python nucularQuery.py --contains delicious ../testdata/ScriptExampleThe output of the query command gives
<!-- archive= ../testdata/ScriptExample <query threaded="False"> <contains p="delicious"/> </query> --> <!-- result status= complete --> <entries> </entries> <!-- 0 entries in result set -->The above output includes some verbose XML comments because the query command didn't specify
--silent
but there are no entries shown inside the top level entries
tag.
% python nucularAggregate.py --silent ../testdata/ScriptExample/After aggregation we pose the query again
% python nucularQuery.py --contains delicious ../testdata/ScriptExampleAnd the query evaluation generates the output
<!-- archive= ../testdata/ScriptExample <query threaded="False"> <contains p="delicious"/> </query> --> <!-- result status= complete --> <entries> <entry id="123FROG"> <fld n="descr">little green slimy things</fld> <fld n="food">tastes delicious, like chicken</fld> <fld n="name">frog</fld> </entry> <entry id="456BUNNY"> <fld n="descr">cute and cuddly</fld> <fld n="food">just delicious with garlic</fld> <fld n="name">bunny rabbit</fld> </entry> <entry id="Lola Waller"> <fld n="g">female</fld> <fld n="n">thinks snails are delicious</fld> <fld n="p">333-2222</fld> </entry> <entry id="Sandy Waller"> <fld n="c">delicious pizza</fld> <fld n="g">female</fld> <fld n="p">333-2222</fld> </entry> </entries> <!-- 4 entries in result set -->
% python nucularQuery.py --contains delicious --contains CUDDLY ../testdata/ScriptExampleThe query generates the following XML as output
<!-- archive= ../testdata/ScriptExample <query threaded="False"> <contains p="cuddly"/> <contains p="delicious"/> </query> --> <!-- result status= complete --> <entries> <entry id="456BUNNY"> <fld n="descr">cute and cuddly</fld> <fld n="food">just delicious with garlic</fld> <fld n="name">bunny rabbit</fld> </entry> </entries> <!-- 1 entries in result set -->
../doc/deliciousCuddlyQuery.xml
with the content
<query> <contains p="cuddly"/> <contains p="delicious"/> </query>represents the same query as the one above (looking for "cuddly" and "delicious") and the command using this XML specification
% python nucularQuery.py --xml ../doc/deliciousCuddlyQuery.xml ../testdata/ScriptExamplegenerates the same output.
python nucularDump.py --prefix ../testdata/Dump ../testdata/ScriptExample/In this case because the archive is small the only file created by the command is
../testdata/Dump0.xml
. For larger archives
the script might create additional files:
../testdata/Dump1.xml
,
../testdata/Dump3.xml
,
../testdata/Dump3.xml
, and so forth.
% python nucularLoad.py --silent --visible --xml ../data/gutenberg1.xml ../testdata/ScriptExampleIn this case since we specified
--visible
the
data becomes visible immediately to subsequent queries.
% python nucularQuery.py --contains smith ../testdata/ScriptExampleWe see entries from both the initial data set and the additional data in the output:
<!-- archive= ../testdata/ScriptExample <query threaded="False"> <contains p="smith"/> </query> --> <!-- result status= complete --> <entries> <entry id="10166"> <fld n="Author">Thomas F. A. Smith</fld> <fld n="Comments">[Subtitle: The War as Germans see it]</fld> <fld n="Subtitle"> The War as Germans see it</fld> <fld n="Title">What Germany Thinks</fld> </entry> <entry id="Joe Smithers"> <fld n="c">can't cook</fld> <fld n="g">male</fld> <fld n="p">111-3333</fld> </entry> <entry id="Sally Smithers"> <fld n="c">uses too much salt</fld> <fld n="g">female</fld> <fld n="p">111-3333</fld> </entry> </entries> <!-- 3 entries in result set -->
% python nucularLoad.py --silent --visible --xml ../data/gutenberg1.xml --delete ../testdata/ScriptExampleWhich deletes all the identities from entries in the file
../data/gutenberg1.xml
from the archive, and makes the deletes visible immediately. If we run the "smith"
query again:
% python nucularQuery.py --contains smith ../testdata/ScriptExample > ../doc/smith1.xmlThe entry from the Gutenberg data set is gone:
<!-- archive= ../testdata/ScriptExample <query threaded="False"> <contains p="smith"/> </query> --> <!-- result status= complete --> <entries> <entry id="Joe Smithers"> <fld n="c">can't cook</fld> <fld n="g">male</fld> <fld n="p">111-3333</fld> </entry> <entry id="Sally Smithers"> <fld n="c">uses too much salt</fld> <fld n="g">female</fld> <fld n="p">111-3333</fld> </entry> </entries> <!-- 2 entries in result set -->
% python nucularAggregate.py --silent --full ../testdata/ScriptExample/
% python nucularQuery.py --match g=female --prefix p:333 ../testdata/ScriptExamplegenerating the output
<!-- archive= ../testdata/ScriptExample <query threaded="False"> <match n="g" v="female"/> <prefix n="p" p="333"/> </query> --> <!-- result status= complete --> <entries> <entry id="Lola Waller"> <fld n="g">female</fld> <fld n="n">thinks snails are delicious</fld> <fld n="p">333-2222</fld> </entry> <entry id="Sandy Waller"> <fld n="c">delicious pizza</fld> <fld n="g">female</fld> <fld n="p">333-2222</fld> </entry> </entries> <!-- 2 entries in result set -->
scripts
directory also provides
a utility for building a searchable index from the text
files within a directory tree. nScrape.py
traverses a directory structure identifying text files, reading
the text files found and adding the file information to
a nucular index for later searching.
For an example run of nScrape.py
we first create
a new archive to house the scraped text indices:
% python nucularSite.py --reset ../testdata/ScrapeExampleThen we scrape the contents of the
../test
directory
into the archive:
% python nScrape.py --add --directory ../test ../testdata/ScrapeExampleAfter the scrape we may query the archive, for example to identify files containing words with the prefix "garban":
% python nucularQuery.py --contains garban ../testdata/ScrapeExampleThis query produces the XML output:
<entries> <entry id="7"> <fld n="A_path">../test/scrapeTargetFile.txt</fld> <fld n="B_type">text/plain</fld> <fld n="C"> This is the only file in the distribution which mentions garbanzo beans. </fld> </entry> </entries> <!-- 1 entries in result set -->