return to index |
nucular project page with download links |
nTableSpace
wrapper makes it easy to store data organized into
tables of rows identified by primary keys in the manner of SQL databases.
This document briefly describes
the nTableSpace
module and its usage.
This discussion will be most easily understood by readers with some previous background in SQL relational database technology and the Python programming language.
commit
frequently guarantees transaction serializability
whereas the tablespace.store
operation does not guarantee
serializability with respect to concurrent accesses to the same archive.
Tablespace Operation | Similar SQL construct |
---|---|
Connect to a table space | |
# Connect to a new archive and new table space ts = nTableSpace.newTableSpace("/tmp/myarchive", "db1") # OR connect to an existing archive and # possibly new table space ts = nTableSpace.getTableSpace(archive, "db1") |
CREATE DATABASE DB1; |
Define a table | |
# "L" is a unique internal abbreviation for the table ts.defTable("Sells", "L", "SNO", "PNO") |
CREATE TABLE SELLS ( ... PRIMARY KEY (SNO, PNO) ); |
Insert a row into a table | |
ts.insert("Sells", sno=1, pno=1, misc="always late") |
INSERT INTO SELLS(SNO, PNO, MISC) VALUES (1, 1, 'always late'); |
Retrieve rows matching a simple query | |
dictionaryList = ts.match("Supplier", sname='bongo') |
SELECT * FROM SUPPLIER WHERE SNAME='bongo'; |
Delete rows matching a simple query. | |
ts.deleteByValues("Supplier", sname='bongo') |
DELETE FROM SUPPLIER WHERE SNAME='bongo'; |
Delete rows with matching primary key | |
# optimized delete: sno is the primary key ts.delete("Supplier", sno=31) |
DELETE FROM SUPPLIER WHERE SNO=31; |
Emulating table joins | |
# joins must be emulated using python code... def partsForSupplier(sname): for sdict in ts.match("Supplier", sname=sname): snum = sdict["sno"] for spdict in ts.match("Sells", sno=snum): pnum = spdict["pno"] for pdict in ts.match("Part", pno=pnum): yield pdict["pname"] pblake = list(partsForSupplier("blake")) |
SELECT P.PNAME FROM (SUPPLIER AS S INNER JOIN SELLS AS L ON S.SNO=L.SNO) INNER JOIN PARTS AS P ON P.PNO=L.PNO WHERE S.SNAME='blake'; |
Constructing complex queries | |
# this comparison is very loosely analogous boiledSelect = ts.selectByValues("Supplier") boiledSelect.getQuery().anyWord("boil") boiledSelect() print "boiled suppliers" print bb |
SELECT * FROM SUPPLIER WHERE SNAME LIKE "%boil%" OR CITY LIKE "%boil%" OR ...and so forth for all string columns... |
Update rows matching a complex query | |
ts.selectByValues("Supplier", sname="blake").updateSet(city="chico") |
UPDATE SUPPLIER SET CITY='chico' WHERE SNAME='blake'; |
Delete rows using a complex query | |
ts.selectByValues("Supplier", sname="blake").delete() |
DELETE FROM SUPPLIER WHERE SNAME='blake'; |
Drop a table | |
ts.dropTable("Sells") |
DROP TABLE SELLS; |
Erase a table space | |
ts.dropAll() |
DROP DATABASE DB1; |
Store modifications | |
ts.store() |
COMMIT; |
Optimize the archive | |
# aggregate and optimize updates. ts.cleanUp(full=True) |
-- no standard SQL analogue. |
The remainder of this document discusses these operations in greater detail.
ts = nTableSpace.newTableSpace(archive, prefix)where archive is a directory path for the archive and prefix is a unique prefix to use for this table space within the archive. Bear in mind the following cautions
/tmp/supplier_parts
and creates a table space with prefix SP
in the
archive using the nTableSpace.newTableSpace
function.
from nucular import nTableSpace archive = "/tmp/supplier_parts" prefix = "SP" ts = nTableSpace.newTableSpace(archive, prefix) ts.defTable("Supplier", "S", "SNO") ts.insert("Supplier", sno=1, sname="smith", city="london", misc="a very unpleasant person") print ts.dumpString() ts.store()
ts.defTable("Supplier", "S", "SNO")
operation defines a table
named Supplier
to be managed by the table space ts
with
primary key attributes SNO
. The calling sequence for defTable
is:
tablespace.defTable(tableName, tableAbbrev, *keyAttributes):where
ts.insert("Supplier", sno=1, sname="smith", city="london", misc="a very unpleasant person")Adds a row to the Supplier table, defining the primary key and a number of other descriptive attributes. Each insert must define the primary key but there are no restrictions on the number or names for other attributes in a table row.
print ts.dumpString()
produces the dump output
TableSpace dump ts prefix='SP' table: 'Supplier' prefix='S' keylist=['SNO'] 'SPS1': 'city' : 'london' 'i' : 'SPS1' 'misc' : 'a very unpleasant person' 'sname' : 'smith' 'sno' : 1 'tablename' : 'Supplier'which lists the table space content -- showing one table with one row.
ts.store()If a program modifies a table space and fails to call the store method the modifications will be discarded.
getTableSpace
function. The operation
ts = nTableSpace.getTableSpace(archive, prefix)
will use an existing table space in the archive which uses the prefix,
or it will create one if there is no such table space in the archive.
The following interaction reconnects to the table space created above and adds two more tables as well as many more rows to the table space.
archive = "/tmp/supplier_parts" prefix = "SP" ts = nTableSpace.getTableSpace(archive, prefix) ts.insert("Supplier", sno=2, sname="jones", city="paris", misc="likes bunnies boiled broiled or in a stew") ts.insert("Supplier", sno=3, sname="adams", city="vienna", misc="owns and sells bunnies as a hobby") ts.insert("Supplier", sno=31, sname="adams", city="vienna", misc="this is a typo") ts.insert("Supplier", sno=4, sname="blake", city="rome", misc="has no discernible personality") ts.insert("Supplier", sno=5, sname="bongo", city="mahwah", misc="erroneous") ts.defTable("Sells", "L", "SNO", "PNO") ts.insert("Sells", sno=1, pno=1, misc="always late") ts.insert("Sells", sno=1, pno=2) ts.insert("Sells", sno=2, pno=4) ts.insert("Sells", sno=3, pno=1) ts.insert("Sells", sno=3, pno=3) ts.insert("Sells", sno=3, pno=2) ts.insert("Sells", sno=4, pno=3) ts.insert("Sells", sno=4, pno=4) ts.defTable("Part", "P", "PNO") ts.insert("Part", PNO=1, PNAME="Screw", PRICE=10) ts.insert("Part", PNO=2, PNAME="Nut", PRICE=8) ts.insert("Part", PNO=3, PNAME="Bolt", PRICE=15) ts.insert("Part", PNO=4, PNAME="Cam", PRICE=25)
tablespace.match
method. The following example extracts
all entries from the Supplier table where the sname is "bongo".
test = ts.match("Supplier", sname='bongo') print "bongo match", testThe result generated by the method is a list of dictionaries representing the matching rows. The print statement displays this list as follows:
bongo match [{'city': 'mahwah', 'sname': 'bongo', 'i': 'SPS5', 'sno': 5, 'misc': 'erroneous', 'tablename': 'Supplier'}]The calling sequence for
match
allows the specification of values
for any number of attributes as follows
ts.match(tableName, **argsDict)Here the
argsDict
maps attribute names to
the desired values.
Note that the attributes and values must match on exact equality.
See the selectByValues
method for a more
general query mechanism which permits matching by ranges of values,
prefixes of strings, or contained word prefixes.
deleteByValues
removes rows with attributes
that match specified values from a table.
ts.deleteByValues("Supplier", sname='bongo') test = ts.match("Supplier", sname='bongo') print "bongo match after delete", testIn this case the print statement reports
bongo match after delete []The calling sequence for
deleteByValues
is similar to the match
calling sequence
ts.deleteByValues(tableName, **argsDict)Note that the important special case
ts.deleteByValues("Supplier")
deletes all rows from the supplier table.
The deleteByValues
method only supports attribute equality
matches.
The selectByValues
query mechanism described below
provides a more general method for deleting rows based on other predicates.
Deleting a row by primary key
In the case where the primary key for the table is known
use the tablespace.delete
method to delete
rows faster (by a constant factor) than using selectByValue
ts.delete("Supplier", sno=31)
Traversing multiple tables (without joins)
The nTableSpace
module is a thin wrapper over the
underlying Nucular archive infrastructure, and as such provides
no methods for automatically compiling complex join operations,
developing query plans, and so forth. However by combining
the single table query operations offered by the tablespace
abstraction programs can emulate join operations, for example,
as follows.
def partsForSupplier(sname):
for sdict in ts.match("Supplier", sname=sname):
snum = sdict["sno"]
for spdict in ts.match("Sells", sno=snum):
pnum = spdict["pno"]
for pdict in ts.match("Part", pno=pnum):
yield pdict["pname"]
pblake = list(partsForSupplier("blake"))
print "parts for blake", pblake
More complex single table queries with full text searching
As mentioned above the selectByValues
method
allows the creation of more complex queries than the methods described
above. In particular the method call
sel = ts.selectByValues(Tablename, **attributeValues)
Yields an unevaluated selection object sel
which can be
refined by adding additional constraints.
For example, the boiledSelect
query below
selects all entries in the Supplier
table
which contain the word prefix boil
in any attribute.
boiledSelect = ts.selectByValues("Supplier")
q = boiledSelect.getQuery()
q.anyWord("boil")
bb = boiledSelect()
print "boiled suppliers", bb
The method call boiledSelect.getQuery()
extracts
the Nucular query object associated with the table space selection and
the method call q.anyWord("boil")
restricts the query
to match only entries with the word prefix boil
in free text
(i.e., in any value in the entry). In addition to anyWord
, Nucular
queries support other restrictions as described in the
Nucular API Summary Document. A selection
can be restricted by any of those methods in a similar manner.
The call boiledSelect()
evaluates the selection and the resulting
list of dictionaries value prints as:
boiled suppliers
[{'city': 'paris', 'sname': 'jones', 'i': 'SPS2', 'sno': 2,
'misc': 'likes bunnies boiled broiled or in a stew', 'tablename': 'Supplier'}]
Updating or deleting rows matching a complex query
A program can also use selection objects created by the
ts.selectByValues
method to update or delete
matching rows. For example the following interaction sets the
city
to
chico for all rows matching
the boiledSelect
selection (all rows in the
Supplier
table containing the word prefix boil
anywhere).
boiledSelect.updateSet(city="chico") print "updated boiled suppliers" print boiledSelect()The printed selection evaluation reflects the change as follows:
updated boiled suppliers [{'city': 'chico', 'sname': 'jones', 'i': 'SPS2', 'sno': 2, 'misc': 'likes bunnies boiled broiled or in a stew', 'tablename': 'Supplier'}]The following
delete
removes all rows matching the boiledSelect
selection.
boiledSelect.delete()
ts.dropTable("Sells") ts.dropAll()
store
operation is called.
ts.store()Any unstored operations will be discarded.
ts.cleanUp(full=True)A non-full
cleanUp
will only aggregate saved changes to the
transient level. A full cleanUp
will aggregate the saved
changes to the final index.