| return to index |
nucular project page with download links |
nTableSpace wrapper makes it easy to store data organized into
tables of rows identified by primary keys in the manner of SQL databases.
This document briefly describes
the nTableSpace module and its usage.
This discussion will be most easily understood by readers with some previous background in SQL relational database technology and the Python programming language.
commit frequently guarantees transaction serializability
whereas the tablespace.store operation does not guarantee
serializability with respect to concurrent accesses to the same archive.
| Tablespace Operation | Similar SQL construct |
|---|---|
| Connect to a table space | |
# Connect to a new archive and new table space
ts = nTableSpace.newTableSpace("/tmp/myarchive", "db1")
# OR connect to an existing archive and
# possibly new table space
ts = nTableSpace.getTableSpace(archive, "db1")
|
CREATE DATABASE DB1; |
| Define a table | |
# "L" is a unique internal abbreviation for the table
ts.defTable("Sells", "L", "SNO", "PNO")
|
CREATE TABLE SELLS ( ... PRIMARY KEY (SNO, PNO) ); |
| Insert a row into a table | |
ts.insert("Sells", sno=1, pno=1, misc="always late")
|
INSERT INTO SELLS(SNO, PNO, MISC) VALUES (1, 1, 'always late'); |
| Retrieve rows matching a simple query | |
dictionaryList = ts.match("Supplier", sname='bongo')
|
SELECT * FROM SUPPLIER WHERE SNAME='bongo'; |
| Delete rows matching a simple query. | |
ts.deleteByValues("Supplier", sname='bongo')
|
DELETE FROM SUPPLIER WHERE SNAME='bongo'; |
| Delete rows with matching primary key | |
# optimized delete: sno is the primary key
ts.delete("Supplier", sno=31)
|
DELETE FROM SUPPLIER WHERE SNO=31; |
| Emulating table joins | |
# joins must be emulated using python code...
def partsForSupplier(sname):
for sdict in ts.match("Supplier", sname=sname):
snum = sdict["sno"]
for spdict in ts.match("Sells", sno=snum):
pnum = spdict["pno"]
for pdict in ts.match("Part", pno=pnum):
yield pdict["pname"]
pblake = list(partsForSupplier("blake"))
|
SELECT P.PNAME
FROM (SUPPLIER AS S INNER JOIN SELLS AS L
ON S.SNO=L.SNO) INNER JOIN PARTS AS P
ON P.PNO=L.PNO
WHERE S.SNAME='blake';
|
| Constructing complex queries | |
# this comparison is very loosely analogous
boiledSelect = ts.selectByValues("Supplier")
boiledSelect.getQuery().anyWord("boil")
boiledSelect()
print "boiled suppliers"
print bb
|
SELECT * FROM SUPPLIER WHERE SNAME LIKE "%boil%" OR CITY LIKE "%boil%" OR ...and so forth for all string columns... |
| Update rows matching a complex query | |
ts.selectByValues("Supplier", sname="blake").updateSet(city="chico")
|
UPDATE SUPPLIER SET CITY='chico' WHERE SNAME='blake'; |
| Delete rows using a complex query | |
ts.selectByValues("Supplier", sname="blake").delete()
|
DELETE FROM SUPPLIER WHERE SNAME='blake'; |
| Drop a table | |
ts.dropTable("Sells")
|
DROP TABLE SELLS; |
| Erase a table space | |
ts.dropAll() |
DROP DATABASE DB1; |
| Store modifications | |
ts.store() |
COMMIT; |
| Optimize the archive | |
# aggregate and optimize updates. ts.cleanUp(full=True) |
-- no standard SQL analogue. |
The remainder of this document discusses these operations in greater detail.
ts = nTableSpace.newTableSpace(archive, prefix)
where archive is a directory path for the archive and prefix is a unique prefix to use
for this table space within the archive. Bear in mind the following cautions
/tmp/supplier_parts and creates a table space with prefix SP in the
archive using the nTableSpace.newTableSpace function.
from nucular import nTableSpace
archive = "/tmp/supplier_parts"
prefix = "SP"
ts = nTableSpace.newTableSpace(archive, prefix)
ts.defTable("Supplier", "S", "SNO")
ts.insert("Supplier", sno=1, sname="smith", city="london", misc="a very unpleasant person")
print ts.dumpString()
ts.store()
ts.defTable("Supplier", "S", "SNO") operation defines a table
named Supplier to be managed by the table space ts with
primary key attributes SNO. The calling sequence for defTable
is:
tablespace.defTable(tableName, tableAbbrev, *keyAttributes):
where
ts.insert("Supplier", sno=1, sname="smith", city="london",
misc="a very unpleasant person")
Adds a row to the Supplier table, defining the primary key and a number of other
descriptive attributes. Each insert must define the primary key but there are no
restrictions on the number or names for other attributes in a table row.
print ts.dumpString() produces the dump output
TableSpace dump ts prefix='SP'
table: 'Supplier'
prefix='S'
keylist=['SNO']
'SPS1':
'city' : 'london'
'i' : 'SPS1'
'misc' : 'a very unpleasant person'
'sname' : 'smith'
'sno' : 1
'tablename' : 'Supplier'
which lists the table space content -- showing one table with one row.
ts.store()
If a program modifies a table space and fails to call the store method
the modifications will be discarded.
getTableSpace function. The operation
ts = nTableSpace.getTableSpace(archive, prefix)
will use an existing table space in the archive which uses the prefix,
or it will create one if there is no such table space in the archive.
The following interaction reconnects to the table space created above and adds two more tables as well as many more rows to the table space.
archive = "/tmp/supplier_parts"
prefix = "SP"
ts = nTableSpace.getTableSpace(archive, prefix)
ts.insert("Supplier", sno=2, sname="jones", city="paris",
misc="likes bunnies boiled broiled or in a stew")
ts.insert("Supplier", sno=3, sname="adams", city="vienna",
misc="owns and sells bunnies as a hobby")
ts.insert("Supplier", sno=31, sname="adams", city="vienna",
misc="this is a typo")
ts.insert("Supplier", sno=4, sname="blake", city="rome",
misc="has no discernible personality")
ts.insert("Supplier", sno=5, sname="bongo", city="mahwah",
misc="erroneous")
ts.defTable("Sells", "L", "SNO", "PNO")
ts.insert("Sells", sno=1, pno=1, misc="always late")
ts.insert("Sells", sno=1, pno=2)
ts.insert("Sells", sno=2, pno=4)
ts.insert("Sells", sno=3, pno=1)
ts.insert("Sells", sno=3, pno=3)
ts.insert("Sells", sno=3, pno=2)
ts.insert("Sells", sno=4, pno=3)
ts.insert("Sells", sno=4, pno=4)
ts.defTable("Part", "P", "PNO")
ts.insert("Part", PNO=1, PNAME="Screw", PRICE=10)
ts.insert("Part", PNO=2, PNAME="Nut", PRICE=8)
ts.insert("Part", PNO=3, PNAME="Bolt", PRICE=15)
ts.insert("Part", PNO=4, PNAME="Cam", PRICE=25)
tablespace.match method. The following example extracts
all entries from the Supplier table where the sname is "bongo".
test = ts.match("Supplier", sname='bongo')
print "bongo match", test
The result generated by the method is a list of
dictionaries representing the matching rows. The print
statement displays this list as follows:
bongo match [{'city': 'mahwah', 'sname': 'bongo', 'i': 'SPS5',
'sno': 5, 'misc': 'erroneous', 'tablename': 'Supplier'}]
The calling sequence for match allows the specification of values
for any number of attributes as follows
ts.match(tableName, **argsDict)Here the
argsDict maps attribute names to
the desired values.
Note that the attributes and values must match on exact equality.
See the selectByValues method for a more
general query mechanism which permits matching by ranges of values,
prefixes of strings, or contained word prefixes.
deleteByValues removes rows with attributes
that match specified values from a table.
ts.deleteByValues("Supplier", sname='bongo')
test = ts.match("Supplier", sname='bongo')
print "bongo match after delete", test
In this case the print statement reports
bongo match after delete []
The calling sequence for deleteByValues is similar to the match
calling sequence
ts.deleteByValues(tableName, **argsDict)
Note that the important special case ts.deleteByValues("Supplier")
deletes all rows from the supplier table.
The deleteByValues method only supports attribute equality
matches.
The selectByValues query mechanism described below
provides a more general method for deleting rows based on other predicates.
Deleting a row by primary key
In the case where the primary key for the table is known
use the tablespace.delete method to delete
rows faster (by a constant factor) than using selectByValue
ts.delete("Supplier", sno=31)
Traversing multiple tables (without joins)
The nTableSpace module is a thin wrapper over the
underlying Nucular archive infrastructure, and as such provides
no methods for automatically compiling complex join operations,
developing query plans, and so forth. However by combining
the single table query operations offered by the tablespace
abstraction programs can emulate join operations, for example,
as follows.
def partsForSupplier(sname):
for sdict in ts.match("Supplier", sname=sname):
snum = sdict["sno"]
for spdict in ts.match("Sells", sno=snum):
pnum = spdict["pno"]
for pdict in ts.match("Part", pno=pnum):
yield pdict["pname"]
pblake = list(partsForSupplier("blake"))
print "parts for blake", pblake
More complex single table queries with full text searching
As mentioned above the selectByValues method
allows the creation of more complex queries than the methods described
above. In particular the method call
sel = ts.selectByValues(Tablename, **attributeValues)
Yields an unevaluated selection object sel which can be
refined by adding additional constraints.
For example, the boiledSelect query below
selects all entries in the Supplier table
which contain the word prefix boil in any attribute.
boiledSelect = ts.selectByValues("Supplier")
q = boiledSelect.getQuery()
q.anyWord("boil")
bb = boiledSelect()
print "boiled suppliers", bb
The method call boiledSelect.getQuery() extracts
the Nucular query object associated with the table space selection and
the method call q.anyWord("boil") restricts the query
to match only entries with the word prefix boil in free text
(i.e., in any value in the entry). In addition to anyWord, Nucular
queries support other restrictions as described in the
Nucular API Summary Document. A selection
can be restricted by any of those methods in a similar manner.
The call boiledSelect() evaluates the selection and the resulting
list of dictionaries value prints as:
boiled suppliers
[{'city': 'paris', 'sname': 'jones', 'i': 'SPS2', 'sno': 2,
'misc': 'likes bunnies boiled broiled or in a stew', 'tablename': 'Supplier'}]
Updating or deleting rows matching a complex query
A program can also use selection objects created by the
ts.selectByValues method to update or delete
matching rows. For example the following interaction sets the
city to chico for all rows matching
the boiledSelect selection (all rows in the
Supplier table containing the word prefix boil
anywhere).
boiledSelect.updateSet(city="chico")
print "updated boiled suppliers"
print boiledSelect()
The printed selection evaluation reflects the change as follows:
updated boiled suppliers
[{'city': 'chico', 'sname': 'jones', 'i': 'SPS2', 'sno': 2,
'misc': 'likes bunnies boiled broiled or in a stew', 'tablename': 'Supplier'}]
The following delete removes all rows matching the boiledSelect
selection.
boiledSelect.delete()
ts.dropTable("Sells")
ts.dropAll()
store operation is called.
ts.store()
Any unstored operations will be discarded.
ts.cleanUp(full=True)
A non-full cleanUp will only aggregate saved changes to the
transient level. A full cleanUp will aggregate the saved
changes to the final index.