The nTableSpace wrapper API

nucular project page with download links

The nTableSpace wrapper API

It is often convenient to store, retrieve, and modify data which closely models an SQL relational database table set using a Nucular archive. The nTableSpace wrapper makes it easy to store data organized into tables of rows identified by primary keys in the manner of SQL databases. This document briefly describes the nTableSpace module and its usage.

This discussion will be most easily understood by readers with some previous background in SQL relational database technology and the Python programming language.

Quick reference comparison chart

The following table compares methods in the tablespace API with similar SQL language constructs. The similarity is generally not exact -- for example SQL commit frequently guarantees transaction serializability whereas the tablespace.store operation does not guarantee serializability with respect to concurrent accesses to the same archive.

Tablespace Operation	Similar SQL construct
Connect to a table space
# Connect to a new archive and new table space ts = nTableSpace.newTableSpace("/tmp/myarchive", "db1") # OR connect to an existing archive and # possibly new table space ts = nTableSpace.getTableSpace(archive, "db1")	CREATE DATABASE DB1;
Define a table
# "L" is a unique internal abbreviation for the table ts.defTable("Sells", "L", "SNO", "PNO")	CREATE TABLE SELLS ( ... PRIMARY KEY (SNO, PNO) );
Insert a row into a table
ts.insert("Sells", sno=1, pno=1, misc="always late")	INSERT INTO SELLS(SNO, PNO, MISC) VALUES (1, 1, 'always late');
Retrieve rows matching a simple query
dictionaryList = ts.match("Supplier", sname='bongo')	SELECT * FROM SUPPLIER WHERE SNAME='bongo';
Delete rows matching a simple query.
ts.deleteByValues("Supplier", sname='bongo')	DELETE FROM SUPPLIER WHERE SNAME='bongo';
Delete rows with matching primary key
# optimized delete: sno is the primary key ts.delete("Supplier", sno=31)	DELETE FROM SUPPLIER WHERE SNO=31;
Emulating table joins
# joins must be emulated using python code... def partsForSupplier(sname): for sdict in ts.match("Supplier", sname=sname): snum = sdict["sno"] for spdict in ts.match("Sells", sno=snum): pnum = spdict["pno"] for pdict in ts.match("Part", pno=pnum): yield pdict["pname"] pblake = list(partsForSupplier("blake"))	SELECT P.PNAME FROM (SUPPLIER AS S INNER JOIN SELLS AS L ON S.SNO=L.SNO) INNER JOIN PARTS AS P ON P.PNO=L.PNO WHERE S.SNAME='blake';
Constructing complex queries
# this comparison is very loosely analogous boiledSelect = ts.selectByValues("Supplier") boiledSelect.getQuery().anyWord("boil") boiledSelect() print "boiled suppliers" print bb	SELECT * FROM SUPPLIER WHERE SNAME LIKE "%boil%" OR CITY LIKE "%boil%" OR ...and so forth for all string columns...
Update rows matching a complex query
ts.selectByValues("Supplier", sname="blake").updateSet(city="chico")	UPDATE SUPPLIER SET CITY='chico' WHERE SNAME='blake';
Delete rows using a complex query
ts.selectByValues("Supplier", sname="blake").delete()	DELETE FROM SUPPLIER WHERE SNAME='blake';
Drop a table
ts.dropTable("Sells")	DROP TABLE SELLS;
Erase a table space
ts.dropAll()	DROP DATABASE DB1;
Store modifications
ts.store()	COMMIT;
Optimize the archive
# aggregate and optimize updates. ts.cleanUp(full=True)	-- no standard SQL analogue.

Note that there is no method for creating indexes on tables because the underlying Nucular archive infrastructure automatically indexes everything by default.

The remainder of this document discusses these operations in greater detail.

Creating a brand new archive and populating a table space

To make a new archive and a new tablespace at once use

    ts = nTableSpace.newTableSpace(archive, prefix)

where archive is a directory path for the archive and prefix is a unique prefix to use for this table space within the archive. Bear in mind the following cautions

Any existing archive files will be deleted. Do not use this method if the archive path directory exists and you want to keep anything it contains.
The prefix must identify the tablespace uniquely within the archive. The prefix is used for creating unique Nucular identifiers for objects in the table space. No other identifiers in the table space should use the prefix. If there are other table spaces in the archive no tablespace prefix should be a prefix of another table space prefix, or problems will ensue. This error condition is not checked at present... but it probably should be checked somehow... xxxx

The interaction below initializes a new archive at the directory path /tmp/supplier_parts and creates a table space with prefix SP in the archive using the nTableSpace.newTableSpace function.

    from nucular import nTableSpace
    archive = "/tmp/supplier_parts"
    prefix = "SP"
    ts = nTableSpace.newTableSpace(archive, prefix)
    ts.defTable("Supplier", "S", "SNO")
    ts.insert("Supplier", sno=1, sname="smith", city="london", misc="a very unpleasant person")
    print ts.dumpString()
    ts.store()

Defining a table

The ts.defTable("Supplier", "S", "SNO") operation defines a table named Supplier to be managed by the table space ts with primary key attributes SNO. The calling sequence for defTable is:

    tablespace.defTable(tableName, tableAbbrev, *keyAttributes):

where

The table name must uniquely identify the table in the tablespace.
The tableAbbrev must uniquely identify the table in the tablespace and must not be a prefix of any other table in the tablespace.
All rows for the table must define the key attributes.
The key attributes must uniquely identify each row in the table. (In the terminology of database theory they form a primary key for the table.)

Inserting a row

Once the table is created the insert operation

    ts.insert("Supplier", sno=1, sname="smith", city="london", 
                          misc="a very unpleasant person")

Adds a row to the Supplier table, defining the primary key and a number of other descriptive attributes. Each insert must define the primary key but there are no restrictions on the number or names for other attributes in a table row.

Dumping the tablespace to a string for testing

The print ts.dumpString() produces the dump output

TableSpace dump ts prefix='SP'
    table: 'Supplier'
        prefix='S'
        keylist=['SNO']
        'SPS1':
            'city' : 'london'
            'i' : 'SPS1'
            'misc' : 'a very unpleasant person'
            'sname' : 'smith'
            'sno' : 1
            'tablename' : 'Supplier'

which lists the table space content -- showing one table with one row.

Always store any modifications!

Any modifications to a table space must be stored or the modifications will not be made permanent.

    ts.store()

If a program modifies a table space and fails to call the store method the modifications will be discarded.

Attaching to a tablespace in an existing archive

If you want to use an existing archive to store your new table space or if you want to access an existing table space in an existing archive use the getTableSpace function. The operation


    ts = nTableSpace.getTableSpace(archive, prefix)

will use an existing table space in the archive which uses the prefix, or it will create one if there is no such table space in the archive.

The following interaction reconnects to the table space created above and adds two more tables as well as many more rows to the table space.

    archive = "/tmp/supplier_parts"
    prefix = "SP"
    ts = nTableSpace.getTableSpace(archive, prefix)
    ts.insert("Supplier", sno=2, sname="jones", city="paris", 
                          misc="likes bunnies boiled broiled or in a stew")
    ts.insert("Supplier", sno=3, sname="adams", city="vienna", 
                          misc="owns and sells bunnies as a hobby")
    ts.insert("Supplier", sno=31, sname="adams", city="vienna", 
                          misc="this is a typo")
    ts.insert("Supplier", sno=4, sname="blake", city="rome", 
                          misc="has no discernible personality")
    ts.insert("Supplier", sno=5, sname="bongo", city="mahwah", 
                          misc="erroneous")

    ts.defTable("Sells", "L", "SNO", "PNO")
    ts.insert("Sells", sno=1, pno=1, misc="always late")
    ts.insert("Sells", sno=1, pno=2)
    ts.insert("Sells", sno=2, pno=4)
    ts.insert("Sells", sno=3, pno=1)
    ts.insert("Sells", sno=3, pno=3)
    ts.insert("Sells", sno=3, pno=2)
    ts.insert("Sells", sno=4, pno=3)
    ts.insert("Sells", sno=4, pno=4)

    ts.defTable("Part", "P", "PNO")
    ts.insert("Part", PNO=1, PNAME="Screw", PRICE=10)
    ts.insert("Part", PNO=2, PNAME="Nut", PRICE=8)
    ts.insert("Part", PNO=3, PNAME="Bolt", PRICE=15)
    ts.insert("Part", PNO=4, PNAME="Cam", PRICE=25)

Extracting rows matching a pattern

To extract rows from a table based on values for attributes from a table space, use the tablespace.match method. The following example extracts all entries from the Supplier table where the sname is "bongo".

    test = ts.match("Supplier", sname='bongo')
    print "bongo match", test

The result generated by the method is a list of dictionaries representing the matching rows. The print statement displays this list as follows:

    bongo match [{'city': 'mahwah', 'sname': 'bongo', 'i': 'SPS5', 
        'sno': 5, 'misc': 'erroneous', 'tablename': 'Supplier'}]

The calling sequence for match allows the specification of values for any number of attributes as follows

   ts.match(tableName, **argsDict)

Here the argsDict maps attribute names to the desired values. Note that the attributes and values must match on exact equality. See the selectByValues method for a more general query mechanism which permits matching by ranges of values, prefixes of strings, or contained word prefixes.

Deleting rows matching a pattern

The deleteByValues removes rows with attributes that match specified values from a table.

    ts.deleteByValues("Supplier", sname='bongo')
    test = ts.match("Supplier", sname='bongo')
    print "bongo match after delete", test

In this case the print statement reports

    bongo match after delete []

The calling sequence for deleteByValues is similar to the match calling sequence

    ts.deleteByValues(tableName, **argsDict)

Note that the important special case

ts.deleteByValues("Supplier")
deletes all rows from the supplier table.

The deleteByValues method only supports attribute equality
matches.
The selectByValues query mechanism described below
provides a more general method for deleting rows based on other predicates.

Deleting a row by primary key

In the case where the primary key for the table is known
use the tablespace.delete method to delete
rows faster (by a constant factor) than using selectByValue
    ts.delete("Supplier", sno=31)


Traversing multiple tables (without joins)

The nTableSpace module is a thin wrapper over the
underlying Nucular archive infrastructure, and as such provides
no methods for automatically compiling complex join operations,
developing query plans, and so forth.  However by combining
the single table query operations offered by the tablespace
abstraction programs can emulate join operations, for example,
as follows.
    def partsForSupplier(sname):
        for sdict in ts.match("Supplier", sname=sname):
            snum = sdict["sno"]
            for spdict in ts.match("Sells", sno=snum):
                pnum = spdict["pno"]
                for pdict in ts.match("Part", pno=pnum):
                    yield pdict["pname"]

    pblake = list(partsForSupplier("blake"))
    print "parts for blake", pblake
More complex single table queries with full text searching

As mentioned above the selectByValues method
allows the creation of more complex queries than the methods described
above.  In particular the method call
    sel = ts.selectByValues(Tablename, **attributeValues)

Yields an unevaluated selection object sel which can be
refined by adding additional constraints.
For example, the boiledSelect query below
selects all entries in the Supplier table
which contain the word prefix boil in any attribute.
    boiledSelect = ts.selectByValues("Supplier")
    q = boiledSelect.getQuery()
    q.anyWord("boil")

    bb = boiledSelect()
    print "boiled suppliers", bb

The method call boiledSelect.getQuery() extracts
the Nucular query object associated with the table space selection and
the method call q.anyWord("boil") restricts the query
to match only entries with the word prefix boil in free text
(i.e., in any value in the entry).  In addition to anyWord, Nucular
queries support other restrictions as described in the 
Nucular API Summary Document.  A selection
can be restricted by any of those methods in a similar manner.

The call boiledSelect() evaluates the selection and the resulting
list of dictionaries value prints as:
    boiled suppliers
    [{'city': 'paris', 'sname': 'jones', 'i': 'SPS2', 'sno': 2, 
      'misc': 'likes bunnies boiled broiled or in a stew', 'tablename': 'Supplier'}]


Updating or deleting rows matching a complex query

A program can also use selection objects created by the 
ts.selectByValues method to update or delete
matching rows.  For example the following interaction sets the
city to

chico for all rows matching the boiledSelect selection (all rows in the Supplier table containing the word prefix boil anywhere).

    boiledSelect.updateSet(city="chico")
    print "updated boiled suppliers"
    print boiledSelect()

The printed selection evaluation reflects the change as follows:

updated boiled suppliers
[{'city': 'chico', 'sname': 'jones', 'i': 'SPS2', 'sno': 2, 
  'misc': 'likes bunnies boiled broiled or in a stew', 'tablename': 'Supplier'}]

The following delete removes all rows matching the boiledSelect selection.

    boiledSelect.delete()

Dropping tables and whole table spaces

The table space abstraction also provides methods for dropping a table definition (together with any rows in the table) or a whole table space

    ts.dropTable("Sells")
    ts.dropAll()

Again: always store changes

Just for emphasis, please remember that changes to a table space will only be made permanent in the containing archive when the store operation is called.

    ts.store()

Any unstored operations will be discarded.

Archive cleanup

The table space interface also provides a simplified interface which permits the optimization of the underlying archive using the aggregation operations explained in the overview document.

     ts.cleanUp(full=True)

A non-full cleanUp will only aggregate saved changes to the transient level. A full cleanUp will aggregate the saved changes to the final index.

End of The nTableSpace wrapper API return to index