Next: 7.2 QTcl Commands Up: 7 Writing Qddb Applications Previous: 7 Writing Qddb Applications

7.1 Data abstractions

The basic Qddb data abstractions are schemas, keylists, tuples, rows, views, and instances. Each of these datatypes refers to Qddb structures introduced in Chapter 4.

In this section, we will discuss these abstractions without descending into Tcl commands. The next section discusses the commands that manipulate these abstractions.

Schemas

The schema is an abstraction of the Schema file in the Qddb relation directory. It represents the schema as a tree. (See Figure 7.1).

Figure 7.1: Tree Form of the Qddb Schema ``Attr1* Attr2 (A B)* Attr3''

The tree form implies two operations:

Find the deepest common ancestor of two given nodes in the tree.
Find all leaves for whom a given node is an ancestor.

Section 7.1.3 details the importance of the schema abstraction in relation to tuples.

A keylist describes a set of tuples, often those that satisfy some query. Each node in the keylist refers to a tuple in the database (perhaps in the Database file, perhaps in a file in the directories Additions or Changes). Within the tuple, a keylist node can refer to a particular attribute. Keylists have several binary operations:

tabular705

The Qddb query mechanism supports searches on individual attributes and on entire tuples based on: (1) a word, (2) a number, (3) a date, (4) a regular expression, and (5) a range of words, numbers, or dates. Each query returns a keylist; logical combinations of queries are accomplished by binary operations on the resulting keylists. For example, if you want to find all the tuples that contain the word Joe and Machey, you would:
tabular712

Each keylist node is a quadruple {start, length, type, number} and a leaf identifier that points into the data files. The pair {type, number} specifies whether the tuple resides in the stable or instable part of the database. Not all nodes in a keylist necessarily point to a valid tuple; if a tuple has been changed or deleted, then a query might still match the stable tuple. Invalid tuples are empty when you try to read them.

7.1.3 Tuples

The tuple is the basis for data manipulation. You can read, write, add, delete, and change tuples. You can also translate tuples from internal tree form into several other formats: external, tclexternal, and readable. A tuple can be locked to prevent others from writing it, but even a locked tuple can always be read. The act of writing a tuple is synchronization-atomic; Qddb will wait until any writes are complete before reading. When a tuple is locked for writing, it is refreshed from disk immediately after the lock is acquired so that you are guaranteed the latest version of the tuple.

Like schemas, tuples are most conveniently visualized in the form of a tree. Using the schema shown in Figure 7.1, we might have the following tuple (shown in readable form):

    Attr1 = "the"
    Attr1 = "moon"
    Attr2 (
        A = "was"
        B = "a"
    )
    Attr2 (
        A = "ghostly"
        B = "galleon"
    )
    Attr3 = "tossed"

The tree form of this tuple is shown in Figure 7.2. There are three types of node in this tree: attribute, instance, and data nodes. An attribute node describes an attribute in the schema. Instance nodes describe the instance numbers of their children. These two node types alternate on paths from the root to leaves. The data nodes are the leaves.

Figure 7.2: Tree Form of a Qddb Tuple

The tree form of a tuple is closely related to the tree form of a schema. In its simplest form, the tuple tree is an exact duplicate of the schema tree, with instance nodes inserted below each attribute node and leaves inserted at the end of each branch. Multiple instances of an expandable attribute are represented by separate branches.

The operations on tuples include adding and removing instances of attributes. The tuple shown in Figure 7.2 has two instances of Attr2. You might wish to add another instance of Attr2 and fill in just the B subattribute. When you request a new instance of Attr2, you will get a new branch of Attr2 that includes Attr2.A and Attr2.B. You can then fill in Attr2.B and leave Attr2.A empty. When the tuple is stored, the empty attribute will be omitted, and no storage will be consumed.

7.1.4 Rows

In a sense, a Qddb tuple is the join of a set of related rows from several ordinary relational tables. The relational tables are implicit in Qddb's tree-structured tuples. In order to reduce a Qddb tuple to tabular form, you must extract the rows of interest.

A row is a view of a tuple that restricts each multiple-instance attribute to one of the instances. Rows are built from tuples. A complete row includes each leaf of the schema tree. A partial row omits some leaves. You would omit leaves if the data they contain is not important to display at the moment.

For example, the following is one complete row of the tuple shown in 7.1:

    Attr1 = "the"
    Attr2 ( 
        A = "ghostly"
        B = "galleon"
    )
    Attr3 = "tossed"

The following is a partial row of the same tuple:

    Attr1 = "the"
    Attr3 = "tossed"

Typically, you want to prune identical partial rows (you cannot have identical complete rows.)

All complete rows can be extracted from a tuple tree with the following algorithm, written in an ad-hoc, C-like notation (see Figure 7.2):

    rowList procedure ProcessAttributes (AttributeNode) {
        rowList r = NULL;
        foreach Inst = instance of attributeNode {
            r += ProcessInstances(Inst) /* Union */
        }
        return r
    }
    rowList procedure ProcessInstances (InstanceNode) {
        rowList r = NULL
        if leaf(InstanceNode)
            return InstanceNode.Value
        else {
            foreach Attr = attribute of InstanceNode
                r *= ProcessAttributes(Attr) /* Cartesian product */
            return r
        }
    }
    rowList r = ProcessInstances(RootNode)

The above algorithm works as follows. To generate all rows in a tuple, you call ProcessInstances with the root node of the tuple. ProcessInstances iteratively forms the cartesian product of the rows produced at that instance by ProcessAttributes. ProcessAttributes iteratively forms the union of all rows produced at that attribute by calling ProcessInstances.

7.1.5 An example

Consider the following animal-records schema:

    Kind (
        Desc*
        Which
        Breed
    )*
    Whose

Here is a tuple in that relation:

    Kind (
        Desc = "My dog's name is 'Bonnie'"
        Desc = "My dog's name is 'Boomer'"
        Which = "Dog"
        Breed = "Shekie"
    )
    Kind (
        Desc = "My cat's name is 'Vinnie'"
        Desc = "My other cat's name is 'Bud'"
        Which = "Cat"
        Breed = "Domestic short hair"
    )
    Whose = "Eric"

The following are the complete rows of this tuple:

tabular762
Partial rows can be extracted by excluding particular columns from the result, possibly deleting duplicate rows.

7.1.6 Views

A view is a mechanism for navigating among the rows of a tuple. At any instant, a view corresponds to one complete row in the tuple. The view can be shifted to other rows of the same tuple. When you define a view, you specify the Tcl global variables that should be bound to each leaf attribute. When the view corresponds to a particular row of the tuple, those variables have the values of the associated attributes. You can read those variables to see what is in the tuple, and you can set those variables to modify what is in the tuple. If you aren't interested in following some leaf attributes, you can omit variables for those attributes. If you provide no variables at all, then you can't use the view to help you extract values (although you can still get at the values by instance commands, described later).

A newly-defined view initially corresponds to the first row, that is, the row that has the first instance of each attribute in the tuple. After definition, a view may be shifted to any row. The usual way is to switch to a different instance of an expandable attribute.

For example, using the Schema and tuple defined in Section 7.1.5, we might have the following view:

tabular772

This view describes a row containing the first instance of Kind and the first instance of Kind.Desc within Kind. By shifting the the Kind.Desc attribute to the second instance, the view becomes:

tabular781

You can also shift the instance of the entire Kind attribute, in which case the view would contain the first instance of Kind.Desc within the second instance of Kind:

tabular790