A schema tree describes the structure of tuples in the database. It indicates which attributes are structured and/or expandable. It also contains type information for leaf attributes. As shown above, schema trees represent many linked relational schemas. Schema trees have the following properties.
Subtrees allow us to restrict our attention.
For example, suppose we want to add
a new residence to a client's record. To perform this task in
a database using relational tables, we would
(1) find the client in the Client table, read Client Id,
(2) add a new row to the Residence
table with Client Id and a new unique Residence Id, and
(3) add one or more phone numbers to the Phones table, each with the
appropriate Residence Id.
With schema trees, however,
we only need to find the client and add a new section of the tuple tree
corresponding to the Residence
attribute. Multiple phone numbers can
be
added to the new instance by creating a new section of the tuple tree
corresponding to the Residence.Phones
attribute under the same instance of Residence.
We have demonstrated that a recursively-structured schema tree can describe a set of relational tables where rows in a subordinate table are related to a single entity. In database terminology, these are one-many relations.
Tables that are related in more complex ways, such as many-many relationships,
can still be described by a set of schema trees. Two schema trees
are linked much in the same way that conventional relational tables
are linked. For example, consider the following two schema trees linked
by the SocialSecurityNumber
field:
# Person schema tree
Name (First Last)
SocialSecurityNumber
Address (
Street City State ZipCode
)*
# Company schema tree
CompanyName
CompanyAddress (
Street City State ZipCode
)*
Employees (
SocialSecurityNumber
Position
SalaryHistory (
Date
Salary
)*
)*
These particular relations should not be represented by a single
schema tree because it is a many-many relationship. A person
can be an employee of multiple companies. If you change the
Address
field of the Company relation, you want the Address
field to change whenever accessing employees of that company.
We have seen how schema trees can be used to describe the relationship between relational tables and to create new, logically related, entries in those tables. We now describe how Qddb uses the schema tree to fully describe a database by establishing types, verbose field names, search options, word descriptions and value formatting.
Qddb associates a type with each leaf attribute, either a numeric type (real, integer, or date), or string. Numeric values may be associated with a format for display (not input) purposes. The string type consists of words interspersed with separators. Usually, separators are taken to be any non-alphanumeric characters, but the schema may specify other separators. Strings are of arbitrary length.
Each attribute in the schema may be associated with a verbose (many-word) name for display purposes. By default, database searches cover values in all leaf attributes (in relational database parlance, all attributes are indexed), but the designer can exclude individual attributes from the indices to reduce their space requirements.
The schema itself is maintained in a free-format Ascii file that can be created by any text editor. Each attribute or subattribute in the schema is of the form:
AttributeName ?<options>? ?(<subattributes>)? ?*?
where we use the ??
notation to indicate optional syntax.
The <options>
may be any of:
Option | Purpose |
verbosename ``string'' | Verbose name of attribute |
type string | Attribute has type string [default] |
type date | Attribute has type date |
type integer | Attribute has type integer |
type real | Attribute has type real |
alias AttributeName | A unique alias for the attribute |
separators ``string'' | Words are separated by one |
of the characters in ``string'' | |
format ``string'' | Format the attribute based on ``string'' |
exclude | Exclude the attribute from indexing |
defaultvalue ``string'' | A default value for each instance |
To represent a simple database of potential employees, we might keep their name, addresses, phone numbers, rank, and date of application. A schema for this relation could be:
Name ( First Middle Last )
Address (
Street exclude type string
City
State
Zip verbosename "Zip Code"
)*
Phones ( Desc verbosename "Description" Number )*
Rank
type integer
format "%2d"
verbosename "Prospect ranking"
Date type date format "%D %T"
verbosename "Date Applied"
Real and integer formats are specified in the standard format for printf() in the ANSI C standard, and dates are specified in the standard form accepted by the POSIX function strftime(). All formats pertain to output conventions; input conventions are those accepted by atoi() (integers), atof() (reals), and a wide variety of forms (dates).
It is important to be able to uniquely distinguish attributes in
a schema tree. For this reason, a schema must uniquely name attributes at the
top level of each subtree.
Attributes across levels may have
identical names. For example, consider this schema:
A ( B C )* B C
The leaf attributes are A.B
, A.C
, B
, and C
. When
we refer to B
, we are
referring to the leaf attribute B
, not A.B
.
A depth-first traversal of the schema tree generates the set of
leaf attributes.
This set, displayed with one column for each leaf attribute, is called a
complete row of the schema.
If we
wish to elide certain uninteresting parts of the complete row, we may ignore
the columns comprising those parts, resulting in a partial row. For
example, if we are only interested in the
leaf attributes B
and C
, then we exclude attribute A
and all its leaves from our traversal.