Next: 6.2 Complex data: some Up: 7.1.1 Schemas Previous: 7.1.1 Schemas

6.1 The format

The Schema file begins with optional storage format options:

HashSize = <Number>: Sets the number of buckets in the HashTable file. If you leave this number out, Qddb will use a default of 20,000.
HashType = <Number>: Sets the hash function to <Number>, where <Number> may currently be 0 or 1. 0 is the default hash function; 1 is a newer function designed for data with repetitive prefixes/suffixes. If your data contains numbers, WWW URLs, or source code, you may find that HashType 1 performs much better than the default. You can experiment with your own HashType by editing the function Qddb_HashValue in Lib/LibQddb/Hash.c.
Use Cached Hashing: Instructs qstall and qindex to build a HashTable file suitable for caching hash values instead of reading the entire file at startup.
CacheSize = <Number>: Sets the maximum number of hash values to cache. Ignored unless ``Use Cached Hashing'' is specified.
Use Reduced Attribute Identifiers: Instructs qstall to build a RedAttrIndex structure file holding the full attribute identifiers. This option reduces the size of the Database and Index files, because the lengthy attribute identifiers and instance numbers are represented by a shorter base-36 number. You can add and delete this option from the Schema at any time, but you must immediately restabilize with qstall. qstall will notice that you are converting from reduced attribute identifiers to full attribute identifiers (or vice-versa) and make the appropriate conversion.
DateFormat = "Format String": Sets the default format string for outputting dates (this relation only). Any format suitable for strftime(3) is allowed.

Qddb ignores the case of keywords such as HashSize.

After the format options, Schema contains the attribute descriptions. Each attribute takes the form AttributeName ?<options>? ?(<subattributes>)? ?*?. (We are following the Tcl convention that parts surrounded in ?? are optional, and that parts with capital letters need to be specified further by a variable or a specific literal.)

Example:

Suppose that you want to build a database for your CD-ROM collection. You want to keep series together, along with their individual costs and date of arrival. The following Schema might fulfill your wish:

    # My CD-ROM collection
    Use cached hashing
    Use reduced attribute identifiers
    HASHSIZE = 1000
    CACHESIZE = 100
    HASHTYPE = 1
    # *** attributes begin here ***
    Series verbosename "CD-ROMs" (
        BarCode verbosename "Barcode on CD cover" separators ""
        Title 
        Num verbosename "# CDs" type integer
        Mfg verbosename "Manufacturer" (
            Name
            Address (Street City State Zip)*
            Phones (Desc Area Number)*
        )
        DateReleased verbosename "Release date" type date
        DatePurchased verbosename "Acquisition date" type date
        Cost type real format "%.2f"
    )*
    Comments verbosename "Series comments"

Attribute names must begin with an alphabetic character (``a'' through ``z,'' either upper or lower case) and may continue with any alphanumeric characters. The outer-level attribute names must be unique, and within each structured attribute, the attribute names must be unique. However, you may use the same name inside several different structured attributes, although you may find it confusing.

If you specify *, then this attribute is expandable. If you include subattributes, then this attribute is structured. The subattributes obey the same syntax as the attribute itself. (You may start each subattribute on a new line if you like, but the Schema file does not follow any fixed format.) You may terminate attribute definitions with a semicolon if it helps you read the schema, but they are optional. Comments may be placed anywhere on a line after a pound sign (``#'').

The options may be any of:

tabular575

The words verbosename, type, separators, format, alias, exclude, and defaultvalue are keywords and are invalid as attribute names.

Types restrict the values accepted by nxqddb (but not by qadd and qedit.) If you don't specify the type of a leaf attribute, it accepts arbitrary strings. Integer types may be negative. Real types are stored as double precision floating point. Dates are stored as seconds after an arbitrary time called ``the epoch''.

Separators indicate how the values are to be parsed into words for the purpose of word search. If you don't specify separators, Qddb uses all non-alphanumeric characters as delimiters. (If the attribute has type real, then -+. is not a delimiter; if the attribute has type integer or date, then +- are not delimiters.) If you want strings like "foo:bar/whatever this is" to be considered three words, foo, bar, and whatever this is, then you should set the separators to ":/". A newline (``\n'') is always a delimiter.

Aliases allow you to give a simple name to a deeply nested leaf attribute. This is handy if you write programs in qtcl, the Qddb extension to Tcl, described in Chapter 7.

The exclude option instructs qstall to exclude the attribute from indexing. You should probably use this option if you never need search on a particular attribute's value. Intelligent use of this option can significantly decrease the size of the Index file.

Formats indicate how you want nxqddb to display numeric data. They do not influence how you enter data. They are ignored except for integer, real, and date attributes.

For integers, you may specify any of the standard printf(3) ``%d'' formats. Most commonly, you will want to specify the padding:
tabular593

For reals, you may specify any of the standard printf(3) ``%f'' formats. Most commonly, you will want to specify the precision:
tabular602

For dates, you may specify almost any combination of the following (also see the man pages for strftime(3), asctime(3), and mktime(3)):

tabular614

You may enter dates in almost any format, except that ``nn/nn/nn'' is interpreted as ``%m/%d/%y''. You can override this interpretation when you install Qddb so that such input will be interpreted ``%d/%m/%y''. Some combinations of date format, such as ``%d-%b-%y'', are not currently recognized by Qddb. If you provide a date format that Qddb cannot interpret, you will see an error when entering dates. The most common formats are supported.

You may also enter dates relative to some date:
tabular624

You may also use the day-of-week keywords Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, and Sunday. Month names are also fine. For example, the following are all valid input to a date attribute:
tabular639

The input ``next tuesday'' does not mean this coming (or current) Tuesday, but the one after. The keyword ``tuesday'' means the coming (or current) Tuesday.

Figure 6.1 shows a schema using all the options. It represents a family that might have many members, addresses and phone numbers. The attributes n, a, and p are expandable so you can have multiple names, addresses and phone numbers grouped together in one record.

Figure 6.1: An example of a schema showing all options

Next: 6.2 Complex data: some Up: 7.1.1 Schemas Previous: 7.1.1 Schemas

Herrin Software Development, Inc.