Each qddb relation is implemented as a Unix directory containing several files and subdirectories. The name of the relation is the same as the name of the directory that contains these files. All files are in Ascii. Some files are structure files, which assist in performing searches. Structure files generally point into other files by means of locators, which are (offset, length) integer pairs. The files are as follows.
Typically, programs that query the relation initialize themselves by bringing HashTable into memory. Thereafter, a query on a single key requires (1) computing the key's hash value, (2) examining the hash table to find the locator into Index for the associated bucket, (3) reading a contiguous section of Index to obtain locators into Stable for the tuples containing that key, and (4) reading a contiguous section of Stable for each tuple to be presented. The list of locators built in step (3) is stored in an internal form called a TupleList (see Section 4), which can be manipulated before step (4), for example, to remove unwanted entries. Thus, the solution set X of a simple query on stable data can be obtained in |X|+2 read system calls, including the one needed to bring the hash table into memory.
Each query must also search instable data. The additional cost is one extra open/read pair plus one read of the Changes or Additions directories for each changed or added tuple. Thus, the number of excess open/read system calls is directly proportional to the number of added and updated tuples in the relation, that is, the number of tuples in the instable part.
In addition to the extra system calls, instable relations also require significant computation, because a string-search routine like fgrep must be used to determine the presence of keys.
We have considered an alternative implementation: to log all updates in a single Log file, with locators to that file stored in a separate structure file called LogLocators. Each update would require invalidating the appropriate entry in LogLocators and appending the update to Log. This representation has the advantage that there are fewer open calls needed to search updated tuples. (Open calls are potentially expensive.) The log file approach is therefore more efficient than our current method for querying instable parts of a highly modified relation.
The log file representation has the disadvantage that it requires more computation to perform an update. Also, simultaneous updates to different tuples require record locking in potentially many portions of the two files. Our current approach has the advantage of less complicated storage management and the expense of only a single pair of open/write calls per update. Opening a single file for exclusive use is sufficient to prevent race conditions and allows all tuples to be updated at once.
We accept the performance penalty of our method for instable relations because we assume that relations are changed infrequently; however, we believe that log files would be a useful addition to qddb.