Time Is Not A Time-Series

Introduction

Representing time-varying data in a database has long been a problem. While it has always been possible to build structures tagged with time fields, the database tools for maintaining and analyzing time-varying data have left much to be desired.

In the days of relational databases, there were few choices except to use the database system as a simple data repository. Most, if not all, of the analysis and query logic associated with time-varying data needed to be implemented in application programs. The deficiency of the relational systems was their inability to create new data types that could hold time-varying collections of data. If the operation couldn't be done using select, project, join, and cartesian product, it couldn't be done in the relational system.

With the advent of object-oriented technology in general, and object-oriented database systems in particular, the obvious problem of the relational systems appears to be solved. Because object-oriented database systems can implement new data types that are directly usable by the database system, it is now possible to define new types of collections -- such as time-series -- that model time-varying collections of data. Presumably, these new data types can include the operations needed to maintain, select, and aggregate their contents based on the needs of the application builder.

Based on this new-found capability, there is now a perception that the only thing to do is provide a starter set of classes. Our ten years of experience in designing and using VisionTM -- our commercial, temporal, object-oriented database system -- tells us that the problem is larger than that.

Capturing Relationships -- How Time Affects A Data Model

Providing a TimeSeries class -- a type of indexed collection that understands time as its index -- is a reasonable starting point for modeling time-varying data in an object-oriented system. Of its many and extensible set of properties and operations, the salient features of this new class are as follows:

  • It is ordered by its index (i.e., last Friday is before this Tuesday)
  • It supports order-aware interval queries (i.e., find the most recent observation recorded in the series on or before November 23, 1992)
  • It is possible to use its instances to hold elements of any data type -- subject, of course, to the requirements of its type declaration and the type checking requirements of the database (i.e., it should be possible to construct a time-series of Company objects, not just a time series of numbers).

Although many time-series hold observations recorded at a fixed frequency such as annual, quarterly, or monthly, it is overly restrictive to require a regular pattern of observations -- stock split adjustment factors and intra-day trades on a stock exchange, for example, will never occur at regular intervals.

Armed with a TimeSeries class, properties of new and existing classes can be declared in terms of its instances. Applications of TimeSeries in a number of real financial systems developed with Vision include:

  • A pricingSeries property at Security that returns a TimeSeries of Price objects
  • A holdingsSeries property at Account that returns a TimeSeries of HoldingsList objects
  • An industrySeries property at Company that returns a TimeSeries of Industry objects
  • An estimateSeries property at Security that returns a TimeSeries of TimeSeries of EarningsEstimate objects

Intuitively, there are no surprises here. Just as other collection types are used to model structural one-to-many relationships, time-series are used to model temporal one-to-many relationships. There is, however, one significant difference between structural and temporal one-to-many relationships. In most databases, there are usually more temporal one-to-many relationships than there are structural one-to-many relationships.

With the exception of a few truly immutable properties, such as a person's blood type or birth date, any property has the potential to be time-varying. In practice, which properties actually need to be made time-varying is a function of the information the database needs to capture and how that information will be used. That is a fact of life and a source of complexity that must be managed.

Encapsulating Complexity --How Time Affects A Program

An application can use time-varying data in a database in one of two general ways. It can directly examine a time-series to display its content, query its elements, compute a statistic, or perform some other operation that can be answered locally by the time-series. Adding an observation or generating a price chart are examples.

The other way an application can use a time-series is as part of a navigation from one object to another. In that role, the time-series is simply expected to supply whichever one of its elements is appropriate as of a particular point in time:

      ibm industrySeries asOf: aPointInTime

The navigational use of time-series, although apparently simpler, has far greater impact than any other use. Other uses are localized to a specific part of an application. Navigation, on the other hand, is fundamental to the operation of an object-oriented database.

Every time-varying property participates in a time-dependent navigation every time it is accessed on the way from one part of the database to another. If the data model specifies that the industry classification of a Company varies over time, it is incumbent on the application to supply a point in time when it wants a company's industry.

Although it can be done, it is unreasonable to require that every application supply an explicit extra parameter every time it traverses a time-varying property. Access to time-varying properties needs to be encapsulated in a way that simplifies the use of those properties from an application. In the case of this example, it is reasonable to define an industry method at Company that gets a value for aPointInTime from somewhere else:

      ibm industry

This encapsulation is useful for all the obvious programming reasons. It happens to be useful from a data modeling point of view as well. The designer of the database probably intended to assert that every Company has an Industry -- it just happens that the industry property of Company varies over time.

The set of time-varying properties is also subject to change as the database evolves. If industry changes from a time-invariant property to a time-varying property, the ability to write a method to encapsulate the navigation is not just nice to have -- it is an absolute necessity. If existing programs are to continue to work, the new type signature for the class Company has to be substitutable for the old signature. For example, industry has to continue to return an Industry object; it can't simply disappear or suddenly return a different kind of object like a time-series.

All of this may seem to belabor the obvious. Of course these methods can and should be written -- any competent programmer can see that all that is needed is some global location to which a method can refer to answer the when? question. The ability to create new properties and methods, is, after all, part of every object-oriented programming environment. The problem is not quite that simple.

Querying The Data Base -- The Problem Goes Global

There is a major problem here, and it comes from the use of global variables to pass the values of parameters. The objects used to record and supply the temporal context used to navigate the database are not properly part of the database -- they exist only to support its use. They also should not be globals. The use of globals to cache state and pass parameters has rightfully fallen into disfavor -- especially in environments like a database system that need to be multi-user and multi-threaded.

To illustrate some of the difficulties, consider answering the question:

    "In which of the last five years did the earnings per share of more than half the companies in the database exceed the previous year's industry average earnings per share?"

This is a query based on the overall state of the database, not the content of some specific time-series. To answer this question using just the capabilities of the programming language without architectural support from the database, an application programmer must manually iterate through the years of interest, running the query for each year in turn. More subtly, the need to answer the previous year sub-query embedded in this query requires that the temporal context change during the query.

The solution to this problem is not simply a matter of application programmer cleverness. This approach has pushed a substantial amount of responsibility back onto the application programmer. Even if all of this complexity can be encapsulated to look benign, it is, in fact, a cumbersome, non-reentrant, highly procedural way of doing things. In effect, this work-around has removed the problem from the database system -- where it can be optimized by designers who know how to do such things -- and hidden it in the bowels of an application's logic.

Solving The Problem -- Dimensionality Is Architectural

That time is special should not come as a surprise. The world is multi-dimensional and time is one of those dimensions. Different points in time select different states of the world. A database dealing with time-varying data needs to do the same.

Technically, time is a parameter. Unlike the properties of a class, which parameterize instances of that class, time parameterizes the entire database. The programming tools suitable for parameterizing a class are not sufficient to parameterize the entire database. When an application supplies a value -- or set of values -- for time, it is selecting a collection of states from the database, not a set of instances from a class in the database.

Although the precise interaction of time with the data model is managed in the small at the level of the data model's structure, time itself exists in the large -- globally and outside the context of any particular part of the database. Supporting time in a database -- or any other system that models the world -- requires architectural support to parameterize the model as a whole, not just the component parts of the model.

Time is not the only property that parameterizes whole models. Any property that depends on the perspective of the user places exactly the same demands on the system. For example, in a financial system that deals with multi-currency data, Currency works the same way. In the small, the data model specifies the rules for accessing and using monetary values consistently; in the large, the perspective of the user determines how those rules get used. Other properties such as Scenario and Version play similar roles in other applications.