External Database Access and Investment Databases

Overview

The Vision database provides access to a variety of database sources that are updated by an external vendor on a regular basis. These databases are linked to specific entities such as the IBM security object or the Nestles company object during an update process known as reconciliation. The reconciliation process addresses any adjustments that are local to the specific external database so that access to multiple databases can appear as homogeneous as possible. Cusip changes, split adjustments, and fiscal year management are all addressed in a uniform manner so that arbitrary decisions defined by the individual database vendors become irrelevant.

Each external database source supplies one or more tables of information usually related to a specific entity type such as Company or Security. Each table of information can contain multiple records for a specific entity, representing data as of different points in time. Access to this data is provided via a set of messages defined to link a specific entity instance with its corresponding data in the external source.

A specific data source may send incremental updates for each new period of time or may resend many periods of data with each update. The former approach is referred to as an Incremental or Append type of update; the latter is known as a Full Refresh or Replacement type of update. The individual records in one of these tables are often referred to as snapshots since they represent a set of data values for a specific entity as of a specific point in time.


The General Model

Each table of information supplied by an external data source is mapped to its own Vision class, which is a subclass of the DataSet class. An additional class is created for each data source to manage the links between core entity instances such as Company objects and the specific tables associated with the data source. Therefore, each data source minimally involves two classes.

Unlike the core entity classes, you almost never access the external database classes directly. Instead, you access the specific records you need via messages that link the core entity instances to the specific records with which they are associated. These links are established as part of the reconciliation process. Note that the external database classes are accessed directly during reconciliation.

Assume the XYZ database supplies one table of information that relates to company instances. This data source provides an incremental update once a month. The table supplies twenty pieces of information (e.g., item1, item2, ... item20) as well as a localId code and a date. Each record will therefore contain values for each of these data items.

To support this database, two new classes are defined. The class snapshot would refer to the table of information that is updated monthly by the vendor. The class LocalEntity would refer to the class that manages the links between Company instances and the instances of XYZ snapshot. Classes such as snapshot are often referred to as data sets and instances of these classes are referred to as data records.

Each company in the XYZ database will have an instance in the XYZ LocalEntity class. These instances are created and maintained during the reconciliation process. Initially, there will be a one-to-one correspondence between the instances of LocalEntity and the instances of snapshot if we assume that the initial update provides a single month's information. During subsequent updates, new instances will be created in the LocalEntity class as new companies are added to the XYZ database. One or more snapshot records will be associated with each LocalEntity instance, each representing a specific point in time.

To establish the connection between specific Company instances and their corresponding instances in the XYZ LocalEntity class, the fixed property XYZData is defined at the class Company. The value of this property is the actual instance in the XYZ LocalEntity class that represents the data for the specific company. The expression:

  Named Company IBM XYZData
will therefore return the instance in the XYZ LocalEntity class that represents IBM's data. This is illustrated pictorially below:
  Company                                 XYZ LocalEntity
  ================================        ==============================
  | ticker | name | ... | XYZData|        |  id       | ...            |
  |________|______|_____|________|        |____________________________|
  |  GM    |      |     |        |        |   id1     |                |
  |  IBM   |      |     |  ======|===|    |   id2     |                |
  |  XON   |      |     |        |   |    |   id3     |                |
  |  . . .                       |   |====|=> idIBM   |                |
  |______________________________|        | . . .                      |
                                          |____________________________|
The XYZData property establishes a permanent relationship between the two instances. As a result, the user is sheltered from any id changes in either the Company or the XYZ LocalEntity classes.

Because the value of this property is the actual instance in the LocalEntity class, it responds to the messages defined for that class. In this example, the expression:

  Named Company IBM XYZData id
returns the string value idIBM. Since the message showMessages lists all the messages defined for the recipient object's class, the expression:
  Named Company IBM XYZData showMessages
or
  Named Company IBM XYZData showMessagesX
displays all the messages defined for the XYZ LocalEntity class.

It is likely that the XYZ database will not supply information for every Company instance. To keep access to the database uniform, any company not having actual XYZ data will have the default instance of the XYZ LocalEntity class assigned to its XYZData property. Every company will therefore return an instance of the XYZ LocalEntity class in response to the message XYZData. The messages isDefault and isntDefault are defined for all objects to determine if the value is the default instance of the class. For example, assume that the XYZ database does not supply information for GM. The expression:

  Named Company GM XYZData isDefault
returns the value TRUE, the expression:
  Named Company IBM XYZData isDefault
returns the value FALSE, and the expression:
  Company masterList select: [ XYZData isntDefault ]
returns the list of Company instances that have real XYZ data.

It is also possible that XYZ will provide data for companies that do not have a corresponding Company instance. In this case, the information will be tracked in the XYZ LocalEntity class and not linked to a specific Company instance. If the company is created as an instance in the future, the link can be established at that time and any XYZ data history collected will be automatically accessible.

To establish the connection between specific LocalEntity instances and the corresponding instances in the snapshot class, the time series property history is defined at the class LocalEntity. The values of this property are the actual records in the snapshot class that correspond to the data for the instance as of a specific point in time. The expression:

  Named Company IBM XYZData history
returns the instance in the snapshot class that represents IBM's most recent data in the XYZ database and the expression:
  Named Company IBM XYZData :history asOf: 9211 .
or
  9211 evaluate:
    [ Named Company IBM XYZData history ]
returns the instance in the snapshot class that represents IBM's November 1992 data in the XYZ database. This is illustrated pictorially below:
  LocalEntity                           snapshot
  ==============================       ================================
  | id      | ... | history    |       | localId   | date | item1 | ...
  |---------|-----|------------|       |-----------|------|-------|-----
  | id1     |     | 9211-------|-------|--> id1    | 9211 |
  |         |     | 9212-------|-----  |    id2    | 9211 |
  |         |     | 9301-------|--  |  |    id3    | 9211 |
  | . . .   |     |            | |  |--|--> id1    | 9212 |
  |               |            | |     |    id2    | 9212 |
  |               |            | |     |    id3    | 9212 |
  |               |            | |-----|--> id1    | 9301 |
  |               |            |       |           |      |
  |_______________|____________|       |___________|______|_____________
The history property establishes a permanent relationship between the LocalEntity instance and the set of snapshot records associated with that instance over time.

Because the values in the history property are actual instances in the snapshot class, they respond to the messages defined for that class. The expression:

  Named Company IBM XYZData history item1
returns the value of item1. Since the message displayMessages lists all the messages defined for the recipient object's class, the expression:
  Named Company IBM XYZData history displayMessages
or
  Named Company IBM XYZData history displayMessagesX
displays all the messages defined for the snapshot class. To determine the date of the record retrieved, use the date message:
  Named Company IBM XYZData history date
Continuing with the example introduced above, the expressions:
  Named Company IBM XYZData :history asOf: 9211 . item1
or
  9211 evaluate:
    [ Named Company IBM XYZData history item1 ]
can be used to access the value of XYZ item as of November 1992.

Because the expression XYZData history will be used to access most of the useful data in this database, a cover method is normally defined to simplify this access. For example, the method xyz could be defined at the class Company to execute the expression XYZData history. The previous expression would then be rewritten as:

  Named Company IBM :xyz asOf: 9211 . item1
or
  9211 evaluate:
    [ Named Company IBM xyz item1 ]
The default instance for the snapshot class is stored in the history time series as of earliestPossibleDate (i.e., 1/1/1). The default entry is stored in the time series so that you do not get Selector Not Found errors if you access data from a date prior to the first real date. If this value were not stored and the first real date were November 1992, the expression:
  9206 evaluate:
    [
      Named Company IBM XYZData history
       do: [ date print ;
            item1 print ;
            item2 print ;
            item3 printNL ;
          ] ;
    ] ;
would display the errors:
  >>> Selector 'item1' Not Found <<<
  >>> Selector 'item2' Not Found <<<
  >>> Selector 'item3' Not Found <<< 
because the history message would return the value NA. All LocalEntity instances will minimally have this single default record in the time series. Therefore, if you ask for a company that does not have any data or a company that does not have data as far back as you have requested, you will not get the Selector Not Found errors.

The expression:

  Company masterList select: [ XYZData history isntDefault ] .
selects the companies that have real snapshot records.

Since the message xyz translates into XYZData history, the expression:

  Company masterList select: [ xyz isntDefault ] .
is identical to
  Company masterList select: [ XYZData history isntDefault ] .
To select those companies that have some XYZ data but do not have any data on or before December 1992, use:
  92 evaluate:
     [ Company masterList
         select: [ XYZData isntDefault && XYZData history isDefault ]
     ]
or
  Company masterList
      select: [ XYZData isntDefault && :xyz asOf: 92 . isDefault ]

Database Availability and Access

The class DataSource has been defined to track information about the external databases maintained in Vision and the data sets that they provide. A summary report listing the current data sources with their latest and earliest data dates can be displayed using the expression DataSource displayUpdateStatus.

For example, when executed on December 21, 1993, the following output may have been generated:

  Vision Database Update Status
         As Of:  12/21/1993


  Source                 Latest        Earliest

  EE                  12/ 2/1993     10/14/1993
  IbesIntl            12/16/1993      9/30/1993
  MSCI                10/31/1993      7/31/1993
  WS                  11/30/1993      9/30/1993
The message profile produces a detailed report showing data source characteristics such as its update frequency, the entity class for which data is provided, and its data sets and their characteristics. For example:
  Named DataSource IbesIntl profile
produces a report of the form:
  Data Source: IbesIntl
  Object Space:  25   Environment DB IbesIntl
  Entity Type:        Company
  Access Via:         Company IbesData
  Frequency:          7 Days
  Last Processed:     12/7/1993
  Last Update As Of:  12/2/1993
  Previous Update:    11/24/1993
  Available For:      9/30/1993 - 12/2/1993

  Data Source Includes the Following Data Sets:
  Data Set:           annEst
  Update Mode:        Append
  Access Via:         Company IbesData annEstHistory
  Duration:           12 MonthEnds
  First Available:    11/30/1993

  Data Set:           qtrEst
  Update Mode:        Append
  Access Via:         Company IbesData qtrEstHistory
  Duration:            3 MonthEnds
  First Available:    11/30/1993

  Data Set:           snapshot
  Update Mode:        Append
  Access Via:         Company IbesData history
  Duration:            7 Days
  First Available:    11/30/1993

  Data Set:           static
  Update Mode:        Replace
  Duration:          NA
  First Available:    11/30/1993
This profile displays general information about the data source at the top. The remainder of the report then profiles each data set associated with the data source. Any data set profile that includes the line Access Via can be accessed via the local entity instance.

Object Space indicates the environment in which the specialized messages designed to reconcile this data source to the core entity data are stored. Entity Type refers to the default instance of the class associated with the data source. Frequency indicates the expected frequency at which the data source will update. The Last Processed date refers to the actual date that the last update was installed and available in Vision. The Last Updated As Of and Previous Update dates refer to the as of dates associated with the most recent two updates of the data. The Available For range indicates the first and last periods of data availablility.

The Access Via expression in the top portion of the profile is the access path that can be used to navigate from a core entity to a local entity for this data source. In most cases, the message name is the data source name with the suffix Data. In the case of the data source IbesIntl the access name from Company is IbesData not IbesIntlData. This allows access to US and international Ibes data to look homogeneous from the company class. For example, to access the Ibes local entity associated with the company Nestles', you would use the expression:

  Named Company \461669 IbesData
A partial list of messages defined at the class DataSource is shown below:

MessageDescription
dataSetListlist of data sets for source
earliestUpdateDateasof date of first update
entityTypereturns underlying Entity class
lastProcessDateactual date last update was run
localEntityreturns LocalEntity class
mostRecentUpdateDateasof date of latest update
objectSpacereturns object space environment
previousUpdateDateasof date of prior update
profilereport method
updateFrequencyexpected update frequency

The message earliestUpdateDate refers to the initial update that was processed in the Vision database. Data sets which provide only a single record per update will not have any history prior to this date. Data sets which supply several periods of data with each update, on the other hand, may have history prior to this date. The mostRecentUpdateDate is the as of date for the latest update. The lastProcessDate represents the actual date that the last update was installed and available in Vision. For example, the expression:

  Named DataSource IbesIntl
     do: [ earliestUpdateDate print: 15 ;
           mostRecentUpdateDate print: 15 ;
           lastProcessDate printNL ;
         ]
might display:
  9/30/1993     12/ 2/1993     12/7/1993
which indicates that the database was last updated on Dec 7 with data as of Dec 2. The first data available was as of Sept 30.

The message updateFrequency returns the expected frequency at which the data source updates. For example, the expression:

  Named DataSource IbesIntl updateFrequency
returns the value 7 days indicating that the data should be updated once a week. Note that the frequency for the data source indicates the expected update cycle for refreshing or adding to each data set. In general, updates will occur using this schedule, but an occasional update may be skipped. The previous update date can be used to determine the current update schedule.

The message entityType returns the default instance of the class associated with this DataSource. For example, the expression:

  Named DataSource MSCI entityType
returns the default Security and the expression:
  Named DataSource WS entityType
returns the default Company. The message localEntity returns the default instance of the local entity class associated with this data source. The expression:
  Named DataSource WS localEntity instanceList
returns the list of all local entities that have been installed with the WorldScope database.

The profile report for a data source includes a profile of each data set associated with the data source. For example, the DataSource profile displayed above for IbesIntl included the section:

  Data Set:           snapshot
  Update Mode:        Append
  Access Via:         Company IbesData history
  Duration:            7 Days
  First Available:    11/30/1993
If the Update Mode is Append this data set is used to append new points to a time series defined at the local entity for the data source. If the Update Mode is Replace this data set is used for local reconciliation purposes and is not accessed via the local entity. The Access Via expression is the access path that can be used to navigate from the local entity to a specific record in this data set. The Duration is used to limit access to old data. First Available indicates the first date that this data source was updated.

Recall that time series properties are defined at each data source's local entity class to link a data set's instances to the local entity. Any data set profile that includes the line Access Via: can be accessed via the local entity instance. For example, the expression:

  Named Company \461669 IbesData history
is used to access ibes snapshot records. To access all snapshots over time, use the expression:
  Named Company \461669 IbesData :history
  do: [ #---
      ] ;
Although history is typically the name of the time series that links snapshot records, many data sources provide several data sets, each linked by its own time series property. For example, in the IbesIntl profile there are three data sets that include the Access Via line:
  Data Set:           annEst
  Access Via:         Company IbesData annEstHistory
  Data Set:           qtrEst
  Access Via:         Company IbesData qtrEstHistory
  Data Set:           snapshot
  Access Via:         Company IbesData history
This indicates that access to the IbesIntl data sets from the local entity is via the time series messages annEstHistory, qtrEstHistory, and history.

The message dataSetList at DataSource returns the list of DataSet classes that are provided by a given data source. The message profile defined at DataSet produces a report showing characteristics of the data set.

The following is a partial list of messages defined for the DataSet class:

MessageDescription
accessNamelocal entity property that links data records
classNamename of data set class
dataSourcereturns DataSource instance
durationused to limit access to old data
firstAvailableDatedate record first available in Vision
localIdid of local entity to which record is linked
profilereport method
updateModeindicates type of update for data records
updateModeIsAppendmethod (returns boolean)
updateModeIsReplacemethod (returns boolean)

The expression:

  Named DataSource MSCI dataSetList
   select: [ updateModeIsAppend ] .
       do: [ className print: 15; accessName printNL ];
lists the name of each data set in the MSCI database and the property that links the records in that data set to MSCI LocalEntity instances.


Database Coverage

LocalEntity instances for each data source are linked to Entity instances such as Company or Security. The message globalEntry defined at the LocalEntity classes returns the actual Entity instance. Cover methods such as company or security are also defined to access the Entity instance. The expression:
  Named Security \461669 MSCIData globalEntry
returns:
  Named Security \461669
and the expression:
  Named Security 461669 MSCIData globalEntry MSCIData
returns:
  Named Security \461669 MSCIData
The following are frequently used messages defined for the LocalEntity class in each data source:

MessageDescription
dataSourcereturns DataSource instance
displayFullInfomethod
displayInfomethod
globalEntryreturns Entity instance
hasBeenUpdatedreturns boolean
hasntBeenUpdatedreturns boolean
idvendor identifier
isReconciledGloballyreturns boolean
isntReconciledGloballyreturns boolean
lastUpdateDatereturns date

Although each data source is updated according to its scheduled frequency, not all entities that were previously updated will have data in the current update. The messages hasBeenUpdated and hasntBeenUpdated defined at the LocalEntity classes indicate the update status of a given instance as of its data source's latest update.

For example, to find the securities which have actual Ibes data but have not been updated by the latest update, use the following:

  Security masterList
     select: [ IbesData isntDefault &&
               IbesData hasntBeenUpdated
             ] .
Substituting MSCIData in the above expression will return the securities which do not have current MSCI data.

To display the last update date for Ibes data in securities which have not been updated by the latest Ibes update, use:

  Security masterList
    select: [ IbesData isntDefault &&
              IbesData hasntBeenUpdated
            ] .
    sortUp: [ id ] .
        do: [ IbesData lastUpdateDate print: 12;
              displayInfo;
            ];
Each data source may also provide data for entities that are not currently in the Company or Security universe. This data is still maintained at the LocalEntity class for each data source, but the local entity is not linked to a global entity (e.g., Company or Security instance). The messages isReconciledGlobally and isntReconciledGlobally have been defined to flag this state.

For example, to find the local entities in the IbesIntl database that are not globally reconciled, use:

  Named DataSource IbesIntl localEntity masterList
    select: [ isntReconciledGlobally ] .
or
  Security IbesData masterList
    select: [ isntReconciledGlobally ] .
The method displayGlobalExceptionReport defined for DataSource instances lists the local entities in the data source that are not globally reconciled. To produce this report for IbesIntl, use the expression:
  Named DataSource IbesIntl displayGlobalExceptionReport
The local IbesIntl entities that are globally reconciled can be accessed by the expression:
  Named DataSource IbesIntl localEntity masterList
    select: [ isReconciledGlobally ] .
Since globally reconciled local entities are linked to actual Company instances, the following expression returns the Company instances which have actual Ibes data:
  Named DataSource IbesIntl localEntity masterList
    select: [ isReconciledGlobally ] .
    send: [ company ] .
which is equivalent to:
  Company masterList select: [ IbesData isntDefault ] .
Besides tracking general update information at the data source level, such as frequency and update dates, individual update information is also maintained for each data set. The following lists additional messages defined for the instances of DataSet classes:

MessageDescription
daterecord data date
processDateactual date record was processed
refreshCountnumber of times record was refreshed
updateDatesource update date which provided record

These properties, as well as those described earlier in this section, are often used in various status reports and are grouped into useful formats by the following methods defined at DataSet classes:

displayCharacteristics
- displays className, updateMode and duration in a single line

displayFullInfo
- displays localId, refreshCount, firstAvailableDate, processDate, updateDate and some data values in a single line

profile
- displays the data set name, updateMode, duration, LocalEntity property it updates, and the date the data set was first made available for use in a multi-line format.

Related Topics