External Database Access and Investment Databases
Overview
The Vision database provides access to a variety of database sources that are updated by an external vendor on a regular basis. These databases are linked to specific entities such as the IBM security object or the Nestles company object during an update process known as reconciliation. The reconciliation process addresses any adjustments that are local to the specific external database so that access to multiple databases can appear as homogeneous as possible. Cusip changes, split adjustments, and fiscal year management are all addressed in a uniform manner so that arbitrary decisions defined by the individual database vendors become irrelevant.Each external database source supplies one or more tables of information usually related to a specific entity type such as Company or Security. Each table of information can contain multiple records for a specific entity, representing data as of different points in time. Access to this data is provided via a set of messages defined to link a specific entity instance with its corresponding data in the external source.
A specific data source may send incremental updates for each new period of time or may resend many periods of data with each update. The former approach is referred to as an Incremental or Append type of update; the latter is known as a Full Refresh or Replacement type of update. The individual records in one of these tables are often referred to as snapshots since they represent a set of data values for a specific entity as of a specific point in time.
The General Model
Each table of information supplied by an external data source is mapped to its own Vision class, which is a subclass of the DataSet class. An additional class is created for each data source to manage the links between core entity instances such as Company objects and the specific tables associated with the data source. Therefore, each data source minimally involves two classes.Unlike the core entity classes, you almost never access the external database classes directly. Instead, you access the specific records you need via messages that link the core entity instances to the specific records with which they are associated. These links are established as part of the reconciliation process. Note that the external database classes are accessed directly during reconciliation.
Assume the XYZ database supplies one table of information that relates to company instances. This data source provides an incremental update once a month. The table supplies twenty pieces of information (e.g., item1, item2, ... item20) as well as a localId code and a date. Each record will therefore contain values for each of these data items.
To support this database, two new classes are defined. The class snapshot would refer to the table of information that is updated monthly by the vendor. The class LocalEntity would refer to the class that manages the links between Company instances and the instances of XYZ snapshot. Classes such as snapshot are often referred to as data sets and instances of these classes are referred to as data records.
Each company in the XYZ database will have an instance in the XYZ LocalEntity class. These instances are created and maintained during the reconciliation process. Initially, there will be a one-to-one correspondence between the instances of LocalEntity and the instances of snapshot if we assume that the initial update provides a single month's information. During subsequent updates, new instances will be created in the LocalEntity class as new companies are added to the XYZ database. One or more snapshot records will be associated with each LocalEntity instance, each representing a specific point in time.
To establish the connection between specific Company instances and their corresponding instances in the XYZ LocalEntity class, the fixed property XYZData is defined at the class Company. The value of this property is the actual instance in the XYZ LocalEntity class that represents the data for the specific company. The expression:
Named Company IBM XYZDatawill therefore return the instance in the XYZ LocalEntity class that represents IBM's data. This is illustrated pictorially below:
Company XYZ LocalEntity ================================ ============================== | ticker | name | ... | XYZData| | id | ... | |________|______|_____|________| |____________________________| | GM | | | | | id1 | | | IBM | | | ======|===| | id2 | | | XON | | | | | | id3 | | | . . . | |====|=> idIBM | | |______________________________| | . . . | |____________________________|The XYZData property establishes a permanent relationship between the two instances. As a result, the user is sheltered from any id changes in either the Company or the XYZ LocalEntity classes.
Because the value of this property is the actual instance in the LocalEntity class, it responds to the messages defined for that class. In this example, the expression:
Named Company IBM XYZData idreturns the string value idIBM. Since the message showMessages lists all the messages defined for the recipient object's class, the expression:
Named Company IBM XYZData showMessagesor
Named Company IBM XYZData showMessagesXdisplays all the messages defined for the XYZ LocalEntity class.
It is likely that the XYZ database will not supply information for every Company instance. To keep access to the database uniform, any company not having actual XYZ data will have the default instance of the XYZ LocalEntity class assigned to its XYZData property. Every company will therefore return an instance of the XYZ LocalEntity class in response to the message XYZData. The messages isDefault and isntDefault are defined for all objects to determine if the value is the default instance of the class. For example, assume that the XYZ database does not supply information for GM. The expression:
Named Company GM XYZData isDefaultreturns the value TRUE, the expression:
Named Company IBM XYZData isDefaultreturns the value FALSE, and the expression:
Company masterList select: [ XYZData isntDefault ]returns the list of Company instances that have real XYZ data.
It is also possible that XYZ will provide data for companies that do not have a corresponding Company instance. In this case, the information will be tracked in the XYZ LocalEntity class and not linked to a specific Company instance. If the company is created as an instance in the future, the link can be established at that time and any XYZ data history collected will be automatically accessible.
To establish the connection between specific LocalEntity instances and the corresponding instances in the snapshot class, the time series property history is defined at the class LocalEntity. The values of this property are the actual records in the snapshot class that correspond to the data for the instance as of a specific point in time. The expression:
Named Company IBM XYZData historyreturns the instance in the snapshot class that represents IBM's most recent data in the XYZ database and the expression:
Named Company IBM XYZData :history asOf: 9211 .or
9211 evaluate: [ Named Company IBM XYZData history ]returns the instance in the snapshot class that represents IBM's November 1992 data in the XYZ database. This is illustrated pictorially below:
LocalEntity snapshot ============================== ================================ | id | ... | history | | localId | date | item1 | ... |---------|-----|------------| |-----------|------|-------|----- | id1 | | 9211-------|-------|--> id1 | 9211 | | | | 9212-------|----- | id2 | 9211 | | | | 9301-------|-- | | id3 | 9211 | | . . . | | | | |--|--> id1 | 9212 | | | | | | id2 | 9212 | | | | | | id3 | 9212 | | | | |-----|--> id1 | 9301 | | | | | | | |_______________|____________| |___________|______|_____________The history property establishes a permanent relationship between the LocalEntity instance and the set of snapshot records associated with that instance over time.
Because the values in the history property are actual instances in the snapshot class, they respond to the messages defined for that class. The expression:
Named Company IBM XYZData history item1returns the value of item1. Since the message displayMessages lists all the messages defined for the recipient object's class, the expression:
Named Company IBM XYZData history displayMessagesor
Named Company IBM XYZData history displayMessagesXdisplays all the messages defined for the snapshot class. To determine the date of the record retrieved, use the date message:
Named Company IBM XYZData history dateContinuing with the example introduced above, the expressions:
Named Company IBM XYZData :history asOf: 9211 . item1or
9211 evaluate: [ Named Company IBM XYZData history item1 ]can be used to access the value of XYZ item as of November 1992.
Because the expression XYZData history will be used to access most of the useful data in this database, a cover method is normally defined to simplify this access. For example, the method xyz could be defined at the class Company to execute the expression XYZData history. The previous expression would then be rewritten as:
Named Company IBM :xyz asOf: 9211 . item1or
9211 evaluate: [ Named Company IBM xyz item1 ]The default instance for the snapshot class is stored in the history time series as of earliestPossibleDate (i.e., 1/1/1). The default entry is stored in the time series so that you do not get Selector Not Found errors if you access data from a date prior to the first real date. If this value were not stored and the first real date were November 1992, the expression:
9206 evaluate: [ Named Company IBM XYZData history do: [ date print ; item1 print ; item2 print ; item3 printNL ; ] ; ] ;would display the errors:
>>> Selector 'item1' Not Found <<< >>> Selector 'item2' Not Found <<< >>> Selector 'item3' Not Found <<<because the history message would return the value NA. All LocalEntity instances will minimally have this single default record in the time series. Therefore, if you ask for a company that does not have any data or a company that does not have data as far back as you have requested, you will not get the Selector Not Found errors.
The expression:
Company masterList select: [ XYZData history isntDefault ] .selects the companies that have real snapshot records.
Since the message xyz translates into XYZData history, the expression:
Company masterList select: [ xyz isntDefault ] .is identical to
Company masterList select: [ XYZData history isntDefault ] .To select those companies that have some XYZ data but do not have any data on or before December 1992, use:
92 evaluate: [ Company masterList select: [ XYZData isntDefault && XYZData history isDefault ] ]or
Company masterList select: [ XYZData isntDefault && :xyz asOf: 92 . isDefault ]
Database Availability and Access
The class DataSource has been defined to track information about the external databases maintained in Vision and the data sets that they provide. A summary report listing the current data sources with their latest and earliest data dates can be displayed using the expression DataSource displayUpdateStatus.For example, when executed on December 21, 1993, the following output may have been generated:
Vision Database Update Status As Of: 12/21/1993 Source Latest Earliest EE 12/ 2/1993 10/14/1993 IbesIntl 12/16/1993 9/30/1993 MSCI 10/31/1993 7/31/1993 WS 11/30/1993 9/30/1993The message profile produces a detailed report showing data source characteristics such as its update frequency, the entity class for which data is provided, and its data sets and their characteristics. For example:
Named DataSource IbesIntl profileproduces a report of the form:
Data Source: IbesIntl Object Space: 25 Environment DB IbesIntl Entity Type: Company Access Via: Company IbesData Frequency: 7 Days Last Processed: 12/7/1993 Last Update As Of: 12/2/1993 Previous Update: 11/24/1993 Available For: 9/30/1993 - 12/2/1993 Data Source Includes the Following Data Sets: Data Set: annEst Update Mode: Append Access Via: Company IbesData annEstHistory Duration: 12 MonthEnds First Available: 11/30/1993 Data Set: qtrEst Update Mode: Append Access Via: Company IbesData qtrEstHistory Duration: 3 MonthEnds First Available: 11/30/1993 Data Set: snapshot Update Mode: Append Access Via: Company IbesData history Duration: 7 Days First Available: 11/30/1993 Data Set: static Update Mode: Replace Duration: NA First Available: 11/30/1993This profile displays general information about the data source at the top. The remainder of the report then profiles each data set associated with the data source. Any data set profile that includes the line Access Via can be accessed via the local entity instance.
Object Space indicates the environment in which the specialized messages designed to reconcile this data source to the core entity data are stored. Entity Type refers to the default instance of the class associated with the data source. Frequency indicates the expected frequency at which the data source will update. The Last Processed date refers to the actual date that the last update was installed and available in Vision. The Last Updated As Of and Previous Update dates refer to the as of dates associated with the most recent two updates of the data. The Available For range indicates the first and last periods of data availablility.
The Access Via expression in the top portion of the profile is the access path that can be used to navigate from a core entity to a local entity for this data source. In most cases, the message name is the data source name with the suffix Data. In the case of the data source IbesIntl the access name from Company is IbesData not IbesIntlData. This allows access to US and international Ibes data to look homogeneous from the company class. For example, to access the Ibes local entity associated with the company Nestles', you would use the expression:
Named Company \461669 IbesDataA partial list of messages defined at the class DataSource is shown below:
Message | Description |
dataSetList | list of data sets for source |
earliestUpdateDate | asof date of first update |
entityType | returns underlying Entity class |
lastProcessDate | actual date last update was run |
localEntity | returns LocalEntity class |
mostRecentUpdateDate | asof date of latest update |
objectSpace | returns object space environment |
previousUpdateDate | asof date of prior update |
profile | report method |
updateFrequency | expected update frequency |
The message earliestUpdateDate refers to the initial update that was processed in the Vision database. Data sets which provide only a single record per update will not have any history prior to this date. Data sets which supply several periods of data with each update, on the other hand, may have history prior to this date. The mostRecentUpdateDate is the as of date for the latest update. The lastProcessDate represents the actual date that the last update was installed and available in Vision. For example, the expression:
Named DataSource IbesIntl do: [ earliestUpdateDate print: 15 ; mostRecentUpdateDate print: 15 ; lastProcessDate printNL ; ]might display:
9/30/1993 12/ 2/1993 12/7/1993which indicates that the database was last updated on Dec 7 with data as of Dec 2. The first data available was as of Sept 30.
The message updateFrequency returns the expected frequency at which the data source updates. For example, the expression:
Named DataSource IbesIntl updateFrequencyreturns the value 7 days indicating that the data should be updated once a week. Note that the frequency for the data source indicates the expected update cycle for refreshing or adding to each data set. In general, updates will occur using this schedule, but an occasional update may be skipped. The previous update date can be used to determine the current update schedule.
The message entityType returns the default instance of the class associated with this DataSource. For example, the expression:
Named DataSource MSCI entityTypereturns the default Security and the expression:
Named DataSource WS entityTypereturns the default Company. The message localEntity returns the default instance of the local entity class associated with this data source. The expression:
Named DataSource WS localEntity instanceListreturns the list of all local entities that have been installed with the WorldScope database.
The profile report for a data source includes a profile of each data set associated with the data source. For example, the DataSource profile displayed above for IbesIntl included the section:
Data Set: snapshot Update Mode: Append Access Via: Company IbesData history Duration: 7 Days First Available: 11/30/1993If the Update Mode is Append this data set is used to append new points to a time series defined at the local entity for the data source. If the Update Mode is Replace this data set is used for local reconciliation purposes and is not accessed via the local entity. The Access Via expression is the access path that can be used to navigate from the local entity to a specific record in this data set. The Duration is used to limit access to old data. First Available indicates the first date that this data source was updated.
Recall that time series properties are defined at each data source's local entity class to link a data set's instances to the local entity. Any data set profile that includes the line Access Via: can be accessed via the local entity instance. For example, the expression:
Named Company \461669 IbesData historyis used to access ibes snapshot records. To access all snapshots over time, use the expression:
Named Company \461669 IbesData :history do: [ #--- ] ;Although history is typically the name of the time series that links snapshot records, many data sources provide several data sets, each linked by its own time series property. For example, in the IbesIntl profile there are three data sets that include the Access Via line:
Data Set: annEst Access Via: Company IbesData annEstHistory Data Set: qtrEst Access Via: Company IbesData qtrEstHistory Data Set: snapshot Access Via: Company IbesData historyThis indicates that access to the IbesIntl data sets from the local entity is via the time series messages annEstHistory, qtrEstHistory, and history.
The message dataSetList at DataSource returns the list of DataSet classes that are provided by a given data source. The message profile defined at DataSet produces a report showing characteristics of the data set.
The following is a partial list of messages defined for the DataSet class:
Message | Description |
accessName | local entity property that links data records |
className | name of data set class |
dataSource | returns DataSource instance |
duration | used to limit access to old data |
firstAvailableDate | date record first available in Vision |
localId | id of local entity to which record is linked |
profile | report method |
updateMode | indicates type of update for data records |
updateModeIsAppend | method (returns boolean) |
updateModeIsReplace | method (returns boolean) |
The expression:
Named DataSource MSCI dataSetList select: [ updateModeIsAppend ] . do: [ className print: 15; accessName printNL ];lists the name of each data set in the MSCI database and the property that links the records in that data set to MSCI LocalEntity instances.
Database Coverage
LocalEntity instances for each data source are linked to Entity instances such as Company or Security. The message globalEntry defined at the LocalEntity classes returns the actual Entity instance. Cover methods such as company or security are also defined to access the Entity instance. The expression:Named Security \461669 MSCIData globalEntryreturns:
Named Security \461669and the expression:
Named Security 461669 MSCIData globalEntry MSCIDatareturns:
Named Security \461669 MSCIDataThe following are frequently used messages defined for the LocalEntity class in each data source:
Message | Description |
dataSource | returns DataSource instance |
displayFullInfo | method |
displayInfo | method |
globalEntry | returns Entity instance |
hasBeenUpdated | returns boolean |
hasntBeenUpdated | returns boolean |
id | vendor identifier |
isReconciledGlobally | returns boolean |
isntReconciledGlobally | returns boolean |
lastUpdateDate | returns date |
Although each data source is updated according to its scheduled frequency, not all entities that were previously updated will have data in the current update. The messages hasBeenUpdated and hasntBeenUpdated defined at the LocalEntity classes indicate the update status of a given instance as of its data source's latest update.
For example, to find the securities which have actual Ibes data but have not been updated by the latest update, use the following:
Security masterList select: [ IbesData isntDefault && IbesData hasntBeenUpdated ] .Substituting MSCIData in the above expression will return the securities which do not have current MSCI data.
To display the last update date for Ibes data in securities which have not been updated by the latest Ibes update, use:
Security masterList select: [ IbesData isntDefault && IbesData hasntBeenUpdated ] . sortUp: [ id ] . do: [ IbesData lastUpdateDate print: 12; displayInfo; ];Each data source may also provide data for entities that are not currently in the Company or Security universe. This data is still maintained at the LocalEntity class for each data source, but the local entity is not linked to a global entity (e.g., Company or Security instance). The messages isReconciledGlobally and isntReconciledGlobally have been defined to flag this state.
For example, to find the local entities in the IbesIntl database that are not globally reconciled, use:
Named DataSource IbesIntl localEntity masterList select: [ isntReconciledGlobally ] .or
Security IbesData masterList select: [ isntReconciledGlobally ] .The method displayGlobalExceptionReport defined for DataSource instances lists the local entities in the data source that are not globally reconciled. To produce this report for IbesIntl, use the expression:
Named DataSource IbesIntl displayGlobalExceptionReportThe local IbesIntl entities that are globally reconciled can be accessed by the expression:
Named DataSource IbesIntl localEntity masterList select: [ isReconciledGlobally ] .Since globally reconciled local entities are linked to actual Company instances, the following expression returns the Company instances which have actual Ibes data:
Named DataSource IbesIntl localEntity masterList select: [ isReconciledGlobally ] . send: [ company ] .which is equivalent to:
Company masterList select: [ IbesData isntDefault ] .Besides tracking general update information at the data source level, such as frequency and update dates, individual update information is also maintained for each data set. The following lists additional messages defined for the instances of DataSet classes:
Message | Description |
date | record data date |
processDate | actual date record was processed |
refreshCount | number of times record was refreshed |
updateDate | source update date which provided record |
These properties, as well as those described earlier in this section, are often used in various status reports and are grouped into useful formats by the following methods defined at DataSet classes:
- displayCharacteristics
- - displays className, updateMode and duration in a single line
- displayFullInfo
- - displays localId, refreshCount, firstAvailableDate, processDate, updateDate and some data values in a single line
- profile
- - displays the data set name, updateMode, duration, LocalEntity property it updates, and the date the data set was first made available for use in a multi-line format.