Vision Class: DataFeed

DataFeed Review

The DataFeed class is an abstract class that is used to organize the classes that translate data from sources external to Vision into Vision objects. An external feed corresponds to a flat, tabular structure. Several subclasses of DataFeed have been defined to encapsulate different ways to map data from external formats into Vision objects. You can get a full list of the feeds defined in your environment using:

     DataFeed showInheritance ;

A separate subclass is defined for each format of external data you wish to load into your Vision database. For example, subclasses of the MasterFeed class are defined to create and update instances of a specific Entity, such as Currency or Security. Subclasses of the EntityExtenderFeed class are defined to update a specific relationship between an Entity subclass and associated data, such as pricing data for a security.

At its simplest, a feed is a tab or vertical bar delimited string containing one or more columns of information for one or more rows. For example, to create and update Currency instances, you could use:

     CurrencyMaster updateFromString: 
     "entityId | name                 | shortName
      USD      | United States Dollar | US Dollar
      CAD      | Canadian Dollar      | CA Dollar
      GBP      | Great British Pound  | GB Pound
     " ;

This example loads data using the CurrencyMaster feed. This feed will create Currency instances for any entityId not already defined and will update the name and shortName properties for the three instances included.

The message updateFromString: can be sent to any DataFeed subclass to update data from the string supplied as a parameter. Alternatively, the message loadFromFile: can be sent to any DataFeed subclass to read the data from the file name supplied as a parameter.

The document Vision Class: DataFeed provides a detailed description of the DataFeed class. A number of specialized interfaces have been defined that package feeds for batch processing. This document provides additional advanced techniques for working with the feeds through the use of examples. If you need additional assistance implementing some of these techniques in your environment, contact your Insyte consultant.


Bridges and EstimateBridges

The subclasses of the Bridge class are used to manage the protocol for connecting an entity to one more DataRecord instances. It is useful when there are complex relationships that need to be managed and you do not want to overburden the entity class with all the required information. An instance of a Bridge subclass is normally referenced via the entity instance with which it is associated. The BatchFeedManager class supplies an interface for creating bridge subclasses.

Bridges and Vendor Data

External vendors often provide data about a specific entity. This data can be stored directly with the entity by establishing one or more properties that return the vendor information. When several data items and/or data records are associated with a single vendor, a naming convention is usually imposed to keep track of the different sources. This is a situation where a Bridge can be used to help organize the vendor-specific DataRecords:

For example, in investment applications fundamental data about companies can be supplied from a number of vendors. Each vendor supplies overlapping data items for multiple frequencies. In particular, Vendor X supplies sales and earnings values annually and quarterly and Vendor Y supplies sales and earnings values annually. The Bridge and DataRecord classes can be set up as follows:

     #--  Create the Bridges to Company
     Interface BatchFeedManager
          createBridgeClass: "VendorX" from: "Bridge" 
          linkedTo: "Company" via: "vendorX" asTS: "FALSE" ;
     Interface BatchFeedManager
          createBridgeClass: "VendorY" from: "Bridge" 
          linkedTo: "Company" via: "vendorY" asTS: "FALSE" ;

     #--  Create the DataRecord classes and attach to Bridge
     Interface BatchFeedManager
          createDataRecordClass: "VendorXAnnual" from: "DataRecord"
          linkedTo: "VendorX" via: "annual" asTS: "TRUE" ;
     Interface BatchFeedManager
          createDataRecordClass: "VendorXQuarter" from: "DataRecord"
          linkedTo: "VendorX" via: "quarter" asTS: "TRUE" ;
     Interface BatchFeedManager
          createDataRecordClass: "VendorYAnnual" from: "DataRecord"
          linkedTo: "VendorY" via: "annual" asTS: "TRUE" ;

     #--  Define properties for DataRecord classes
     PropertySetup updateFromString: 
     "classId        | property | dataType | 
      VendorXAnnual  | sales    | Double   | 
      VendorXAnnual  | eps      | Double   | 
      VendorXQuarter | sales    | Double   |   
      VendorXQuarter | eps      | Double   | 
      VendorYAnnual  | sales    | Double   | 
      VendorYAnnual  | eps      | Double   | 
      " ;

     #--  Load some data
     VendorXAnnualFeed updateFromString:
     "id | date | sales | eps
     ibm | 9512 | 1000  | 2.34
     ibm | 9612 | 2000  | 1.24
     ibm | 9712 | 3000  | 3.29
     " ;

    #--  Basic Access - latest ibm annual data from vendor X
    Named Company IBM vendorX annual
       do: [ date print: 15 ; sales print ; eps printNL ] 

    #--  Basic Access - ibm annual data from vendor X over time
    Named Company IBM vendorX :annual
       do: [ ^date print: 15 ; sales print ; eps printNL ] 

Bridges and Private Data

Another reason to use Bridge subclasses is to create structures that can be updated by private users. For example, suppose the Equity Research group wants to update analyst rating information for companies without requiring the intervention of the global database administrator. A separate object space can be created for this group. An ERCompany bridge class and one or more data record classes can be created in this space. Data associated with this bridge class can then be updated privately.

Assume that an object space named EquityResearch has been created for storing this private data. The Bridge and DataRecord classes can be set up as follows:

     #--  Create the Bridge to Company in the EquityResearch object space
     Interface BatchFeedManager
          setObjectSpaceTo: Environment DB EquityResearch ;
     Interface BatchFeedManager
          createBridgeClass: "ERCompany" from: "LocalEntity" 
          linkedTo: "Company" via: "erCompany" asTS: "Method" ;

     #--  Create the AnalystRating class and attach to Bridge
     Interface BatchFeedManager
          setObjectSpaceTo: Environment DB EquityResearch ;
     Interface BatchFeedManager
       createDataRecordClass: "ERAnalystRating" from: "DataRecord"
       linkedTo: "ERCompany" via: "analystRating" asTS: "TRUE" ;

     #--  Define properties for AnalystRating classes
     PropertySetup updateFromString: 
     "classId           | property | dataType
      ERAnalystRating   | rating | Integer
      ERAnalystRating   | comment | String
     " ;

     #--  Load some data  (performed by private user in private object space)
     ERAnalystRatingFeed updateFromString:
     "id | date | rating | comment
     ibm | 9901 | 3      | average
     ibm | 9902 | 2      | changed to good
     ibm | 9903 | 1      | changed to excellent
     " ;

    #--  Basic Access - latest ibm rating data
    Named Company IBM erCompany analystRating
       do: [ date print: 15 ; rating print; comment printNL ] ;

    #--  Basic Access - ibm rating data from over time
    Named Company IBM erCompany :annualRating
       do: [ ^date print: 15 ; rating print ; comment printNL ] 

Bridges, Private Data, and Memberships

Private bridges can be used in conjunction with MembershipFeeds as well. Recall that a MembershipFeed is used to update and cross reference one-to-many and many-to-many relationships between two Entity instances over time. For example, one or more companies can be members of the same industry. Over time, a company can move from one industry to another. In this case, instances of the entity class Company represent members and instances of the entity class Industry represent groups.

Since the Company and Industry classes are shared, a private user cannot update the properties that track this relationship if they are defined directly at the entity classes. To circumvent this situation, the cross-referencing properties can be defined at a bridge class that can be updated by the private users.

Assume that an object space named MyData has been created for storing this private data and the bridge classes. The membership relationship can be set up as follows:

     #--  Create bridge classes
     Interface BatchFeedManager
        setObjectSpaceTo: Environment DB MyData .
          createBridgeClass: "MyCompany" from: "LocalEntity" 
          linkedTo: "Company" via: "myCompany" asTS: "Method" ;
     
     Interface BatchFeedManager
        setObjectSpaceTo: Environment DB MyData .
          createBridgeClass: "MyIndustry" from: "LocalEntity" 
          linkedTo: "Industry" via: "myIndustry" asTS: "Method" ;
     
     #---  Explicitly define property and xref property at Bridges
     MyCompany define: 'industry' withDefault: Industry ;
     MyIndustry define: 'companyList' withDefault: IndexedList new ;
     
     #--  Make sure companyList default is stored in correct object space
     MyIndustry companyList 
     establishResidenceInSpaceOf: Environment DB MySpace ;

     #--  Enable Navigation
     MessageSetup updateFromString:
     "classId | message | tvFlag | returnType | containerType | description
     MyCompany | industry | N | Industry | |
     MyIndustry | companyList | N | Company | IndexedList |
     " ;

     #---  Setup MembershipFeed
     MembershipFeedSetup updateFromString: "
     feedId              | groupId  | groupPath   | memberId | memberPath
     MyCompanyToIndustry | Industry | companyList | Company  | industry
     " ;
     
     #---  Add a "redirection" via the bridge paths
     MyCompanyToIndustry setGroupBridgePathTo: "myIndustry" ;
     MyCompanyToIndustry setMemberBridgePathTo: "myCompany" ;
     
     
     #---  Load some Sample Data
     MyCompanyToIndustry updateFromString: "
     memberId | groupId 
     ibm | 530
     hwp | 530
     msft | 540
     orcl | 540
     gm | 210
     f | 210
     c | 210
     " ;
     
     #---  Sample Access: get ibm's industry
     Named Company IBM myCompany industry displayInfo ;
     
     #---  Sample Access: get industry 530's companyList
     Named Industry \530 myIndustry companyList
        do: [ displayInfo ] ;

EstimateBridges

External vendors provide data that can be estimated for multiple time periods. Subclasses of the EstimateBridge class can be used to manage the relationship between an entity and the estimate data. For example, suppose you want to create an estimate bridge for tracking consensus estimates provided by I/B/E/S. The EstimateBridge class and associated feed can be set up as follows:

     Interface BatchFeedManager
          createBridgeClass: "Ibes" from: "Bridge"
          linkedTo: "Company" via: "ibes" asTS: FALSE;

     #---  Annual Estimates
     Interface BatchFeedManager
       createEstimateBridgeClass: "IbesEpsABridge" from: "EstimateBridge"
       linkedTo: "Ibes" via: "epsEstA" 
       withEstimateRecordClass: "ConsensusEstimateRecord" 
       andFreq: 12 monthEnds ;

     #---  Quarterly Estimates
     Interface BatchFeedManager
       createEstimateBridgeClass: "IbesEpsQBridge" from: "EstimateBridge"
       linkedTo: "Ibes" via: "epsEstQ" 
       withEstimateRecordClass: "ConsensusEstimateRecord" 
       andFreq: 3 monthEnds ;

     #---  Sample Data
     IbesEpsABridgeFeed updateFromString:
    "id  | date   | periodEndDate | currency | mean | median
     ibm | 990615 | 98            | USD      | 6.15 | 6.23
     ibm | 990616 | 98            | USD      | 6.16 | 6.24
     gm  | 990616 | 98            | USD      | 61.6 | 62.3
     ibm | 990616 | 99            | USD      | 8.99 | 8.32
    " ; 

    #---  Sample Access:  all annual values for ibm for 98
    Named Company IBM ibes :epsEstA asOf: 98 .
       :observation do: [ ^date print: 15 ; mean printNL ] ;

    #--  Show last observation for each epsEstA for ibm
    Named Company IBM ibes :epsEstA nonDefaults
    do: [ ^date print: 15 ; 
          lastObservation do: [ date print: 15; mean printNL ] ;
        ] ;

     #--  Show last and prior observation for each epsEstA for ibm
     Named Company IBM ibes :epsEstA nonDefaults
     do: [ ^date print: 15 ; 
           lastObservation, priorObservation
            do: [ " " print ; date print: 15; mean print ] ;
            newLine print ;
         ] ;

     #--  Show actual and last observation for each epsEstA for ibm
     Named Company IBM ibes :epsEstA nonDefaults
     do: [ ^date print: 15 ; 
           actualRecord, lastObservation
             do: [ " " print ; date print: 15; mean print ] ;
           newLine print ;
        ] ;

     #--  Get last 5 estimate made for ibm in 98
     Named Company IBM ibes :epsEstA asOf: 98 .
         :observation last: 5 .
    do: [ date print: 15 ; mean print ; median printNL ] ;

Private EstimateBridges

EstimateBridges can be created in separate object spaces to enable private updating as follows:

     Interface BatchFeedManager 
          setObjectSpaceTo: Environment DB EquityResearch ;
     Interface BatchFeedManager 
          createDataRecordClass: "ERAnalystEstimate" 
          from: "AnalystEstimateRecord"
          linkedTo: "" via: NA asTS: NA ;

     Interface BatchFeedManager 
          setObjectSpaceTo: Environment DB EquityResearch ;
     Interface BatchFeedManager
         createEstimateBridgeClass: "EpsABridge" from: "EstimateBridge" 
         linkedTo: "ERCompany" via: "epsEstA" 
         withEstimateRecordClass: "ERAnalystEstimate" andFreq: 12 monthEnds ;

     EpsABridgeFeed updateFromString:
     "id | date | periodEndDate | currency | estimate | actualFlag
      ibm | 990615 | 98 | USD | 6.15    |  N
      ibm | 990616 | 98 | USD | 6.16    |  N
      gm  | 990616 | 98 | USD | 61.6    |  N
      ibm | 990616 | 00 | USD | 99.616  | N
      " ; 

     Named Company IBM erCompany :epsEstA asOf: 98 . 
         :observation displayAll ;


     Named Company IBM erCompany :epsEstA
     do: [ ^date print: 15 ; isDefault print: 10 ; :observation count printNL ;
           :observation displayAll ;
           "=" fill: 50 . printNL ;
         ] ;


Customizing DataFeeds

The updateFromString: message defined at DataFeed is called directly or indirectly via the loadFromFile: message or the BatchFeedManager interface. This message clears the last set of instances created for the feed, creates a new instance in the feed class for each record in the feed, and runs the reconcile message which calls the following methods:

Method Function
initializeProcessing run any special initializations for each feed instance including the creation of any missing Entity, Bridge, and/or DataRecord instances, appropriate for the feed; assign the feed property underlyingRecord to return the structure to be updated by the feed instance
runUpdate use each feed instance to update the appropriate properties in the underlyingRecord
displayExceptions display any errors or other status information
runWrapup execute any special post-processing steps
runLocalWrapup execute any supplemental wrapup steps for built-in feeds
runUpdateStats update summary information about the feed such as timestamp of processing (lastUpdateTime) and number of instances processed (lastUpdateCount).

The displayExceptions message executes the following methods:

Method Function
displayExceptionSummary display summary counts
displayBadOnes display bad records supplied in feed
displayNewOnes display new core class instances created by feed
displayOtherExceptions display other information about the instances processed in the current feed update

Subclasses of DataFeed redefine any or all of these messages, as needed. The runLocalWrapup and the displayLocalExceptions are the methods that are most frequently modified by users that wish to enhance existing feeds. Any or all of these methods can be redefined by users creating their own feed subclasses.


Enable & Disable Messages

There are a number of messages that can be used to temporarily or permanently change the default behavior of a Data Feed. The messages need to be set prior to the load, for example:


    CurrencyMaster enableInternalIds loadFromFile: "currency.dat" ;

Depending on the message, the behavior can apply to the next load only or remain set until it is explicitly unset. Following is a list of messages that can be used in this manner and a description of their behavior and defaults:

Description Data Feed Default Flag Setting
enableCompanyChanges, disableCompanyChanges
enableCusipChanges, disableCusipChanges
enableSedolChanges, disableSedolChanges
Allow/prohibit posting company, cusip and sedol changes; changes made or skipped displayed in exception report. SecurityMaster enableCompanyChanges, enableCusipChanges, enableSedolChanges Resets to enable after each update.
enableDisplayNewOnes, disableDisplayNewOnes
Enables/disables the new ones report from displaying in the exception report. All data feeds For all data feeds other than EntityExtender, enableDisplayNewOnes is the default; disableDisplayNewOnes is the default for EntityExtenderFeeds Remains set until explicitly unset.
enableEntityCreation, disableEntityCreation
Disable message will prevent any new instances from being created but will update any existing instances MasterFeed enableEntityCreation Resets to enable after each update.
enableInternalIds, disableInternalIds
Enable message can be used to automatically generate a permanent, unique id for any instances in the underlying entity class MasterFeed disableInternalIds Remains set until explicitly unset.
enablePurge, disablePurge
Enable message purges all data from following feed MasterFeed (entities are flagged as inactive) and EntityExtenderFeed (DataRecords and points in time series properties are purged) disablePurge Resets to disable after each update.
enableSplitInversion, disableSplitInversion
Use enable message when feed file supplies inverted splits rates SplitsFeed disableSplitInversion Remains set until explicitly unset.
enableOnlyUpdateOnChange, disableOnlyUpdateOnChange
Allows updates to time series properties only if the value has changed. This works with EntityExtenderFeeds that are used to update fields that are time series properties, not feeds that update time series of records. This message also assumes there is one update per entity per day. Note: use inconsistent with this assumption can cause undesirable results. EntityExtenderFeeds disableOnlyUpdateOnChange Resets to disable after each update.


Advanced Examples Using DataFeeds

  1. Using the same DataRecord subclass with more than one entity or bridge:

    Example:

      tracking analyst data (e.g., rating, score, comment) over time for companies and industries

    Constraints:

      since the name of the property that tracks the entity->datarecord relationship is stored with the datarecord, you must use the same property name for each entity/bridge that relates to the datarecord

    Approach:

    Use the data record class creation interface to create a separate AnalystRating class linked to company and industry via the same property name (e.g., rating). This will generate two distinct feeds - the first will be named AnalystRatingFeed and the second will be named AnalystRating2ndClassNameFeed:

         #-- first one: create new AnalystRating DataRecord class,
         #--    defines rating t/s property at Company, and creates
         #--    AnalystRatingFeed to update Company-AnalystRating data
         Interface BatchFeedManager
            createDataRecordClass: "AnalystRating" from: "DataRecord"
            linkedTo: "Company" via: "rating" asTS: "TRUE" ;
    
         #-- second one: does not create AnalystRating DataRecord class,
         #--    defines rating t/s property at Industry, and creates
         #--    AnalystRatingIndustryFeed to update Industry-AnalystRating data
         Interface BatchFeedManager
            createDataRecordClass: "AnalystRating" from: "DataRecord"
            linkedTo: "Industry" via: "rating" asTS: "TRUE" ;
    

  2. Create EntityExtenderFeed subclass that can be used as a superclass of several entity extender feeds:

    Example:

      add special processing rule for analyst rating that applies to the company and industry analyst rating feeds but no other feeds

    Approach:

        #----------
        #  Create the new class with a generic feed class (AnalystRatingFeed).
        #  Defer the linkage to specific subclasses as needed.
        #----------
        Interface BatchFeedManager
          createDataRecordClass: "AnalystRating" from: "DataRecord"
          linkedTo: "Entity" via: NA asTS: NA ;
    
        #----------
        #  Predefine the separate feed classes as subclasses of AnalystRatingFeed.
        #  This provides the opportunity for the specific feeds to inherit from a
        #  single feed class that can define analyst rating specific protocol
        #----------
    
        ClassSetup updateFromString:
        "classId | parentId | description
        AnalystRatingCompanyFeed | AnalystRatingFeed | Updates company ratings
        AnalystRatingIndustryFeed | AnalystRatingFeed | Updates industry ratings
        " ;
    
        #----------
        #  Now define the actual relationships between specific  subclasses
        #    and AnalystRating; this will link the feed named 
        #       AnalystRating_____Feed with this combination of settings
        #----------
    
        Interface BatchFeedManager
          createDataRecordClass: "AnalystRating" from: "DataRecord"
          linkedTo: "Company" via: "rating" asTS: "TRUE" ;
        Interface BatchFeedManager
          createDataRecordClass: "AnalystRating" from: "DataRecord"
          linkedTo: "Industry" via: "rating" asTS: "TRUE" ;
    
        #-- The AnalystRatingFeed class is a subclass of EEFeed and a
        #-- superclass of AnalystRatingCompanyFeed and AnalystRatingIndustryFeed
        #-- so special processing protocol can be defined at AnalystRatingFeed
    
        AnalystRatingFeed showInheritance ;
    
  3. Create an EntityExtenderFeed to update properties at an Entity or Bridge without requiring a specific DataRecord:
        EntityExtenderFeedSetup updateFromString: "feedId | baseClassId
        ERMiscFeed | ERCompany
        " ;
        
        ####################
        #  Here a couple of properties at the ERCompany bridge
        ####################
        PropertySetup updateFromString: "classId | property | tsFlag | dataType 
        ERCompany | ts1 | Y | Double 
        ERCompany | fp1 | N | Integer
        " ;
        
        ####################
        #  This feed should update the property ts1 right at the bridge
        ####################
        ERMiscFeed updateFromString: "entityId | date | ts1 
        ibm | 9705 | 10
        ibm | 9706 | 11
        " ;
        
        ####################
        #  and show it
        ####################
        Named Company IBM erCompany :ts1 displayAll ;
    


Optimizing Large Feeds

Large data files require more system resource to process than small ones. The loadFromFile: message will read and process your entire data file in a single pass. For performance reasons, it may be useful or necessary to divide the processing into multiple passes. The message bulkLoadFromFile:withConfig:andBatchSize:fromBatch:to: is available to do this. The parameters are:

Parameter Definition
bulkLoadFromFile: file name
withConfig: configuration file name
andBatchSize: number of characters to process at a time
fromBatch: first batch number
to: last batch number

The configuration file is read in once and preserved for each batch processed. If the configuration file contains a fieldOrderList it applies to the entire file; otherwise, the line specified in the configuration file as the headerLineNumber is used to set the fieldOrderList. All lines up to and including the headerLineNumber are skipped during the bulk load. If no line is specified, the first non-blank, non-comment line of the feed file is assumed to contain the field names. After the header line is identified several configuration file options are not meaningful and are disabled in subsequent iterations, such as skipTop, skipBottom, maxRecords, and asOfdateLineNumber. If there are no config file options NA can be supplied at withConfig:.

The batch size is used to control the amount of the file read in each pass. For example, if your file is 50mg and you want to process 5mg at a time, you would set this value to 5000000. The load will be broken into subsets of pieces of approximately 5mg and would process the file in 10 iterations. Note that each pass will adjust the number of characters read so that full records are always included. Note also that this technique should only be used on files that can support arbitrary cutoff points for batch sizes.

If the fromBatch: and to: parameters are set to NA, the entire file will be processed. If you want to only process a subset of the file, you can indicate the first and/or last batch to include. Batches are numbered from 0. The total number of batches will be a function of the total size of the file and the batch size you set.

For example, to load an entire price file in subsets of approximately 10mg use:

     PriceFeed bulkLoadFromFile: "price.dat"
        withConfig: "price.cfg" 
        andBatchSize: 10000000 fromBatch: NA to: NA ;
To load the first 10mg subset, use:
     PriceFeed bulkLoadFromFile: "price.dat"
        withConfig: "price.cfg" 
        andBatchSize: 10000000 fromBatch: 0 to: 0 ;