Vision Class: DataFeed
DataFeed Review
The DataFeed class is an abstract class that is used to organize the classes that translate data from sources external to Vision into Vision objects. An external feed corresponds to a flat, tabular structure. Several subclasses of DataFeed have been defined to encapsulate different ways to map data from external formats into Vision objects. You can get a full list of the feeds defined in your environment using:
DataFeed showInheritance ;
A separate subclass is defined for each format of external data you wish to load into your Vision database. For example, subclasses of the MasterFeed class are defined to create and update instances of a specific Entity, such as Currency or Security. Subclasses of the EntityExtenderFeed class are defined to update a specific relationship between an Entity subclass and associated data, such as pricing data for a security.
At its simplest, a feed is a tab or vertical bar delimited string containing one or more columns of information for one or more rows. For example, to create and update Currency instances, you could use:
CurrencyMaster updateFromString: "entityId | name | shortName USD | United States Dollar | US Dollar CAD | Canadian Dollar | CA Dollar GBP | Great British Pound | GB Pound " ;
This example loads data using the CurrencyMaster feed. This feed will create Currency instances for any entityId not already defined and will update the name and shortName properties for the three instances included.
The message updateFromString: can be sent to any DataFeed subclass to update data from the string supplied as a parameter. Alternatively, the message loadFromFile: can be sent to any DataFeed subclass to read the data from the file name supplied as a parameter.
The document Vision Class: DataFeed provides a detailed description of the DataFeed class. A number of specialized interfaces have been defined that package feeds for batch processing. This document provides additional advanced techniques for working with the feeds through the use of examples. If you need additional assistance implementing some of these techniques in your environment, contact your Insyte consultant.
Bridges and EstimateBridges
The subclasses of the Bridge class are used to manage the protocol for connecting an entity to one more DataRecord instances. It is useful when there are complex relationships that need to be managed and you do not want to overburden the entity class with all the required information. An instance of a Bridge subclass is normally referenced via the entity instance with which it is associated. The BatchFeedManager class supplies an interface for creating bridge subclasses.
Bridges and Vendor Data
External vendors often provide data about a specific entity. This data can be stored directly with the entity by establishing one or more properties that return the vendor information. When several data items and/or data records are associated with a single vendor, a naming convention is usually imposed to keep track of the different sources. This is a situation where a Bridge can be used to help organize the vendor-specific DataRecords:
For example, in investment applications fundamental data about companies can be supplied from a number of vendors. Each vendor supplies overlapping data items for multiple frequencies. In particular, Vendor X supplies sales and earnings values annually and quarterly and Vendor Y supplies sales and earnings values annually. The Bridge and DataRecord classes can be set up as follows:
#-- Create the Bridges to Company Interface BatchFeedManager createBridgeClass: "VendorX" from: "Bridge" linkedTo: "Company" via: "vendorX" asTS: "FALSE" ; Interface BatchFeedManager createBridgeClass: "VendorY" from: "Bridge" linkedTo: "Company" via: "vendorY" asTS: "FALSE" ; #-- Create the DataRecord classes and attach to Bridge Interface BatchFeedManager createDataRecordClass: "VendorXAnnual" from: "DataRecord" linkedTo: "VendorX" via: "annual" asTS: "TRUE" ; Interface BatchFeedManager createDataRecordClass: "VendorXQuarter" from: "DataRecord" linkedTo: "VendorX" via: "quarter" asTS: "TRUE" ; Interface BatchFeedManager createDataRecordClass: "VendorYAnnual" from: "DataRecord" linkedTo: "VendorY" via: "annual" asTS: "TRUE" ; #-- Define properties for DataRecord classes PropertySetup updateFromString: "classId | property | dataType | VendorXAnnual | sales | Double | VendorXAnnual | eps | Double | VendorXQuarter | sales | Double | VendorXQuarter | eps | Double | VendorYAnnual | sales | Double | VendorYAnnual | eps | Double | " ; #-- Load some data VendorXAnnualFeed updateFromString: "id | date | sales | eps ibm | 9512 | 1000 | 2.34 ibm | 9612 | 2000 | 1.24 ibm | 9712 | 3000 | 3.29 " ; #-- Basic Access - latest ibm annual data from vendor X Named Company IBM vendorX annual do: [ date print: 15 ; sales print ; eps printNL ] #-- Basic Access - ibm annual data from vendor X over time Named Company IBM vendorX :annual do: [ ^date print: 15 ; sales print ; eps printNL ]
Bridges and Private Data
Another reason to use Bridge subclasses is to create structures that can be updated by private users. For example, suppose the Equity Research group wants to update analyst rating information for companies without requiring the intervention of the global database administrator. A separate object space can be created for this group. An ERCompany bridge class and one or more data record classes can be created in this space. Data associated with this bridge class can then be updated privately.
Assume that an object space named EquityResearch has been created for storing this private data. The Bridge and DataRecord classes can be set up as follows:
#-- Create the Bridge to Company in the EquityResearch object space Interface BatchFeedManager setObjectSpaceTo: Environment DB EquityResearch ; Interface BatchFeedManager createBridgeClass: "ERCompany" from: "LocalEntity" linkedTo: "Company" via: "erCompany" asTS: "Method" ; #-- Create the AnalystRating class and attach to Bridge Interface BatchFeedManager setObjectSpaceTo: Environment DB EquityResearch ; Interface BatchFeedManager createDataRecordClass: "ERAnalystRating" from: "DataRecord" linkedTo: "ERCompany" via: "analystRating" asTS: "TRUE" ; #-- Define properties for AnalystRating classes PropertySetup updateFromString: "classId | property | dataType ERAnalystRating | rating | Integer ERAnalystRating | comment | String " ; #-- Load some data (performed by private user in private object space) ERAnalystRatingFeed updateFromString: "id | date | rating | comment ibm | 9901 | 3 | average ibm | 9902 | 2 | changed to good ibm | 9903 | 1 | changed to excellent " ; #-- Basic Access - latest ibm rating data Named Company IBM erCompany analystRating do: [ date print: 15 ; rating print; comment printNL ] ; #-- Basic Access - ibm rating data from over time Named Company IBM erCompany :annualRating do: [ ^date print: 15 ; rating print ; comment printNL ]
Bridges, Private Data, and Memberships
Private bridges can be used in conjunction with MembershipFeeds as well. Recall that a MembershipFeed is used to update and cross reference one-to-many and many-to-many relationships between two Entity instances over time. For example, one or more companies can be members of the same industry. Over time, a company can move from one industry to another. In this case, instances of the entity class Company represent members and instances of the entity class Industry represent groups.
Since the Company and Industry classes are shared, a private user cannot update the properties that track this relationship if they are defined directly at the entity classes. To circumvent this situation, the cross-referencing properties can be defined at a bridge class that can be updated by the private users.
Assume that an object space named MyData has been created for storing this private data and the bridge classes. The membership relationship can be set up as follows:
#-- Create bridge classes Interface BatchFeedManager setObjectSpaceTo: Environment DB MyData . createBridgeClass: "MyCompany" from: "LocalEntity" linkedTo: "Company" via: "myCompany" asTS: "Method" ; Interface BatchFeedManager setObjectSpaceTo: Environment DB MyData . createBridgeClass: "MyIndustry" from: "LocalEntity" linkedTo: "Industry" via: "myIndustry" asTS: "Method" ; #--- Explicitly define property and xref property at Bridges MyCompany define: 'industry' withDefault: Industry ; MyIndustry define: 'companyList' withDefault: IndexedList new ; #-- Make sure companyList default is stored in correct object space MyIndustry companyList establishResidenceInSpaceOf: Environment DB MySpace ; #-- Enable Navigation MessageSetup updateFromString: "classId | message | tvFlag | returnType | containerType | description MyCompany | industry | N | Industry | | MyIndustry | companyList | N | Company | IndexedList | " ; #--- Setup MembershipFeed MembershipFeedSetup updateFromString: " feedId | groupId | groupPath | memberId | memberPath MyCompanyToIndustry | Industry | companyList | Company | industry " ; #--- Add a "redirection" via the bridge paths MyCompanyToIndustry setGroupBridgePathTo: "myIndustry" ; MyCompanyToIndustry setMemberBridgePathTo: "myCompany" ; #--- Load some Sample Data MyCompanyToIndustry updateFromString: " memberId | groupId ibm | 530 hwp | 530 msft | 540 orcl | 540 gm | 210 f | 210 c | 210 " ; #--- Sample Access: get ibm's industry Named Company IBM myCompany industry displayInfo ; #--- Sample Access: get industry 530's companyList Named Industry \530 myIndustry companyList do: [ displayInfo ] ;
EstimateBridges
External vendors provide data that can be estimated for multiple time periods. Subclasses of the EstimateBridge class can be used to manage the relationship between an entity and the estimate data. For example, suppose you want to create an estimate bridge for tracking consensus estimates provided by I/B/E/S. The EstimateBridge class and associated feed can be set up as follows:
Interface BatchFeedManager createBridgeClass: "Ibes" from: "Bridge" linkedTo: "Company" via: "ibes" asTS: FALSE; #--- Annual Estimates Interface BatchFeedManager createEstimateBridgeClass: "IbesEpsABridge" from: "EstimateBridge" linkedTo: "Ibes" via: "epsEstA" withEstimateRecordClass: "ConsensusEstimateRecord" andFreq: 12 monthEnds ; #--- Quarterly Estimates Interface BatchFeedManager createEstimateBridgeClass: "IbesEpsQBridge" from: "EstimateBridge" linkedTo: "Ibes" via: "epsEstQ" withEstimateRecordClass: "ConsensusEstimateRecord" andFreq: 3 monthEnds ; #--- Sample Data IbesEpsABridgeFeed updateFromString: "id | date | periodEndDate | currency | mean | median ibm | 990615 | 98 | USD | 6.15 | 6.23 ibm | 990616 | 98 | USD | 6.16 | 6.24 gm | 990616 | 98 | USD | 61.6 | 62.3 ibm | 990616 | 99 | USD | 8.99 | 8.32 " ; #--- Sample Access: all annual values for ibm for 98 Named Company IBM ibes :epsEstA asOf: 98 . :observation do: [ ^date print: 15 ; mean printNL ] ; #-- Show last observation for each epsEstA for ibm Named Company IBM ibes :epsEstA nonDefaults do: [ ^date print: 15 ; lastObservation do: [ date print: 15; mean printNL ] ; ] ; #-- Show last and prior observation for each epsEstA for ibm Named Company IBM ibes :epsEstA nonDefaults do: [ ^date print: 15 ; lastObservation, priorObservation do: [ " " print ; date print: 15; mean print ] ; newLine print ; ] ; #-- Show actual and last observation for each epsEstA for ibm Named Company IBM ibes :epsEstA nonDefaults do: [ ^date print: 15 ; actualRecord, lastObservation do: [ " " print ; date print: 15; mean print ] ; newLine print ; ] ; #-- Get last 5 estimate made for ibm in 98 Named Company IBM ibes :epsEstA asOf: 98 . :observation last: 5 . do: [ date print: 15 ; mean print ; median printNL ] ;
Private EstimateBridges
EstimateBridges can be created in separate object spaces to enable private updating as follows:
Interface BatchFeedManager setObjectSpaceTo: Environment DB EquityResearch ; Interface BatchFeedManager createDataRecordClass: "ERAnalystEstimate" from: "AnalystEstimateRecord" linkedTo: "" via: NA asTS: NA ; Interface BatchFeedManager setObjectSpaceTo: Environment DB EquityResearch ; Interface BatchFeedManager createEstimateBridgeClass: "EpsABridge" from: "EstimateBridge" linkedTo: "ERCompany" via: "epsEstA" withEstimateRecordClass: "ERAnalystEstimate" andFreq: 12 monthEnds ; EpsABridgeFeed updateFromString: "id | date | periodEndDate | currency | estimate | actualFlag ibm | 990615 | 98 | USD | 6.15 | N ibm | 990616 | 98 | USD | 6.16 | N gm | 990616 | 98 | USD | 61.6 | N ibm | 990616 | 00 | USD | 99.616 | N " ; Named Company IBM erCompany :epsEstA asOf: 98 . :observation displayAll ; Named Company IBM erCompany :epsEstA do: [ ^date print: 15 ; isDefault print: 10 ; :observation count printNL ; :observation displayAll ; "=" fill: 50 . printNL ; ] ;
Customizing DataFeeds
The updateFromString: message defined at DataFeed is called directly or indirectly via the loadFromFile: message or the BatchFeedManager interface. This message clears the last set of instances created for the feed, creates a new instance in the feed class for each record in the feed, and runs the reconcile message which calls the following methods:
Method | Function |
---|---|
initializeProcessing | run any special initializations for each feed instance including the creation of any missing Entity, Bridge, and/or DataRecord instances, appropriate for the feed; assign the feed property underlyingRecord to return the structure to be updated by the feed instance |
runUpdate | use each feed instance to update the appropriate properties in the underlyingRecord |
displayExceptions | display any errors or other status information |
runWrapup | execute any special post-processing steps |
runLocalWrapup | execute any supplemental wrapup steps for built-in feeds |
runUpdateStats | update summary information about the feed such as timestamp of processing (lastUpdateTime) and number of instances processed (lastUpdateCount). |
The displayExceptions message executes the following methods:
Method | Function |
---|---|
displayExceptionSummary | display summary counts |
displayBadOnes | display bad records supplied in feed |
displayNewOnes | display new core class instances created by feed |
displayOtherExceptions | display other information about the instances processed in the current feed update |
Subclasses of DataFeed redefine any or all of these messages, as needed. The runLocalWrapup and the displayLocalExceptions are the methods that are most frequently modified by users that wish to enhance existing feeds. Any or all of these methods can be redefined by users creating their own feed subclasses.
Enable & Disable Messages
There are a number of messages that can be used to temporarily or permanently change the default behavior of a Data Feed. The messages need to be set prior to the load, for example:
CurrencyMaster enableInternalIds loadFromFile: "currency.dat" ;
Depending on the message, the behavior can apply to the next load only or remain set until it is explicitly unset. Following is a list of messages that can be used in this manner and a description of their behavior and defaults:
Description | Data Feed | Default | Flag Setting |
---|---|---|---|
enableCompanyChanges, disableCompanyChanges
enableCusipChanges, disableCusipChanges enableSedolChanges, disableSedolChanges | |||
Allow/prohibit posting company, cusip and sedol changes; changes made or skipped displayed in exception report. | SecurityMaster | enableCompanyChanges, enableCusipChanges, enableSedolChanges | Resets to enable after each update. |
enableDisplayNewOnes, disableDisplayNewOnes | |||
Enables/disables the new ones report from displaying in the exception report. | All data feeds | For all data feeds other than EntityExtender, enableDisplayNewOnes is the default; disableDisplayNewOnes is the default for EntityExtenderFeeds | Remains set until explicitly unset. |
enableEntityCreation, disableEntityCreation | |||
Disable message will prevent any new instances from being created but will update any existing instances | MasterFeed | enableEntityCreation | Resets to enable after each update. |
enableInternalIds, disableInternalIds | |||
Enable message can be used to automatically generate a permanent, unique id for any instances in the underlying entity class | MasterFeed | disableInternalIds | Remains set until explicitly unset. |
enablePurge, disablePurge | |||
Enable message purges all data from following feed | MasterFeed (entities are flagged as inactive) and EntityExtenderFeed (DataRecords and points in time series properties are purged) | disablePurge | Resets to disable after each update. |
enableSplitInversion, disableSplitInversion | |||
Use enable message when feed file supplies inverted splits rates | SplitsFeed | disableSplitInversion | Remains set until explicitly unset. |
enableOnlyUpdateOnChange, disableOnlyUpdateOnChange | |||
Allows updates to time series properties only if the value has changed. This works with EntityExtenderFeeds that are used to update fields that are time series properties, not feeds that update time series of records. This message also assumes there is one update per entity per day. Note: use inconsistent with this assumption can cause undesirable results. | EntityExtenderFeeds | disableOnlyUpdateOnChange | Resets to disable after each update. |
Advanced Examples Using DataFeeds
- Using the same DataRecord subclass with more than one entity or bridge:
Example:
- tracking analyst data (e.g., rating, score, comment) over time for
companies and industries
Constraints:
- since the name of the property that tracks the entity->datarecord
relationship is stored with the datarecord, you must use the same property
name for each entity/bridge that relates to the datarecord
Approach:
Use the data record class creation interface to create a separate AnalystRating class linked to company and industry via the same property name (e.g., rating). This will generate two distinct feeds - the first will be named AnalystRatingFeed and the second will be named AnalystRating2ndClassNameFeed:
#-- first one: create new AnalystRating DataRecord class, #-- defines rating t/s property at Company, and creates #-- AnalystRatingFeed to update Company-AnalystRating data Interface BatchFeedManager createDataRecordClass: "AnalystRating" from: "DataRecord" linkedTo: "Company" via: "rating" asTS: "TRUE" ; #-- second one: does not create AnalystRating DataRecord class, #-- defines rating t/s property at Industry, and creates #-- AnalystRatingIndustryFeed to update Industry-AnalystRating data Interface BatchFeedManager createDataRecordClass: "AnalystRating" from: "DataRecord" linkedTo: "Industry" via: "rating" asTS: "TRUE" ;
- Create EntityExtenderFeed subclass that can be used as a superclass of
several entity extender feeds:
Example:
- add special processing rule for analyst rating that applies to the
company and industry analyst rating feeds but no other feeds
Approach:
#---------- # Create the new class with a generic feed class (AnalystRatingFeed). # Defer the linkage to specific subclasses as needed. #---------- Interface BatchFeedManager createDataRecordClass: "AnalystRating" from: "DataRecord" linkedTo: "Entity" via: NA asTS: NA ; #---------- # Predefine the separate feed classes as subclasses of AnalystRatingFeed. # This provides the opportunity for the specific feeds to inherit from a # single feed class that can define analyst rating specific protocol #---------- ClassSetup updateFromString: "classId | parentId | description AnalystRatingCompanyFeed | AnalystRatingFeed | Updates company ratings AnalystRatingIndustryFeed | AnalystRatingFeed | Updates industry ratings " ; #---------- # Now define the actual relationships between specific subclasses # and AnalystRating; this will link the feed named # AnalystRating_____Feed with this combination of settings #---------- Interface BatchFeedManager createDataRecordClass: "AnalystRating" from: "DataRecord" linkedTo: "Company" via: "rating" asTS: "TRUE" ; Interface BatchFeedManager createDataRecordClass: "AnalystRating" from: "DataRecord" linkedTo: "Industry" via: "rating" asTS: "TRUE" ; #-- The AnalystRatingFeed class is a subclass of EEFeed and a #-- superclass of AnalystRatingCompanyFeed and AnalystRatingIndustryFeed #-- so special processing protocol can be defined at AnalystRatingFeed AnalystRatingFeed showInheritance ;
- Create an EntityExtenderFeed to update properties at an Entity or Bridge
without requiring a specific DataRecord:
EntityExtenderFeedSetup updateFromString: "feedId | baseClassId ERMiscFeed | ERCompany " ; #################### # Here a couple of properties at the ERCompany bridge #################### PropertySetup updateFromString: "classId | property | tsFlag | dataType ERCompany | ts1 | Y | Double ERCompany | fp1 | N | Integer " ; #################### # This feed should update the property ts1 right at the bridge #################### ERMiscFeed updateFromString: "entityId | date | ts1 ibm | 9705 | 10 ibm | 9706 | 11 " ; #################### # and show it #################### Named Company IBM erCompany :ts1 displayAll ;
Optimizing Large Feeds
Large data files require more system resource to process than small ones. The loadFromFile: message will read and process your entire data file in a single pass. For performance reasons, it may be useful or necessary to divide the processing into multiple passes. The message bulkLoadFromFile:withConfig:andBatchSize:fromBatch:to: is available to do this. The parameters are:
Parameter | Definition |
---|---|
bulkLoadFromFile: | file name |
withConfig: | configuration file name |
andBatchSize: | number of characters to process at a time |
fromBatch: | first batch number |
to: | last batch number |
The configuration file is read in once and preserved for each batch processed. If the configuration file contains a fieldOrderList it applies to the entire file; otherwise, the line specified in the configuration file as the headerLineNumber is used to set the fieldOrderList. All lines up to and including the headerLineNumber are skipped during the bulk load. If no line is specified, the first non-blank, non-comment line of the feed file is assumed to contain the field names. After the header line is identified several configuration file options are not meaningful and are disabled in subsequent iterations, such as skipTop, skipBottom, maxRecords, and asOfdateLineNumber. If there are no config file options NA can be supplied at withConfig:.
The batch size is used to control the amount of the file read in each pass. For example, if your file is 50mg and you want to process 5mg at a time, you would set this value to 5000000. The load will be broken into subsets of pieces of approximately 5mg and would process the file in 10 iterations. Note that each pass will adjust the number of characters read so that full records are always included. Note also that this technique should only be used on files that can support arbitrary cutoff points for batch sizes.
If the fromBatch: and to: parameters are set to NA, the entire file will be processed. If you want to only process a subset of the file, you can indicate the first and/or last batch to include. Batches are numbered from 0. The total number of batches will be a function of the total size of the file and the batch size you set.
For example, to load an entire price file in subsets of approximately 10mg use:
PriceFeed bulkLoadFromFile: "price.dat" withConfig: "price.cfg" andBatchSize: 10000000 fromBatch: NA to: NA ;To load the first 10mg subset, use:
PriceFeed bulkLoadFromFile: "price.dat" withConfig: "price.cfg" andBatchSize: 10000000 fromBatch: 0 to: 0 ;