Case Study 5: Advanced Classification Techniques
Reminder! To run these examples, you should first start a new session and then load the sample database using:"/localvision/samples/general/sample.load" asFileContents evaluate ;and load testList using:
!testList <- Company masterList rankDown: [ sales ] . select: [ rank <= 20 ] ;Any other files referenced can be read from the /localvision/samples/general/ directory.Note: The sample.load file runs by default on a Unix environment. If you are using a Windows NT platform, this location may be prefixed by a drive and optional path (e.g. d:/visiondb/localvision/samples/general/sample.load). Check with your Vision Administrator for further details.
Overview
The goal of the examples in this section is to illustrate some techniques that support complex classification requirements. In many of the other examples, you have seen basic techniques for grouping companies into industries and/or sectors. In this example, two additional requirements are considered:
- Support for a hierarchical grouping scheme with varying numbers of parents.
- Support for many-to-many relationships.
Sample Region Data
The examples in this section use the class Region. A region can reference a large area such as "Northeast" or can refer to a more specific area such as "New York City". Some regions can aggregate into larger regions. For example, "New York City" is part of the region "Tri-State Area" which is part of the "Northeast" region.
The Classification class is used to manage entity classes whose instances are primarily used for grouping purposes and which may aggregate into other instances of the same class. Classes such as Industry and Region are normally created as subclasses of Classification. The following Vision code creates the Region class and some sample instances:
#-- Create Region subclass Classification createSubclass: "Region" ; #-- Create Region instances and link to parent region Region createInstance: "Northeast" . setNameTo: "North East Region" ; Region createInstance: "TriState" . setNameTo: "NY, NJ, Conn Tri-State Region" . setParentTo: Named Region Northeast ; Region createInstance: "NYC" . setNameTo: "New York City" . setParentTo: Named Region TriState ; Region createInstance: "ConnS" . setNameTo: "Southern Connecticut" . setParentTo: Named Region TriState ; Region createInstance: "NJN" . setNameTo: "Northern New Jersey" . setParentTo: Named Region TriState ; Region createInstance: "NewEngland" . setNameTo: "New England" . setParentTo: Named Region Northeast ; Region createInstance: "ConnN" . setNameTo: "Northern Connecticut" . setParentTo: Named Region NewEngland ; Region createInstance: "Mass" . setNameTo: "Massachusetts" . setParentTo: Named Region NewEngland ; Region createInstance: "NYS" . setNameTo: "New York State ex NYC" . setParentTo: Named Region Northeast ; Region createInstance: "MidAtlantic" . setNameTo: "Middle Atlantic" . setParentTo: Named Region Northeast ; Region createInstance: "Other" . setNameTo: "Rest of the World" ;
The first step is to create the new class, Region using the createSubclass: message. By creating Region as a subclass of Classification (which is a subset of Entity), the Named Region dictionary is automatically created for storing references to individual region instances. The property parent is defined at Classification and is used to store a reference to another instance of the class that represents the parent instance. The message setParentTo: allows you to set up these relationships.
You can reference a specific region using:
Named Region NYC name printNL ;You can reference a specific region's parent region using:
Named Region NYC parent name printNL ;You can reference a specific region's parent's parent using:
Named Region NYC parent parent name printNL ;The message displayHierarchy displays the region's full parent hierarchy:
Named Region NYC displayHierarchy ;
When new instances are created, the parent property is initialized to return the new instance. The isParent message returns TRUE if the recipient is its own parent instance.
Using the Hierarchy
The following Vision code can be used to assign a primary region to each company in the sample database:
#-- Create property to store the region Company defineFixedProperty: 'primaryRegion' withDefault: Named Region Other ; #-- Assign Region values Named Company send: [ AET, CI, TIC ] . do: [ :primaryRegion <- ^global Named Region ConnN ] ; Named Company send: [ T, XON ] . do: [ :primaryRegion <- ^global Named Region NJN ] ; Named Company send: [ CMB, CCI ] . do: [ :primaryRegion <- ^global Named Region NYC ] ; Named Company send: [ DEC ] . do: [ :primaryRegion <- ^global Named Region Mass ] ; Named Company send: [ EK ] . do: [ :primaryRegion <- ^global Named Region NYS ] ; Named Company send: [ ARC, BLS, DOW, DD, MO ] . do: [ :primaryRegion <- ^global Named Region MidAtlantic ] ; Named Company send: [ C, F, GM ] . do: [ :primaryRegion <- ^global Named Region Northeast ] ; Named Company send: [ GTE, GE ] . do: [ :primaryRegion <- ^global Named Region NewEngland ] ; Named Company send: [ IBM, NYN, PEP ] . do: [ :primaryRegion <- ^global Named Region TriState] ;
The first step is to define a new property at the Company class which will be used to store the value of a company's primary region. Since the withDefault: parameter is supplied for this new property, current company instances will have this property initialized with the region Other and new company instances created with the createInstance: message will have their primaryRegion property initialized to use this Region.
Specific companies are then assigned to regions. Note that some companies are assigned to very specific regions such as NJN (e.g., XON in northern New Jersey) while other companies are assigned to fairly broad regions such as Northeast. The displayHierarchy message will display the number of levels appropriate for the region. For example:
Named Company XON primaryRegion displayHierarchyincludes three levels (northern New Jersey, tri-state area, northeast) while:
Named Company DOW primaryRegion displayHierarchyincludes two levels (mid-atlantic, northeast) and:
Named Company GM primaryRegion displayHierarchyincludes one level (northeast). To display the region for each company:
Company masterList do: [ code print: 10 ; primaryRegion code printNL ] ;To display the region for each company, grouping the companies by parent region use:
Company masterList groupedBy: [ primaryRegion parent ] . do: [ "Parent: " print ; name printNL ; groupList do: [ code print: 10 ; name print: 30 ; primaryRegion printNL ; ]; newLine print ; ] ;It is often useful to be able to return the most aggregate parent in a classification hierarchy. You can define a method to return the "major" region using:
Region defineMethod: [ | major | isParent ifTrue: [ asSelf ] ifFalse: [ parent major ] ] ;This method checks to see if an instance is its own parent. If it is, you have reached the top of the hierarchy and the value of the instance is returned; otherwise, the major message is sent to the current instance's parent object. For example:
Company masterList groupedBy: [ primaryRegion major ] . do: [ "Major: " print ; code print: 10 ; name printNL ; groupList do: [ code print: 10 ; name print: 30 ; primaryRegion printNL ; ]; newLine print ; ] ;
Warning!! This method recursively calls the major message until the parent object is the same as the current recipient. This implies that an instance in the region hierarchy must have its value of parent defined to return itself. Otherwise your method will not terminate.
Multiple Regions
The primaryRegion property allows you to associate a specific region instance with a company instance. The following Vision code can be used to assign multiple regions to each company in the sample database:
#-- Create property to store regions Company defineFixedProperty: 'regions' ; #-- Assign an empty indexed list as the value for this property #-- for each company; this indexed list will enables easy access #-- to the set of regions or specific regions associated with the #-- company Company instanceList do: [ :regions <- ^global IndexedList clusterNew ] ;
The first step is to define a new property at the Company class which will be used to store an indexed list of regions in which the company does business. This property is initialized to be a new instance of the IndexedList class. The clusterNew version of the new message is used to keep all the new indexed list objects in the same physical storage cluster. More information about clustering and other initialization issues is available.
Initially, this regions message returns an empty list for all companies. For example:
Named Company IBM regions count printNL ;displays the value 0. To add the value of the primaryRegion as an entry in the regions list, use:
#-- Add the primaryRegion as an initial region for all Company instanceList do: [ regions at: primaryRegion put: primaryRegion ] ;The at:put: message is used to add the object supplied as the second parameter to the recipient indexed list and store a direct index to this object using the first parameter. Each company will now have one region in its regions list. To access all the regions for a company use:
Named Company IBM regions do: [ displayInfo ] ;To test if a company is part of a specific region use:
Named Company IBM regions at: Named Region TriStateIf the region is in IBM's region list, the region object will be returned; otherwise, the value NA is returned. To find all the companies that do business in the Mass region use:
Company masterList select: [ regions at: ^global Named Region Mass . isntNA ] . do: [ displayInfo ] ;To include additional regions for some companies use:
#-- Add other regions for some companies Named Company IBM regions at: Named Region Mass put: Named Region Mass ; Named Company IBM regions at: Named Region NJN put: Named Region NJN ; Named Company CI regions at: Named Region NJN put: Named Region NJN ;Now the expression:
Named Company IBM regions count printNL ;displays the value 3 and the expression:
Named Company IBM regions do: [ displayInfo ] ;displays the regions TriState, NJN, and Mass. Rerun the query to find the companies that do business in the Mass region:
Company masterList select: [ regions at: ^global Named Region Mass . isntNA ] . do: [ displayInfo ] ;This list now includes IBM.
You can define a property at the Region class that tracks all companies that do business in that region:
#-- Define the property Region defineFixedProperty: 'companyList' ; #-- Update the lists Company masterList mgroupedBy: [ regions ] . do: [ :companyList <- groupList ] ;The mgroupedBy: message provides an efficient way to generate groups from multi-valued properties.
To display the number of companies in the Mass region use:
Named Region Mass companyList countTo display the companies in this region, use:
Named Region Mass companyList do: [ displayInfo ] ;
| Vision Basics | Creating a Demo Database | Single Object Access | Using Lists | Using Dates and TimeSeries |