Case Study 5: Advanced Classification Techniques


Reminder!
To run these examples, you should first start a new session and then load the sample database using:
  "/localvision/samples/general/sample.load" asFileContents evaluate ;
and load testList using:
    !testList <- Company masterList 
        rankDown: [ sales ] . 
        select: [ rank <= 20 ] ;
    
Any other files referenced can be read from the /localvision/samples/general/ directory.

Note: The sample.load file runs by default on a Unix environment. If you are using a Windows NT platform, this location may be prefixed by a drive and optional path (e.g. d:/visiondb/localvision/samples/general/sample.load). Check with your Vision Administrator for further details.



Overview

The goal of the examples in this section is to illustrate some techniques that support complex classification requirements. In many of the other examples, you have seen basic techniques for grouping companies into industries and/or sectors. In this example, two additional requirements are considered:

  1. Support for a hierarchical grouping scheme with varying numbers of parents.
  2. Support for many-to-many relationships.


Sample Region Data

The examples in this section use the class Region. A region can reference a large area such as "Northeast" or can refer to a more specific area such as "New York City". Some regions can aggregate into larger regions. For example, "New York City" is part of the region "Tri-State Area" which is part of the "Northeast" region.

The Classification class is used to manage entity classes whose instances are primarily used for grouping purposes and which may aggregate into other instances of the same class. Classes such as Industry and Region are normally created as subclasses of Classification. The following Vision code creates the Region class and some sample instances:

     #--  Create Region subclass
     Classification createSubclass: "Region" ;

     #--  Create Region instances and link to parent region
     Region createInstance: "Northeast" .
         setNameTo: "North East Region" ;
     Region createInstance: "TriState" .
         setNameTo: "NY, NJ, Conn Tri-State Region" .
         setParentTo: Named Region Northeast ;
     Region createInstance: "NYC" .
         setNameTo: "New York City" .
         setParentTo: Named Region TriState ;
     Region createInstance: "ConnS" .
         setNameTo: "Southern Connecticut" .
         setParentTo: Named Region TriState ;
     Region createInstance: "NJN" .
         setNameTo: "Northern New Jersey" .
         setParentTo: Named Region TriState ;
     Region createInstance: "NewEngland" .
         setNameTo: "New England" .
         setParentTo: Named Region Northeast ;
     Region createInstance: "ConnN" .
         setNameTo: "Northern Connecticut" .
         setParentTo: Named Region NewEngland ;
     Region createInstance: "Mass" .
         setNameTo: "Massachusetts" .
         setParentTo: Named Region NewEngland ;
     Region createInstance: "NYS" .
         setNameTo: "New York State ex NYC" .
         setParentTo: Named Region Northeast ;
     Region createInstance: "MidAtlantic" .
         setNameTo: "Middle Atlantic" .
         setParentTo: Named Region Northeast ;
     Region createInstance: "Other" .
         setNameTo: "Rest of the World" ;

The first step is to create the new class, Region using the createSubclass: message. By creating Region as a subclass of Classification (which is a subset of Entity), the Named Region dictionary is automatically created for storing references to individual region instances. The property parent is defined at Classification and is used to store a reference to another instance of the class that represents the parent instance. The message setParentTo: allows you to set up these relationships.

You can reference a specific region using:

     Named Region NYC name printNL ;
You can reference a specific region's parent region using:
     Named Region NYC parent name printNL ;
You can reference a specific region's parent's parent using:
     Named Region NYC parent parent name printNL ;
The message displayHierarchy displays the region's full parent hierarchy:
     Named Region NYC displayHierarchy ;

When new instances are created, the parent property is initialized to return the new instance. The isParent message returns TRUE if the recipient is its own parent instance.


Using the Hierarchy

The following Vision code can be used to assign a primary region to each company in the sample database:

     #--  Create property to store the region
     Company defineFixedProperty: 'primaryRegion' 
                     withDefault: Named Region Other ;

     #--  Assign Region values
     Named Company send: [ AET, CI, TIC ] .
        do: [ :primaryRegion <- ^global Named Region ConnN ] ;
     Named Company send: [ T, XON ] .
        do: [ :primaryRegion <- ^global Named Region NJN ] ;
     Named Company send: [ CMB, CCI ] .
        do: [ :primaryRegion <- ^global Named Region NYC ] ;
     Named Company send: [ DEC ] .
        do: [ :primaryRegion <- ^global Named Region Mass ] ;
     Named Company send: [ EK ] .
        do: [ :primaryRegion <- ^global Named Region NYS ] ;
     Named Company send: [ ARC, BLS, DOW, DD, MO ] .
        do: [ :primaryRegion <- ^global Named Region MidAtlantic ] ;
     Named Company send: [ C, F, GM ] .
        do: [ :primaryRegion <- ^global Named Region Northeast ] ;
     Named Company send: [ GTE, GE ] .
        do: [ :primaryRegion <- ^global Named Region NewEngland ] ;
     Named Company send: [ IBM, NYN, PEP ] .
        do: [ :primaryRegion <- ^global Named Region TriState] ;

The first step is to define a new property at the Company class which will be used to store the value of a company's primary region. Since the withDefault: parameter is supplied for this new property, current company instances will have this property initialized with the region Other and new company instances created with the createInstance: message will have their primaryRegion property initialized to use this Region.

Specific companies are then assigned to regions. Note that some companies are assigned to very specific regions such as NJN (e.g., XON in northern New Jersey) while other companies are assigned to fairly broad regions such as Northeast. The displayHierarchy message will display the number of levels appropriate for the region. For example:

     Named Company XON primaryRegion displayHierarchy
includes three levels (northern New Jersey, tri-state area, northeast) while:
     Named Company DOW primaryRegion displayHierarchy
includes two levels (mid-atlantic, northeast) and:
     Named Company GM primaryRegion displayHierarchy
includes one level (northeast). To display the region for each company:
     Company masterList
     do: [ code print: 10 ; primaryRegion code printNL ] ;
To display the region for each company, grouping the companies by parent region use:
     Company masterList groupedBy: [ primaryRegion parent ] .
     do: [ "Parent: " print ; name printNL ;
           groupList
           do: [ code print: 10 ; name print: 30 ; 
                 primaryRegion printNL ;
               ];
           newLine print ;
         ] ;
It is often useful to be able to return the most aggregate parent in a classification hierarchy. You can define a method to return the "major" region using:
     Region defineMethod: [ | major |
          isParent ifTrue: [ asSelf ] ifFalse: [ parent major ] 
     ] ;
This method checks to see if an instance is its own parent. If it is, you have reached the top of the hierarchy and the value of the instance is returned; otherwise, the major message is sent to the current instance's parent object. For example:
     Company masterList groupedBy: [ primaryRegion major ] .
     do: [ "Major: " print ; code print: 10 ; name printNL ;
           groupList
           do: [ code print: 10 ; name print: 30 ; 
                 primaryRegion printNL ;
               ];
           newLine print ;
         ] ;


Warning!!
This method recursively calls the major message until the parent object is the same as the current recipient. This implies that an instance in the region hierarchy must have its value of parent defined to return itself. Otherwise your method will not terminate.


Multiple Regions

The primaryRegion property allows you to associate a specific region instance with a company instance. The following Vision code can be used to assign multiple regions to each company in the sample database:

     #--  Create property to store regions
     Company defineFixedProperty: 'regions'  ;

     #--  Assign an empty indexed list as the value for this property
     #--  for each company; this indexed list will enables easy access
     #--  to the set of regions or specific regions associated with the
     #--  company
     Company instanceList
       do: [ :regions <- ^global IndexedList clusterNew ] ;

The first step is to define a new property at the Company class which will be used to store an indexed list of regions in which the company does business. This property is initialized to be a new instance of the IndexedList class. The clusterNew version of the new message is used to keep all the new indexed list objects in the same physical storage cluster. More information about clustering and other initialization issues is available.

Initially, this regions message returns an empty list for all companies. For example:

     Named Company IBM regions count printNL ;
displays the value 0. To add the value of the primaryRegion as an entry in the regions list, use:
     #--  Add the primaryRegion as an initial region for all
     Company instanceList
        do: [ regions at: primaryRegion put: primaryRegion ] ;
The at:put: message is used to add the object supplied as the second parameter to the recipient indexed list and store a direct index to this object using the first parameter. Each company will now have one region in its regions list. To access all the regions for a company use:
     Named Company IBM regions
     do: [ displayInfo ] ;
To test if a company is part of a specific region use:
     Named Company IBM regions at: Named Region TriState
If the region is in IBM's region list, the region object will be returned; otherwise, the value NA is returned. To find all the companies that do business in the Mass region use:
     Company masterList 
         select: [ regions at: ^global Named Region Mass . isntNA ] .
     do: [ displayInfo ] ;
To include additional regions for some companies use:
     #--  Add other regions for some companies
     Named Company IBM regions
          at: Named Region Mass put: Named Region Mass ;
     Named Company IBM regions
          at: Named Region NJN put: Named Region NJN ;
     Named Company CI regions
          at: Named Region NJN put: Named Region NJN ;
Now the expression:
     Named Company IBM regions count printNL ;
displays the value 3 and the expression:
     Named Company IBM regions
     do: [ displayInfo ] ;
displays the regions TriState, NJN, and Mass. Rerun the query to find the companies that do business in the Mass region:
     Company masterList 
         select: [ regions at: ^global Named Region Mass . isntNA ] .
     do: [ displayInfo ] ;
This list now includes IBM.

You can define a property at the Region class that tracks all companies that do business in that region:

     #--  Define the property
     Region defineFixedProperty: 'companyList' ;

     #--  Update the lists
     Company masterList mgroupedBy: [ regions ] .
     do: [ :companyList <- groupList ] ;
The mgroupedBy: message provides an efficient way to generate groups from multi-valued properties.

To display the number of companies in the Mass region use:

     Named Region Mass companyList count
To display the companies in this region, use:
     Named Region Mass companyList
     do: [ displayInfo ] ;


| Vision Basics | Creating a Demo Database | Single Object Access | Using Lists | Using Dates and TimeSeries |