Vision Class: Collection

Overview

The Collection class is an abstract class that is used to organize classes in the hierarchy whose instances represent sets of objects. The major subclasses of Collection are: List, IndexedList, and TimeSeries.

Instances of the class List represent collections of objects that are accessed either by position or as a set. Instances of the class IndexedList represent collections of objects that are accessed by a user-defined index or as a set. Instances of the class TimeSeries represent collections of objects that are accessed by date or as a set. The elements in a collection do not need to be from the same class.

The Collection class is an indirect subclass of Object:

  Object
     |
     Function
        |
        EnumeratedFunction
           |
           Collection
              |
              IndexedCollection
              |   |-- IndexedList
              |   |-- TimeSeries
              |
              SequencedCollection
                  |-- List
The Collecton classes have been optimized to organize and query large sets of data. A large number of the messages defined for these classes have been written in Vision and can therefore be modified and expanded as needed. As always, you can define any number of new messages for these classes.


Collection Basics

Many of the frequently used List, IndexedList, and TimeSeries messages are defined at the class Collection. Each subclass defines additional messages that are unique for the specific class. You can send the toList message to an instance of any of the Collecton subclasses to return an instance of the List class. You cannot directly convert instances of one of these subclasses to a TimeSeries or an IndexedList object.

All Collection objects respond to the message count. For example:

  Currency instanceList count print ;
prints the number of instances in the Currency class.

The do: message has been redefined at the Collection class. It performs the same function as the version defined at Object; however, instead of applying to a single object, the version defined for the Collection class applies the block supplied as the parameter to each element in the recipient object. For example:

  Currency instanceList
  do: [ code print: 10 ; name printNL ] ;
displays the value of code and name for each instance in the Currency class. The supplied block is evaluated for each element in the collection. Messages within the block are sent to the individual elements in the collection.

The send: message has also been redefined at the Collection class. It works identically to the do: message except for the value returned. The block is evaluated for each object in the recipient collection. Unlike the do: message, the result of evaluating the block is returned, not the original object. Blocks return the value of the last statement exectued. The send: message is defined to return this object for each object in the recipient collection. For example:

  !names <- Currency instanceList
      send: [ name ] ;
  names count printNL ;
In this example, the currency name is returned for each instance in the Currency class. The variable names represents a collection where each element in this collection is the name of a currency. If the do: message had been used instead of send:, the variable names would represent the list of all currency instances. The collection returned by the send: message has the same number of elements in the same order as the original collection. If the original collection is a List or an IndexedList, the send: message returns a List. If the original collection is a TimeSeries, the send: message returns a TimeSeries containing the same dates as the original TimeSeries. The do: message always returns the original collection.

The extendBy: message has been redefined at the Collection class. It performs the same function as the version defined at Object. Instead of applying to a single object, the version defined for the Collection class applies the block supplied as the parameter to each element in the recipient object, returning a collection where each element of the collection is extended by the variables defined within the block. For example:

  !xlist <- Currency instanceList
     extendBy: [ !nameLength <- name count ;
                 !name10 <- name take: 10 ;
               ] ;
defines xlist to represent a list of all currency instances that respond to the new variables nameLength and name10 in addition to the currency messages. For example:
  xlist 
  do: [ name print: 30 ;   
        nameLength printNL ;
      ] ;
All objects in xlist respond to the nameLength message just like any other message. All objects in the list continue to respond to the messages already defined such as name.

Within the square brackets, you can define as many variables as you would like. Each variable should be introduced with the symbol ! followed by a variable name. If the original collection is a List or an IndexedList, the extendBy: message returns a List. If the original collection is a TimeSeries, the extendBy: message returns a TimeSeries containing the same dates as the original TimeSeries.

The collect: message is used to create an extension of the original list that always defines a variable named value. This variable is set to the result of evaluating the block supplied as a parameter to the collect: message.

The messages do:, send:, and extendBy: have been redefined for the Collection classes to operate on the elements of the collection. The messages basicDo:, basicSend:, and basicExtend: are available if you want to run any of these operations on the collection as a whole. For example:

  Currency instanceList
  basicDo: [ whatAmI print: 30 ; count printNL ] ;
sends the messages inside the block to the collection object itself, not to the individual elements.

The message numberElements can be used to extend each element in a collection by the variable position which corresponds to its position number in the collection. For example:

  Currency instanceList 
    numberElements            #- extend by !position
  do: [ code print: 10 ;  
        position printNL ;
      ] ;

The message linkElements can be used to extend each element in a collection by the two variables: prior and next. These variables will return the elements in the recipient that are before or after each element in the recipient. For example:

  3 sequence linkElements
  do: [ print ; 
        prior print ; 
        next printNL 
      ] ;
displays:
          1      NA         2
          2        1        3
          3        2        NA


Creating Subsets

It is often useful to restrict a collection to objects that meet certain criteria. For example, you may wish to find the currencies with a US exchange rate greater than 2 and produce a report for those currencies:

  Currency instanceList            #- start with all currencies
     select: [ usExchange > 2 ].   #- return subset meeting criteria
  do: [ name print: 30 ;           #- display name and exchange
        usExchange printNL ;       #-   for members of subset
      ] ;

The select: message returns a collection of those elements in the original collection that meet the criteria specified.

The select: message returns a new Collection object that contains a subset of the objects in the original Collection. This subset can be empty (i.e., contain no objects) and will never have more elements than the original collection. If the original collection is a List or an IndexedList, the select: message returns a List. If the original collection is a TimeSeries, the select: message returns a TimeSeries containing a subset of the dates in the original TimeSeries.

The block supplied as the parameter to the select: message can contain any valid Vision program. You can create local variables in this block and use the && and || messages to produce multi-part criteria. For example:

  Currency instanceList
  select: [ !firstLetter <- name take: 1 ;
            (usExchange > 2 ) &&
            (firstLetter < "B" || firstLetter > "U")
          ] .
The value returned by the selection block is the value used to perform the selection. The subset will include all elements that return the value TRUE when this block is evaluated. Since the object returned by the select: message is another Collection, you can send another select: message to it. The previous example could have been written using:
  Currency instanceList
  select: [ usExchange > 2 ] .
  select: [ !firstLetter <- name take: 1 ;
            firstLetter < "B" || firstLetter > "U"
          ] .

The message first: is used to select the specified number of elements in the recipient that are not the default value, starting from the beginning of the collection. The message last: is used to select the specified number of elements in the recipient that are not the default value, starting from the end of the collection. For example:

  Currency instanceList last: 10 .
      select: [ usExchange > 2 ] .
This expression selects the last 10 currencies in the Currency class and then selects those with an exchange rate greater than 2 from this subset.

The message nonDefaults can be used to eliminate any default values from a Collection. For example:

  Currency instanceList nonDefaults
returns a list of all currencies excluding the default instance. It is equivalent to:
  Currency instanceList 
     select: [ isntDefault ] .


Sorting and Ranking Collections

The messages sortUp: and sortDown: return the recipient collection as a List object in ascending or descending order. The collection is sorted by the value returned by the block supplied as a parameter. For example:

  Currency instanceList 
     sortUp: [ name ] .
  do: [ name printNL ] ;
This program sorts the currency objects by name and prints them in alphabetical order.

You can perform multiple sorts using the sortUpBy:then: and sortDownBy:then: messages and supplying two criteria. For example, to sort by exchange rate, then name, use:

  Currency instanceList 
     sortUpBy: [ usExchange ] then: [ name ] .
  do: [ name print: 30 ; usExchange printNL ] ;
The messages sortUpBy:then:then:, sortUpBy:then:then:then:, sortDownBy:then:then:, and sortDownBy:then:then:then: have also been defined to perform sorts with additional criteria.

Since these messages assume all your sorts are ascending or descending, they cannot be used to sort the currencies from highest to lowest score and then alphabetically by name for ties. To accomplish this, you need to take advantage of the fact that the sort messages produce "stable" sorts. This means that when a sort message is applied to a collection, the original order of the collection is preserved if the sort produces a tied value. If you apply several sort criteria in reverse order (i.e., the most detailed level of sort first), you can produce the desired results. For example:

  Currency instanceList 
     sortUp: [ name ] .
     sortUp: [ usExchange ] .
  do: [ name print: 30 ; usExchange printNL ] ;
is identical to the previous example, producing an ascending sort by exchange rate, sorting alphabetically when the exchange rates are the same. The expression:
  Currency instanceList 
     sortUp: [ name ] .
     sortDown: [ usExchange ] .
  do: [ name print: 30 ; usExchange printNL ] ;
produces a descending sort by exchange rate, sorting alphabetically when the exchange rates are the same.

The rankUp: and rankDown: messages are defined to rank the elements in a Collection based on the value returned by evaluating the parameter supplied as a block. These message return a List or TimeSeries extended by the variable rank which represents the element's rank in the collection. The value of rank will be an integer between 1 and the number of elements in the recipient object. The returned object will have the same number of elements and will be in the same order as the recipient. For example:

  Currency instanceList
       rankUp: [ usExchange ] .
  do: [ code print: 5 ; usExchange print ; rank printNL ] ;
The displayed rank indicates the currency's position in the collection by ascending exchanging rate.

The messages rankDown:usingCollector: and rankUp:usingCollector: provide a way to define a different variable name to the hold the value of the rank. The second parameter is a named block that defines the variable name to use as a parameter and returns the value of ^current. For example, to name the variable alt instead of rank use:

  Currency instanceList
     rankUp: [ usExchange ] usingCollector: [ | :alt | ^current ] .
  do: [ code print: 5 ; usExchange print ; alt printNL ] ;


Grouping Collections

The groupedBy: message provides you with a powerful tool for aggregation, enabling you to simultaneously analyze information at a detailed and aggregate level. The groupedBy: message is used to organize your original Collecton into a List of groups, sublists that are generated based on the criteria supplied. You can perform summary analysis on each group and analyze the individual elements in each group.

The groupedBy: message groups the recipient collection based on the value returned by the block supplied as a parameter. It returns a List of the unique instances of this value found in the recipient collection, extended by the variable groupList, which returns the List of elements from the original collection that are associated with the specific group.

For example, if the variable companyList contains a list of companies that respond to the message sector, and each sector responds to the message name, then the expression:

  companyList groupedBy: [ sector ] .
  do: [ name printNL ;
      ] ;
can be used to group the companies into sectors and display the name of each sector present.

By asking Vision to group companyList by sector, Vision returns a List of the sectors that exist in companyList. The original set of companies is not lost. Each element in the List formed by the groupedBy: message responds to the message groupList. This message returns the List of companies from the original collection associated with the specific group. For example:

  companyList groupedBy: [ sector ] . 
  do: [ name print ;
        groupList count printNL ;
      ] ;
displays the number of elements in the group after each sector's name. Since groupList returns a List object, you can send any of the List messages to it. For example, you could display the companies in each sector using:
  companyList groupedBy: [ sector ] . 
  do: [ name print ;
        groupList count printNL ;
        groupList 
        do: [                        # for each company in sector 
            displayInfo ;            # print standard company info
            ] ;                      # end of groupList do: []
      ];                             # end of sector do: []
In this example, the summary information is displayed about the sector followed by an entry for each element in the sector.

Pictorially the groupedBy: message works as follows:

groupedBy: message structure

Grouping introduces a concept known as nested lists. In this case, the outer list represents a list of sectors. Each element in this outer list in turn responds to the message groupList which returns the set of elements from the original list that are in the current group. Since this subset is itself a list, you can perform any list operations on it, including grouping it further. For example:

  companyList groupedBy: [ sector ] .      # group by sector 
  do: [ 
      "SECTOR: " print; 
      name printNL;                        # print sector's name 
      groupList groupedBy: [ industry ] .  # group companies in sector 
      do: [                                #    by industry 
          "INDUSTRY: " print; 
          name printNL;                    # print industry's name 
          groupList                        # for each company 
          do: [                            #   in industry 
              "Company: " print ; 
              name printNL ;               # print company's name
              ] ;                          # end of companies in ind. 
          ] ;                              # end of industries in sector 
  ] ;                                      # end of sectors

This report groups companyList into sectors. The companies within a sector are then grouped into industries. The companies are listed under their corresponding industry. This example uses the groupedBy: message twice; once to group the original list into sectors; the second time to group the companies in a given sector into industries.


Note:

The message groupedByString: is used to correctly group lists using a criteria block that generates a String value. By default, strings that are distinct objects will be grouped into separate groups even if they have the same content. This version of the groupedBy: message groups strings with the same content into the same group, even if the initial strings are distinct objects.


A number of additional variations of the groupedBy: message have been defined and are summarized below:

Message Definition
groupedBy:in: Groups recipient using parameter1 and returns List containing one element for each value in the List supplied as parameter2
groupedBy:intersect: Groups recipient using parameter1 and returns List containing one element for each value that appears in both recipient and parameter2
groupedBy:union: Groups recipient using parameter1 and returns List containing one element for each value that appears in either recipient or parameter2
groupedBy:usingCutoffs: Groups recipient using parameter1 into partitions based on List of Numbers supplied in parameter2
groupedByCriteria: Groups recipient based on List of Blocks supplied as parameter, returning a List extended by keyList as well as groupList
groupPrintUsing: Groups recipient using parameter, displaying the count for each group
mgroupedBy: Groups recipient using parameter1 which can return a List of objects. Elements in the recipient may be included in one or more groupLists.


Collection Computation Messages

A number of messages have been defined to compute summary statistics for a Collection. For example:

  companyList average: [ sales ] 
computes the average sales value for the companies in companyList.

The following table summarizes the various statistical messages that have been defined. Note that most of these messages are defined in Vision and can therefore be copied or modified to create other variations as needed. These messages can be sent to an instance of any Collection subclass and return a numeric value of NA. Parameters are Blocks except where explicitly noted.

Message Definition Sample
average Simple average 5 sequence average
average: Simple average based on value returned by parameter list average: [ sales ]
average:withWeights: Weighted average using parameter2 to weight the value returned by parameter1 list average: [sales] withWeights: [mktCap]
compound Compounded value 5 sequence compound
compound: Compounded value based on value returned by parameter list compound: [sales]
correlate:with: Correlation based on evaluation of blocks supplied as two parameters list correlate: [sales] with: [profit]
gMean Geometric Mean 5 sequence gMean
gMean: Geometric mean based on value returned by parameter list gMean: [sales]
harmonicMean Harmonic Mean 5 sequence harmonicMean
harmonicMean: Harmonic mean based on value returned by parameter list harmonicMean: [sales]
harmonicMean:withWeights: Harmonic mean using parameter2 to weight the value returned by parameter1 list harmonicMean: [sales] withWeights: [mktCap]
max Maximum value in recipient 5 sequence max
max: Maximum based on value returned by parameter list max: [sales]
median Median value in recipient 5 sequence median
median: Median based on value returned by parameter list median: [sales]
min Minimum value in recipient 5 sequence min
min: Minimum based on value returned by parameter list min: [sales]
mode Mode value in recipient 5 sequence mode
mode: Mode based on value returned by parameter list mode: [sales]
product Product of values in recipient 5 sequence product
product: Product of values returned by parameter list product: [sales]
rankCorrelate:with: Correlation between relative ranks of values returned by the two parameters list rankCorrelate: [sales] with: [profit]
regress: Performs linear regression between recipient and parameter, returning object that responds to beta, alpha, pearson, rsq, and stdErr. (2,3,9,1,8,7,5) regress: (6,5,11,7,5,4,4) .
stdDev Standard deviation of values in recipient 5 sequence stdDev
stdDev: Standard deviation based on value returned by parameter list stdDev: [sales]
total Total value in recipient 5 sequence total
total: Total based on value returned by parameter list total: [sales]


Tiles, Running Totals, and Other Intra-List Messages

The decileUp: and decileDown: messages have been implemented to assign a value from 1 to 10 to each element in the recipient Collecton. The decileUp: message assigns the value 1 to the first 10% of the elements with the lowest values, the value 2 to the next 10%, and so on with the value 10 assigned to the elements in the top 10%. The decileDown: message assigns the value 1 to the first 10% of the elements with the highest values and the value 10 to the elements in the bottom 10%. When you send one of these messages to a Collection, Vision returns the recipient extended by the variable named decile. For example:

  companyList decileDown: [ sales ] . 
  do: [ name print: 30 ; 
        sales print: 10 ;
        decile printNL ; 
      ] ;
displays the decile value from 1 to 10 for each company in companyList.

The quintileUp: and quintileDown: messages are identical to the decile messages except they assign a value from 1 to 5, returned in a variable named quintile. The percentileUp: and percentileDown: messages are identical to the decile messages except they assign a value from 1 to 100, returned in a variable named percentile.

The messages tileUp:tiles: and tileDown:tiles: can be used to provide an arbitrary number of tiles. The extended variable is named tile. For example, to group a company list into fractiles (20 groups), you could use:

  companyList 
       tileUp: [ sales ] tiles: 20 .
  do: [ name print: 30 ; 
        sales print: 10 ;
        tile printNL ;      #- number from 1-20
      ] ;

If you want to define a different name for the extended variable representing the tile, you can use the messages decileUp:using:, decileDown:using:, quintileUp:using:, quintileDown:using:, percentileUp:using:, percentileDown:using:, tileUp:using:tiles:, and tileDown:using:tiles:. For example:

  companyList 
      decileUp: [ sales ] using: "d1" .
      decileUp: [ profit ] using: "d2" .
  do: [ name print: 30 ; 
        d1 print ;          # sales decile
        d2 printNL ;        # profit decile
      ] ;

These various "tiling" messages have all been defined in Vision using the messages tileUp:usingCollector:tiles: and tileDown:usingCollector:tiles:. The first parameter is the block to be evaluated. The second parameter defines the name of the extended variable. The third parameter indicates the number of "tiles" to create. For example, to define the message fractileUp:, use:

  Collection
  defineMethod: 
  [ | fractileUp: block |
    ^self tileUp: block 
          usingCollector: [ |:fractile| ^current ] 
          tiles: 20
  ] ;

The message runningTotal: returns the recipient Collection extended by the variable runningTotal which represents the running sum of the elements in the collection to that point. For example:

  20 sequence
     runningTotal: [ ^self ] .    
  do: [ print ; 
        runningTotal printNL ;
      ];
The message runningAverage: returns the recipient Collection extended by the variable runningAverage which represents the running average of the elements in the collection to that point.

Several additional Collection messages return the recipient collection extended by a new variable. The message normalize: returns the recipient extended by the variable norm, representing the normalized value of the element relative the mean and standard deviation of the collection. The message weightedDecile: returns the recipient extended by the variable decile, a number from 1 to 10. The message weightedQuintile: returns the recipient extended by the variable quintile, a number from 1 to 5.


Inter-Collection Messages

A number of messages have been defined to perform operations between two lists. The messages can be sent to an instance of any Collection subclass and return a List object unless otherwise noted.

Message Definition Sample
+ Add parameter to recipient (List or TimeSeries only); parameter can be a scalar number or a list of numbers 5 sequence + 5 sequence
- Subtract parameter from recipient (List or TimeSeries only); parameter can be a scalar number or a list of numbers 5 sequence - 3
* Multiply recipient by parameter (List or TimeSeries only); parameter can be a scalar number or a list of numbers 5 sequence * 5 sequence
/ Divide recipient by parameter (List or TimeSeries only); parameter can be a scalar number or a list of numbers 5 sequence / 2
isEquivalentTo: Returns Boolean indicating if recipient and parameter have equivalent content 5 sequence isEquivalentTo: 1,2,3,4,5 .
union: Returns List of elements in either or both the recipient and parameter 5 sequence union: 3 sequence .
union:using: Returns List of elements in either or both the recipient and parameter1 using block supplied as parameter2 to modify the elements before comparing for equality list1 union: list2 using: [ asSelf ] .
intersect: Returns List of elements in both the recipient and parameter 5 sequence intersect: 3 sequence .
intersect:using: Returns List of elements in both the recipient and parameter1 using block supplied as parameter2 to modify the elements before comparing for equality list1 intersect: list2 using: [ asSelf ] .
exclude: Returns List of elements in recipient that are not in parameter 5 sequence exclude: 3 sequence .
exclude:using: Returns List of elements in recipient that are not in parameter1 using block supplied as parameter2 to modify the elements before comparing for equality list1 exclude: list2 using: [ asSelf ] .
difference: Returns a List of 2 elements: the first contains the List of elements that are in recipient and not in parameter; the second contains the List of elements that are in parameter and not in recipient list1 difference: list2 .


Creating and Updating Collections

Many of the messages described in this document implicitly create new instances of one of the Collection subclasses. Messages such as send:, select:, groupedBy:, sortUp:, decileDown:, and extendBy: all create and return new Collection objects.

There is normally no reason to create new instances of the Collection class directly. You can explicitly create a new instance of any of the Collection subclasses by sending the new message to the specific class. For example:

  !list <- List new ; 
  !ilist <- IndexedList new ;
  !ts <- TimeSeries new ;
The variables list, ilist, and ts can be accessed and updated using the rules defined for the List, IndexedList, and TimeSeries, respectively.

The copyListElements message can be used to create a copy of a Collection. The returned object is a new Collection containing the same elements as the original. When this message is sent to a List or IndexedList, a new List is created and returned. When this message is sent to a TimeSeries, a new TimeSeries object is created and returned.

The append: message can be used to append objects to a Collection. The returned object is always a new List.

The collectListElementsFrom: message is used to create a single List from a Collection of Lists. The supplied parameter should be a Block. This message evaluates the block for each element and creates a "running list" of the objects returned by the block. If the returned object is a Collection, the individual elements in the collection are appending the "running list". For example:

  5 sequence
     collectListElementsFrom: [ ^self sequence ] .
  do: [ printNL ] ;
returns a single List containing the elements 1, 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5.


Collection Messages and the . Operator

A common requirement when working with Collections is to apply a series of operations in succession. Starting with a list, you may wish to perform a series of extensions, selections, and groupings, then sort and print the result. There are a number of different techniques that can be used to produce the same result. For example:

  !myList <- Currency instanceList ;
  !xList <- myList 
    extendBy: [ !rate <- 1 / usExchange ] ;
  !subset1 <- xList select: [ rate < 1.5 ] ;
  !subset2 <- subset1 first: 5 ;
  !final <- subset2 sortUp: [ name ] ;
  final do: [ code print: 5 ;
              name print: 15 ;
              rate printNL ;
            ] ;
This example uses a large number of temporary variables. Although there is nothing wrong with this approach, you need not save each of the intermediate steps. For example:
  Currency instanceList 
    extendBy: [ !rate <- 1 / usExchange ] .
    select: [ rate < 1.5 ] .
    first: 5 .
    sortUp: [ name ] .
  do: [ code print: 5 ;
        name print: 15 ;
        rate printNL ;
      ] ;
This example produces the same result without any intermediate variables.

Note that the . operator is used to terminate the keyword messages such as select: and do:. A common misassumption is that this operator is automatically inserted after each collection operation. Although this particular example may look as though that is the case, the purpose of the operator is to terminate the parameters and evaluate the expression up until that point. For example, if you were using the unary message numberElements in the previous expression it would not be followed by the .:

  Currency instanceList 
    numberElements           #- . not needed since no parameters
    extendBy: [ !rate <- 1 / usExchange ] .    #- . terminates block 
    select: [ rate < 1.5 ] .                   #- . terminates block 
    first: 5 .                                 #- . terminates count 
    sortUp: [ name ] .                         #- . terminates block
  do: [ code print: 5 ;
        name print: 15 ;
        rate printNL ;
      ] ;
The . operator provides an alternative to using parentheses in expressions containing numerous keywords applied in succession such as this one. The same expression using parentheses could be written as:
  ((((Currency instanceList 
         numberElements 
         extendBy: [ !rate <- 1 / usExchange ]
     ) select: [ rate < 1.5 ]
    ) first: 5 
   ) sortUp: [ name ] 
  ) do: [ code print: 5 ;
          name print: 15 ;
          rate printNL ;
        ] ;


When to Iterate

The do: message is defined to evaluate its block parameter for each element in the collection. Although this operation may appear to operate an element at a time, collection operations are actually optimized internally and do not execute by sequential evaluation.


Warning!!
Because collections are not evaluated sequentially, you do not have control over the order in which the collection is processed nor can you assume that the number of evaluations is equal to the number of elements in the list. For example:
    !count <- 0 ;
     list do: [ ^global :count <- ^global count + 1 ] ;
     count printNL ;
will not produce the results you expect. Although you might expect count to equal the number of elements in list, it would actually equal the number of internal passes needed to process the list.

The iterate: message can be used instead of do: when you want to evaluate the list in order, an element at a time. Although there are situations when you may need to do this, this message is much slower than the do: message since it does not benefit from the optimizations available. For example:

  !count <- 0 ;
  list iterate: [ ^global :count <- ^global count + 1 ] ;
  count printNL ;

Related Topics