Vision Class: Collection
Overview
The Collection class is an abstract class that is used to organize classes in the hierarchy whose instances represent sets of objects. The major subclasses of Collection are: List, IndexedList, and TimeSeries.
Instances of the class List represent collections of objects that are accessed either by position or as a set. Instances of the class IndexedList represent collections of objects that are accessed by a user-defined index or as a set. Instances of the class TimeSeries represent collections of objects that are accessed by date or as a set. The elements in a collection do not need to be from the same class.
The Collection class is an indirect subclass of Object:
Object | Function | EnumeratedFunction | Collection | IndexedCollection | |-- IndexedList | |-- TimeSeries | SequencedCollection |-- ListThe Collecton classes have been optimized to organize and query large sets of data. A large number of the messages defined for these classes have been written in Vision and can therefore be modified and expanded as needed. As always, you can define any number of new messages for these classes.
Collection Basics
Many of the frequently used List, IndexedList, and TimeSeries messages are defined at the class Collection. Each subclass defines additional messages that are unique for the specific class. You can send the toList message to an instance of any of the Collecton subclasses to return an instance of the List class. You cannot directly convert instances of one of these subclasses to a TimeSeries or an IndexedList object.
All Collection objects respond to the message count. For example:
Currency instanceList count print ;prints the number of instances in the Currency class.
The do: message has been redefined at the Collection class. It performs the same function as the version defined at Object; however, instead of applying to a single object, the version defined for the Collection class applies the block supplied as the parameter to each element in the recipient object. For example:
Currency instanceList do: [ code print: 10 ; name printNL ] ;displays the value of code and name for each instance in the Currency class. The supplied block is evaluated for each element in the collection. Messages within the block are sent to the individual elements in the collection.
The send: message has also been redefined at the Collection class. It works identically to the do: message except for the value returned. The block is evaluated for each object in the recipient collection. Unlike the do: message, the result of evaluating the block is returned, not the original object. Blocks return the value of the last statement exectued. The send: message is defined to return this object for each object in the recipient collection. For example:
!names <- Currency instanceList send: [ name ] ; names count printNL ;In this example, the currency name is returned for each instance in the Currency class. The variable names represents a collection where each element in this collection is the name of a currency. If the do: message had been used instead of send:, the variable names would represent the list of all currency instances. The collection returned by the send: message has the same number of elements in the same order as the original collection. If the original collection is a List or an IndexedList, the send: message returns a List. If the original collection is a TimeSeries, the send: message returns a TimeSeries containing the same dates as the original TimeSeries. The do: message always returns the original collection.
The extendBy: message has been redefined at the Collection class. It performs the same function as the version defined at Object. Instead of applying to a single object, the version defined for the Collection class applies the block supplied as the parameter to each element in the recipient object, returning a collection where each element of the collection is extended by the variables defined within the block. For example:
!xlist <- Currency instanceList extendBy: [ !nameLength <- name count ; !name10 <- name take: 10 ; ] ;defines xlist to represent a list of all currency instances that respond to the new variables nameLength and name10 in addition to the currency messages. For example:
xlist do: [ name print: 30 ; nameLength printNL ; ] ;All objects in xlist respond to the nameLength message just like any other message. All objects in the list continue to respond to the messages already defined such as name.
Within the square brackets, you can define as many variables as you would like. Each variable should be introduced with the symbol ! followed by a variable name. If the original collection is a List or an IndexedList, the extendBy: message returns a List. If the original collection is a TimeSeries, the extendBy: message returns a TimeSeries containing the same dates as the original TimeSeries.
The collect: message is used to create an extension of the original list that always defines a variable named value. This variable is set to the result of evaluating the block supplied as a parameter to the collect: message.
The messages do:, send:, and extendBy: have been redefined for the Collection classes to operate on the elements of the collection. The messages basicDo:, basicSend:, and basicExtend: are available if you want to run any of these operations on the collection as a whole. For example:
Currency instanceList basicDo: [ whatAmI print: 30 ; count printNL ] ;sends the messages inside the block to the collection object itself, not to the individual elements.
The message numberElements can be used to extend each element in a collection by the variable position which corresponds to its position number in the collection. For example:
Currency instanceList numberElements #- extend by !position do: [ code print: 10 ; position printNL ; ] ;
The message linkElements can be used to extend each element in a collection by the two variables: prior and next. These variables will return the elements in the recipient that are before or after each element in the recipient. For example:
3 sequence linkElements do: [ print ; prior print ; next printNL ] ;displays:
1 NA 2 2 1 3 3 2 NA
Creating Subsets
It is often useful to restrict a collection to objects that meet certain criteria. For example, you may wish to find the currencies with a US exchange rate greater than 2 and produce a report for those currencies:
Currency instanceList #- start with all currencies select: [ usExchange > 2 ]. #- return subset meeting criteria do: [ name print: 30 ; #- display name and exchange usExchange printNL ; #- for members of subset ] ;
The select: message returns a collection of those elements in the original collection that meet the criteria specified.
The select: message returns a new Collection object that contains a subset of the objects in the original Collection. This subset can be empty (i.e., contain no objects) and will never have more elements than the original collection. If the original collection is a List or an IndexedList, the select: message returns a List. If the original collection is a TimeSeries, the select: message returns a TimeSeries containing a subset of the dates in the original TimeSeries.
The block supplied as the parameter to the select: message can contain any valid Vision program. You can create local variables in this block and use the && and || messages to produce multi-part criteria. For example:
Currency instanceList select: [ !firstLetter <- name take: 1 ; (usExchange > 2 ) && (firstLetter < "B" || firstLetter > "U") ] .The value returned by the selection block is the value used to perform the selection. The subset will include all elements that return the value TRUE when this block is evaluated. Since the object returned by the select: message is another Collection, you can send another select: message to it. The previous example could have been written using:
Currency instanceList select: [ usExchange > 2 ] . select: [ !firstLetter <- name take: 1 ; firstLetter < "B" || firstLetter > "U" ] .
The message first: is used to select the specified number of elements in the recipient that are not the default value, starting from the beginning of the collection. The message last: is used to select the specified number of elements in the recipient that are not the default value, starting from the end of the collection. For example:
Currency instanceList last: 10 . select: [ usExchange > 2 ] .This expression selects the last 10 currencies in the Currency class and then selects those with an exchange rate greater than 2 from this subset.
The message nonDefaults can be used to eliminate any default values from a Collection. For example:
Currency instanceList nonDefaultsreturns a list of all currencies excluding the default instance. It is equivalent to:
Currency instanceList select: [ isntDefault ] .
Sorting and Ranking Collections
The messages sortUp: and sortDown: return the recipient collection as a List object in ascending or descending order. The collection is sorted by the value returned by the block supplied as a parameter. For example:
Currency instanceList sortUp: [ name ] . do: [ name printNL ] ;This program sorts the currency objects by name and prints them in alphabetical order.
You can perform multiple sorts using the sortUpBy:then: and sortDownBy:then: messages and supplying two criteria. For example, to sort by exchange rate, then name, use:
Currency instanceList sortUpBy: [ usExchange ] then: [ name ] . do: [ name print: 30 ; usExchange printNL ] ;The messages sortUpBy:then:then:, sortUpBy:then:then:then:, sortDownBy:then:then:, and sortDownBy:then:then:then: have also been defined to perform sorts with additional criteria.
Since these messages assume all your sorts are ascending or descending, they cannot be used to sort the currencies from highest to lowest score and then alphabetically by name for ties. To accomplish this, you need to take advantage of the fact that the sort messages produce "stable" sorts. This means that when a sort message is applied to a collection, the original order of the collection is preserved if the sort produces a tied value. If you apply several sort criteria in reverse order (i.e., the most detailed level of sort first), you can produce the desired results. For example:
Currency instanceList sortUp: [ name ] . sortUp: [ usExchange ] . do: [ name print: 30 ; usExchange printNL ] ;is identical to the previous example, producing an ascending sort by exchange rate, sorting alphabetically when the exchange rates are the same. The expression:
Currency instanceList sortUp: [ name ] . sortDown: [ usExchange ] . do: [ name print: 30 ; usExchange printNL ] ;produces a descending sort by exchange rate, sorting alphabetically when the exchange rates are the same.
The rankUp: and rankDown: messages are defined to rank the elements in a Collection based on the value returned by evaluating the parameter supplied as a block. These message return a List or TimeSeries extended by the variable rank which represents the element's rank in the collection. The value of rank will be an integer between 1 and the number of elements in the recipient object. The returned object will have the same number of elements and will be in the same order as the recipient. For example:
Currency instanceList rankUp: [ usExchange ] . do: [ code print: 5 ; usExchange print ; rank printNL ] ;The displayed rank indicates the currency's position in the collection by ascending exchanging rate.
The messages rankDown:usingCollector: and rankUp:usingCollector: provide a way to define a different variable name to the hold the value of the rank. The second parameter is a named block that defines the variable name to use as a parameter and returns the value of ^current. For example, to name the variable alt instead of rank use:
Currency instanceList rankUp: [ usExchange ] usingCollector: [ | :alt | ^current ] . do: [ code print: 5 ; usExchange print ; alt printNL ] ;
Grouping Collections
The groupedBy: message provides you with a powerful tool for aggregation, enabling you to simultaneously analyze information at a detailed and aggregate level. The groupedBy: message is used to organize your original Collecton into a List of groups, sublists that are generated based on the criteria supplied. You can perform summary analysis on each group and analyze the individual elements in each group.
The groupedBy: message groups the recipient collection based on the value returned by the block supplied as a parameter. It returns a List of the unique instances of this value found in the recipient collection, extended by the variable groupList, which returns the List of elements from the original collection that are associated with the specific group.
For example, if the variable companyList contains a list of companies that respond to the message sector, and each sector responds to the message name, then the expression:
companyList groupedBy: [ sector ] . do: [ name printNL ; ] ;can be used to group the companies into sectors and display the name of each sector present.
By asking Vision to group companyList by sector, Vision returns a List of the sectors that exist in companyList. The original set of companies is not lost. Each element in the List formed by the groupedBy: message responds to the message groupList. This message returns the List of companies from the original collection associated with the specific group. For example:
companyList groupedBy: [ sector ] . do: [ name print ; groupList count printNL ; ] ;displays the number of elements in the group after each sector's name. Since groupList returns a List object, you can send any of the List messages to it. For example, you could display the companies in each sector using:
companyList groupedBy: [ sector ] . do: [ name print ; groupList count printNL ; groupList do: [ # for each company in sector displayInfo ; # print standard company info ] ; # end of groupList do: [] ]; # end of sector do: []In this example, the summary information is displayed about the sector followed by an entry for each element in the sector.
Pictorially the groupedBy: message works as follows:
Grouping introduces a concept known as nested lists. In this case, the outer list represents a list of sectors. Each element in this outer list in turn responds to the message groupList which returns the set of elements from the original list that are in the current group. Since this subset is itself a list, you can perform any list operations on it, including grouping it further. For example:
companyList groupedBy: [ sector ] . # group by sector do: [ "SECTOR: " print; name printNL; # print sector's name groupList groupedBy: [ industry ] . # group companies in sector do: [ # by industry "INDUSTRY: " print; name printNL; # print industry's name groupList # for each company do: [ # in industry "Company: " print ; name printNL ; # print company's name ] ; # end of companies in ind. ] ; # end of industries in sector ] ; # end of sectors
This report groups companyList into sectors. The companies within a sector are then grouped into industries. The companies are listed under their corresponding industry. This example uses the groupedBy: message twice; once to group the original list into sectors; the second time to group the companies in a given sector into industries.
Note: The message groupedByString: is used to correctly group lists using a criteria block that generates a String value. By default, strings that are distinct objects will be grouped into separate groups even if they have the same content. This version of the groupedBy: message groups strings with the same content into the same group, even if the initial strings are distinct objects.
A number of additional variations of the groupedBy: message have been defined and are summarized below:
Message | Definition |
groupedBy:in: | Groups recipient using parameter1 and returns List containing one element for each value in the List supplied as parameter2 |
groupedBy:intersect: | Groups recipient using parameter1 and returns List containing one element for each value that appears in both recipient and parameter2 |
groupedBy:union: | Groups recipient using parameter1 and returns List containing one element for each value that appears in either recipient or parameter2 |
groupedBy:usingCutoffs: | Groups recipient using parameter1 into partitions based on List of Numbers supplied in parameter2 |
groupedByCriteria: | Groups recipient based on List of Blocks supplied as parameter, returning a List extended by keyList as well as groupList |
groupPrintUsing: | Groups recipient using parameter, displaying the count for each group |
mgroupedBy: | Groups recipient using parameter1 which can return a List of objects. Elements in the recipient may be included in one or more groupLists. |
Collection Computation Messages
A number of messages have been defined to compute summary statistics for a Collection. For example:
companyList average: [ sales ]computes the average sales value for the companies in companyList.
The following table summarizes the various statistical messages that have been defined. Note that most of these messages are defined in Vision and can therefore be copied or modified to create other variations as needed. These messages can be sent to an instance of any Collection subclass and return a numeric value of NA. Parameters are Blocks except where explicitly noted.
Message | Definition | Sample |
average | Simple average | 5 sequence average |
average: | Simple average based on value returned by parameter | list average: [ sales ] |
average:withWeights: | Weighted average using parameter2 to weight the value returned by parameter1 | list average: [sales] withWeights: [mktCap] |
compound | Compounded value | 5 sequence compound |
compound: | Compounded value based on value returned by parameter | list compound: [sales] |
correlate:with: | Correlation based on evaluation of blocks supplied as two parameters | list correlate: [sales] with: [profit] |
gMean | Geometric Mean | 5 sequence gMean |
gMean: | Geometric mean based on value returned by parameter | list gMean: [sales] |
harmonicMean | Harmonic Mean | 5 sequence harmonicMean |
harmonicMean: | Harmonic mean based on value returned by parameter | list harmonicMean: [sales] |
harmonicMean:withWeights: | Harmonic mean using parameter2 to weight the value returned by parameter1 | list harmonicMean: [sales] withWeights: [mktCap] |
max | Maximum value in recipient | 5 sequence max |
max: | Maximum based on value returned by parameter | list max: [sales] |
median | Median value in recipient | 5 sequence median |
median: | Median based on value returned by parameter | list median: [sales] |
min | Minimum value in recipient | 5 sequence min |
min: | Minimum based on value returned by parameter | list min: [sales] |
mode | Mode value in recipient | 5 sequence mode |
mode: | Mode based on value returned by parameter | list mode: [sales] |
product | Product of values in recipient | 5 sequence product |
product: | Product of values returned by parameter | list product: [sales] |
rankCorrelate:with: | Correlation between relative ranks of values returned by the two parameters | list rankCorrelate: [sales] with: [profit] |
regress: | Performs linear regression between recipient and parameter, returning object that responds to beta, alpha, pearson, rsq, and stdErr. | (2,3,9,1,8,7,5) regress: (6,5,11,7,5,4,4) . |
stdDev | Standard deviation of values in recipient | 5 sequence stdDev |
stdDev: | Standard deviation based on value returned by parameter | list stdDev: [sales] |
total | Total value in recipient | 5 sequence total |
total: | Total based on value returned by parameter | list total: [sales] |
Tiles, Running Totals, and Other Intra-List Messages
The decileUp: and decileDown: messages have been implemented to assign a value from 1 to 10 to each element in the recipient Collecton. The decileUp: message assigns the value 1 to the first 10% of the elements with the lowest values, the value 2 to the next 10%, and so on with the value 10 assigned to the elements in the top 10%. The decileDown: message assigns the value 1 to the first 10% of the elements with the highest values and the value 10 to the elements in the bottom 10%. When you send one of these messages to a Collection, Vision returns the recipient extended by the variable named decile. For example:
companyList decileDown: [ sales ] . do: [ name print: 30 ; sales print: 10 ; decile printNL ; ] ;displays the decile value from 1 to 10 for each company in companyList.
The quintileUp: and quintileDown: messages are identical to the decile messages except they assign a value from 1 to 5, returned in a variable named quintile. The percentileUp: and percentileDown: messages are identical to the decile messages except they assign a value from 1 to 100, returned in a variable named percentile.
The messages tileUp:tiles: and tileDown:tiles: can be used to provide an arbitrary number of tiles. The extended variable is named tile. For example, to group a company list into fractiles (20 groups), you could use:
companyList tileUp: [ sales ] tiles: 20 . do: [ name print: 30 ; sales print: 10 ; tile printNL ; #- number from 1-20 ] ;
If you want to define a different name for the extended variable representing the tile, you can use the messages decileUp:using:, decileDown:using:, quintileUp:using:, quintileDown:using:, percentileUp:using:, percentileDown:using:, tileUp:using:tiles:, and tileDown:using:tiles:. For example:
companyList decileUp: [ sales ] using: "d1" . decileUp: [ profit ] using: "d2" . do: [ name print: 30 ; d1 print ; # sales decile d2 printNL ; # profit decile ] ;
These various "tiling" messages have all been defined in Vision using the messages tileUp:usingCollector:tiles: and tileDown:usingCollector:tiles:. The first parameter is the block to be evaluated. The second parameter defines the name of the extended variable. The third parameter indicates the number of "tiles" to create. For example, to define the message fractileUp:, use:
Collection defineMethod: [ | fractileUp: block | ^self tileUp: block usingCollector: [ |:fractile| ^current ] tiles: 20 ] ;
The message runningTotal: returns the recipient Collection extended by the variable runningTotal which represents the running sum of the elements in the collection to that point. For example:
20 sequence runningTotal: [ ^self ] . do: [ print ; runningTotal printNL ; ];The message runningAverage: returns the recipient Collection extended by the variable runningAverage which represents the running average of the elements in the collection to that point.
Several additional Collection messages return the recipient collection extended by a new variable. The message normalize: returns the recipient extended by the variable norm, representing the normalized value of the element relative the mean and standard deviation of the collection. The message weightedDecile: returns the recipient extended by the variable decile, a number from 1 to 10. The message weightedQuintile: returns the recipient extended by the variable quintile, a number from 1 to 5.
Inter-Collection Messages
A number of messages have been defined to perform operations between two lists. The messages can be sent to an instance of any Collection subclass and return a List object unless otherwise noted.
Message | Definition | Sample |
+ | Add parameter to recipient (List or TimeSeries only); parameter can be a scalar number or a list of numbers | 5 sequence + 5 sequence |
- | Subtract parameter from recipient (List or TimeSeries only); parameter can be a scalar number or a list of numbers | 5 sequence - 3 |
* | Multiply recipient by parameter (List or TimeSeries only); parameter can be a scalar number or a list of numbers | 5 sequence * 5 sequence |
/ | Divide recipient by parameter (List or TimeSeries only); parameter can be a scalar number or a list of numbers | 5 sequence / 2 |
isEquivalentTo: | Returns Boolean indicating if recipient and parameter have equivalent content | 5 sequence isEquivalentTo: 1,2,3,4,5 . |
union: | Returns List of elements in either or both the recipient and parameter | 5 sequence union: 3 sequence . |
union:using: | Returns List of elements in either or both the recipient and parameter1 using block supplied as parameter2 to modify the elements before comparing for equality | list1 union: list2 using: [ asSelf ] . |
intersect: | Returns List of elements in both the recipient and parameter | 5 sequence intersect: 3 sequence . |
intersect:using: | Returns List of elements in both the recipient and parameter1 using block supplied as parameter2 to modify the elements before comparing for equality | list1 intersect: list2 using: [ asSelf ] . |
exclude: | Returns List of elements in recipient that are not in parameter | 5 sequence exclude: 3 sequence . |
exclude:using: | Returns List of elements in recipient that are not in parameter1 using block supplied as parameter2 to modify the elements before comparing for equality | list1 exclude: list2 using: [ asSelf ] . |
difference: | Returns a List of 2 elements: the first contains the List of elements that are in recipient and not in parameter; the second contains the List of elements that are in parameter and not in recipient | list1 difference: list2 . |
Creating and Updating Collections
Many of the messages described in this document implicitly create new instances of one of the Collection subclasses. Messages such as send:, select:, groupedBy:, sortUp:, decileDown:, and extendBy: all create and return new Collection objects.
There is normally no reason to create new instances of the Collection class directly. You can explicitly create a new instance of any of the Collection subclasses by sending the new message to the specific class. For example:
!list <- List new ; !ilist <- IndexedList new ; !ts <- TimeSeries new ;The variables list, ilist, and ts can be accessed and updated using the rules defined for the List, IndexedList, and TimeSeries, respectively.
The copyListElements message can be used to create a copy of a Collection. The returned object is a new Collection containing the same elements as the original. When this message is sent to a List or IndexedList, a new List is created and returned. When this message is sent to a TimeSeries, a new TimeSeries object is created and returned.
The append: message can be used to append objects to a Collection. The returned object is always a new List.
The collectListElementsFrom: message is used to create a single List from a Collection of Lists. The supplied parameter should be a Block. This message evaluates the block for each element and creates a "running list" of the objects returned by the block. If the returned object is a Collection, the individual elements in the collection are appending the "running list". For example:
5 sequence collectListElementsFrom: [ ^self sequence ] . do: [ printNL ] ;returns a single List containing the elements 1, 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5.
Collection Messages and the . Operator
A common requirement when working with Collections is to apply a series of operations in succession. Starting with a list, you may wish to perform a series of extensions, selections, and groupings, then sort and print the result. There are a number of different techniques that can be used to produce the same result. For example:
!myList <- Currency instanceList ; !xList <- myList extendBy: [ !rate <- 1 / usExchange ] ; !subset1 <- xList select: [ rate < 1.5 ] ; !subset2 <- subset1 first: 5 ; !final <- subset2 sortUp: [ name ] ; final do: [ code print: 5 ; name print: 15 ; rate printNL ; ] ;This example uses a large number of temporary variables. Although there is nothing wrong with this approach, you need not save each of the intermediate steps. For example:
Currency instanceList extendBy: [ !rate <- 1 / usExchange ] . select: [ rate < 1.5 ] . first: 5 . sortUp: [ name ] . do: [ code print: 5 ; name print: 15 ; rate printNL ; ] ;This example produces the same result without any intermediate variables.
Note that the . operator is used to terminate the keyword messages such as select: and do:. A common misassumption is that this operator is automatically inserted after each collection operation. Although this particular example may look as though that is the case, the purpose of the operator is to terminate the parameters and evaluate the expression up until that point. For example, if you were using the unary message numberElements in the previous expression it would not be followed by the .:
Currency instanceList numberElements #- . not needed since no parameters extendBy: [ !rate <- 1 / usExchange ] . #- . terminates block select: [ rate < 1.5 ] . #- . terminates block first: 5 . #- . terminates count sortUp: [ name ] . #- . terminates block do: [ code print: 5 ; name print: 15 ; rate printNL ; ] ;The . operator provides an alternative to using parentheses in expressions containing numerous keywords applied in succession such as this one. The same expression using parentheses could be written as:
((((Currency instanceList numberElements extendBy: [ !rate <- 1 / usExchange ] ) select: [ rate < 1.5 ] ) first: 5 ) sortUp: [ name ] ) do: [ code print: 5 ; name print: 15 ; rate printNL ; ] ;
When to Iterate
The do: message is defined to evaluate its block parameter for each element in the collection. Although this operation may appear to operate an element at a time, collection operations are actually optimized internally and do not execute by sequential evaluation.
Warning!! Because collections are not evaluated sequentially, you do not have control over the order in which the collection is processed nor can you assume that the number of evaluations is equal to the number of elements in the list. For example:!count <- 0 ; list do: [ ^global :count <- ^global count + 1 ] ; count printNL ;will not produce the results you expect. Although you might expect count to equal the number of elements in list, it would actually equal the number of internal passes needed to process the list.
The iterate: message can be used instead of do: when you want to evaluate the list in order, an element at a time. Although there are situations when you may need to do this, this message is much slower than the do: message since it does not benefit from the optimizations available. For example:
!count <- 0 ; list iterate: [ ^global :count <- ^global count + 1 ] ; count printNL ;