Debugging Malloc Errors
Build a reproducer
First, to reproduce the error, we had to find the limit datasize
from the job that malloc-ed and then run it in a session connected to the appropriate version of the database.
It is important to note the run against an old version of the database because the update had subsequently completed; building new holdings takes more memory than overwriting holdings already in the database. It will also take at least some memory to profile the database.
% limit datasize 4000000
% batchvision -v 5099805
V> 20091016 evaluate: [ IVR Holdings doitFor: 20091016 asDate; ];
V> ?G
** MALLOC
% limit datasize 5000000
% batchvision -v 5099805
V> 20091016 evaluate: [ IVR Holdings doitFor: 20091016 asDate; ];
V> ?G
V> "starting investigation" printNL;
Run your reproducer in a “bigger” session
Then you can see above that you simply bump the limit datasize
in another session with your reproducer. This results in a running session where you can look at the structures that your update has modified!
Survey the big structures
This query quickly shows you how much memory the big containers are using:
^tmp define: 'bigContainerCount' toBe: 10;
!bigContainers <- AdminTools objectSpaceList
extendBy: [ !bc; !os <- ^self; ] .
iterate: [
:bc <- os containerList
select: [ (containerAddrType > 0) ] .
sortDown: [ containerSize ] .
first: ^tmp bigContainerCount ;
] .
collectListElementsFrom: [ bc ] .
reject: [ containerSize isNA ] . # optional but usually helpful
sortDown: [ containerSize ] .
first: ^tmp bigContainerCount;
bigContainers do: [
displayContainerInfo;
] .
total: [ containerSize ] . printWithCommasNL: 20;
TIP: In large databases you may want to do this query only on the object spaces that you know are impacted by the load. Iterating through each object spaces containerList can take a very long time and use memory in a session that won’t be reclaimed
Which looks like:
7 2073 1 Double 599384068 309834 32 3
10 300 1 Pointer 431030356 346407 32 3
7 2062 1 Pointer 322948304 309833 32 3
10 12817 1 Double 179353108 346411 32 3
7 2060 1 Property 139319916 309831 32 13
10 11997 1 Pointer 102939952 345912 32 3
10 12027 1 List 102914580 345918 32 3
10 2780 1 Pointer 98961276 345311 32 3
10 2794 1 Pointer 98961276 344594 32 3
10 2464 1 Pointer 62721356 346409 32 3
2,138,534,192.00
V> AdminTools displayMemoryStats;
V> ?g
Current allocation level: 3,353,062,696
Session maximum: 3,874,077,748
Heap limit: 10,240,000,000
Total size of mapped segments683,359,575,500
The top 10 segments account for 2.1GB of the 3.8GB needed to run this update. But what are they?
( AdminTools displayMemoryStats
missing? )
Which business objects do the containers belong to?
This query will give you tips as to what properties the biggest containers belong to:
bigContainers first: 4 . iterate: [
"=" fill: 60 . printNL;
"Container: " print;
^self displayContainerInfo;
newLine print;
^self displayLocalVisionReferents;
newLine print;
];
============================================================
Container: 7 2073 1 Double 599384068 309834 32 3
ConstituentData [ 7: 2089] '_shares'
TimeSeries ID: [ 7: 2063]
Number of TimeSeries in Store: 7619556
Number of TimePoints in Store: 5276
TimeSeries LStore profile:
LStore ID: [ 7: 2061]
Number of Lists in Store: 7619556
LStore Content profile:
Vector ID: [ 7: 2060]
Number of elements: 80737066
Number of usegments: 2
Number needing alignment: 0
Number of transitions: 11609984
============================================================
Container: 10 300 1 Pointer 431030356 346407 32 3
Account [ 10: 955] 'holdingsSeries'
TimeSeries ID: [ 10: 303]
Number of TimeSeries in Store: 54352
Number of TimePoints in Store: 5319
TimeSeries LStore profile:
LStore ID: [ 10: 302]
Number of Lists in Store: 54352
LStore Content profile:
Vector ID: [ 10: 25]
Number of elements: 107757579
Number of usegments: 5
Number needing alignment: 0
Number of transitions: 345859
============================================================
Container: 7 2062 1 Pointer 322948304 309833 32 3
ConstituentData [ 7: 2089] '_shares'
TimeSeries ID: [ 7: 2063]
Number of TimeSeries in Store: 7619556
Number of TimePoints in Store: 5276
TimeSeries LStore profile:
LStore ID: [ 7: 2061]
Number of Lists in Store: 7619556
LStore Content profile:
Vector ID: [ 7: 2060]
Number of elements: 80737066
Number of usegments: 2
Number needing alignment: 0
Number of transitions: 11609984
============================================================
Container: 10 12817 1 Double 179353108 346411 32 3
Account [ 10: 955] '_totalMarketValue'
TimeSeries ID: [ 10: 650]
Number of TimeSeries in Store: 54352
Number of TimePoints in Store: 5318
Size of Cartesian Space: 289043936
Number of MapTransitions: 69671
TimeSeries LStore profile:
LStore ID: [ 10: 648]
Number of Lists in Store: 54352
LStore Content profile:
Vector ID: [ 10: 647]
Number of elements: 107703574
Number of usegments: 20
Number needing alignment: 18
Number of transitions: 154030
The _totalMarketValue
, _shares
and holdingsSeries
taking up most of the memory from a holdings load is pretty much what one would expect. But an update was unexpectedly using too much memory these techniques would be the exact techniques would use to begin to figure out why.
Memory Usage Nuances
Container Table Traversal
Container Table traversal uses up memory that can’t be reclaimed by a session. This query demonstrates this phenomenon effectively:
"Before:" printNL;
AdminTools displayMemoryStats;
AdminTools objectSpaceList iterate: [ containerList ];
newLine print;
"After Traversal:" printNL;
AdminTools displayMemoryStats;
Utility collectSessionGarbage;
newLine print;
"After collectSessionGarbage:" printNL;
AdminTools displayMemoryStats;
You can see that the 427MB used by the traversal can’t be reclaimed by a collectSessionGarbage
:
Current allocation level: 80,892,443
Session maximum: 80,906,379
Heap limit: 10,240,000,000
Total size of mapped segments 340,008,248
After Traversal:
Current allocation level: 427,029,603
Session maximum: 471,514,023
Heap limit: 10,240,000,000
Total size of mapped segments356,799,206,488
After collectSessionGarbage:
Current allocation level: 427,029,603
Session maximum: 471,514,023
Heap limit: 10,240,000,000
Total size of mapped segments356,825,094,992
Entity Creation memory spikes
Often there are many large properties off of key Entities such as Security
or Account
. Creating a single entity here can often take much more memory than one might expect:
"Before:" printNL;
AdminTools displayMemoryStats;
Security createInstance
newLine print;
"After:" printNL;
AdminTools displayMemoryStats;
Creating a single security instance takes 500MB of memory
Before:
Current allocation level: 2,641,712
Session maximum: 2,659,968
Heap limit: 2,684,354,560
Total size of mapped segments 18,833,024
After:
Current allocation level: 482,271,800
Session maximum: 482,273,068
Heap limit: 2,684,354,560
Total size of mapped segments 784,757,728
AdminTools displayMemoryStats
In case you don’t have this method in your database:
AdminTools respondsTo: 'displayMemoryStats' . ifFalse: [
AdminTools define:'sessionAllocationHighWaterMark' toBePrimitive: 600 withControlValue: 1;
AdminTools define:'sessionAllocationResourceLimit' toBePrimitive: 600 withControlValue: 2;
AdminTools define:'sessionMappedAddressSpaceLevel' toBePrimitive: 600 withControlValue: 3;
AdminTools defineMethod: [|displayMemoryStats|
" Current allocation level: " print;
totalNetworkAllocation printWithCommasNL: 15.000000;
" Session maximum: " print;
sessionAllocationHighWaterMark printWithCommasNL: 15.000000;
" Heap limit: " print;
sessionAllocationResourceLimit printWithCommasNL: 15.000000;
" Total size of mapped segments" print;
sessionMappedAddressSpaceLevel printWithCommasNL: 15.000000;
];
];