Thursday, February 9, 2012

Practical Garbage Collection

Parallel collection

The point of having a generational collector is to optimize for throughput; in other words, the total amount of work the application gets to do in a particular amount of time. As a side-effect, most of the pauses incurred due to garbage collection also become shorter. However, no attempt is made to eliminate the periodic full collections which will imply a pause time of whatever is necessary to complete a full collection.

The throughput collector does do one thing which is worth mentioning in order to mitigate this: It is parallel, meaning it uses multiple CPU cores simultaneously to speed up garbage collection. This does lead to shorter pause times, but there is a limit to how far you can go – even in an unrealistic perfect situation of a linear speed-up (meaning, double CPU count -> half collection time) you are limited by the number of CPU cores on your system. If you are collecting a 30 GB heap, that is going to take some significant time even if you do so with 16 parallel threads.

In garbage collection parlance, the word parallel is used to refer to a collector that does work on multiple CPU cores at the same time.

No comments: