Load Stall Minimization

A common pattern seen in the profiles for many Java programs is large amounts of time spent on object indirection stalls. The first time an object pointer is de-referenced is usually not long after we have loaded that reference. Placing pre-fetches could help reduce the penalty, but having some way of figuring out which indirections could be costly, coupled with pre-fetch placement and code motion to try and produce an optimal delay between pre-fetch and indirection, would be a useful optimization. Another way of dealing with this problem would be change the garbage collection (GC) and how it arranges objects, i.e. which objects are adjacent to which ones and how to be more optimal with respect to what falls on the same cache line; a combination of approaches could also be beneficial.