TY - GEN
T1 - Efficient MRC construction with SHArdS
AU - Waldspurger, Carl A.
AU - Park, Nohhyun
AU - Garthwaite, Alexander
AU - Ahmad, Irfan
PY - 2015/1/1
Y1 - 2015/1/1
N2 - Reuse-distance analysis is a powerful technique for characterizing temporal locality of workloads, often visualized with miss ratio curves (MRCs). Unfortunately, even the most efficient exact implementations are too heavyweight for practical online use in production systems. We introduce a new approximation algorithm that employs uniform randomized spatial sampling, implemented by tracking references to representative locations selected dynamically based on their hash values. A further refinement runs in constant space by lowering the sampling rate adaptively. Our approach, called SHARDS (Spatially Hashed Approximate Reuse Distance Sampling), drastically reduces the space and time requirements of reuse-distance analysis, making continuous, online MRC generation practical to embed into production firmware or system software. SHARDS also enables the analysis of long traces that, due to memory constraints, were resistant to such analysis in the past. We evaluate SHARDS using trace data collected from a commercial I/O caching analytics service. MRCs generated for more than a hundred traces demonstrate high accuracy with very low resource usage. MRCs constructed in a bounded 1 MB footprint, with effective sampling rates significantly lower than 1%, exhibit approximate miss ratio errors averaging less than 0.01. For large traces, this configuration reduces memory usage by a factor of up to 10,800 and run time by a factor of up to 204.
AB - Reuse-distance analysis is a powerful technique for characterizing temporal locality of workloads, often visualized with miss ratio curves (MRCs). Unfortunately, even the most efficient exact implementations are too heavyweight for practical online use in production systems. We introduce a new approximation algorithm that employs uniform randomized spatial sampling, implemented by tracking references to representative locations selected dynamically based on their hash values. A further refinement runs in constant space by lowering the sampling rate adaptively. Our approach, called SHARDS (Spatially Hashed Approximate Reuse Distance Sampling), drastically reduces the space and time requirements of reuse-distance analysis, making continuous, online MRC generation practical to embed into production firmware or system software. SHARDS also enables the analysis of long traces that, due to memory constraints, were resistant to such analysis in the past. We evaluate SHARDS using trace data collected from a commercial I/O caching analytics service. MRCs generated for more than a hundred traces demonstrate high accuracy with very low resource usage. MRCs constructed in a bounded 1 MB footprint, with effective sampling rates significantly lower than 1%, exhibit approximate miss ratio errors averaging less than 0.01. For large traces, this configuration reduces memory usage by a factor of up to 10,800 and run time by a factor of up to 204.
UR - http://www.scopus.com/inward/record.url?scp=85072870932&partnerID=8YFLogxK
M3 - Conference contribution
T3 - Proceedings of the 13th USENIX Conference on File and Storage Technologies, FAST 2015
SP - 95
EP - 110
BT - Proceedings of the 13th USENIX Conference on File and Storage Technologies, FAST 2015
PB - USENIX Association
T2 - 13th USENIX Conference on File and Storage Technologies, FAST 2015
Y2 - 16 February 2015 through 19 February 2015
ER -