Username   Password       Forgot your password?  Forgot your username? 


Performance Improvements by Deploying L2 Prefetchers with Helper Thread for Pointer-Chasing Applications

Volume 14, Number 10, October 2018, pp. 2312-2320
DOI: 10.23940/ijpe.18.10.p7.23122320

Yan Huanga, Huidong Zhub, and Yuhua Lia

aCollege of Software Engineering, Zhengzhou University of Light Industry, Zhengzhou, 450003, China
bCollege of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou, 450003, China

(Submitted on July 10, 2018; Revised on August 12, 2018; Accepted on September 11, 2018)


Modern processor micro-architecture offers advanced prefetch mechanisms that are designed to effectively hide memory latency and improve application performance. However, pointer-chasing applications employing linked data structures expose a memory latency problem that is difficult to deal with by using hardware prefetchers. It is promising that helper threaded prefetching based on Chip Multiprocessor is an effective method for reducing the memory latency of accesses to linked data structures. In this paper, we first illustrated two L2 prefetchers on Chip Multiprocessor and two different helper threaded prefetching techniques for pointer-chasing applications. Then, we revealed the limitations of L2 prefetchers for pointer-intensive applications after applying two different threaded prefetching techniques. Finally, we optimized the deployment of L2 prefetchers with two different threaded prefetching techniques for pointer-chasing applications. The experimental results indicate that L2 prefetchers’ effectiveness on helper threads depends on the memory access pattern of the targeted applications, and the optimized deployment of L2 prefetchers further improves the performance of pointer-intensive applications.


References: 17

                1. A. J. Smith, “Cache Memories,” Computing Surveys, Vol. 14, No. 3, pp. 473-530, 1982
                2. S. Byna, Y. Chen, and X. H. Sun, “A Taxonomy of Data Prefetching Mechanisms,” Journal of Computer Science and Technology, Vol. 24, No. 3, pp. 405-417, 2009
                3. T. F. Chen and J. L. Baer. “Effective Hardware-based Data Prefetching for High-Performance Processors,” IEEE Transactions on Computers, Vol. 44, No. 5, pp. 609-623, 1995
                4. A. Gendler, A. Mendelson, and Y. Birk, “A PAB-based, Multi-Prefetcher Mechanism,” International Journal of Parallel Programming, Vol. 34, No. 2, pp. 171-188, 2006
                5. A. Herdrich, E. Verplanke, P. Autee, R. Illikkal, C. Gianos, R. Singhal, et al., “Cache QoS: From Concept to Reality in the Intel Xeon ProcessorE5-2600 v3 Product Family,” in Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 457-468, April 2016
                6. J. Lee, C. Jung, D. Lim, and Y. Solihin, “Prefetching with Helper Threads for Loosely Coupled Multiprocessor Systems,” IEEE Transactions on Parallel and Distributed System, Vol. 20, No. 9, pp. 1309-1324, 2009
                7. J. Lee, H. Kim, M. Shin, and J. Kim, “Mutually Aware Prefetcher and On-Chip Network Designs for Multi-Cores,” IEEE Transactions on Computers, Vol. 63, No. 9, pp. 2316-2329, 2014
                8. D. Lo, L. Cheng, R. Govindaraju, P. Ranganathan, and C. Kozyrakis, “Heracles: Improving Resource Efficiency at Scale,” in Proceedings of 42nd Annual International Symposium on Computer Architecture (ISCA), pp. 450-462, June 2015
                9. X. D. Wang, S. Chen, J. Setter, and J. F. Martinez, “SWAP: Effective Fine-Grain Management of Shared Last-Level Caches with Minimum Hardware Support,” in Proceedings of 2017 IEEE Internatinal Symposium on High-Performance Computer Architecture (HPCA), pp. 121-132, May 2017
                10. X. D. Wang and J. F. Martínez, “ReBudget: Trading off Efficiency vs. Fairness in Market-based Multicore Resource Allocation via Runtime Budget Reassignment,” in Proceedings of 21st International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 19-32, April 2016
                11. A. D. Blanche and T. Lundqvist, “Addressing Characterization Methods for Memory Contention Aware Co-Scheduling,” The Journal of Supercomputing, Vol. 71, No. 4, pp. 1451-1483, 2013
                12. G. Kaur, “A DAG based Task Scheduling Algorithms for Multiprocessor System - A Survey,” International Journal of Grid and Distributed Computing, Vol. 9, No. 9, pp. 103-114, 2016
                13. Q. Tian, J. M. Li, F. Y. Zheng, and S. Zhao, “A Cache Consistency Protocol with Improved Architecture,” International Journal of Performability Engineering, Vol. 14, No. 1, pp. 178-185, 2018
                14. Q. Zhang, Y. F. Ge, H. Liang, and J. Shi, “A Load Balancing Task Scheduling Algorithm based on Feedback Mechanism for Cloud Computing,” International Journal of Grid and Distributed Computing, Vol. 9, No. 4, pp. 41-52, 2016
                15. Y. Huang, J. Tang, Z. M. Gu, M. Cai, J. X. Zhang, and N. H. Zheng, “The Performance Optimization of Threaded Prefetching for Linked Data Structures,” International Journal of Parallel Programming, Vol. 39, No. 6, pp.1-23, 2012
                16. C. J. Lee, O. Mutlu, V. Narasiman, and Y. N. Patt, “Prefetch-Aware DRAM Controllers,” in Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture, pp. 200-209, November 2008
                17. C. B. Zilles, “Benchmark Health Considered Harmful,” ACM SIGARCH Computer Architecture News, Vol. 29, No. 3, pp. 4-5, 2001


                              Please note : You will need Adobe Acrobat viewer to view the full articles.Get Free Adobe Reader

                              Download this file (IJPE-2018-10-07.pdf)IJPE-2018-10-07.pdf[Performance Improvements by Deploying L2 Prefetchers with Helper Thread for Pointer-Chasing Applications]410 Kb
                              This site uses encryption for transmitting your passwords.