View : 968 Download: 0
TEA-RC: Thread Context-Aware Register Cache for GPUs
- Title
- TEA-RC: Thread Context-Aware Register Cache for GPUs
- Authors
- Jeong, Ipoom; Oh, Yunho; Ro, Won Woo; Yoon, Myung Kuk
- Ewha Authors
- 윤명국
- SCOPUS Author ID
- 윤명국
- Issue Date
- 2022
- Journal Title
- IEEE ACCESS
- ISSN
- 2169-3536
- Citation
- IEEE ACCESS vol. 10, pp. 82049 - 82062
- Keywords
- Registers; Instruction sets; Graphics processing units; Kernel; Random access memory; Nonvolatile memory; Message systems; register file; register cache; volatile memory; non-volatile memory; hybrid register file; hierarchical register file
- Publisher
- IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
- Indexed
- SCIE; SCOPUS
- Document Type
- Article
- Abstract
- Graphics processing units (GPUs) achieve high throughput by exploiting a high degree of thread-level parallelism (TLP). To support such high TLP, GPUs have a large-sized register file to store the context of all threads, consuming around 20% of total GPU energy. Several previous studies have attempted to minimize the energy consumption of the register file by implementing an emerging non-volatile memory (NVM), leveraging its higher density and lower leakage power over SRAMs. To amortize the cost of long access latency of NVM, prior work adopts a hierarchical register file consisting of an SRAM-based register cache and NVM-based registers where the register cache works as a write buffer. To get the register cache index, they use the partially selected bits of warp ID and register ID. This work observes that such an index calculation causes three types of contentions leading to the underutilization of the register cache: inter-warp, intra-warp, and false contentions. To minimize such contentions, this paper proposes a thread context-aware register cache (TEA-RC) in GPUs. In TEA-RC, the cache index is calculated considering the high correlation between the number of scheduled threads and the register usage of threads. The proposed design shows 28.5% higher performance and 9.1 percentage point lower energy consumption over the conventional register cache that concatenates three bits of warp ID and five bits of register ID to compute the cache index.
- DOI
- 10.1109/ACCESS.2022.3196149
- Appears in Collections:
- 인공지능대학 > 컴퓨터공학과 > Journal papers
- Files in This Item:
There are no files associated with this item.
- Export
- RIS (EndNote)
- XLS (Excel)
- XML