View : 73 Download: 0

SAVector: Vectored Systolic Arrays

Title
SAVector: Vectored Systolic Arrays
Authors
ChoiSangunParkSeongjunJaeyongKimJongminKooGunjaeHongSeokinYoonMyung KukOhYunho
Ewha Authors
윤명국
SCOPUS Author ID
윤명국scopus
Issue Date
2024
Journal Title
IEEE Access
ISSN
2169-3536JCR Link
Citation
IEEE Access vol. 12, pp. 44446 - 44461
Keywords
energy efficiencyInference acceleratoron-chip buffer
Publisher
Institute of Electrical and Electronics Engineers Inc.
Indexed
SCIE; SCOPUS WOS scopus
Document Type
Article
Abstract
Conventional DNN inference accelerators are designed with a few (up to four) large systolic arrays. As such a scale-up architecture often suffers from low utilization, a scale-out architecture, in which a single accelerator has tens of pods and each pod has a small systolic array, has been proposed. While the scale-out architecture is promising, it still incurs increasing off-chip memory access as the pods are supposed to access the duplicate tiles of tensors. Prior work has proposed a shared buffer structure to address the problem, but those architectures suffer from performance degradation due to shared buffer access latency. We make an observation that all the pods access the same rows of input and weights within a short time window. With the observation, we propose a new inference accelerator architecture, called Vectored Systolic Arrays (SAVector). SAVector consists of a new two-level on-chip buffer architecture and a tensor tile scheduling technique. In the new buffer architecture, global buffers are shared by all the pods and they keep the rows shared by the pods. And each pod has a tiny dedicated buffer. SAVector monitors the memory access behavior and timely determines to prefetch the data and flush it. In our evaluation, SAVector exhibits a very similar off-chip memory access count to the scale-up architecture and achieves 52% energy-delay-product (EDP) reduction. Also, SAVector achieves 27% EDP reduction over prior work by mitigating performance degradation from global buffer access latency. © 2013 IEEE.
DOI
10.1109/ACCESS.2024.3380433
Appears in Collections:
인공지능대학 > 컴퓨터공학과 > Journal papers
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

BROWSE