J. Semicond. > Volume 41?>?Issue 2?> Article Number: 022402

HRM: H-tree based reconfiguration mechanism in reconfigurable homogeneous PE array

Junyong Deng 1, , Lin Jiang 2, , , Yun Zhu 1, , Xiaoyan Xie 3, , Xinchuang Liu 1, , Feilong He 3, , Shuang Song 4, and L. K. John 4,

+ Author Affiliations + Find other works by these authors

PDF

Turn off MathJax

Abstract: In order to accommodate the variety of algorithms with different performance in specific application and improve power efficiency, reconfigurable architecture has become an effective methodology in academia and industry. However, existing architectures suffer from performance bottleneck due to slow updating of contexts and inadequate flexibility. This paper presents an H-tree based reconfiguration mechanism (HRM) with Huffman-coding-like and mask addressing method in a homogeneous processing element (PE) array, which supports both programmable and data-driven modes. The proposed HRM can transfer reconfiguration instructions/contexts to a particular PE or associated PEs simultaneously in one clock cycle in unicast, multicast and broadcast mode, and shut down the unnecessary PE/PEs according to the current configuration. To verify the correctness and efficiency, we implement it in RTL synthesis and FPGA prototype. Compared to prior works, the experiment results show that the HRM has improved the work frequency by an average of 23.4%, increased the updating speed by 2×, and reduced the area by 36.9%; HRM can also power off the unnecessary PEs which reduced 51% of dynamic power dissipation in certain application configuration. Furthermore, in the data-driven mode, the system frequency can reach 214 MHz, which is 1.68× higher compared with the programmable mode.

Key words: H-tree based reconfiguration mechanism (HRM)Huffman-coding-like addressingprogrammable modedata-driven modehomogeneous PE array

Abstract: In order to accommodate the variety of algorithms with different performance in specific application and improve power efficiency, reconfigurable architecture has become an effective methodology in academia and industry. However, existing architectures suffer from performance bottleneck due to slow updating of contexts and inadequate flexibility. This paper presents an H-tree based reconfiguration mechanism (HRM) with Huffman-coding-like and mask addressing method in a homogeneous processing element (PE) array, which supports both programmable and data-driven modes. The proposed HRM can transfer reconfiguration instructions/contexts to a particular PE or associated PEs simultaneously in one clock cycle in unicast, multicast and broadcast mode, and shut down the unnecessary PE/PEs according to the current configuration. To verify the correctness and efficiency, we implement it in RTL synthesis and FPGA prototype. Compared to prior works, the experiment results show that the HRM has improved the work frequency by an average of 23.4%, increased the updating speed by 2×, and reduced the area by 36.9%; HRM can also power off the unnecessary PEs which reduced 51% of dynamic power dissipation in certain application configuration. Furthermore, in the data-driven mode, the system frequency can reach 214 MHz, which is 1.68× higher compared with the programmable mode.

Key words: H-tree based reconfiguration mechanism (HRM)Huffman-coding-like addressingprogrammable modedata-driven modehomogeneous PE array



References:

[1]

Yun Z, Jiang L, Wang S, et al. Design of reconfigurable array processor for multimedia application. Multimed Tools Appl, 2018, 77(3), 3639

[2]

Shi X, Luo X, Liang J, et al. Frog: Asynchronous graph processing on GPU with hybrid coloring model. IEEE Trans Knowl Data Eng, 2017, 30(1), 29

[3]

Wang Y, Davidson A, Pan Y, et al. Gunrock: A high-performance graph processing library on the GPU. ACM SIGPLAN Notices, 2016, 51(8), 11

[4]

Cao S, Zhang C, Yao Z, et al. Efficient and effective sparse LSTM on FPGA with bank-balanced sparsity. Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019, 63

[5]

Wu E, Zhang X, Berman D, et al. Compute-efficient neural-network acceleration. Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019, 191

[6]

Tian R J, Jiang L, Deng J Y, et al. Design and implementation of reconfigurable viewport transformation unit in embedded GPU. Mini-Micro Syst, 2018, 39(05), 1074

[7]

Vestias M P. High-performance reconfigurable computing. In: Advanced Methodologies and Technologies in Network Architecture, Mobile Computing, and Data Analytics. IGI Global, 2019, 731

[8]

Yao P, Zheng L, Liao X, et al. An efficient graph accelerator with parallel data conflict management. Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018, 8

[9]

Yang C, Wang Y, Wang X, et al. WRA: A 2.2-to-6.3 TOPS highly unified dynamically reconfigurable accelerator using a novel Winograd decomposition algorithm for convolutional neural networks. IEEE Trans Circuits Syst I, 2019, 66(9), 3480

[10]

Liu L, Li Z, Yang C, et al. Hrea: An energy-efficient embedded dynamically reconfigurable fabric for 13-dwarfs processing. IEEE Trans Circuits Syst II, 2017, 65(3), 381

[11]

Jafri S M A H, Daneshtalab M, Abbas N, et al. Transmap: Transformation based remapping and parallelism for high utilization and energy efficiency in CGRAs. IEEE Trans Comput, 2016, 65(11), 3456

[12]

Karunaratne M, Mohite A K, Mitra T, et al. Hycube: A CGRA with reconfigurable single-cycle multi-hop interconnect. 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC), 2017, 1

[13]

Kim Y, Joo H, Yoon S. Inter-coarse-grained reconfigurable architecture reconfiguration technique for efficient pipelining of kernel-stream on coarse-grained reconfigurable architecture-based multi-core architecture. IET Circuits, Devices Syst, 2016, 10(4), 251

[14]

Tajammul M A, Jafri S M A H, Hemani A, et al. Private configuration environments (PCE) for efficient reconfiguration, in CGRAs. 2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors, 2013, 227

[15]

Jiang L, Deng J Y, Song S, et al. HRM: H-tree based recon?guration mechanism in homogeneous PE array for video processing. Poster in the 55th Annual Design Automation Conference (DAC’18), 2018

[16]

Huang J, Raabe A, Buckl C, et al. A workflow for runtime adaptive task allocation on heterogeneous mpsocs. 2011 Design, Automation & Test in Europe, 2011, 1

[17]

Jafri S M A H, Tajammul M A, Hemani A, et al. Energy-aware-task-parallelism for efficient dynamic voltage, and frequency scaling, in CGRAs. 2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), 2013, 104

[18]

Kirischian L. Reconfigurable computing systems engineering: virtualization of computing architecture. CRC Press, 2017

[19]

Wei S J, Liu L B, Yin S Y. Reconfigurable computing. Science Press, 2014 (in Chinese)

[20]

Wang Y S, Liu L B, Yin S Y, et al. Hierarchical representation of on-chip context to reduce reconfiguration time and implementation area for coarse-grained reconfigurable architecture. Sci Chin Inform Sci, 2013, 56(11), 1

[21]

Kim Y, Mahapatra R N. Dynamic context compression for low-power coarse-grained reconfigurable architecture. IEEE Trans Very Large Scale Integr Syst, 2009, 18(1), 15

[22]

Venkat A, Tullsen D M. Harnessing ISA diversity: Design of a heterogeneous-ISA chip multiprocessor. ACM SIGARCH Comput Architect News, 2014, 42(3), 121

[23]

Hu C. Why FinFET and what next. Keynote in Shanghai Tech Workshop on Emerging Devices, Circuits and Systems, 2016

[24]

Deng J Y, Li T, Jiang L, et al. Design and optimization for multiprocessor interactive GPU. The 威尼斯 of China Universities of Posts and Telecommunications, 2014, 21(3), 85

[25]

Deng J Y, Li T, Jiang L, et al. Design and implementation of the graphics accelerator oriented to OpenGL. 威尼斯 of Xidian University, 2015, 42(6), 124

[26]

Deng J Y, Li T, Jiang L, et al. The design of multiprocessor interactive GPU MIGPU-9. J Comput Aid Des Comput Graph, 2014, 26(9), 1468

[27]

Shen X B, Liu Z X, Wang R, et al. The unified model of computer architectures. Chin J Computs, 2007, 30(5), 729

[28]

Black D C, Donovan J, Bunton B, et al. SystemC: From the ground up. Springer Science & Business Media, 2009

[29]

Eng L Z. Qt5 C++GUI programming cookbook. Packt Publishing Ltd, 2016

[30]

Zhang X T, Jiang L, Deng J Y, et al. Design and Implementation of global controller in reconfigurable video array processor. Microelectron Comput, 2017, 34(11), 75

[1]

Yun Z, Jiang L, Wang S, et al. Design of reconfigurable array processor for multimedia application. Multimed Tools Appl, 2018, 77(3), 3639

[2]

Shi X, Luo X, Liang J, et al. Frog: Asynchronous graph processing on GPU with hybrid coloring model. IEEE Trans Knowl Data Eng, 2017, 30(1), 29

[3]

Wang Y, Davidson A, Pan Y, et al. Gunrock: A high-performance graph processing library on the GPU. ACM SIGPLAN Notices, 2016, 51(8), 11

[4]

Cao S, Zhang C, Yao Z, et al. Efficient and effective sparse LSTM on FPGA with bank-balanced sparsity. Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019, 63

[5]

Wu E, Zhang X, Berman D, et al. Compute-efficient neural-network acceleration. Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019, 191

[6]

Tian R J, Jiang L, Deng J Y, et al. Design and implementation of reconfigurable viewport transformation unit in embedded GPU. Mini-Micro Syst, 2018, 39(05), 1074

[7]

Vestias M P. High-performance reconfigurable computing. In: Advanced Methodologies and Technologies in Network Architecture, Mobile Computing, and Data Analytics. IGI Global, 2019, 731

[8]

Yao P, Zheng L, Liao X, et al. An efficient graph accelerator with parallel data conflict management. Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018, 8

[9]

Yang C, Wang Y, Wang X, et al. WRA: A 2.2-to-6.3 TOPS highly unified dynamically reconfigurable accelerator using a novel Winograd decomposition algorithm for convolutional neural networks. IEEE Trans Circuits Syst I, 2019, 66(9), 3480

[10]

Liu L, Li Z, Yang C, et al. Hrea: An energy-efficient embedded dynamically reconfigurable fabric for 13-dwarfs processing. IEEE Trans Circuits Syst II, 2017, 65(3), 381

[11]

Jafri S M A H, Daneshtalab M, Abbas N, et al. Transmap: Transformation based remapping and parallelism for high utilization and energy efficiency in CGRAs. IEEE Trans Comput, 2016, 65(11), 3456

[12]

Karunaratne M, Mohite A K, Mitra T, et al. Hycube: A CGRA with reconfigurable single-cycle multi-hop interconnect. 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC), 2017, 1

[13]

Kim Y, Joo H, Yoon S. Inter-coarse-grained reconfigurable architecture reconfiguration technique for efficient pipelining of kernel-stream on coarse-grained reconfigurable architecture-based multi-core architecture. IET Circuits, Devices Syst, 2016, 10(4), 251

[14]

Tajammul M A, Jafri S M A H, Hemani A, et al. Private configuration environments (PCE) for efficient reconfiguration, in CGRAs. 2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors, 2013, 227

[15]

Jiang L, Deng J Y, Song S, et al. HRM: H-tree based recon?guration mechanism in homogeneous PE array for video processing. Poster in the 55th Annual Design Automation Conference (DAC’18), 2018

[16]

Huang J, Raabe A, Buckl C, et al. A workflow for runtime adaptive task allocation on heterogeneous mpsocs. 2011 Design, Automation & Test in Europe, 2011, 1

[17]

Jafri S M A H, Tajammul M A, Hemani A, et al. Energy-aware-task-parallelism for efficient dynamic voltage, and frequency scaling, in CGRAs. 2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), 2013, 104

[18]

Kirischian L. Reconfigurable computing systems engineering: virtualization of computing architecture. CRC Press, 2017

[19]

Wei S J, Liu L B, Yin S Y. Reconfigurable computing. Science Press, 2014 (in Chinese)

[20]

Wang Y S, Liu L B, Yin S Y, et al. Hierarchical representation of on-chip context to reduce reconfiguration time and implementation area for coarse-grained reconfigurable architecture. Sci Chin Inform Sci, 2013, 56(11), 1

[21]

Kim Y, Mahapatra R N. Dynamic context compression for low-power coarse-grained reconfigurable architecture. IEEE Trans Very Large Scale Integr Syst, 2009, 18(1), 15

[22]

Venkat A, Tullsen D M. Harnessing ISA diversity: Design of a heterogeneous-ISA chip multiprocessor. ACM SIGARCH Comput Architect News, 2014, 42(3), 121

[23]

Hu C. Why FinFET and what next. Keynote in Shanghai Tech Workshop on Emerging Devices, Circuits and Systems, 2016

[24]

Deng J Y, Li T, Jiang L, et al. Design and optimization for multiprocessor interactive GPU. The 威尼斯 of China Universities of Posts and Telecommunications, 2014, 21(3), 85

[25]

Deng J Y, Li T, Jiang L, et al. Design and implementation of the graphics accelerator oriented to OpenGL. 威尼斯 of Xidian University, 2015, 42(6), 124

[26]

Deng J Y, Li T, Jiang L, et al. The design of multiprocessor interactive GPU MIGPU-9. J Comput Aid Des Comput Graph, 2014, 26(9), 1468

[27]

Shen X B, Liu Z X, Wang R, et al. The unified model of computer architectures. Chin J Computs, 2007, 30(5), 729

[28]

Black D C, Donovan J, Bunton B, et al. SystemC: From the ground up. Springer Science & Business Media, 2009

[29]

Eng L Z. Qt5 C++GUI programming cookbook. Packt Publishing Ltd, 2016

[30]

Zhang X T, Jiang L, Deng J Y, et al. Design and Implementation of global controller in reconfigurable video array processor. Microelectron Comput, 2017, 34(11), 75

[1]

Xiaoyan Cheng, Haigang Yang, Tao Yin, Qisong Wu, Hongfeng Zhang, Fei Liu. Design and analysis of a dual mode CMOS field programmable analog array. J. Semicond., 2014, 35(10): 105011. doi: 10.1088/1674-4926/35/10/105011

[2]

Jinbo Pan, Qimin Yan. Data-driven material discovery for photocatalysis: a short review. J. Semicond., 2018, 39(7): 071001. doi: 10.1088/1674-4926/39/7/071001

[3]

Hua Fan, Xue Han, Qi Wei, Huazhong Yang. An 11-bit ENOB, accuracy-programmable, and non-calibrating time-mode SAR ADC. J. Semicond., 2013, 34(1): 015010. doi: 10.1088/1674-4926/34/1/015010

[4]

Yiqiang Wu, Zhigong Wang, Junliang Wang, Li Ma, Jian Xu, Lu Tang. Lower-power, high-linearity class-AB current-mode programmable gain amplifier. J. Semicond., 2014, 35(10): 105003. doi: 10.1088/1674-4926/35/10/105003

[5]

Gu Gaowei, Zhu En, Lin Ye, Liu Wensong. A 10 Gb/s burst-mode clock and data recovery circuit. J. Semicond., 2012, 33(7): 075011. doi: 10.1088/1674-4926/33/7/075011

[6]

Dan Lu, Yiming He, Zhaosong Li, Lingjuan Zhao, Wei Wang. InP-based monolithically integrated few-mode devices. J. Semicond., 2018, 39(10): 101001. doi: 10.1088/1674-4926/39/10/101001

[7]

Jia Yingqian, Zhao Zhengping, Yang Yongjun, Hu Xiaodong, Li Qian. SOI-based radial-contour-mode micromechanical disk resonator. J. Semicond., 2011, 32(11): 115001. doi: 10.1088/1674-4926/32/11/115001

[8]

Gao Bo, Yu Xuefeng, Ren Diyuan, Li Yudong, Sun Jing, Cui Jiangwei, Wang Yiyuan, Li Ming. Total dose ionizing irradiation effects on a static random access memory field programmable gate array. J. Semicond., 2012, 33(3): 034007. doi: 10.1088/1674-4926/33/3/034007

[9]

Lou Jiana, Wu Xiaobo, Zhao Menglian, Chen Hai, Yan Xiaolang. A High Efficiency Green Mode Switch Mode Power Supply Regulator. J. Semicond., 2008, 29(9): 1813.

[10]

, , , , . A New Clustering-Based Partitioning Method for VLSI Mixed-Mode Placement. J. Semicond., 2005, 26(1): 22.

[11]

Liu Lianxi, Zhu Zhangming, Yang Yintang. Design of an LED driver based on hysteretic-current-control mode in a 0.6 μm BCD process. J. Semicond., 2012, 33(4): 045005. doi: 10.1088/1674-4926/33/4/045005

[12]

Hao Luo, Cheng’ao Yang, Shengwen Xie, Xiaoli Chai, Shushan Huang, Yu Zhang, Yingqiang Xu, Zhichuan Niu. High order DBR GaSb based single longitude mode diode lasers at 2 μm wavelength. J. Semicond., 2018, 39(10): 104007. doi: 10.1088/1674-4926/39/10/104007

[13]

Limeng Zhang, Dan Lu, Zhaosong Li, Biwei Pan, Lingjuan Zhao. C-band fundamental/first-order mode converter based on multimode interference coupler on InP substrate. J. Semicond., 2016, 37(12): 124005. doi: 10.1088/1674-4926/37/12/124005

[14]

Chen Yuan, Jincheng Dai, Hao Jia, Jianfeng Ding, Lei Zhang, Xin Fu, Lin Yang. Design of a C-band polarization rotator-splitter based on a mode-evolution structure and an asymmetric directional coupler. J. Semicond., 2018, 39(12): 124008. doi: 10.1088/1674-4926/39/12/124008

[15]

Gao Jinxia, Zhang Yimen, Zhang Yuming. C-V Characteristic Distortion in the Pinch-Off Mode of a Buried Channel MOS Structure in 4H-SiC. J. Semicond., 2006, 27(7): 1259.

[16]

Xing Xinpeng, Li Dongmei, Wang Zhihua. A Novel CMOS Current Mode Bandgap Reference. J. Semicond., 2008, 29(7): 1249.

[17]

Li Xianrui, Lai Xinquan, Li Yushan, Ye Qiang. Research and design of a novel current mode charge pump. J. Semicond., 2009, 30(10): 105012. doi: 10.1088/1674-4926/30/10/105012

[18]

Quan Zhou, Shuxu Guo, Jingyi Song, Zhaohan Li, Guotong Du, Yuchun Chang. A low power discrete operation mode for punchthrough phototransistor. J. Semicond., 2013, 34(7): 074010. doi: 10.1088/1674-4926/34/7/074010

[19]

Song Zhen, Ou Guping, Liu Fengmin. Growth Mode of PTCDA on p-Si Substrates. J. Semicond., 2007, 28(7): 1009.

[20]

Jingbi You. Perovskite plasmonic lasers capable of mode modulation. J. Semicond., 2019, 40(7): 070203. doi: 10.1088/1674-4926/40/7/070203

Search

Advanced Search >>

GET CITATION

J Y Deng, L Jiang, Y Zhu, X Y Xie, X C Liu, F L He, S Song, L K John, HRM: H-tree based reconfiguration mechanism in reconfigurable homogeneous PE array[J]. J. Semicond., 2020, 41(2): 022402. doi: 10.1088/1674-4926/41/2/022402.

Export: BibTex EndNote

Article Metrics

Article views: 335 Times PDF downloads: 29 Times Cited by: 0 Times

History

Manuscript received: 27 October 2019 Manuscript revised: 13 December 2019 Online: Accepted Manuscript: 06 January 2020 Uncorrected proof: 07 January 2020 Published: 11 February 2020

Email This Article

User name:
Email:*请输入正确邮箱
Code:*验证码错误
XML 地图 | Sitemap 地图