Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2025, Vol. 19 Issue (1) : 191103    https://doi.org/10.1007/s11704-023-3239-x
Architecture
RVAM16: a low-cost multiple-ISA processor based on RISC-V and ARM Thumb
Libo HUANG, Jing ZHANG, Ling YANG, Sheng MA, Yongwen WANG, Yuanhu CHENG()
College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China
 Download: PDF(13942 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

The rapid development of ISAs has brought the issue of software compatibility to the forefront in the embedded field. To address this challenge, one of the promising solutions is the adoption of a multiple-ISA processor that supports multiple different ISAs. However, due to constraints in cost and performance, the architecture of a multiple-ISA processor must be carefully optimized to meet the specific requirements of embedded systems. By exploring the RISC-V and ARM Thumb ISAs, this paper proposes RVAM16, which is an optimized multiple-ISA processor microarchitecture for embedded devices based on hardware binary translation technique. The results show that, when running non-native ARM Thumb programs, RVAM16 achieves a significant speedup of over 2.73× with less area and energy consumption compared to using hardware binary translation alone, reaching more than 70% of the performance of native RISC-V programs.

Keywords multiple-ISA processor      architecture      binary translation      RISC-V      embedded     
Corresponding Author(s): Yuanhu CHENG   
Just Accepted Date: 31 October 2023   Issue Date: 18 March 2024
 Cite this article:   
Libo HUANG,Jing ZHANG,Ling YANG, et al. RVAM16: a low-cost multiple-ISA processor based on RISC-V and ARM Thumb[J]. Front. Comput. Sci., 2025, 19(1): 191103.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-023-3239-x
https://academic.hep.com.cn/fcs/EN/Y2025/V19/I1/191103
Technique Pros Cons
Multi-core Independent performance for different ISAs High hardware cost
Multi-decoder Same performance for different ISAs Complex pipeline
HBT Low hardware cost Lower performance for non-native ISA
Tab.1  The features of different multiple-ISA processor techniques
Fig.1  Multiple-ISA processor microarchitecture based on binary translation
Fig.2  Microarchitecture of RVAM16 processor core
ARM Thumb registers (encoding) RISC-V registers (encoding)
R0−R7 (000−111) R16−R23 (10000−10111)
R8−R12 (1000−1100) R24−R28 (11000−11100)
SP (1101) R29 (11101)
LR (1110) R30 (11110)
PC (1111) PC/R31 (11111)
Tab.2  Register mapping from ARM Thumb to RISC-V
Fig.3  
Fig.4  Architecture of hardware binary translator in RVAM16
Fig.5  Structure of RVAM16 ALU
Instruction category Instruction Translation ratio
Without optimization With optimization
Move MOV Rd, Rm 1 1
MOV PC, Rm 1 1
MOV Rd, PC 2 2
MOVS Rd, Rm 4 1
MOVS Rd, imm 4 1
Add ADD Rd, SP, imm 1 1
ADD Rd, Rd, Rm 1 1
ADD Rd, Rd, PC 2 2
ADD PC, PC, Rm 3 3
ADR Rd, <label> 3 3
ADDS Rd, Rd, imm 8 1
ADDS Rd, Rm, Rn 9 1
ADDS Rd, Rm, imm 9 1
ADCS Rd, Rd, Rm 16 1
Subtract SUB SP, SP, imm 2 2
SUBS Rd, Rm, Rn 17 1
SBCS Rd, Rd, Rm 17 1
RSBS Rd, Rm, 0 17 1
SUBS Rd, Rm, imm 18 1
SUBS Rd, Rd, imm 18 1
Multiply MUL Rd, Rm, Rd 4 1
Compare CMN Rm, imm 8 1
CMP Rm, Rn 16 1
CMP Rn, imm 17 2
Logic LOPaRd, Rd, Rm 4 1
BICS Rd, Rd, Rm 5 2
Shift SOPbRd, Rd, Rm 8 1
SOPbRd, Rm, imm 9 1
RORS Rd, Rd, Rm 10 6
Load LDRIc Rd, Rm, imm 1 1
LDRRd Rd, Rm, Rn 2 2
LDR Rd, <label> 3 3
LDM Rm, registers ne ne
Store STRXf Rd, Rm, imm 1 1
STRXf Rd, Rm, Rn 2 2
STM Rm, registers ne ne
Stack POP registers ne ne
PUSH registers ne ne
Branch BXXg 1 1
BGT 4 1
BLE 4 1
Extend EOP Rd, Rm 2 2
Reverse REVSH Rd, Rm 5 5
REV Rd, Rm 11 11
REV16 Rd, Rm 11 11
Tab.3  Instruction translation ratio in ARMv6-M without and with optimization
Fig.6  Performance of running Dhrystone and CoreMark benchmarks
Fig.7  Performance of running Embench benchmark suite
Real case Kernel program Execution time
RISC-V Thumb
Communication program UART 1.10 1.23
Temperature control system Sensor access 1.19 1.38
Flash control system Flash access 1.50 1.53
Tab.4  Relative execution time of RVAM16 running real applications
Name Technique Native ISA Non-native ISA Slowdown
Software QEMU [6] DBT X86/ARM/Other RISC-V 7.07× [12]
QEMU DBT RISC-V X86 more than 3.50× [23]
RV8 [9] DBT X86 RISC-V 2.60×
Lupori et al. [12] SBT X86/ARM RISC-V 1.12×/1.35×
Hardware Capella et al. [18] HBT + extended MIPS MIPS X86/ARM/PowerPC about 1.57×/2.03×/2.22×
HBT HBT RISC-V ARM 4.89×
RVAM16 HBT + hardware optimization RISC-V ARM 1.40×
Tab.5  Compareison between state-of-the-art software binary translation system and HBT-based multiple-ISA processors
Core Area (μm2) Benchmark (Target ISA) Power (mW) Relative energy
Static Dynamic Total
RVAM16 24269.28 Dhrystone (RISC-V) 1.69 1.51 3.20 0.84
CoreMark (RISC-V) 1.69 1.59 3.28 0.90
Dhrystone (ARM Thumb) 1.71 1.70 3.41 1.25
CoreMark (ARM Thumb) 1.70 1.78 3.48 1.26
RISC-V + HBT 26658.41 Dhrystone (RISC-V) 1.86 1.48 3.34 0.87
CoreMark (RISC-V) 1.86 1.55 3.41 0.94
Dhrystone (ARM Thumb) 1.87 1.77 3.64 3.65
CoreMark (ARM Thumb) 1.87 1.71 3.58 5.59
Base RISC-V Core 20757.91 Dhrystone (RISC-V) 1.37 1.42 2.79 0.73
CoreMark (RISC-V) 1.36 1.53 2.89 0.79
Ibex 32802.67 Dhrystone (RISC-V) 2.29 2.70 4.99 0.93
CoreMark (RISC-V) 2.29 3.26 5.55 1.04
Cortex-M0 26264.45 Dhrystone (ARM Thumb) 2.41 1.80 4.21 1.00
CoreMark (ARM Thumb) 2.41 1.85 4.26 1.00
Tab.6  Area, power and energy consumption of processors at 100 MHz and 0.81 V
Fig.8  Area and power distribution of RVAM16
  
  
  
  
  
  
1 T, Adegbija A, Rogacs C, Patel A Gordon-Ross . Microprocessor optimizations for the internet of things: a survey. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2018, 37( 1): 7–20
2 K, Saso Y Hara-Azumi . Simple instruction-set computer for area and energy-sensitive IoT edge devices. In: Proceedings of the 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP). 2018, 1−4
3 K, Saso Y Hara-Azumi . Revisiting simple and energy efficient embedded processor designs toward the edge computing. IEEE Embedded Systems Letters, 2020, 12( 2): 45–49
4 R L, Sites A, Chernoff M B, Kirk M P, Marks S G Robinson . Binary translation. Communications of the ACM, 1993, 36( 2): 69–81
5 Apple Inc. About the Rosetta translation environment. See developer.apple.com/documentation/apple-silicon/about-the-rosetta-translation-environment website, Accessed: 2023
6 F Bellard . QEMU, a fast and portable dynamic translator. In: Proceedings of the USENIX Annual Technical Conference. 2005, 41−46
7 D Y, Hong C C, Hsu P C, Yew J J, Wu W C, Hsu P, Liu C M, Wang Y C Chung . HQEMU: a multi-threaded and retargetable dynamic binary translator on multicores. In: Proceedings of the 10th International Symposium on Code Generation and Optimization. 2012, 104−113
8 B, Ilbeyi D, Lockhart C Batten . Pydgin for RISC-V: a fast and productive instruction-set simulator. In: Proceedings of the 3rd RISC-V Workshop. 2016
9 M, Clark B Hoult . rv8: a high performance RISC-V to x86 binary translator. In: Proceedings of the 1st Workshop on Computer Architecture Research with RISC-V (CARRV). 2017
10 C, Sabri L, Kriaa S L Azzouz . Comparison of IoT constrained devices operating systems: a survey. In: Proceedings of the 14th International Conference on Computer Systems and Applications (AICCSA). 2017, 369−375
11 B Y, Shen J Y, Chen W C, Hsu W Yang . LLBT: an LLVM-based static binary translator. In: Proceedings of 2012 International Conference on Compilers, Architectures and Synthesis for Embedded Systems. 2012, 51−60
12 L, Lupori V, Rosario E Borin . Towards a high-performance RISC-V emulator. In: Proceedings of 2018 Symposium on High Performance Computing Systems (WSCAD). 2018, 213−220
13 A, Venkat D M Tullsen . Harnessing ISA diversity: design of a heterogeneous-ISA chip multiprocessor. In: Proceedings of the 41st International Symposium on Computer Architecture (ISCA). 2014, 121−132
14 A, Venkat H, Basavaraj D M Tullsen . Composite-ISA cores: enabling multi-ISA heterogeneity using a single ISA. In: Proceedings of 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). 2019, 42−55
15 J, Balkind K, Lim M, Schaffner F, Gao G, Chirkov A, Li A, Lavrov T M, Nguyen Y, Fu F, Zaruba K, Gulati L, Benini D Wentzlaff . BYOC: a ”bring your own core” framework for heterogeneous-ISA research. In: Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems. 2020, 699−714
16 S, Rokicki E, Rohou S Derrien . Hybrid-DBT: hardware/software dynamic binary translation targeting VLIW. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2019, 38( 10): 1872–1885
17 F M, Capella M, Brandalero L, Carro A C S Beck . A multiple-ISA reconfigurable architecture. Design Automation for Embedded Systems, 2015, 19( 4): 329–344
18 J, Fajardo M B, Rutzig L, Carro A C S Beck . Towards a multiple-ISA embedded system. Journal of Systems Architecture, 2013, 59( 2): 103–119
19 K, Chai F, Wolff C Papachristou . XBT: FPGA accelerated binary translation. In: Proceedings of IEEE National Aerospace and Electronics Conference. 2021, 365−372
20 S, Rokicki E, Rohou S Derrien . Hardware-accelerated dynamic binary translation. In: Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE). 2017, 1062−1067
21 A, Waterman K Asanovic . The RISC-V instruction set manual: volume I: unprivileged ISA. 2019
22 ARM. ARM® Cortex®-M0 DesignStartTM RTL Testbench: user guide . 2015
23 W, Wang X, Liu J, Yu J, Li Z, Mao Z, Li C, Ding C Zhang . The design and building of openKylin on RISC-V architecture. In: Proceedings of the 15th International Conference on Advanced Computer Theory and Engineering (ICACTE). 2022, 88−91
[1] FCS-23239-OF-LH_suppl_1 Download
[1] Jingyu LIU, Shi CHEN, Li SHEN. A comprehensive survey on graph neural network accelerators[J]. Front. Comput. Sci., 2025, 19(2): 192104-.
[2] Huize LI, Hai JIN, Long ZHENG, Yu HUANG, Xiaofei LIAO. ReCSA: a dedicated sort accelerator using ReRAM-based content addressable memory[J]. Front. Comput. Sci., 2023, 17(2): 172103-.
[3] Hong QIAN, Yang YU. Derivative-free reinforcement learning: a review[J]. Front. Comput. Sci., 2021, 15(6): 156336-.
[4] Najme MANSOURI, Mohammad Masoud JAVIDI, Behnam Mohammad Hasani ZADE. Hierarchical data replication strategy to improve performance in cloud computing[J]. Front. Comput. Sci., 2021, 15(2): 152501-.
[5] Zhumin CHEN, Xueqi CHENG, Shoubin DONG, Zhicheng DOU, Jiafeng GUO, Xuanjing HUANG, Yanyan LAN, Chenliang LI, Ru LI, Tie-Yan LIU, Yiqun LIU, Jun MA, Bing QIN, Mingwen WANG, Jirong WEN, Jun XU, Min ZHANG, Peng ZHANG, Qi ZHANG. Information retrieval: a view from the Chinese IR community[J]. Front. Comput. Sci., 2021, 15(1): 151601-.
[6] Thierry GAUTIER, Clément GUY, Alexandre HONORAT, Paul LE GUERNIC, Jean-Pierre TALPIN, Loïc BESNARD. Polychronous automata and their use for formal validation of AADL models[J]. Front. Comput. Sci., 2019, 13(4): 677-697.
[7] Qi ZHU, Bo WU, Xipeng SHEN, Kai SHEN, Li SHEN, Zhiying WANG. Resolving the GPU responsiveness dilemma through program transformations[J]. Front. Comput. Sci., 2018, 12(3): 545-559.
[8] Qiong ZUO, Meiyi XIE, Guanqiu QI, Hong ZHU. Tenant-based access control model for multi-tenancy and sub-tenancy architecture in Software-as-a-Service[J]. Front. Comput. Sci., 2017, 11(3): 465-484.
[9] Zhibin YANG,Jean-Paul BODEVEIX,Mamoun FILALI,Kai HU,Yongwang ZHAO,Dianfu MA. Towards a verified compiler prototype for the synchronous language SIGNAL[J]. Front. Comput. Sci., 2016, 10(1): 37-53.
[10] Genggeng LIU,Wenzhong GUO,Rongrong LI,Yuzhen NIU,Guolong CHEN. XGRouter: high-quality global router in X-architecture with particle swarm optimization[J]. Front. Comput. Sci., 2015, 9(4): 576-594.
[11] Xing CHEN,Aipeng LI,Xue&rsquo;e ZENG,Wenzhong GUO,Gang HUANG. Runtime model based approach to IoT application development[J]. Front. Comput. Sci., 2015, 9(4): 540-553.
[12] Yanhong HUANG,Jifeng HE,Huibiao ZHU,Yongxin ZHAO,Jianqi SHI,Shengchao QIN. Semantic theories of programs with nested interrupts[J]. Front. Comput. Sci., 2015, 9(3): 331-345.
[13] Peng JIANG,Qiaoyan WEN,Wenmin LI,Zhengping JIN,Hua ZHANG. An anonymous and efficient remote biometrics user authentication scheme in a multi server environment[J]. Front. Comput. Sci., 2015, 9(1): 142-156.
[14] Luxi CHEN,Linpeng HUANG,Chen LI,Tao ZAN. Integrating behavior analysis into architectural modeling[J]. Front. Comput. Sci., 2015, 9(1): 15-33.
[15] Xiangke LIAO,Liquan XIAO,Canqun YANG,Yutong LU. MilkyWay-2 supercomputer: system and application[J]. Front. Comput. Sci., 2014, 8(3): 345-356.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed