|
|
An efficient GPU-based parallel tabu search algorithm for hardware/software co-design |
Neng HOU1,2, Fazhi HE1( ), Yi ZHOU3, Yilin CHEN1 |
1. School of Computer Science, Wuhan University,Wuhan 430072, China 2. School of Computer Science, Yangtze University, Jingzhou 434023, China 3. School of Information Science and Engineering,Wuhan University of Science and Technology,Wuhan 430081, China |
|
|
Abstract Hardware/software partitioning is an essential step in hardware/software co-design. For large size problems, it is difficult to consider both solution quality and time. This paper presents an efficient GPU-based parallel tabu search algorithm (GPTS) for HW/SW partitioning. A single GPU kernel of compacting neighborhood is proposed to reduce the amount of GPU global memory accesses theoretically. A kernel fusion strategy is further proposed to reduce the amount of GPU global memory accesses of GPTS. To further minimize the transfer overhead of GPTS between CPU and GPU, an optimized transfer strategy for GPU-based tabu evaluation is proposed, which considers that all the candidates do not satisfy the given constraint. Experiments show that GPTS outperforms state-of-the-art work of tabu search and is competitive with other methods for HW/SW partitioning. The proposed parallelization is significant when considering the ordinary GPU platform.
|
Keywords
hardware/software co-design
hardware/software partitioning
graphics processing unit
GPU-based parallel tabu search
single kernel implementation
kernel fusion strategy
optimized transfer strategy
|
Corresponding Author(s):
Fazhi HE
|
Just Accepted Date: 19 August 2019
Issue Date: 17 March 2020
|
|
1 |
G De Michell, R K Gupta. Hardware/software co-design. Proceedings of the IEEE, 1997, 85(3): 349–365
https://doi.org/10.1109/5.558708
|
2 |
W Wolf . A decade of hardware/software co-design. Computer, 2003, 6(4): 38–43
https://doi.org/10.1109/MC.2003.1193227
|
3 |
J Teich. Hardware/software co-design: the past, the present, and predicting the future. Proceedings of the IEEE, 2012, 100: 1411–1430
https://doi.org/10.1109/JPROC.2011.2182009
|
4 |
A Ouyang, X Peng, J Liu, A Sallam. Hardware/software partitioning for heterogeneous MPSoC considering communication overhead. International Journal of Parallel Programming, 2017, 45(4): 899–922
https://doi.org/10.1007/s10766-016-0466-x
|
5 |
N Hou, X Yan, F He. A survey on partitioning models, solution algorithms and algorithm parallelization for hardware/software co-design. Design Automation for Embedded Systems, 2019, 23(1–2): 57–77
https://doi.org/10.1007/s10617-019-09220-7
|
6 |
W Shi, J Wu, S Lam, T Srikanthan. Algorithms for bi-objective multiple-choice hardware/software partitioning. Computers & Electrical Engineering, 2016, 50: 127–142
https://doi.org/10.1016/j.compeleceng.2016.01.006
|
7 |
R P Dick, D L Rhodes, W Wolf. TGFF: task graphs for free. In: Proceedings of the 6th International Workshop on Hardware/Software Codesign. 1998, 97–101
https://doi.org/10.1145/278241.278309
|
8 |
J Henkel , R Ernst. An approach to automated hardware/software partitioning using a flexible granularity that is driven by high-level estimation techniques. IEEE Transactions on Very Large Scale Integration Systems, 2001, 9(2): 273–289
https://doi.org/10.1109/92.924041
|
9 |
G Jiang, J Wu, S Lam, T Srikanthan, J Sun. Algorithmic aspects of graph reduction for hardware/software partitioning. The Journal of Supercomputing, 2015, 71(6): 2251–2274
https://doi.org/10.1007/s11227-015-1381-4
|
10 |
P Arató, S Juhász, Z Mann, A Orbán, D Papp. Hardware-software partitioning in embedded system design. In: Proceedings of IEEE International Symposium on Intelligent Signal Processing. 2003, 197–202
|
11 |
P Arató, Z Mann, A Orbán. Algorithmic aspects of hardware/software partitioning. ACM Transactions on Design Automation of Electronic Systems, 2005, 10(1): 136–156
https://doi.org/10.1145/1044111.1044119
|
12 |
Y Zhou, F He, N Hou, Y Qiu. Parallel ant colony optimization on multicore SIMD CPUs. Future Generation Computer Systems, 2018, 79(2): 473–487
https://doi.org/10.1016/j.future.2017.09.073
|
13 |
R Wang, W Hung, G Yang, X Song. Uncertainty model for configurable hardware/software and resource partitioning. IEEE Transactions on Computers, 2016, 66(10): 3217–3223
https://doi.org/10.1109/TC.2016.2519895
|
14 |
X Yan, F He, N Hou, H Ai. An efficient particle swarm optimization for large scale hardware/software co-design system. International Journal of Cooperative Information Systems, 2018, 27(1): 1741001
https://doi.org/10.1142/S0218843017410015
|
15 |
A Trindade, L Cordeiro. Applying SMT-based verification to hardware/ software partitioning in embedded systems. Design Automation for Embedded Systems, 2016, 20(1): 1–19
https://doi.org/10.1007/s10617-015-9163-z
|
16 |
H Li, F He, X Yan. IBEA-SVM: an indicator-based evolutionary algorithm based on pre-selection with classification guided by SVM. Applied Mathematics—A Journal of Chinese Universities, 2019, 34(1): 1–26
https://doi.org/10.1007/s11766-019-3706-1
|
17 |
J Luo, F He, J. YongAn efficient and robust bat algorithm with fusion of opposition-based learning and whale optimization algorithm. Intelligent Data Analysis, 2020, 24(3): 500–519
|
18 |
J Yong, F He, H Li, W Zhou. A novel bat algorithm based on cross boundary learning and uniform explosion strategy. Applied Mathematics—A Journal of Chinese Universities, 2019, DOI: 10.1007/s11766-019-3714-1
https://doi.org/10.1007/s11766-019-3714-1
|
19 |
R Gupta, G Micheli. Hardware-software co-synthesis for digital systems. IEEE Design & Test of Computers, 1993, 10(3): 29–41
https://doi.org/10.1109/54.232470
|
20 |
R Ernst, J Henkel, T Benner. Hardware- software co-synthesis for microcontrollers. IEEE Design & Test of Computers, 1993, 10(4): 64–75
https://doi.org/10.1109/54.245964
|
21 |
R Dick, N Jha. MOGAC: a multi-objective genetic algorithm for hardware-software co-synthesis of distributed embedded systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 1998, 17(10): 920–935
https://doi.org/10.1109/43.728914
|
22 |
G Wang, W Gong, R Kastner. Application partitioning on programmable platforms using the ant colony optimization. Journal of Embedded Computing, 2006, 2(1): 119–136
|
23 |
F Ferrandi, P Lanzi, C Pilato, D Sciuto, A Tumeo. Ant colony optimization for mapping, scheduling and placing in reconfigurable systems. In: Proceedings of IEEE NASA/ESA Conference on Adaptive Hardware and Systems. 2013, 47–54
https://doi.org/10.1109/AHS.2013.6604225
|
24 |
M Koudil, K Benatchba , A Tarabet. Using artificial bees to solve partitioning and scheduling problems in co-design. Applied Mathematics and Computation, 2007, 186(2): 1710–1722
https://doi.org/10.1016/j.amc.2006.08.166
|
25 |
M Abdelhalim, S Habib. An integrated high-level hardware/software partitioning methodology. Design Automation for Embedded Systems, 2011, 15(1): 19–50
https://doi.org/10.1007/s10617-010-9068-9
|
26 |
K Garg, Y Aung, S Lam. Knapsim-run-time efficient hardwaresoftware partitioning technique for FPGAs. In: Proceedings of the 28th IEEE International Conference on System-on-Chip. 2015, 64–69
https://doi.org/10.1109/SOCC.2015.7406912
|
27 |
Y Zhang, W Luo, Z Zhang, B Li, X Wang. A hardware/software partitioning algorithm based on artificial immune principles. Applied Soft Computing, 2008, 8(1): 383–391
https://doi.org/10.1016/j.asoc.2007.03.003
|
28 |
Y Jiang, H Zhang, X Jiao, X Song, W Hung, M Gu, J Sun. Uncertain model and algorithm for hardware/software partitioning. In: Proceedings of IEEE Computer Society Annual Symposium on VLSI. 2012, 243–248
https://doi.org/10.1109/ISVLSI.2012.14
|
29 |
G Li , J Feng, C Wang, J Wang. Hardware/software partitioning algorithm based on the combination of genetic algorithm and tabu search. Engineering Review, 2014, 34(2): 151–160
|
30 |
X Yan, F He, Y Chen. A novel hardware/software partitioning method based on position disturbed particle swarm optimization with invasive weed optimization. Journal of Computer Science and Technology, 2017, 32(2): 340–355
https://doi.org/10.1007/s11390-017-1714-2
|
31 |
A Kalavade, P Subrahmanyam. Hardware/software partitioning for multi-function systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 1998, 17(9): 819–837
https://doi.org/10.1109/43.720318
|
32 |
N Govil, R Shrestha, S Chowdhury. PGMA: an algorithmic approach for multi-objective hardware software partitioning. Microprocessors and Microsystems, 2017, 54: 83–96
https://doi.org/10.1016/j.micpro.2017.09.002
|
33 |
A Farahani, M Kamal, M Salmani-Jelodar. Parallel genetic algorithm based HW/SW partitioning. In: Proceedings of International Symposium on Parallel Computing in Electrical Engineering. 2006, 337–342
|
34 |
Y Wu, H Zhang, H Yang. Research on parallel HW/SW partitioning based on hybrid PSO algorithm. In: Proceedings of International Conference on Algorithms and Architectures for Parallel Processing. 2009, 449–459
https://doi.org/10.1007/978-3-642-03095-6_43
|
35 |
Y Pan, F He, H Yu, H Li. Learning adaptive trust strength with user roles of truster and trustee for trust-aware recommender systems. Applied Intelligence, 2019, DOI: 10.1007/s10489-019-01542-0
https://doi.org/10.1007/s10489-019-01542-0
|
36 |
X Lv, F He, W Cai, Y Cheng. An optimized RGA supporting selective undo for collaborative text editing systems. Journal of Parallel and Distributed Computing, 2019, 132: 310–330
https://doi.org/10.1016/j.jpdc.2019.05.005
|
37 |
K Li, F He, H Yu. Robust visual tracking based on convolutional features with illumination and occlusion handing. Journal of Computer Science and Technology, 2018, 33(1): 223–236
https://doi.org/10.1007/s11390-017-1764-5
|
38 |
H Yu, F He, Y Pan. A novel region-based active contour model via local patch similarity measure for image segmentation. Multimedia Tools and Applications, 2018, 77(18): 24097–24119
https://doi.org/10.1007/s11042-018-5697-y
|
39 |
T Van Luong, N Melab, E Talbi. GPU computing for parallel local search meta-heuristic algorithms. IEEE Transactions on Computers, 2013, 62(1): 173–185
https://doi.org/10.1109/TC.2011.206
|
40 |
Y Zhou, F He, Y Qiu. Dynamic strategy based parallel ant colony optimization on GPUs for TSPs. Science China Information Sciences, 2017, 60(6): 068102.
https://doi.org/10.1007/s11432-015-0594-2
|
41 |
W Zhu, J Curry, A Marquez. SIMD tabu search for the quadratic assignment problem with graphics hardware acceleration. International Journal of Production Research, 2010, 48(4): 1035–1047
https://doi.org/10.1080/00207540802555744
|
42 |
K Wei, X Sun, H Chu, C Wu. Reconstructing permutation table to improve the tabu search for the PFSP on GPU. The Journal of Supercomputing, 2017, 73(11): 4711–4738
https://doi.org/10.1007/s11227-017-2041-7
|
43 |
L Bukata, P Šucha, Z Hanzálek. Solving the resource constrained project scheduling problem using the parallel tabu search designed for the CUDA platform. Journal of Parallel and Distributed Computing, 2015, 77: 58–68
https://doi.org/10.1016/j.jpdc.2014.11.005
|
44 |
N Hou, F He, Y Chen, Y Zhou. An adaptive neighborhood taboo search on GPU for hardware/software co-design. In: Proceedings of the 20th International Conference on Computer Supported Cooperative Work in Design. 2016, 239–244
https://doi.org/10.1109/CSCWD.2016.7565995
|
45 |
N Hou, F He, Y Zhou, H Ai. A GPU-based tabu search for very large hardware/software partitioning with limited resource usage. Journal of Advanced Mechanical Design, Systems, and Manufacturing, 2017, 11(5): JAMDSM0060
https://doi.org/10.1299/jamdsm.2017jamdsm0060
|
46 |
J Wu, T Srikanthan, G Chen. Algorithmic aspects of hardware/software partitioning: 1D search algorithms. IEEE Transactions on Computers, 2010, 59(4): 532–544
https://doi.org/10.1109/TC.2009.173
|
47 |
J Wu, P Wang, S Lam, T Srikanthan. Efficient heuristic and tabu search for hardware/software partitioning. The Journal of Supercomputing, 2013, 66(1): 118–134
https://doi.org/10.1007/s11227-013-0888-9
|
48 |
Z Chen, J Wu, G Song, J Chen. Noderank: an efficient algorithm for hardware/software partitioning. Chinese Journal of Computers, 2013, 36(10): 2033–2040
https://doi.org/10.3724/SP.J.1016.2013.02033
|
49 |
H Quan, T Zhang, Q Liu, J Guo, X Wang, R Hu. Comments on algorithmic aspects of hardware/software partitioning: 1D search algorithms. IEEE Transactions on Computers, 2014, 4(63): 1055–1056
https://doi.org/10.1109/TC.2012.174
|
50 |
M Billeter, O Olsson, U Assarsson. Efficient stream compaction on wide SIMD many-core architectures. In: Proceedings of the Conference on High Performance Graphics. 2009, 159–166
https://doi.org/10.1145/1572769.1572795
|
51 |
N Wilt. The Cuda Handbook: a Comprehensive Guide to GPU Programming. Pearson Education, 2013
|
52 |
K Gupta, J Stuart, J Owens. A study of persistent threads style GPU programming for GPGPU workloads. In: Proceedings of Innovative Parallel Computing. 2012, 1–14
https://doi.org/10.1109/InPar.2012.6339596
|
53 |
M Guthaus, J Ringenberg, D Ernst, T Austin, T Mudge, R Brown. MiBench: a free, commercially representative embedded benchmark suite. In: Proceedings of IEEE International Workshop on Workload Characterization. 2001, 3–14
|
54 |
Y Pan, F He, H Yu. A novel enhanced collaborative autoencoder with knowledge distillation for top-n recommender systems. Neurocomputing, 2019, 332: 137–148
https://doi.org/10.1016/j.neucom.2018.12.025
|
55 |
S Zhang, F He, W Ren, J Yao. Joint learning of image detail and transmission map for single image dehazing. The Visual Compute, 2018, DOI: 10.1007/s00371-018-1612-9
https://doi.org/10.1007/s00371-018-1612-9
|
56 |
X Chen, F He, H Yu. A matting method based on full feature coverage. Multimedia Tools and Applications, 2019, 78(9): 11173–11201
https://doi.org/10.1007/s11042-018-6690-1
|
57 |
H Yu, F He, Y Pan. A novel segmentation model for medical images with intensity inhomogeneity based on adaptive perturbation. Multimedia Tools and Applications, 2019, 78(9), 11779–11798
https://doi.org/10.1007/s11042-018-6735-5
|
58 |
F Fang, M Yi, , H Feng, S Hu, C Xiao. Narrative collage of I mage collections by scene graph recombination. IEEE Transactions on Visualization and Computer Graphics, 2018, 24(9): 2559–2572
https://doi.org/10.1109/TVCG.2017.2759265
|
59 |
Y Wu, F He, D Zhang, X Li. Service-oriented feature-based data exchange for cloud-based design and manufacturing. IEEE Transactions on Services Computing, 2018, 11(2): 341–353
https://doi.org/10.1109/TSC.2015.2501981
|
60 |
Y Pan, F He, H Yu. A correlative denoising autoencoder to model social influence for top-N recommender system. Frontiers of Computer Science, 2020, 14(3): 143301
https://doi.org/10.1007/s11704-019-8123-3
|
61 |
X Lv, F He, X Yan, Y Wu, Y Cheng. Integrating selective undo of feature-based modeling operations for real-time collaborative CAD systems. Future Generation Computer Systems, 2019, 100: 473–497
https://doi.org/10.1016/j.future.2019.05.021
|
62 |
K Li, F He, H Yu, X Chen. A parallel and robust object tracking approach synthesizing adaptive Bayesian learning and improved incremental subspace learning. Frontiers of Computer Science, 2019, 13(5): 1116–1135
https://doi.org/10.1007/s11704-018-6442-4
|
63 |
L Yang, Q Yan, Y Fu, C Xiao. Surface reconstruction via fusing sparsesequence of depth images. IEEE Transactions on Visualization and Computer Graphics, 2018, 24(2): 1190–1203
https://doi.org/10.1109/TVCG.2017.2657766
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|