Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2023, Vol. 17 Issue (3) : 173321    https://doi.org/10.1007/s11704-022-1389-x
RESEARCH ARTICLE
Vehicle color recognition based on smooth modulation neural network with multi-scale feature fusion
Mingdi HU1(), Long BAI1, Jiulun FAN1, Sirui ZHAO2, Enhong CHEN2
1. School of Communications and Information Engineering & School of Artificial Intelligence, Xi’an University of Posts & Telecommunications, Xi’an 710121, China
2. School of Computer Science and Technology, University of Science and Technology of China, Hefei 230026, China
 Download: PDF(8165 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Vehicle Color Recognition (VCR) plays a vital role in intelligent traffic management and criminal investigation assistance. However, the existing vehicle color datasets only cover 13 classes, which can not meet the current actual demand. Besides, although lots of efforts are devoted to VCR, they suffer from the problem of class imbalance in datasets. To address these challenges, in this paper, we propose a novel VCR method based on Smooth Modulation Neural Network with Multi-Scale Feature Fusion (SMNN-MSFF). Specifically, to construct the benchmark of model training and evaluation, we first present a new VCR dataset with 24 vehicle classes, Vehicle Color-24, consisting of 10091 vehicle images from a 100-hour urban road surveillance video. Then, to tackle the problem of long-tail distribution and improve the recognition performance, we propose the SMNN-MSFF model with multi-scale feature fusion and smooth modulation. The former aims to extract feature information from local to global, and the latter could increase the loss of the images of tail class instances for training with class-imbalance. Finally, comprehensive experimental evaluation on Vehicle Color-24 and previously three representative datasets demonstrate that our proposed SMNN-MSFF outperformed state-of-the-art VCR methods. And extensive ablation studies also demonstrate that each module of our method is effective, especially, the smooth modulation efficiently help feature learning of the minority or tail classes. Vehicle Color-24 and the code of SMNN-MSFF are publicly available and can contact the author to obtain.

Keywords vehicle color recognition      benchmark dataset      multi-scale feature fusion      long-tail distribution      improved smooth l1 loss     
Corresponding Author(s): Mingdi HU   
About author:

Tongcan Cui and Yizhe Hou contributed equally to this work.

Just Accepted Date: 31 March 2022   Issue Date: 20 October 2022
 Cite this article:   
Mingdi HU,Long BAI,Jiulun FAN, et al. Vehicle color recognition based on smooth modulation neural network with multi-scale feature fusion[J]. Front. Comput. Sci., 2023, 17(3): 173321.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-022-1389-x
https://academic.hep.com.cn/fcs/EN/Y2023/V17/I3/173321
Fig.1  24 vehicle colors distribution diagram
Color Number Percentage/% Color Number Percentage/%
White 11827 37.87 Blue 316 1.01
Black 6270 20.08 Dark-brown 230 0.74
Orange 2431 7.78 Brown 118 0.38
Silver-gray 2125 6.80 Yellow 100 0.32
Grass-green 1766 5.65 Lemon-yellow 92 0.29
Dark-gray 1555 4.98 Dark-orange 90 0.29
Dark-red 1263 4.04 Dark-green 70 0.22
Gray 736 2.36 Red-orange 63 0.20
Red 644 2.06 Earthy-yellow 52 0.17
Cyan 553 1.77 Green 50 0.16
Champagne 466 1.49 Pink 35 0.11
Dark-blue 365 1.17 Purple 15 0.04
Tab.1  Number of 24 vehicle colors
Color Sample picture Color Sample picture Color Sample picture
Red Earthy-yellow Black
Dark-red Green White
Pink Dark-green Silver-gray
Orange Grass-green Gray
Dark-orange Cyan Dark-gray
Red-orange Blue Champagne
Yellow Dark-blue Brown
Lemon-yellow Purple Dark-brown
Tab.2  Sample of vehicle color-24 benchmark dataset
Fig.2  Comparison before and after removal of haze
Fig.3  Comparison before and after brightness adjustment
Fig.4  The framework overview for our SMNN-MSFF
Operation Kernel Stride Feature map size Dimension
Input 227×227×3
Conv1 7×7 2 112×112×64 9408
Maxpool 3×3 2 56×56×64
Res_block1 [1×13×31×1]×3 1 56×56×256 442368
Res_block2 [1×13×31×1]×4 1 28×28×512 4718592
Res_block3 [1×13×31×1]×4 1 14×14×1024 18874368
Res_block4 [1×13×31×1]×3 1 7×7×2048 56623104
Average pool 7×7 1 1×1×2048
Fc 1000 2048000
Tab.3  Structural parameters of VCR-ResNet
Fig.5  Visualization results of each layer of VCR-ResNet
Fig.6  The structure of multi-scale feature fusion
Faster-RCNN VGG12 VGG16 ResNet34 ResNet50 MobileNet +VCR-ResNet
White 0.84 0.64 0.83 0.81 0.82 0.8 0.85
Black 0.82 0.6 0.83 0.79 0.81 0.74 0.84
Orange 0.81 0.75 0.83 0.8 0.81 0.79 0.81
Silver-gray 0.77 0.62 0.78 0.7 0.71 0.71 0.73
Grass-green 0.70 0.72 0.76 0.75 0.78 0.77 0.8
Dark-gray 0.66 0.57 0.68 0.63 0.66 0.62 0.67
Dark-red 0.78 0.61 0.76 0.73 0.72 0.70 0.72
Gray 0.18 0.50 0.26 0.53 0.55 0.58 0.65
Red 0.60 0.58 0.60 0.70 0.71 0.70 0.73
Cyan 0.75 0.72 0.80 0.77 0.78 0.81 0.80
Champagne 0.63 0.59 0.76 0.72 0.7 0.67 0.70
Dark-blue 0.66 0.63 0.75 0.63 0.68 0.62 0.70
Blue 0.73 0.62 0.77 0.65 0.65 0.61 0.66
Dark-brown 0.45 0.48 0.29 0.61 0.63 0.63 0.69
Brown 0.30 0.52 0.07 0.57 0.56 0.66 0.68
Yellow 0.51 0.70 0.56 0.51 0.49 0.49 0.44
Lemon-yellow 0.87 0.70 0.92 0.55 0.57 0.56 0.59
Dark-orange 0.65 0.64 0.52 0.58 0.57 0.51 0.62
Dark-green 0.38 0.63 0.75 0.54 0.53 0.54 0.57
Red-orange 0.24 0.58 0.43 0.49 0.51 0.5 0.51
Earthy-yellow 0.62 0.69 0.33 0.52 0.51 0.55 0.6
Green 0.61 0.73 0.97 0.63 0.65 0.67 0.69
Pink 0.50 0.68 0.50 0.70 0.72 0.73 0.71
Purple 0 0.47 0 0.55 0.57 0.58 0.60
mAP 58.59% 62.38% 61.49% 64.42% 65.38% 64.75% 68.17%
Relative increase 0% 3.79% 2.90% 5.83% 6.79% 6.16% 9.58%
Tab.4  Comparison table of recognition accuracy of feature extraction network ablation experiments
Faster-RCNN +FPN L1 Loss CE Loss MSE Loss Focal Loss +VCR-Loss
White 0.84 0.96 0.96 0.95 0.99 0.98 0.98
Black 0.82 0.94 0.94 0.90 0.98 0.97 0.97
Orange 0.81 0.88 0.88 0.87 0.97 0.98 0.98
Silver-gray 0.77 0.85 0.85 0.86 0.93 0.97 0.96
Grass-green 0.70 0.81 0.81 0.83 0.95 0.98 0.98
Dark-gray 0.66 0.73 0.73 0.78 0.88 0.94 0.94
Dark-red 0.78 0.83 0.83 0.82 0.87 0.97 0.98
Gray 0.18 0.69 0.69 0.76 0.79 0.82 0.89
Red 0.60 0.78 0.78 0.75 0.74 0.96 0.96
Cyan 0.75 0.81 0.81 0.83 0.84 0.98 0.97
Champagne 0.63 0.76 0.76 0.79 0.8 0.94 0.91
Dark-blue 0.66 0.77 0.77 0.74 0.75 0.97 0.96
Blue 0.73 0.77 0.77 0.78 0.79 0.97 0.97
Dark-brown 0.45 0.79 0.79 0.77 0.77 0.88 0.97
Brown 0.30 0.71 0.71 0.75 0.78 0.80 0.88
Yellow 0.51 0.30 0.30 0.79 0.81 0.95 0.97
Lemon-yellow 0.87 0.77 0.77 0.85 0.87 0.99 0.99
Dark-orange 0.65 0.70 0.70 0.81 0.48 0.94 0.96
Dark-green 0.38 0.71 0.71 0.8 0.89 0.91 0.94
Red-orange 0.24 0.54 0.54 0.71 0.57 0.94 0.99
Earthy-yellow 0.62 0.71 0.71 0.73 0.78 0.92 0.97
Green 0.61 0.73 0.73 0.75 0.79 0.89 0.93
Pink 0.50 0.66 0.66 0.63 0.55 0.90 0.94
Purple 0 0.65 0.65 0.5 0.57 0.48 0.80
mAP 58.59% 74.38% 74.38% 78.13% 79.75% 91.79% 94.96%
Relative increase 0% 15.79% 15.79% 19.54% 21.16% 33.20% 36.37%
Tab.5  Comparison of recognition accuracy between multi-scale feature fusion and loss function ablation experiments
Fig.7  Comparison of training loss of different models
Faster-RCNN [12] SSD [13] YOLO-v3 [14] YOLO-v4 [15] Efficient-Det [16] Center-Net [17] Retina-Net [18] SMNN-MSFF
White 0.84 0.96 0.97 0.98 0.95 0.97 0.98 0.98
Black 0.82 0.95 0.96 0.96 0.93 0.94 0.97 0.97
Orange 0.81 0.96 0.97 0.98 0.96 0.97 0.98 0.98
Silver-gray 0.77 0.91 0.92 0.92 0.87 0.88 0.97 0.96
Grass-green 0.70 0.96 0.97 0.98 0.95 0.97 0.98 0.98
Dark-gray 0.66 0.84 0.82 0.93 0.73 0.73 0.94 0.94
Dark-red 0.78 0.93 0.94 0.94 0.91 0.92 0.97 0.98
Gray 0.18 0.54 0.50 0.40 0.28 0.28 0.82 0.89
Red 0.60 0.88 0.88 0.81 0.79 0.77 0.96 0.96
Cyan 0.75 0.92 0.93 0.92 0.82 0.89 0.98 0.97
Champagne 0.63 0.81 0.83 0.77 0.66 0.76 0.94 0.91
Dark-blue 0.66 0.86 0.85 0.87 0.76 0.77 0.97 0.96
Blue 0.73 0.87 0.85 0.87 0.75 0.82 0.97 0.97
Dark-brown 0.45 0.71 0.68 0.60 0.49 0.64 0.88 0.97
Brown 0.30 0.58 0.52 0.25 0.36 0.25 0.80 0.88
Yellow 0.51 0.79 0.72 0.56 0.47 0.42 0.95 0.97
Lemon-yellow 0.87 0.93 0.84 0.77 0.66 0.64 0.99 0.99
Dark-orange 0.65 0.78 0.66 0.60 0.66 0.53 0.94 0.96
Dark-green 0.38 0.58 0.63 0.00 0.23 0.66 0.91 0.94
Red-orange 0.24 0.61 0.61 0.05 0.33 0.13 0.94 0.99
Earthy-yellow 0.62 0.74 0.69 0.43 0.43 0.29 0.92 0.97
Green 0.61 0.74 0.77 0.54 0.33 0.40 0.89 0.93
Pink 0.50 0.71 0.75 0.03 0.18 0.01 0.90 0.94
Purple 0.00 0.19 0.08 0.00 0.10 0.00 0.48 0.80
mAP 58.59% 78.13% 76.38% 62.77% 60.86% 61.07% 91.79% 94.96%
Tab.6  Comparison of recognition accuracy of 24 colors in different network classifications
Fig.8  Vehicle recognition results with 24 colors
Chen et al. [3] Hu et al. [7] Fu et al. [10] SMNN-MSFF
White 0.94 0.96 0.98 0.99
Black 0.97 0.97 0.99 0.98
Gray 0.85 0.87 0.96 0.96
Red 0.99 0.99 0.99 0.98
Cyan 0.98 0.99 0.96 0.98
Blue 0.95 0.97 0.94 0.97
Yellow 0.95 0.97 0.98 0.96
Green 0.78 0.83 0.97 0.96
mAP 92.63% 94.38% 97.13% 97.25%
Tab.7  Comparison of recognition accuracy of 8 colors in different networks
Fig.9  Recognition of performance on different datasets
Algorithm Faster-RCNN [12] SSD [13] YOLO-v3 [14] YOLO-v4 [15] Efficient-Det [16] Center-Net [17] Retina-Net [18] SMNN-MSFF
Inference time/s 2.670 1.689 1.265 0.989 1.062 1.233 1.359 1.021
Tab.8  Comparison of different network recognition speeds (using the CPU device)
  
  
  
  
  
1 X, Ke Y F Zhang . Fine-grained vehicle type detection and recognition based on dense attention network. Neurocomputing, 2020, 399: 247–257
2 A, Tariq M Z, Khan M U G Khan . Real time vehicle detection and colour recognition using tuned features of faster-RCNN. In: Proceedings of the 1st International Conference on Artificial Intelligence and Data Analytics. 2021, 262–267
3 P, Chen X, Bai W Y Liu . Vehicle color recognition on urban road by feature context. IEEE Transactions on Intelligent Transportation Systems, 2014, 15( 5): 2340–2346
4 Y, Jeong K H, Park D Park . Homogeneity patch search method for voting-based efficient vehicle color classification using front-of-vehicle image. Multimedia Tools and Applications, 2019, 78( 20): 28633–28648
5 D S B, Tilakaratna U, Watchareeruetai S, Siddhichai N Natcharapinchai . Image analysis algorithms for vehicle color recognition. In: Proceedings of 2017 International Electrical Engineering Congress. 2017, 1–4
6 E, Dule M, Gökmen M S Beratoğlu . A convenient feature vector construction for vehicle color recognition. In: Proceedings of the 11th WSEAS International Conference on Nural Networks and 11th WSEAS International Conference on Evolutionary Computing and 11th WSEAS International Conference on Fuzzy Systems. 2010, 250–255
7 C P, Hu X, Bai L, Qi P, Chen G J, Xue L Mei . Vehicle color recognition with spatial pyramid deep learning. IEEE Transactions on Intelligent Transportation Systems, 2015, 16( 5): 2925–2934
8 R F, Rachmadi I K E Purnama . Vehicle color recognition using convolutional neural network. 2015, arXiv preprint arXiv: 1510.07391
9 L, Zhuo Q, Zhang J F, Li J, Zhang X G, Li H Zhang . High-accuracy vehicle color recognition using hierarchical fine-tuning strategy for urban surveillance videos. Journal of Electronic Imaging, 2018, 27( 5): 051203
10 H Y, Fu H D, Ma G Y, Wang X M, Zhang Y F Zhang . MCFF-CNN: multiscale comprehensive feature fusion convolutional neural network for vehicle color recognition based on residual learning. Neurocomputing, 2020, 395: 178–187
11 M, Nafzi M, Brauckmann T Glasmachers . Vehicle shape and color classification using convolutional neural network. 2019, arXiv preprint arXiv: 1905.08612
12 S Q, Ren K M, He R, Girshick J Sun . Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39( 6): 1137–1149
13 W, Liu D, Anguelov D, Erhan C, Szegedy S, Reed C Y, Fu A C Berg . SSD: single shot MultiBox detector. In: Proceedings of the 14th European Conference on Computer Vision. 2016, 21–37
14 J, Redmon S, Divvala R, Girshick A Farhadi . You only look once: Unified, real-time object detection. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 779–788
15 A, Bochkovskiy C Y, Wang H Y M Liao . YOLOv4: optimal speed and accuracy of object detection. 2020, arXiv preprint arXiv: 2004.10934
16 M X, Tan R M, Pang Q V Le . EfficientDet: scalable and efficient object detection. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 10778–10787
17 X Y, Zhou D Q, Wang P Krähenbühl . Objects as points. 2019, arXiv preprint arXiv: 1904.07850
18 T Y, Lin P, Goyal R, Girshick K M, He P Dollár . Focal loss for dense object detection. In: Proceedings of 2017 IEEE International Conference on Computer Vision. 2017, 2999–3007
19 K H, Tang J Q, Huang H W Zhang . Long-tailed classification by keeping the good and removing the bad momentum causal effect. In: Proceedings of the 34th Conference on Neural Information Processing Systems. 2020
20 X G, Wang X, Bai W Y, Liu L J Latecki . Feature context for image classification and object detection. In: Proceedings of the CVPR 2011. 2011, 961–968
21 A, Krizhevsky I, Sutskever G E Hinton . ImageNet classification with deep convolutional neural networks. In: Proceedings of the 26th Annual Conference on Neural Information Processing Systems. 2012, 1106–1114
22 Y, Cui M L, Jia T Y, Lin Y, Song S Belongie . Class-balanced loss based on effective number of samples. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 9260–9269
23 K D, Cao C L, Wei A, Gaidon N, Arechiga T Y Ma . Learning imbalanced datasets with label-distribution-aware margin loss. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 140
24 M Y, Ren W Y, Zeng B, Yang R Urtasun . Learning to reweight examples for robust deep learning. In: Proceedings of the 35th International Conference on Machine Learning. 2018
25 J, Shu Q, Xie L X, Yi Q, Zhao S P, Zhou Z B, Xu D Y Meng . Meta-weight-net: learning an explicit mapping for sample weighting. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 172
26 M A, Jamal M, Brown M H, Yang L Q, Wang B Q Gong . Rethinking class-balanced methods for long-tailed visual recognition from a domain adaptation perspective. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 7607–7616
27 B Y, Kang S N, Xie M, Rohrbach Z C, Yan A, Gordo J S, Feng Y Kalantidis . Decoupling representation and classifier for long-tailed recognition. In: Proceedings of the 8th International Conference on Learning Representations. 2020
28 B Y, Zhou Q, Cui X S, Wei Z M Chen . BBN: bilateral-branch network with cumulative learning for long-tailed visual recognition. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 9716–9725
29 X, Yin X, Yu K, Sohn X M, Liu M Chandraker . Feature transfer learning for face recognition with under-represented data. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 5697–5706
30 J L, Liu Y F, Sun C C, Han Z P, Dou W H Li . Deep representation learning on long-tailed data: a learnable embedding augmentation perspective. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 2967–2976
31 Z W, Liu Z Q, Miao X H, Zhan J Y, Wang B Q, Gong S X Yu . Large-scale long-tailed recognition in an open world. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 2532–2541
32 P, Chu X, Bian S P, Liu H B Ling . Feature space augmentation for long-tailed data. In: Proceedings of the 16th European Conference on Computer Vision. 2020, 694–710
33 A K, Menon S, Jayasumana A S, Rawat H, Jain A, Veit S Kumar . Long-tail learning via logit adjustment. In: Proceedings of the International Conference on Learning Representations. 2020
34 L Y, Xiang G G, Ding J G Han . Learning from multiple experts: self-paced knowledge distillation for long-tailed classification. In: Proceedings of the 16th European Conference on Computer Vision. 2020, 247–263
35 Y, Li T, Wang B Y, Kang S, Tang C F, Wang J T, Li J S Feng . Overcoming classifier imbalance for long-tail object detection with balanced group softmax. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 10988–10997
36 X D, Wang L, Lian Z Q, Miao Z W, Liu S X Yu . Long-tailed recognition by routing diverse distribution-aware experts. 2021, arXiv preprint arXiv: 2010.01809
37 X Q, Xue J K, Ding Y J Shi . Research and application of illumination processing method in vehicle color recognition. In: Proceedings of the 3rd IEEE International Conference on Computer and Communications. 2017, 1662–1666
38 C, Seifert A, Aamir A, Balagopalan D, Jain A, Sharma S, Grottel S Gumhold . Visualizations of deep neural networks in computer vision: a survey. In: Cerquitelli T, Quercia D, Pasquale F, eds. Transparent Data Mining for Big and Small Data. Cham: Springer, 2017, 123–144
39 K M, He X Y, Zhang S Q, Ren J Sun . Deep residual learning for image recognition. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 770–778
40 T Y, Lin P, Dollár R, Girshick K M, He B, Hariharan S Belongie . Feature pyramid networks for object detection. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 936–944
[1] FCS-21389-OF-MH_suppl_1 Download
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed