The objection detection of panoramic image is the key part of street view, intelligent transportation, automatic driving and other technologies. Due to the shortcomings of existing algorithms in detecting panoramic images, firstly a high-resolution panoramic image dataset is introduced, then the multi-scale feature pyramid networks (MS-RPN) structure is proposed and a new network with Sim-Inception module is designed. The network can extract different scales of objects from different feature layers, so that the small object in the image can also be accurately detected. Finally, the entire detection network is trained by using the dataset constructed in this study. Meanwhile, the ROIPool is replaced by ROIAlign and the loss function is adjusted according to the network structure. The experimental results show that the detection performance on the panoramic dataset is significantly improved by authors’ proposed algorithm, which is better than other deep learning algorithms, especially for small object in the image.

References

1. 1)
  - 24. Tang, Y., Wang, J., Gao, B., et al: ‘Large scale semi-supervised object detection using visual and semantic knowledge transfer’. Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 2119–2128.
2. 2)
  - 23. Cheng, G., Han, J.W., Zhou, P.C., et al: ‘Learning rotation-invariant and Fisher discriminative convolutional neural networks for object detection’, IEEE Trans. Image Process., 2019, 28, (1), pp. 265–278.
3. 3)
  - 10. Marín, J., Vázquez, D., López, A.M., et al: ‘Random forests of local experts for pedestrian detection’. Proc. IEEE Int. Conf. on Computer Vision, Santiago, Chile, 2014, pp. 2592–2599.
4. 4)
  - 34. Russakovsky, O., Deng, J., Su, H., et al: ‘Imagenet large scale visual recognition challenge’, Int. J. Comput. Vision., 2014, 115, (3), pp. 211–252.
5. 5)
  - 13. Ren, S., He, K., Girshick, R., et al: ‘Faster R-CNN: towards real-time object detection with region proposal networks’, IEEE Trans. Pattern Anal, Mach. Intell., 2017, 39, (6), pp. 1137–1149.
6. 6)
  - 26. Everingham, M., Gool, L.V., Williams, C.K.I., et al: ‘The Pascal visual object classes (VOC) challenge’, Int. J. Comput. Vision., 2010, 88, (2), pp. 303–338.
7. 7)
  - 4. Kise, M., Zhang, Q.: ‘Creating a panoramic field image using multi-spectral stereovision system’, Comput. Electron. Agr., 2008, 60, (1), pp. 67–75.
8. 8)
  - 17. Liu, W., Anguelov, D., Erhan, D., et al: ‘SSD: single shot MultiBox detector’. Proc. European Conf. on Computer Vision, Amsterdam, The Netherlands, 2016, pp. 21–37.
9. 9)
  - 7. Ojala, T., Pietikainen, M., Maenpaa, T.: ‘Multiresolution gray-scale and rotation invariant texture classification with local binary patterns’, IEEE Trans. Pattern Anal, Mach. Intell., 2002, 24, (7), pp. 971–987.
10. 10)
  - 16. Redmon, J., Farhadi, A.: ‘YOLOv3: an incremental improvement’, arXiv preprint, 2018, arXiv:1804.02767.
11. 11)
  - 5. Dalal, N., Triggs, B.: ‘Histograms of oriented gradients for human detection’. Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, San Diego, CA, USA, 2005, pp. 886–893.
12. 12)
  - 28. Simonyan, K., Zisserman, A.: ‘Very deep convolutional networks for large-scale image recognition’, arXiv preprint, 2014, arXiv:1409.1556.
13. 13)
  - 35. Bodla, N., Singh, B., Chellappa, R., et al: ‘Soft-NMS – improving object detection with one line of code’. Proc. IEEE Int. Conf. on Computer Vision, Venice, Italy, 2017, pp. 5562–5570.
14. 14)
  - 29. He, K., Zhang, X., Ren, S., et al: ‘Deep residual learning for image recognition’. Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 770–778.
15. 15)
  - 18. Cai, Z., Fan, Q., Feris, R.S., et al: ‘A unified multi-scale deep convolutional neural network for fast object detection’. Proc. European Conf. on Computer Vision, Amsterdam, The Netherlands, 2016, pp. 354–370.
16. 16)
  - 33. Cai, Z., Vasconcelos, N.: ‘Cascade R-CNN: delving into high quality object detection’, arXiv preprint, 2017, arXiv:1712.00726.
17. 17)
  - 14. Redmon, J., Divvala, S., Girshick, R., et al: ‘You only look once: unified, real-time object detection’. Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 779–788.
18. 18)
  - 21. He, K., Gkioxari, G., Dollar, P., et al: ‘Mask R-CNN’. Proc. IEEE Int. Conf. on Computer Vision, Venice, Italy, 2017, pp. 2980–2988.
19. 19)
  - 3. Lin, M., Xu, G., Ren, X., et al: ‘Cylindrical panoramic image stitching method based on multi-cameras’. Proc. IEEE Int. Conf. Cyber Technol. Autom., Control, Intell. Syst., Shenyang, China, 2015, pp. 1091–1096.
20. 20)
  - 15. Redmon, J., Farhadi, A.: ‘YOLO9000: better, faster, strong’. Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 6517–6525.
21. 21)
  - 27. Lin, T.Y., Maire, M., Belongie, S., et al: ‘Microsoft COCO: common objects in context’. Proc. European Conf. on Computer Vision, ETH Zurich, Switzerland, 2014, pp. 740–755.
22. 22)
  - 9. Baesens, B., Viaene, S., Gestel, T.V., et al: ‘Least squares support vector machine classifiers: an empirical evaluation’. Proc. Belgian-Dutch Artificial Intelligence Conf., De Efteling, Dutch, 2000.
23. 23)
  - 19. Zhu, C., Zheng, Y., Luu, K., et al: ‘CMS-RCNN: contextual multi-scale region-based CNN for unconstrained face detection’, arXiv preprint, 2016, arXiv: 1606.05413.
24. 24)
  - 8. Dollar, P., Belongie, S., Belongie, S., et al: ‘Fast feature pyramids for object detection’, IEEE Trans. Pattern Anal, Mach. Intell., 2014, 36, (8), pp. 1532–1532.
25. 25)
  - 12. Girshick, R.: ‘Fast R-CNN’. Proc. IEEE Int. Conf. on Computer Vision, Santiago, Chile, 2014.
26. 26)
  - 32. Szegedy, C., Ioffe, S., Vanhoucke, V., et al: ‘Inception-v4, inception-ResNet and the impact of residual connections on learning’. Proc. AAAI Conf. on Artificial Intelligence, San Francisco, California, USA, 2017.
27. 27)
  - 31. Szegedy, C., Liu, W., Jia, Y., et al: ‘Going deeper with convolutions’. Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, Boston, Massachusetts, USA, 2015, pp. 1–9.
28. 28)
  - 22. Lin, T.Y., Goyal, P., Girshick, R., et al: ‘Focal loss for dense object detection’, IEEE Trans. Pattern Anal, Mach. Intell., 2017, PP, (99), pp. 2999–3007.
29. 29)
  - 25. Yan, Z., Liang, J., Pan, W., et al: ‘Weakly-and semi-supervised object detection with expectation-maximization algorithm’, arXiv preprint, 2017, arXiv:1702.08740.
30. 30)
  - 1. Kim, H., Jung, J., Paik, J.: ‘Fisheye lens camera based surveillance system for wide field of view monitoring’, Optik, 2016, 127, (14), pp. 5636–5646.
31. 31)
  - 6. Lowe, D.G.: ‘Distinctive image features from scale-invariant keypoints’, Int. J. Comput. Vision., 2004, 60, (2), pp. 91–110.
32. 32)
  - 2. Bertozzi, M., Castangia, L., Cattani, S., et al: ‘360° detection and tracking algorithm of both pedestrian and vehicle using fisheye images’. Proc. IEEE Intell., Seoul, South Korea, 2015, pp. 132–137.
33. 33)
  - 11. Girshick, R., Donahue, J., Darrell, T., et al: ‘Rich feature hierarchies for accurate object detection and semantic segmentation’. Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, Columbus, Ohio, USA, 2014, pp. 580–587.
34. 34)
  - 20. Lin, T.Y., Dollar, P., Girshick, R., et al: ‘Feature pyramid networks for object detection’. Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 936–944.
35. 35)
  - 30. Neubeck, A., Gool, L.V.: ‘Efficient non-maximum suppression’. Proc. IEEE Int. Conf. on Pattern Recognition, Hong Kong, China, 2006, pp. 850–855.

Object detection for panoramic images based on MS-RPN structure in traffic road scenes

References

Related content