Научная визуализация, 2024, том 16, номер 2, страницы 154 - 168, DOI: 10.26583/sv.16.2.11
Dual-Pass Feature-Fused SSD Model for Detecting Multi-Scale Vehicles on the Construction Site
Авторы: M. Petrov1,A, S. Zimina2,A, D. Dyachenko3,B, A. Dubodelov4,B, S. Simakov5,A
A Moscow Institute of Physics & Technology, Instituskii Lane 7, Dolgoprudny 141700, Russia
B LLC Acceleration, Stolyarnii Lane 3, Moscow 115114, Russia
1 ORCID: 0000-0003-4907-4687, mikhail.petrov@phystech.edu
2 ORCID: 0000-0003-4160-9915, sofya.zimina@phystech.edu
3 ORCID: 0009-0004-5158-9377, d.dyachenko@acceleration.ru
4 ORCID: 0009-0000-9162-5863, a.dubodelov@acceleration.ru
5 ORCID: 0000-0003-3406-9623, simakov.ss@phystech.edu
Аннотация
When detecting equipment on a construction site the objects of detection could have very different scale relative to the image on which they are located. For better detection and bounding box visualization of small objects, a Feature-Fused modification of the SSD detector can be used. Together with the use of overlapping image slicing on the inference, this model copes well with the detection of small objects. However, excessive manual adjustment of the slicing parameters for better detection of small objects can both generally worsen detection on scenes different from those on which the model was adjusted, and lead to significant losses in the detection of large objects and problems with their bound-ing box visualization. Therefore, to achieve the best quality, the image slicing parameters should be automatically selected by the model depending on the characteristic scales of objects in the image.
The article presents a dual-pass version of Feature-Fused SSD for automatic determination of image slicing parameters. To determine the characteristic sizes of detected objects on the first pass, a fast truncated version of the detector is used. On the second pass the final object detection is carried out with slicing parameters selected after the first one. Depending on the complexity of the task being solved, the detector demonstrates a quality of 0.82 - 0.92 according to the mAP (mean Average Precision) metric.
Ключевые слова: computer vision; construction site; construction vehicles; single shot detector.