Optical measurement methods are widely used
in all fields of science and technology. Such methods include methods of
close-range photogrammetry [1]. They allow measuring the shape of the object
surface contactless with high spatial resolution and high accuracy over a large
area. The basic principle of these methods is to obtain three-dimensional
coordinates of the object from its two-dimensional images. The main approach
for this is to use two digital video cameras. Two images of the same object
obtained from different angles allow reconstructing three-dimensional
coordinates of the object. To achieve this, it is necessary to determine the
corresponding points on the images. For this purpose, various methods are used:
algorithms for finding key (feature) points, structured illumination, epipolar
geometry, cross-correlation analysis, etc.
One of the photogrammetry methods is based
on cross-correlation processing of stereo pairs of images. It is an Image
Pattern Correlation Technique (IPCT) [2-5]. It is based on processing
algorithms of another method – Particle Image Velocimetry (PIV) [6] and is
another variant of Digital Image Correlation method (DIC) [7].
In contrast, its universality, the method
for finding corresponding points based on cross-correlation analysis has its
disadvantages. The first of them is the proportional increasing of the
calculation complexity with the increasing of the spatial resolution. The
second is the direct dependence of the measurement error on the amplitude of
the displacement of the corresponding points on the images. The greater the
distance between the points, the greater the error. To reduce it, it is
necessary to iteratively calculate the correlation function with decreasing
aperture, which causes an increase in computational costs. In our work we
attempted to apply machine learning methods to replace the cross-correlation
calculation in the IPCT method to solve these problems.
Machine learning algorithms in PIV tasks
have been used for a long time [8]. But due to the weak development of such an
area as machine learning, this application was very limited and therefore was
used only at the post-filtration stage. Since about 2012, there has been a
greater interest in neural networks, namely convolutional neural networks,
thanks to a successful solution proposed on ImageNet. Now this area has begun
to develop actively again, not least because of the appearance of affordable
and powerful GPUs, on which the execution and training of the network is much
faster. One of the first proposals for the application of neural networks to
PIV tasks was suggested in [9]. Since these were the first attempts in a new
direction, the proposed ideas strongly intersected with cross-correlation
algorithms. Two 32×32-pixel interrogation windows from two images were
also uploaded to the network, and the network predicted the displacement vector
corresponding to these windows. The first attempts [9], although they did not
show better results compared to the already known methods, but they showed the
efficiency of the idea, which was further developed. A similar study was
conducted in [10], where a network with the architecture proposed in [11] was
studied. Approaching modern solutions, we should mention the deep neural
network from [12], which was used in [13] for PIV tasks, but with some
modifications. The main one is a new database for training. More detailed
methods of machine learning for diagnostic problems in hydrodynamics are given
in [14-15].
In this work, two networks were used for
the surface shape measurement problem: LiteFlowNet [16] and PIV-LiteFlowNet-en
[17]. Both of these networks are successors of the FlowNet network [12],
designed to receive two images at the input and evaluate the displacement of
the optical flow at the output. According to [12], the network is trained on a
large synthetic dataset and provides acceptable accuracy for estimating rigid
motion. However, the original FlowNet cannot be directly applied to PIV
problems, that was shown in [13].
Image Pattern Correlation Technique is an
optical method of measuring the shape of a surface from its stereo images. The
idea of IPCT, as in other photogrammetric methods, is to find the position of
spatial points with unknown three-dimensional coordinates. For this purpose,
two-dimensional coordinates of these points are searched for in two images
registered with digital video cameras of stereo system. The determined
coordinates allow calculating the required three-dimensional coordinates of the
point with the help of triangulation procedure based on the known intrinsic and
extrinsic parameters of the cameras.
IPCT uses a special pattern – a background
pattern, usually represented by randomly distributed dots on a white background,
or vice versa. Such a pattern allows to increase the contrast of the measured
surface and significantly increases the efficiency of cross-correlation
algorithm.
Cross-correlation processing of images
consists of several consecutive steps. The first step is to divide the input
images into small areas, the so-called interrogation windows, according to the
specified parameters. Then a cross-correlation function is calculated for each
pair of corresponding windows according to the formula
,
|
(1)
|
where
f(x,y)
and
g(x,y) are two-dimensional functions of
brightness distribution on images, ◦ – correlation operation, asterisk *
– complex conjugation operation. Usually, calculation is performed using fast
Fourier transform (FFT) algorithms, by the formula
,
|
(2)
|
where
F(u,v)
and
G(u,v) – Fourier images of
f(x,y)
and
g(x,y). An additional advantage is achieved by
calculating the normalized correlation function using the fast algorithm [18]
using the formula
,
|
(3)
|
where
c`(x,y)
– the correlation function of the transformed interrogation windows
f`(x,y) =
f(x,y) −µ1,
g`(x,y) =
g(x,y) −µ2
(µ1, µ2
– the average brightness value in the survey windows
f(x,y)
and
g(x,y)), obtained according to formula (2) using
the FFT, and σ1
è σ2
– is the standard
deviation of brightness in the interrogation windows, calculated as
,
|
(4)
|
.
|
(5)
|
The next step is to
find the maximum for each calculated cross-correlation function with subpixel
resolution by interpolating its peak. The most commonly used formula for
approximating the maximum with a Gaussian function is
,
|
(6)
|
where
x`max
– approximated horizontal coordinate of the maximum;
xmax
–
coordinate of the pixel with the maximum value of the correlation function;
c0,
c-1
and
c+1
– function values in the
maximum and in the closest to it pixels with coordinates
xmax − 1 è
xmax
+ 1.
Similarly, the coordinate of the maximum vertically
y`max.
The result of this algorithm is a vector
field of point displacements on two images. It can be used to triangulate
three-dimensional surface points directly. A single displacement vector with
its origin defines the two-dimensional coordinates of the interrogation window
center in the first image, and its end defines the two-dimensional coordinates
in the second image. During triangulation of two-dimensional points, usually
the pinhole camera model is used.
Many neural networks, which were considered
when selecting specific candidates to be used in our work, have been
implemented based on the Caffe library [19]. At the moment, their launch is
associated with a lot of technical difficulties due to outdated code base. The
LiteFlowNet-en and LiteFlowNet selected in the first section both have
implementations on the PyTorch library [20], which allowed to successfully
apply them for image processing.
An important feature of PIV-LiteFlowNet-en
in contrast to LiteFlowNet is that at the output image it gives resolution
equal to the input image without the use of bilinear interpolation, which
increases the accuracy of small displacements.
Both networks require a CUDA-enabled GPU to
run. The amount of video memory on the card is important, as it determines with
whatresolution the image can be processed on the video card. In our work we
used the service Google Colab [21], which allocates about 11 GiB of GPU memory
per user, which allows us to process images of 1900×2000 pixels with a
color depth of 24 bits. This is enough to process the experimental images used
in this work at full resolution in the working area.
The first step in evaluating the possibility
of using neural networks for photogrammetry was computer modeling. It consisted
in processing the synthetic images using two selected networks and a standard
cross-correlation algorithm [22]. The modeling evaluated the accuracy of
determining the displacements on the synthetic images using different
approaches. As test images the data set proposed in [13] was used. This set is
a modeled PIV images of flows with different conditions and parameters. Details
of the dataset and mean error for three tested algorithms are shown in Table 1.
The result of modeling in form of the mean square error of the true flow
(defined in the modeling) with the measured flow is shown in Figure 1.
Processing by cross-correlation was performed using a software package of our
own design [23]. Processing parameters: interrogation window size 24×24
pixels, interrogation window offset 4 pixels, approximation of correlation peak
by Gaussian distribution.
Table 1. Image parameters in the dataset and computer
modeling results
Case name
|
Description
|
Condition
|
Images quantity
|
PIV-LiteFlowNet-en
error, pixels
|
LiteFlowNet
error, pixels
|
Cross-correlation
error, pixels
|
Back-step
|
Backward
stepping flow
|
Re = 800
Re = 1000
Re = 1200
Re = 1500
|
600
600
1000
1000
|
0,043
|
0,155
|
0,292
|
Cylinder
|
Flow over a
circular cylinder
|
Re = 40
Re = 150
Re = 200
Re = 300
Re = 400
|
50
500
500
500
500
|
0,202
|
0,315
|
0,312
|
DNS-turbulence
|
A homogeneous
and isotropic turbulence flow
|
-
|
2000
|
0,204
|
0,589
|
0,783
|
JHTDB-channel
|
Channel flow
|
-
|
1900
|
0,080
|
0,218
|
0,311
|
JHTDB-channel
hd
|
Forced
isotropic turbulence
|
-
|
600
|
0,052
|
0,195
|
0,244
|
JHTDB-isotropic
1024 hd
|
Forced
isotropic turbulence
|
-
|
2000
|
0,140
|
0,288
|
0,313
|
JHTDB-mhd 1024
hd
|
Forced MHD
turbulence
|
-
|
800
|
0,090
|
0,349
|
0,382
|
SQG
|
Sea surface
flow driven by SQG model
|
-
|
1500
|
0,203
|
0,652
|
0,875
|
Uniform
|
Uniform flow
|
Displacement
0÷5 pixels
|
1000
|
0,033
|
0,141
|
0,253
|
Figure 1 shows that the PIV-LiteFlowNet-en
network has the best accuracy and small error variation for all flow cases
considered. The LiteFlowNet network, which was not trained for PIV tasks,
though has a large error, but shows good results, indicating its versatility
for various applications. The cross-correlation method generally showed worse
results, except for the "Cylinder" case, where it has a significant
error variation. There is another disadvantage of cross-correlation processing
– a vector field of lower density. While neural networks get a field of
resolution equal to the size of the input image, for cross-correlation the
field is 4 times less dense. This is due to the step between the interrogation
windows of 4 pixels. But cross-correlation has the advantage that its
calculation is executed entirely on the CPU, while neural networks need to use
the GPU to achieve the processing speed advantage.
Figure 1. Results of
computer modeling for two networks and cross-correlation
To evaluate the results of processing by
the compared algorithms with physical modeling, 150 pairs of experimental
images of the surface with different deformations were used. The images were
obtained using the imitator of deformable surface (IDS) described in [24]. The
IDS allows to arbitrarily set the shape of the surface by means of digital
servo-machines.
The IPCT method according to the algorithm
described in [5] was used to reconstruct the surface shape. As pre-processing,
the background pattern images were pre-matched using fiducial markers so, that
the measured surface will be oriented perpendicularly to the optical axis of
the camera. The size of each stereo pair was individual, but usually did not
exceed 1700×1500 pixels. The second stage of image processing was a
cross-correlation analysis (described in 2.1), which results were vector fields
of surface point displacements between stereo pair images. The third stage was
calculation of triangulation to determine three-dimensional surface coordinates.
Figure 2 shows an example of stereo pair processing using PIV-LiteFlowNet-en.
The results of perspective transformation are shown in Figure 2(c-d).
Figure 2. Example of stereo pair processing in experimental
modeling, all samples measured in pixels: a, b – original images; c, d –
results of perspective transformations; e – visualization of sparse vector
field; f – visualization of vector field in full resolution; g – representation
of deformation amplitude using color map
Before comparing the two selected networks
and cross-correlation, it is necessary to determine the equivalent conditions
for these algorithms. Figure 3 shows the vector fields for the same stereo
pair, but at different resolutions of the image. Figure shows that the
PIV-LiteFlowNet-en network cannot cope with offsets greater than ~12-13 pixels.
For the LiteFlowNet network, this value is ~80-90 pixels.
Figure 3. PIV-LiteFlowNet-en processing results for experimental
images with different input image resolutions, colormap shows displacements in
pixels
In order to compare networks with different
ranges of measured displacements, it was decided to perform a calculation for
images with different initial resolutions. In this case the displacements on
the images will change proportionally to their size. This will allow comparing
the results of the algorithms on the same experimental data. Calculation of the
RMS of the reprojection error was performed for 10 resolutions of each
experimental stereo pair. The series of resolutions used in the calculations
was obtained by the following formula
|
(7)
|
where
k
= 0,1,2…9;
R0
– the original size
of the side of the image. In this case, the aspect ratio of the images is
preserved. For the cross-correlation algorithm it was also necessary to define
parameters for image processing. It was impossible to choose universal
parameters, because the displacements can reach more than 100 pixels.
Therefore, the size of the interrogation window must be determined individually
for each resolution. The final processing parameters of the three tested
algorithms are presented in Table 2.
Figure 4 shows the result of testing the
three algorithms. Each curve is an average of 150 stereo pairs. In order to
test the influence of the total intensity on the image, an inverse version of
this pair was created on the basis of each pair of images. This is due to the
fact that the IPCT method is characterized by black dots on a white background,
while the PIV method is characterized by white dots on a black background.
Table 2. Image processing parameters with the
algorithms being tested
Resolution, pixels
|
Interrogation window size, pixels
|
Step for interrogation window, pixels
|
1700×1500
|
256
|
128
|
1530×1350
|
256
|
128
|
1360×1200
|
196
|
98
|
1190×1050
|
196
|
98
|
1020×900
|
128
|
64
|
850×750
|
128
|
64
|
680×600
|
64
|
32
|
510×450
|
64
|
32
|
340×300
|
32
|
16
|
170×150
|
32
|
16
|
The following conclusions can be made from
the processing results:
1. The cross-correlation algorithm shows a
stable reprojection error for almost all resolutions. It is not affected by
intensity inversion.
2. LiteFlowNet shows the best results among
all the algorithms, while image inversion negatively affects its performance.
3. PIV-LiteFlowNet-en shows poor results
due to large displacements in the images. At a resolution of 340×300
pixels, the displacements become quite small, but due to the high compression,
the quality of the images does not allow the algorithm to achieve high
accuracy. Image inversion improves the performance of the algorithm.
4. At 340×300 and 170×150
resolution, all algorithms show a decrease in accuracy due to strong image
resizing.
Figure 4. Average RMS value of the reprojection error for
different resolutions for the three algorithms studied in physical modeling
To better understand the behavior of the
algorithms, Figure 5 shows the RMS error for two stereo pairs: at small and
large displacements in the images. Figure 5(a) shows graphs in the same plane
for small and large displacements, 5(b) the value of this displacement for each
resolution, 5(c) enlarged area 5(a), demonstrating the behavior of algorithms
at small displacements. Comparing the graphs, we can once again see that the
maximum estimated displacement of PIV-LiteFlowNet-en is about 10 pixels, and of
LiteFlowNet about 80 pixels.
Figure 5. Average RMS value of the reprojection error for
different displacement amplitudes in physical modeling: a – RMS of the
reprojection error for small and large displacements; b – offset value for the
two cases; c – enlarged area of the graph (a) to show the behavior of the
algorithms at small displacements
For cross-correlation, the estimate of the
maximum displacement depends on the size of the interrogation window. The
maximum measured displacement should be less than 1/2 or 1/3 of the
interrogation window size. From graphs 5(b) and 5(c) it can be seen that once
the displacements in the images go down to 10 pixels or fewer, the
PIV-LiteFlowNet-en network shows better results than the others, which further
confirms the maximum estimated displacement. The average RMS of the
reprojection error in Figure 5 for all algorithms is quite large, i.e., close
to or greater than 1 pixel. This is explained by the fact that at large
displacements, as in Figure 5, the error increases dramatically, which leads to
increasing of the average error.
Based on the plots in Figure 5 the minimum
error for PIV-LiteFlowNet-en is achieved at 510:450 resolution, for LiteFlowNet
at 680:600. Figure 6 shows plots of reprojection error for each pair of images
out of 150 taken. Also, the 850:750 and 1700:1500 resolutions are plotted for
comparison. All graphs are sorted in order of increasing error for clarity. All
plots, except for the 1700:1500 case, show the same patterns. The LiteFlowNet
network has less error compared to the cross-correlation algorithm with
approximately the same graph shape, only in a few cases does the
cross-correlation exceed the accuracy of the network. The PIV-LiteFlowNet-en
network has the best accuracy of all the algorithms in about 50 cases. At the
same time, it has a better performance in image inversion. The exception is the
case at 510:450 resolution, where the accuracy for the image inversion case
does not fall far behind the original images.
Figure 6. RMS error for each captured image pair for the three
investigated algorithms at different resolutions in physical modeling
The fact that the PIV-LiteFlowNet-en
network performs the best only in ~50 cases is due to the fact that even at
510:450 resolution, most image pairs have displacements greater than 10 pixels.
Therefore, this network exceeds the other algorithms in only 1/3 of the cases.
To demonstrate that none of the algorithms are able to process full resolution
images with high accuracy, the case of 1700:1500 is given, which shows that
acceptable accuracy is achieved by cross-correlation in about 10 cases and by
LiteFlowNet in about 50 cases, which is not even half of the entire set. This
is due to the large displacements in the images.
The paper describes the application of
neural networks to the reconstruction of three-dimensional shape of the object
surface by photogrammetry. The results of their processing were compared with
the already proven algorithm based on cross-correlation. It allows estimating
with an acceptable speed only the sparse vector field, by which
three-dimensional points are calculated with triangulation. To find a solution
to this problem, we reviewed machine learning methods, of which two neural
networks LiteFlowNet and PIV-LiteFlowNet-en were selected. These networks allow
estimating the vector field in full image resolution and at the same time have
a higher calculation speed in comparison with cross-correlation. But the full
gain in speed can be obtained only with the use of a graphics processor.
It was found that neural networks have a
limit on the amount of correctly estimated displacement. For PIV-LiteFlowNet-en
this limit was 12-13 pixels, and for LiteFlowNet about 80 pixels. For the first
network, this can be explained by the training sample, and for the second by
the network design. Also, the difference in the processing of original and
inverse image networks was revealed, which is also a consequence of the
training samples.
According to the processing results,
LiteFlowNet exceeded the algorithm based on cross-correlation and
PIV-LiteFlowNet-en in the sum for all image resolutions. But if we compare
within the limitations of the algorithms, PIV-LiteFlowNet-en has better
accuracy. At the same time, for processing images typical for photogrammetry in
full resolution, none of the methods is satisfactory. For full application of
such neural networks, their modification is required for the investigated task.
Conducted physical modeling to check
selected approaches to image processing for photogrammetry problem showed their
performance and efficiency. But it is necessary to solve several problems for
their application in practice. The selected neural networks are not fully
suitable for the problem under study due to the limitation of the estimated displacement
value and high complexity of their running. For successful practical
application of machine learning it is necessary to modify the design of
selected neural networks, or to develop their own design, and to train them on
experimental images, specific to photogrammetry.
The investigation has been carried out within the framework of the project
“Development of a machine vision system for determining the position of objects in space
based on fiducial markers” with the support of a subvention from the National Research University
“MPEI” for implementation of the internal research program “Priority 2030: Future Technologies” in 2022-2024.
[1]
Luhmann T.,
Robson S., Kyle S., Boehm J. Close-range photogrammetry and 3D imaging //
Close-Range Photogrammetry and 3D Imaging, de Gruyter, 2019. 708 p.
(doi:10.1515/9783110607253)
[2]
Meyer R.,
Kirmse T., Boden F. Optical in-flight wing deformation measurements with the
image pattern correlation technique // New Results in Numerical and
Experimental Fluid Mechanics IX. Springer, Cham, 2014. Vol. 124. Pp. 545–553.
(doi:10.1007/978-3-319-03158-3_55)
[3]
Kirmse T.
Recalibration of a stereoscopic camera system for in-flight wing deformation
measurements // Meas. Sci. Technol., 2016. Vol. 27. ¹ 5. P. 54001.
(doi:10.1088/0957-0233/27/5/054001)
[4]
Boden F.,
Lawson N., Jentink H.W., Kompenhans J. Advanced In-Flight Measurement
Techniques, 2013. 344 p. (doi: 10.1007/978-3-642-34738-2)
[5]
Poroykov
A.Yu., Surkov D.A., Ulyanov D.B., Ilyinac N.S., Shmatko E.V., Pinchukov V.V.
Development of an on-board measuring system for diagnosing deformation of
aerodynamic surfaces in a flight experiment // Journal of communications
technology and electronics, 2021. Vol. 66. ¹ 11. Pp. 1274–1281.
(doi:10.1134/S1064226921110073)
[6]
Raffel M.,
Willert C.E., Scarano F. et al. Particle Image Velocimetry: A Practical Guide
// Berlin: Springer, 2018. 669 p. (doi:10.1007/978-3-319-68852-7)
[7]
Schreier
H., Orteu J.-J., Sutton M.A. Image Correlation for Shape, Motion and
Deformation Measurements // Springer: New York, NY, USA, 2009. 322 p.
(doi:10.1007/978-0-387-78747-3)
[8]
Grant I.,
Pan X. The use of neural techniques in PIV and PTV // Meas. Sci. Technol.,
1997. Vol. 8. ¹. 12. Pp. 1399–1405. (doi:10.1088/0957-0233/8/12/004)
[9]
Rabault J.,
Kolaas J., Jensen A. Performing particle image velocimetry using artificial
neural networks: a proof-of-concept // Meas. Sci. Technol., 2017. Vol. 28. ¹12.
P. 125301. (doi:10.1088/1361-6501/aa8b87)
[10]
Lee Y.,
Yang H., Yin Z. PIV-DCNN: cascaded deep convolutional neural networks for
particle image velocimetry // Experiments in Fluids, 2017. Vol. 58. ¹ 12. P.
171. (doi:10.1007/s00348-017-2456-1)
[11]
Sun Y.,
Wang X., Tang X. Deep Convolutional Network Cascade for Facial Point Detection
// 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013. Pp.
3476–3483. (doi:10.1109/CVPR.2013.446)
[12]
Dosovitskiy
A. et al. FlowNet: Learning Optical Flow with Convolutional Networks // 2015
IEEE International Conference on Computer Vision (ICCV), 2015. Pp. 2758–2766.
(doi:10.1109/ICCV.2015.316)
[13]
Cai S.,
Zhou Sh., Xu Ch., Gao Q. Dense motion estimation of particle images via a
convolutional neural network // Experiments in Fluids, 2019. Vol. 60. ¹. 73. (doi:10.1007/s00348-019-2717-2)
[14]
Brunton
S.L., Noack B.R., Koumoutsakos P. Machine learning for fluid mechanics //
Annual review of fluid mechanics, 2020. Vol. 52. Pp. 477–508.
(doi:10.1146/annurev-fluid-010719-060214)
[15]
Znamenskaya
I.A. Methods for Panoramic Visualization and Digital Analysis of Thermophysical
Flow Fields. A Review. // Scientific Visualization, 2021. Vol. 13. ¹ 3.
Pp. 125–158. (doi:10.26583/sv.13.3.13)
[16]
Gim Y.,
Jang D.K., Sohn D.K., Kim H., Ko H.S. Three-dimensional particle tracking
velocimetry using shallow neural network for real-time analysis // Experiments
in Fluids, 2020. Vol. 61. P. 26. (doi:10.1007/s00348-019-2861-8)
[17]
Cai S.,
Liang J., Gao Q., Xu C., Wei R. Particle Image Velocimetry Based on a Deep
Learning Motion Estimator // IEEE Transactions on Instrumentation and
Measurement, 2020. Vol. 69. ¹ 6. Pp. 3538–3554.
(doi:10.1109/TIM.2019.2932649)
[18]
Yoo J. C.,
Han T.H. Fast Normalized Cross-Correlation // Circuits, systems and signal
processing, 2009. Vol. 28. ¹. 6. P. 819. (doi: 10.1007/s00034-009-9130-7)
[19]
Caffe |
Deep Learning Framework [Electronic resource] URL:
https://caffe.berkeleyvision.org (accessed 16.11.2022).
[20]
PyTorch
[Electronic resource] URL: https://pytorch.org (accessed 16.11.2022).
[21]
Google Colab
[Electronic resource] URL: https://colab.research.google.com (accessed
16.11.2022).
[22]
Pinchukov
V.V., Poroykov A.Yu., Shmatko E.V., Bogachev A.D., Sivov N.Yu. Comparison of
the neural networks with cross-correlation algorithm for the displacements on
images estimation // 2022 Wave Electronics and its Application in Information
and Telecommunication Systems (WECONF), 2022. Pp. 1–5.
(doi:10.1109/WECONF55058.2022.9803453)
[23]
Shmatko
E.V., Pinchukov V.V., Bogachev A.D., Poroykov A.Yu. Crosscorrelation image processing
for surface shape reconstruction using fiducial markers // Journal of Physics:
Conference Series, 2021. Vol. 2127. P. 012030.
(doi:10.1088/1742-6596/2127/1/012030)
[24]
Ivanova
Y.V., Poroykov A.Y. Estimation of the measurement error of photogrammetric techniques
by controlled flexible deformable surface // 2019 International Youth
Conference on Radio Electronics, Electrical and Power Engineering (REEPE),
IEEE, 2019. Pp. 1–5. (doi:10.1109/REEPE.2019.8708779)