Mobile QR Code QR CODE : The Transactions of the Korean Institute of Electrical Engineers

The Transactions of the Korean Institute of Electrical Engineers

ISO Journal TitleTrans. Korean. Inst. Elect. Eng.

Main Menu

Journal Search

[

Research article

]

The Transactions of the Korean Institute of Electrical Engineers

KIEE Vol. 70, No. 10, p.1488-1497

ISSN (print) :

1975-8359

ISSN (online) :

2287-4364

Received : 1 July 2021Revised : 10 September 2021Accepted : 28 September 2021

DOI :

http://doi.org/10.5370/KIEE.2021.70.10.1488

Former Unmanned Surface Vehicle Detection Based on Improved Convolutional Neural Network

단안 카메라를 사용하는 향상된 SSD 기반의 전방 무인수상선 탐지 연구

BangqianAo (Bangqian Ao) ¹iD 김동헌 (Dong Hun Kim) ^†iD

(Dept. of Mechatronics Engineering, Kyungnam University, Korea. E-mail : aobangqian@163.com )

^†Corresponding Author : Dept. of Electrical Engineering, Kyungnam University, Korea.

E-mail : dhkim@kyungnam.ac.kr

License :

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.(www.kiee.or.kr).

Abstract

This paper proposes an approach to the real-time implementation of a convolutional neural network (CNN)-based object detector for a former Unmanned Surface Vehicle (USV). The original network VGG-16 of the Single Shot MultiBox Detector (SSD) is first replaced with ResNet-18, as the basic feature extraction network. The classifying network is then redesigned by reducing half of the convolutional kernel numbers, where kernel sizes of 1×1 and 3×3 are mainly used. Simultaneously, a monocular camera installed on the tracking system, is used to calculate the distance and azimuth of the former USV. The experimental results show that the proposed method has advantages of higher accuracy and lower computational complexity, compared with other existing approaches. Therefore, the proposed approach can be efficiently used on real-time tracking systems.

Key words

Object detection, Accuracy, Tracking system, Monocular camera

1. Introduction

Traditional moving object detection methods generally use the background subtraction technique, which separates the objects from the background by comparing each frame with the background. Background subtraction algorithms utilize the frame difference methods to take the previous frames as the background, and consider the newest one as the object. Other methods use the Haar cascade classifiers ⁽¹⁾, that implement the AdaBoost algorithm. The latter is organized as a screening cascade classifier, where each node is a classifier that comprises multiple trees. At any level, the calculation is terminated once the conclusion of “not in the category” is obtained. This algorithm proved to be fast but not sensitive to the stable or moving slowly objects, and is unable to detect the inner pixels of the moving objects that are not uniformly colored.

Other object detection methods have been also available. For instance, the mean shift ⁽²⁾ algorithm can quickly find the targets using few iterations, and it generally achieves good detection results. However, it cannot solve the occlusion problem of the target and cannot adapt to the shape and size change of the moving target. The cam-shift ⁽³⁾ algorithm is an improved version which can adapt to the size and shape change of the moving target, leading to good detection results. However, the target area can easily be made larger, which may eventually lead to losing the former target when the colors between the background and the target are close. In addition, the Kalman filter ⁽⁴⁾ is a method that considers the object’s motion to obey the Gaussian model. It predicts the target’s motion state and compares it with that of the observation model, in order to update the state of the moving target according to the error. However, this method has a low accuracy. The particle filtering algorithm ⁽⁵⁾ re-samples the particle distribution through the current detecting results, and diffuses the particles according to their distribution. The algorithm re-observes the target state through the diffusion result and finally normalizes and updates the target state. However, in this algorithm, the large samples are needed to approximate the posterior system probability, and the re-sampling stage would cause loss of sample validity and diversity, which could lead to sample depletion. These algorithms have a high performance in several applications. However, they are time consuming, and they do no attain real-time effects because of their high computational cost. In addition, they are not sufficiently robust when the environment changes or a local distortion of the object features happens because of sudden changes in the light intensity ⁽⁶⁾, ⁽⁷⁾. Occlusion and background noise also make the object detector much more complex ⁽⁸⁾, ⁽⁹⁾. Because of all these negative factors, the detecting system design becomes more challenging, while both real-time requirements and computing power limitations should be considered.

The remainder of this paper is organized as follows. Section 2 summarizes the existing algorithms, provides some of their applications and analyzes their advantages and disadvantages. Section 3 presents the elaborately designed M-SSD model structure while setting its key parameters. It also shows the training and validation of the proposed algorithm, with its performance evaluation. The monocular camera algorithm is implemented and compared with other instruments in Section 4. Finally, conclusions and perspectives are drawn in Section 5.

2. Related works

Deep learning (DL) ⁽¹⁰⁾ and convolutional neural network (CNN) ⁽¹¹⁾ have been rapidly developed, and demonstrated high efficiency in classification and image recognition. They became crucial solutions in several application domains, especially in computer vision and object detection. CNN is a basic net structure composed of a feature extractor and a classifier. In the past decades, a great progress has been made on CNN-based systems. Object detection (or tracking) is a basic problem in computer vision.

Several classical efficient object detectors have been recently proposed. For instance, the Region-based CNN (R-CNN) ⁽¹²⁾ applies a high-capacity CNN to the bottom-up candidate area, in order to localize and segment objects. The fast R-CNN ⁽¹³⁾ and faster R-CNN ⁽¹⁴⁾ speed up the approach of R-CNN and improve its accuracy. The Region-based Fully Convolutional Network (R-FCN) ⁽¹⁵⁾ uses a special convolutional layer to construct a position-sensitive score map. It introduces translated changes to the Fully Convolutional Network (FCN), with which each space-sensitive map encodes the relative spatial position information of the region of interest, and a position-sensitive region of interest pooling layer is added on the FCN to supervise these score maps.

The You Only Look Once (YOLO) ⁽¹⁶⁾ method divides a single image into multiple grids, and then performs localization and classification in each grid, in order to predict the confidence and location for multiple categories. The Single Shot MultiBox Detector (SSD) ⁽¹⁷⁾ obtains actual bounding boxes and scores for each feature map, by creating bounding box candidates on the feature map. It is based on a proposal with multi-scale features, while achieving a balance between efficiency and effect. The SSD can almost achieve a real-time performance from the perspective of computing speed. However, it remains a significant challenge with a limited computation platform, in several real-time applications.

In all these object detectors, two criteria are used to judge whether the system is practical or not: (a) computational complexity and (b) detection accuracy. Generally, a higher accuracy requires more CNN layers to get more features, which inevitably increases the computation complexity, and vice versa. It is hard to balance between accuracy and speed. Hence, designing a real-time object detecting or tracking system, with a limited computing platform, is still a challenging problem.

3. Object detection algorithm

3.1 The M-SSD model

In this paper, an improved SSD model is designed. The SSD approach produces a fixed-size collection of bounding boxes and scores in the presence of object class instances, using a feed-forward convolutional network, followed by a non-maximum suppression step to perform the object detection. The model utilizes the visual geometry group (VGG-16) as its basic structure. However, it casts away the last fully connected layers, adds a set of auxiliary convolutional layers to extract features at multiple scales, and decreases the input size to each subsequent layer. It can improve the detection accuracy of small objects, compared with other existing algorithms. For this kind of model structure, the number of network architecture weights is large, and much disk space is required. Furthermore, the detecting speed is slow. Therefore, it is not suitable for limited computing platforms and small-storage real-time detection systems.

Wei Liu ⁽¹⁷⁾ analyzed the SSD model structure and pointed out that the forward time is costed mainly on the base network (i.e. nearly 80%). Therefore, for real-time applications, using a faster basic network can reduce the amount of calculation and greatly improve the speed. ResNet ⁽¹⁸⁾ was first proposed by Kaiming He and proven to be an efficient network. Lili Chen ⁽¹⁹⁾ replaced the basic feature extraction model to ResNet-34 and got fast detection speed on vehicle counting. Note that, in our single former USV object detection system, it is unnecessary to utilize too many network layers for feature extraction. We choose ResNet-18 as its basic feature extraction network, in order to obtain a real-time detection performance.

The whole model structure of ResNet-18 comprises a convolutional layer, four basic block layers and a final fully connected layer, which is shown in detail in Fig. 1. This structure avoids the problem of gradient disappearance caused by the deepening of the neural network layers. Its efficiency has also been simultaneously improved due to the introduced basic blocks.

Fig. 1. The flowchart of ResNet-18

In real-time object detecting tasks, large-sized and excessive- convolution kernels increase the computational cost, dilute the effective features and reduce the real-time control accuracy. The authors of ⁽²⁰⁾, ⁽²¹⁾ prove that the kernel sizes of 1×1 and 3×3 have fewer parameters but stronger feature generalization abilities than the 5×5 and 7×7 kernel size. In addition, a block of two convolutional layers with a 3×3 kernel size plays the same role as one 5×5 convolutional layer, as the convolutional window is scanning the input. The original throughput is kept. However, it results in a lighter number of parameters, while the stacked convolutional layers yield a better result.

Table 1. Parameters of M-SSD from FC6 to conv9_2 layer

Layer	Input size	Output size	Kernel size	Input channel	Output channel
FC6	38×38	19×19	3×3	256	512
FC7	19×19	19×19	1×1	512	512
conv6_1	19×19	10×10	1×1	512	256
conv6_2	19×19	10×10	3×3	128	256
conv7_1	10×10	5×5	1×1	256	128
conv7_2	10×10	5×5	3×3	64	128
conv8_1	5×5	3×3	1×1	128	128
conv8_2	5×5	3×3	3×3	64	64
conv9_1	3×3	1×1	3×3	128	128
conv9_2	3×3	1×1	1×1	64	64

Inspired by these literature methods, two modifications are performed herein compared with the original SSD model: (a) we retain the SSD structure, use ResNet-18 as the basic feature extraction network, but discard the VGG-16, followed by some convolutional layers to detect the object; (b) we replace the convolutional kernels from FC6 to conv9_2 layers and use convolutional kernels of 1×1 size to classify the object. The layer’s specification from FC6 to conv9_2 is presented in Table 1 in detail, while the M-SSD model structure is presented in Fig. 2.

Fig. 2. The overall structure of the M-SSD

In contrast to the SSD model, we choose the layers of res3d, fc6, fc7, conv6_1, conv7_1, conv8_1 and conv9_1 as the regression feature map layers to classify the object. In each feature map layer, 1×1 represent the size of the convolutional kernel, 3 or 6 represents the numbers of prior box and 4 represents the values of the bounding box.

Afterwards, the M-SSD model parameters are set for the proposed real-time detection system as follows:

Ⅰ. Select default box parameters: the feature maps located in different layers have different sizes of receptive fields in a CNN. To correctly detect targets with different scales when they are moved, some algorithms convert the input image to different scales, then process the converted image and fuse the detection results ⁽²²⁾, ⁽²³⁾. The strategy proposed in ⁽²⁴⁾ is based on the fact that the default frame does not need to be mapped one to one with the feature map receptive.

The default frame at different positions corresponds to different regions and target sizes. Assuming that $m$ feature maps should be predicted, the default frame size in each feature map is calculated as:

(1)

$S_{i}=S_{\min}+\dfrac{S_{\max}+S_{\min}}{m-1}(i-1)$, $i\in[1,\: m]$

where $S_{\min}$ is the default frame size of the lowest layer having a value of 0.1 and $S_{\max}$ is the default frame size of the highest layer having a value of 0.96 in the network structure.

The different layers are sorted at regular intervals. The width- to-height ratio of the default frame is $a_{r}\in\{1,\: 2,\: 3,\: 1/2,\: 1/3\}$. The width and height of each default frame are respectively given by:

(2)

$W^{a}_{i}=S_{i}\sqrt{a_{r}}$, $h^{a}_{i}=S_{i}/\sqrt{a_{r}} $

Ⅱ. Choose the matching strategy: this strategy selects the default box for each true label box to match it when it generates the M-SSD detection model. It then finds the highest Jaccard for each true label from all the candidate default boxes, by re-adjusting the Jaccard overlap coefficient.

Ⅲ. Select the loss function: Softmax $l_{i}= -\log(e^{S_{y_i}}/\sum_{j} e^{S_{j}})$ is selected as the loss function, $S_{j}$ is the score of class $j$ and $y_{i}$ is the true label of the real object. Then the formula for the total loss function $L$ is as follows:

(3)

$L=\sum_{i=1}^{N}l_{i}$

where $N$ is the total number of images.

An object function always exists during model training. We should optimize the loss function to minimize the loss value until the value becomes the lowest. The M-SSD training model is developed based on the TensorFlow deep learning framework.

Based on this design, the algorithm complexity is reduced. The advantage of the proposed design will be shown in the following comparative analysis.

The reason why we use SSD for object detection is because the SSD network framework is designed to be independent of the basic network and is used to accurately classify and locate targets. It can run on any basic network(such as VGG, ResNet, MobileNet). Therefore, we can use different basic networks for neural network learning and different regression layers(from 6 to 8) to estimate their accuracy. It is a very useful neural network framework to improve the detection accuracy and speed. YOLO and its improved edition YOLO v3, YOLO v5 have been proposed for multiple objects detection. But, for real-time detection, they are especially performed for tasks on mobile terminal. SSD network framework is still a better choice, since its performance in terms of comprehensive consideration of accuracy and speed is particularly outstanding when used as a network with light structure to detect objects.

3.2 M-SSD model training/testing

The next step consists in training/testing the proposed M-SSD model for object detection. The hardware specifications of the experiment environment are shown in Table 2. CPU is used to train the M-SSD model with 16G RAM. The GPU can highly improve the training speed. Note that some Library Functions of CUDA 10.0/CUDNN 8.0.0, and some platforms such as Python 3.6/TensorFlow 1.8, are used to quickly and effectively train the model. The trained model runs on Ubuntu 18.04 operating system, using a camera to capture real-time objects with a resolution of 1024×768.

Table 2. Hardware specification

Hardware device	Parameter
CPU	Inter(R) Core(TM) i7-8750H
RAM	16GB
GPU	NVIDIA GeForce GTX1060
Operate system	Ubuntu 18.04
CUDA/CUDNN	CUDA 10.0/CUDNN 8.0.0
Platform	Python, TensorFlow
Camera	USB HD, resolution1024×768

Table 3. The parameters initialization

Parameters	Value
base_lr	0.0001
max_iter	50000
Ir_policy	Step
Gamma	0.1
Momentum	0.9
weight_decay	0.0005
image_size	300×300
Type	SGD
BN	32

An image database containing 2000 images was built. These images were collected under different external environments and illumination intensities, with a ratio of 3:1 (positive images, including the USV: negative images without the object). A part of the images was flipped, stretched or compressed to enhance the data set universality. Accordingly, 80% of the images were used for training. The remaining 20% were used for the network testing. In the base network, the images captured by the camera were re-sized to 300×300 before inputting them to the net structure model. The model is trained using stochastic gradient descent (SGD) with a 0.0001 initial learning rate (base_lr), 0.9 momentum, 0.0005 weight decay and a batch normalization (BN) of 32. The network was trained for 50,000 iterations and successfully converged. Other parameters are detailed in Table 3.

A part of the labeled images for training/validating is illustrated in Fig. 3. The experiment is implemented in a pool area of Kyungnam University in South Korea. The training/validating accuracy of the proposed model is presented in Fig. 4. It can be seen that the classification accuracy can reach 96.75%. Some classification and accuracy results, in the case of a successful detection, are shown in Fig. 5.

To evaluate the performance of the proposed detection system, the following four evaluation criteria are used:

(4)

$pr ecision=\dfrac{TP}{TP+FP}×100\% $

(5)

$recall=\dfrac{TP}{TP+FN}×100\% $

(6)

$accuracy=\left(1-\dfrac{1}{n}\right)×100\%$

(7)

$F1=\dfrac{2×pr ecision×recall}{pr ecision+recall}×100\%$

where $a$ and $n$ respectively represent the number of misclassified samples and the total number of samples, TP (true positive) refers to a positive sample which is predicted to be a correct result, FP (false positive) refers to a negative sample which is predicted to be a false alarm, FN (false negative) refers to a positive sample which is predicted to be a missed detection, and TN (true negative) refers to a negative sample which is predicted to be negative.

Fig. 3. Part of the images used for training

The proposed M-SSD model is compared with SSD ⁽¹⁰⁾, R-SSD ⁽²⁴⁾ and F-SSD ⁽¹⁸⁾, using the previously mentioned four parameters; precision, recall, accuracy and F1. The results are shown in Fig. 6. It can be observed that the proposed M-SSD model results in a higher detection performance than SSD, which can reach an accuracy of 96.75\%. This is due to the fact that ResNet-18, which has a stronger feature extraction residual structure, is used to extract the basic feature infor- mation. However, M-SSD has a lower detection performance than R-SSD and F-SSD. This is due to the fact that the proposed model has fewer layers than R-SSD of ResNet-50 and F-SSD of ResNet-34. This inversely proves that a higher accuracy requires deeper network layers. However, this does not mean that a higher accuracy results in a better detection performance. The computation time, given in Table 4, is another parameter for performance estimation. It can be seen from Table 4 that the computation time of the proposed M-SSD model is 424.36s, which is 26.35% less than that of the SSD model, and much less than that of R-SSD and F-SSD. The proposed design improves the detection performance and the detection speed. It can also be implemented on mobile terminals, such as Rasberry Pi and Jetson Nano, for example.

Fig. 4. Accuracy results of the proposed model

Fig. 5. Output of the M-SSD testing

For our collected USV data set, the FPS of SSD is about 67 with the input resolution 300×300, and the FPS of our proposed M-SSD model is about 86 with the same input resolution. When we download the trained file to the mobile terminal Jeston Nano, the FPS of our proposed model is about 32, which achieves real-time former USV detection.

Fig. 6. Performance comparison of different models

Table 4. Computation time of the methods (s)

Method	Basic network	Time
SSD ⁽¹⁰⁾	VGG-16	576.25
R-SSD ⁽²⁴⁾	ResNet-50	824.36
F-SSD ⁽¹⁸⁾	ResNet-34	720.64
M-SSD	ResNet-18	424.36

Part of the failure detection images are shown in Fig. 7. It can be observed that the unobvious characteristics and sharp changes of the ambient light around the detected object, may cause a failure detection. Another labeled image data set is used to verify our conjecture and to train a high accuracy model for further studies. This collected data set mainly comprises images that we previously failed to detect, as well as images collected under the situation of a similar environment. A part of the new data set is shown in Fig. 8. The re-train loss for the new data set is presented in Fig. 9. It can be seen from Fig. 9that the training loss is slightly high during the re-training process. We are not able to obtain a better train loss after 50,000 iterations. This is due to the fact that the basic net structure cannot obtain more features of the USV object to train the model, because of an unclear feature data set. This leads to a low object classification.

In summary, for blurred or unclear images, the network cannot learn enough features and the loss function can not converge to zero. It is concluded that images with clear features are required to train the model and then the network models can achieve good accuracy. For former object detection, the paper gets higher detection accuracy and faster speed than original SSD model through replacing the basic network VGG-16 with ResNet-18 and utilizing 1×1 as the convolutional kernel to return 6 feature maps. Although there is no significant improvement in accuracy, the computational time is reduced 26.35% less than former SSD structure. In addition, it has an advantage in that it can utilize a network with reduced computing efficiency.

Fig. 7. Part of the failure detection images

Fig. 8. New data set for training images

Fig. 9. The re-train loss for the new data set

4. Former USV measurement

In the proposed detection system, the former USV is already detected in the image. Afterwards, we need to measure the distance and the orientation of the former object. Once the object is detected in an image (cf. Fig. 10), the network returns a boundary box over the special object that contains the object position.

The four-pixel values of the object position, namely ($x_{\min}$, $y_{\min}$), ($x_{\min}$, $y_{\max}$), ($x_{\max}$, $y_{\min}$), and ($x_{\max}$, $y_{\max}$), are shown in Fig. 10. The object center is obtained from the pixel values of the object as:

(8)

$x=\left(\dfrac{x_{\min}+x_{\max}}{2}\right)×IM＿WIDTH$

(9)

$y=\left(\dfrac{y_{\min}+y_{\max}}{2}\right)×IM＿HEIGHT$

where IM_WIDTH and IM_HEIGHT represent the image width and height, respectively.

As for the monocular camera ⁽²⁵⁾, we assume that the camera coordinates of the focal point of the optical axis are depicted to the phase plane. Furthermore, the pixels dimensions corresponding to the $x$ and $y$ axes on the phase plane are $dx$ and $dy$, respectively. $(u,\: v)$ is the camera coordinate on the coordinate axis, while $(u_{0,\:}v_{0})$ is the initial coordinate. The distance $d$ between the object and the camera plane is calculated as:

(10)

$d=h/t\mathrm{g}(a r ct\mathrm{g}((y-y_{0})/f)+\alpha)$

(11)

$u=u_{0}+x/dx$, $v=v_{0}+y/dy$

For simplicity, we assume that the center of the image plane $(x_{0,\:}y_{0})$ is such that $x_{0}=y_{0}=0$. Using Eq. (10) and (11) and some simple calculations, we can conclude that:

(12)

$d=h/t\mathrm{g}(a r ct\mathrm{g}(v-v_{0})/a_{y}+\alpha)$

(13)

$a_{y}=f/dy$

where $f,\:\alpha$ and $h$ represent the focal length, tilt angle and optical center height of the camera, respectively. The distance is then obtained.

The principle of the camera is shown in Fig. 14. Using orientation measurement ⁽²⁶⁾, we obtain:

(14)

$\beta =(F_{c}/\omega)\beta^{I}\cos\phi$

where $F_{c}$ and $\omega$ represent the camera wide angle and the camera image width, respectively. $\beta^{I}$ and $\phi$ denote the pixel distance from the target to the centerline of the image plane and the inclination of the phase plane in the horizontal direction, respectively.

The relative deflection orientation of the target ($\beta$) in the direction perpendicular to the camera plane, can be obtained using Eq. (14). The flowchart of the proposed M-SDD-based control system is presented in Fig. 13. According to the previously mentioned design and calculation, the detecting results obtained by the proposed algorithm are shown in Fig. 12, with the detecting class accuracy as well as the distance and azimuth information.

Fig. 10. Object location with a boundary box

Fig. 11. The principle of the monocular camera

High-precision instruments (Lidar and gyroscope) are used to evaluate the accuracy of the distance and angle measured by the proposed algorithm. The separated values are measured and compared with the results of the data information detected by the monocular camera.

It can be seen from Table 5 that the error is less than 1.6%, compared with the true values. Table 6 shows a comparison of the orientation values between the gyroscope (Gyc) and the camera (Cam). It can be observed that an error below 1.2% is obtained. This error is absolutely acceptable in real-time applications. It can also be seen that the error increases with the distance increase.

Fig. 12. Image of real-time distance and angle

Fig. 13. M-SDD-based control system flowchart

Table 5. Distance value (m) compared with Lidar

order	1	2	3	4	5	6	7	8
Lidar	4.0	4.5	4.8	5.0	5.2	5.5	5.8	6.0
Cam	4.02	4.51	4.78	4.97	5.16	5.44	5.75	5.90

Table 6. Orientation value compared with gyroscope

order

Gyc

$15^{\circ} 18^{\prime}$

$18^{\circ} 42^{\prime}$

$20^{\circ} 24^{\prime}$

$25^{\circ} 12^{\prime}$

$30^{\circ} 24^{\prime}$

$33^{\circ} 48^{\prime}$

$37^{\circ} 24^{\prime}$

$40^{\circ} 12^{\prime}$

Cam

$15^{\circ} 08^{\prime}$

$18^{\circ} 27^{\prime}$

$20^{\circ} 04^{\prime}$

$24^{\circ} 40^{\prime}$

$30^{\circ} 02^{\prime}$

$33^{\circ} 12^{\prime}$

$36^{\circ} 50^{\prime}$

$39^{\circ} 26^{\prime}$

5. Conclusion

In this paper, an improved Single Shot MultiBox Detector for former object detection with its distance and orientation mea- surement, is proposed. After training and validating with the modified model, the proposed method detected the former USV with a faster detection rate and a higher detection performance than that of the original SSD model. The average accuracy of USV object detection reached 96.75%, which is sufficiently robust for later tracking tasks. Simultaneously, the monocular camera installed on the tracking system, calculates the distance and orientation information of the former USV in real-time. As a result, the relative location information error is less than 3% of the former object, compared with the true values obtained through experiments. This can fully meet the tracking system design. In future work, an analysis of the tracking system according to the theoretical results obtained herein, is required. This would be beneficial, since the design of the target detecting and tracking system is more conducive to the future development of the marine environment.

Acknowledgements

This work was supported by Kyungnam University Foundation Grant, 2021.

References

P. Goel, S. Agarwal, 2012, Hybrid Approach of Haar Cascade Classifiers and Geometrical Properties of Facial Features Applied to Illumination Invariant Gender Classification System, 2012 International Conference on Computing Sciences, pp. 132-136

G. B. Li, H. F. Wu, 2011, Weighted fragments-based mean shift tracking using color-texture histogram, Journal for ComputerAided Design and Computer Graphics, Vol. 12, No. 12, pp. 2059-2066

D. Exner, E. Bruns, D. Kurz, A. Grundhofer, O. Bimber, 2010, Fast and robust CAMShift tracking, In Proceeding of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 9-16

G. Bishop, G. Welch, 2010, An introduction to the Kalman filter, Proc of SIGGRAPH, Course, Vol. 8, No. 41, pp. 27599-23175

K. Nummiaro, E. Koller-Meier, G. L. Van, 2003, An adaptive color-based particle filter, Image and Vision computing, Vol. 21, No. 1, pp. 99-110

J. Fan, W. Xu, Y. Wu, Y. Gong, 2010, Human Tracking Using convolutional neural networkss, in IEEE Transactions on Neural Networks, Vol. 21, No. 10, pp. 1610-1623

J. Zhu, Y. Lao, Y. F. Zheng, 2010, Object Tracking in Structured Environments for Video Surveillance Applications, in IEEE Transactions on Circuits and Systems for Video Technology, Vol. 20, No. 2, pp. 223-235

D. Koller, J. Weber, J. Malik, 1994, Robust multiple car tracking with occlusion reasoning, Proc. Third European Conference on Computer Vision, pp. 189-196

L. Vasu, D. M. Chandler, 2010, Vehicle tracking using a human-vision-based model of visual similarity, 2010 IEEE Southwest Symposium on Image Analysis & Interpretation (SSIAI), Vol. , No. , pp. 37-40

P. N. Druzhkov, V. D. Kustikova, 2016, A survey of deep learning methods and software tools for image classification and object detection, Pattern Recognition. Image Anal., Vol. 26, No. 1, pp. 9-15

A. Krizhevsky, I. Sutskever, G. E. Hinton, 2012, Image net classification with deep convolutional neural networkss, Advances in neural information processing systems, Vol. 25, pp. 1097-1105

R. Girshick, J. Donahue, T. Darrell, J. Malik, 2014, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 580-587

R. Girshick, 2015, Fast r-cnn, Proceeding of the IEEE international conference on computer vision., pp. 1440-1448

S. Ren, K. He, R. Girshick, J. Sun, 2017, Faster R-CNN: Towards real-time object detection with region proposal networks, IEEE Trans. Pattern Anal. Mach. Intell., Vol. 39, No. 6, pp. 1137-1149

J. Dai, Y. Li, K. He, J. Sun, 2016, R-FCN: Object detection via region-based fully convolutional networks, in Proc. NIPS, pp. 379-387

J. Redmon, S. Divvala, R. Girshick, A. Farhadi, 2016, You only look once: Unifified, real-time object detection, in Proc. IEEE Conf. Computation. Vis. Pattern Recognition. (CVPR), pp. 779-788

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, A. C. Berg, 2016, SSD: Single shot Multi-Box detector, in Proc. ECCV, pp. 21-37

K. He, X. Ren, S., 2016, Deep residual learning for image recognition, IEEE Conf. On Computer Vision and Pattern Recognition, pp. 770-778

L. Chen, Z. Zhang, L. Peng, 2018, Fast single shot multibox detector and its application on vehicle counting system, IET Intelligent Transport Systems, Vol. 12, No. 10, pp. 1406-1413

L. Gao, P. Chen, S. Yu, 2016, Demonstration of convolution kernel operation on resistive cross-point array, in IEEE Electron Device Letters, Vol. 37, No. 7, pp. 870-873

S. Ozturk, U. Ozkaya, B. Akdemir, L. Seyfi, 2018, Convolution kernel size effect on convolutional neural networks in histopathological image processing Applications, 2018 International Symposium on Fundamentals of Electrical Engineering (ISFEE), pp. 1-5

P. Sermanet, D. Eigen, X. Zhang, M. Michael, R. Fergus, Y. LeCun, 2017, OverFeat: Integrated recognition, localization and detection using convolutional networks, in IEEE Electron Device Letters, pp. 256-260

K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun, 2015, Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 37, No. 9, pp. 1904-1916

W. Pei, Y. M. Xu, Y. Y. Zhu, P. Q. Wang, M. Y. Lu, F. Li, 2019, The target detection method of aerial photography images with improved SSD, Ruan Jian Xue Bao/Journal of Software, Vol. 30, No. 3, pp. 738-758

J. W. Chu, L. S. Ji, L. Guo, B. B. Li, R. B. Wang, 2004, Study on method of detecting preceding vehicle based on monocular camera, IEEE Intelligent Vehicles Symposium, pp. 750-755

J. Park, Y. Cho, B. Yoo, J. Kim, 2015, Autonomous collision avoidance for unmanned surface ships using on board monocular vision, OCEANS 2015 MTS/IEEE Washington, pp. 1-6

저자소개

Bangqian Ao

He received his B.S. degree in Department of Mathematics and Computational Science from Xiangtan University, Xiangtan, Hunan, China, in 2008, and his M.S. degree in Department of Electronic Technology from Central South University, Changsha, Hunan, China, in 2011.

He is currently studying for a PhD degree at Kyungnam University.

His research interest include computer vision, intelligent control technology.

김동헌(Dong Hun Kim)

He received his BS, MS and PhD degrees from the Department of Electrical Engineering, Hanyang University, Korea, 1n 1995, 1997 and 2001, respectively.

From 2001 to 2003, he was a Research Associate under several grants in the Department of Electrical and Computer Engineering, Duke University, NC, USA.

In 2003, he joined Boston University, MA, USA as Visiting Assistant Professor under several grants in the Department of Aerospace and Mechanical Engineering.

In 2004, he was engaged in Post-doctoral Research at the School of Information Science and Technology, University of Tokyo, Japan.

Currently, he is a Professor with the Division of Electronic and Electrical Engineering, Kyungnam University, South Korea.

His research interests include swarm intelligence, self-organization of swarm system, mobile robot path planning, decentralized control of autonomous vehicles, intelligent control and adaptive non-linear control.

KIEEThe Transactions of
the Korean Institute of Electrical Engineers

The Transactions of the Korean Institute of Electrical Engineers

ISO Journal TitleTrans. Korean. Inst. Elect. Eng.

Journal Search

Journal XML

Journal Information

단안 카메라를 사용하는 향상된 SSD 기반의 전방 무인수상선 탐지 연구

Abstract

Key words

1. Introduction

2. Related works

3. Object detection algorithm

3.1 The M-SSD model

(1)

(2)

(3)

3.2 M-SSD model training/testing

(4)

(5)

(6)

(7)

4. Former USV measurement

(8)

(9)

(10)

(11)

(12)

(13)

(14)

5. Conclusion

Acknowledgements

References

저자소개

Bangqian Ao

김동헌(Dong Hun Kim)

Article Information (continued)

Key words

KIEEThe Transactions ofthe Korean Institute of Electrical Engineers

The Transactions of the Korean Institute of Electrical Engineers

ISO Journal TitleTrans. Korean. Inst. Elect. Eng.

Journal Search

Journal XML

Journal Information

단안 카메라를 사용하는 향상된 SSD 기반의 전방 무인수상선 탐지 연구

Abstract

Key words

1. Introduction

2. Related works

3. Object detection algorithm

3.1 The M-SSD model

(1)

(2)

(3)

3.2 M-SSD model training/testing

(4)

(5)

(6)

(7)

4. Former USV measurement

(8)

(9)

(10)

(11)

(12)

(13)

(14)

5. Conclusion

Acknowledgements

References

저자소개

Bangqian Ao

김동헌(Dong Hun Kim)

Article Information (continued)

Key words

KIEEThe Transactions of
the Korean Institute of Electrical Engineers