2.1 Literature review on big data architecture design
In order to prevent such railway accidents and industrial safety accidents, prior
studies such as failure analysis and reliability evaluation methods such as FTA (Fault
Tree Analysis), big data and artificial intelligence technologies, and big data architecture
design have been conducted.
We will now introduce prior research and related technologies. First, in previous
studies for big data railway architecture, the big data railway safety platform architecture
was designed by dividing it into five parts: collection, API gateway, preprocessing,
storage, and analysis [3].
Additionally, in the service layer-centered big data architecture research, the architecture
was divided into four layers: the storage layer, processing layer, service layer,
and ingestion layer for predictive maintenance of railroad points.
Furthermore, the main functions of the architecture are to collect data from external
sources, process and retrieve collected data, and perform data aggregation, modeling,
analysis, and visualization, with the architecture designed based on Hadoop and Apache
NIFI [4]. FTA is a quantitative failure analysis and reliability evaluation method that uses
FT (Fault Tree), which logically expresses the relationship with the causes of system
failure to find vulnerable parts and improve system reliability [5].
FTA is a significantly reasonable failure and defect analysis method. If FTA is used
as a standard for artificial intelligence and big data analysis, the reliability of
defect analysis can be increased. MQTT (Message Queue Telemetry Transport) is a
message transmission and reception framework for large-scale IoT communication of
small devices standardized in 2016.
MQTT's publish-subscribe messaging pattern can communicate only through a broker.
MQTT has the following three technical characteristics [6].
ⓐ Clients requesting a connection with the MQTT broker either explicitly disconnect
after making a TCP/IP socket connection or remain connected until they are disconnected
due to network conditions.
ⓑ MQTT's publish-subscribe messaging pattern can communicate only through a broker.
Additionally, when a message is published on the set topic, the message can be published
to the clients subscribing to the topic, and both one-to-one and one-to-many communication
is possible.
ⓒ QoS has 3 levels, where 0 guarantees a maximum of one transmission, 1 guarantees
at least one transmission, and 2 guarantees one reception.
Kafka was developed by Linkedin and is a distributed data streaming platform based
on message queues that can publish, subscribe, store, and process data streams in
real-time. Unlike conventional message transmission systems, Kafka manages messages
as event queues in the file system rather than memory [7].
MongoDB is different from relational databases such as Oracle and MySQL, which store
data in tables and have row-centered storage structures that access databases using
SQL. MongoDB is a NoSQL with a document-centered storage structure, and data is stored
as keys and values in Binary JSON format. It consists of a collection that matches
a table, a document that matches a row, and a field that matches a column [8]. In a study of big data architecture for an IoT-based smart manufacturing system,
MQTT and Kafka are combined to collect, relay, and store sensor data, and MongoDB,
relational databases, and Elasticsearch are adopted as consumers [9]. Another deep learning-based network for real-time object detection is called YOLO
(You Only Look Once). YOLO is a one-stage detection algorithm that performs classification
and location identification simultaneously and has the advantage of being able to
detect objects faster than two-stage detection algorithms based on R-CNN such as fast
R-CNN and SPPNet [10].
In this study, a railway safety platform application model was presented using the
IoT-based big data platform architecture. Additionally, using YOLOv5, an object detection
algorithm, an experiment was conducted on how image data on a railroad track can be
used in anomaly detection for safe railway operation, and the results of the experiment
are presented.
Fig. 1. Big data platform architecture design process
2.2 Research Method
Referring to previous studies, this study applied the following research method to
design the railway safety big data platform architecture. The research process is
shown in Fig. 1.
First, the essential elements for big data platform design were defined in five areas:
① data collection area, ② transmission area, ③ storage area, ④ monitoring and control
area, and ⑤ artificial intelligence analysis area.
Second, we investigated technological details and application cases to analyze whether
the technologies in the five areas defined above are appropriate for IoT device communication
and sensor data storage and analysis.
Third, we combined the technologies of each area to design the optimal railway safety
big data platform architecture.
Lastly, based on the designed railway safety big data platform architecture, we presented
an application model that identifies and classifies railroad track status images collected
from trains through a deep learning algorithm.