Front. Struct. Civ. Eng.    2020, Vol. 14 Issue (6) : 1403-1417
Heuristic solution using decision tree model for enhanced XML schema matching of bridge structural calculation documents
Sang I. PARK1,2, Sang-Ho LEE2()
1. Department of Civil, Environmental and Architectural Engineering, University of Colorado at Boulder, Boulder, CO 80309, USA
2. Department of Civil and Environmental Engineering, Yonsei University, Seoul 03722, Korea
Research on the quality of data in a structural calculation document (SCD) is lacking, although the SCD of a bridge is used as an essential reference during the entire lifecycle of the facility. XML Schema matching enables qualitative improvement of the stored data. This study aimed to enhance the applicability of XML Schema matching, which improves the speed and quality of information stored in bridge SCDs. First, the authors proposed a method of reducing the computing time for the schema matching of bridge SCDs. The computing speed of schema matching was increased by 13 to 1800 times by reducing the checking process of the correlations. Second, the authors developed a heuristic solution for selecting the optimal weight factors used in the matching process to maintain a high accuracy by introducing a decision tree. The decision tree model was built using the content elements stored in the SCD, design companies, bridge types, and weight factors as input variables, and the matching accuracy as the target variable. The inverse-calculation method was applied to extract the weight factors from the decision tree model for high-accuracy schema matching results.

Keywords structural calculation document      bridge structure      XML Schema matching      weight factor      data mining      decision tree model     
Corresponding Author(s): Sang-Ho LEE   
Just Accepted Date: 20 October 2020   Online First Date: 21 December 2020    Issue Date: 12 January 2021
 Cite this article:   
Sang I. PARK,Sang-Ho LEE. Heuristic solution using decision tree model for enhanced XML schema matching of bridge structural calculation documents[J]. Front. Struct. Civ. Eng., 2020, 14(6): 1403-1417.
Fig.1  Basic process of XML application schema matching.
Fig.2  Equality constraint method-based weight selection process.
Fig.3  Changes in the accuracy of XML Schema matching according to change in weight factors: (a) wP; (b) wC; (c) wS
Fig.4  Changes in the accuracy of XML Schema matching according to change in wNE: (a) MM module; (b) SMM module
Fig.5  Comparison of computing time between MM and SMM according to the number of elements in the model.
Fig.6  Comparison of matching accuracy by SCD item.
Fig.7  KDD process.
Fig.8  Inverse-calculation applications of the decision tree.
item variable variable type and range
target var. matching accuracy A: 100% D: 85%–89%
B: 95%–99% E: 80%–84%
C: 90%–94% F:≤79%
input var. ωNE continuous: 0–1
ωS continuous: 0–1
ωC continuous: 0–1
no. of element continuous
structural type of bridge cs: cable-stayed bridge
sb: steel box girder bridge
sp: steel plate bridge
sub_v: v-type substructure
sub_t: t-type substructure
company C_D: D E & C C_S: S Engineering
C_Y: Y Engineering C_K: K E & C
C_M: M Engineering
Tab.1  Variables used in the decision tree model configuration
Fig.9  Decision tree modeling process.
Fig.10  XML Schema matching weight-factor selection process using the decision tree.
A=: (COMPANY== C_M) && (NUM_LINE>1631.5) && (WNE>0.21111)
A=: (COMPANY== C_K) && (TYPE== sb) && (WNE<= 0.23611) && (WC>0.436505)
A=: (COMPANY== C_K) && (TYPE== sb) && (WNE>0.23611) && (WS>0.108825)
A=: (COMPANY== C_D) && (NUM_LINE>524) && (WNE>0.21111) && (TYPE== sub_v) && (WC<= 0.174245)
A=: (COMPANY== C_M) && (NUM_LINE>1631.5) && (WNE<= 0.21111) && (WS>0.207145) && (WC>0.19091)
B=: (COMPANY== C_K) && (TYPE== sub_v) && (WS<= 0.121325)
B=: (COMPANY== C_D) && (NUM_LINE<= 524) && (WC<= 0.13393) && (WNE>0.13393)
B=: (COMPANY== C_D) && (NUM_LINE>524) && (WNE>0.21111) && (TYPE== sp)
B=: (COMPANY== C_Y) && (NUM_LINE<= 1706) && (WC<= 0.267855) && (WNE<= 0.174245) && (TYPE== sub_t)
B=: (COMPANY== C_K) && (TYPE== sub_v) && (WS>0.121325) && (WNE>0.322915) && (WC>0.23611)
B=: (COMPANY== C_K) && (TYPE== sub_v) && (WS>0.121325) && (WNE<= 0.322915) && (WC>0.37647)
B=: (COMPANY== C_M) && (NUM_LINE<= 1631.5) && (TYPE== sub_t) && (WC<= 0.13393) && (WNE>0.13393)
B=: (COMPANY== C_M) && (NUM_LINE>1631.5) && (WNE<= 0.21111) && (WS>0.207145) && (WC<= 0.19091)
B=: (COMPANY== C_D) && (NUM_LINE<= 524) && (WC>0.13393) && (WNE>0.267855) && (WS>0.322915)
B=: (COMPANY== C_M) && (NUM_LINE<= 1631.5) && (TYPE== sub_t) && (WC>0.13393) && (WNE>0.174245) && (WS>0.267855)
B=: (COMPANY== C_D) && (NUM_LINE>524) && (WNE<= 0.21111) && (WC<= 0.436505) && (WS>0.39869) && (TYPE== sp)
B=: (COMPANY== C_D) && (NUM_LINE>524) && (WNE>0.21111) && (TYPE== sub_v) && (WC>0.174245) && (WS<= 0.23611)
B=: (COMPANY== C_D) && (NUM_LINE>524) && (WNE>0.21111) && (TYPE== cs) && (WS>0.23611) && (WC>0.23611)
B=: (COMPANY== C_D) && (NUM_LINE>524) && (WNE<= 0.21111) && (WC<= 0.436505) && (WS<= 0.39869) && (TYPE== cs)
B=: (COMPANY== C_S) && (NUM_LINE<= 1571.5) && (WNE<= 0.267855) && (WS<= 0.207145) && (TYPE== sub_t) && (WC>0.414285)
C=: (COMPANY== C_S) && (NUM_LINE<= 1571.5) && (WNE>0.267855)
C=: (COMPANY== C_D) && (NUM_LINE<= 524) && (WC>0.13393) && (WNE<= 0.267855)
C=: (COMPANY== C_K) && (TYPE== sb) && (WNE>0.23611) && (WS<= 0.108825)
C=: (COMPANY== C_M) && (NUM_LINE<= 1631.5) && (TYPE== sp) && (WC>0.21111)
C=: (COMPANY== C_Y) && (NUM_LINE>1706) && (WS>0.39869) && (TYPE== sb)
C=: (COMPANY== C_S) && (NUM_LINE>1571.5) && (TYPE== cs) && (WNE>0.267855)
C=: (COMPANY== C_D) && (NUM_LINE>524) && (WNE>0.21111) && (TYPE== cs) && (WS<= 0.23611)
C=: (COMPANY== C_K) && (TYPE== cs) && (WNE<= 0.39869) && (WS>0.322915) && (WC<= 0.174245)
C=: (COMPANY== C_K) && (TYPE== cs) && (WNE<= 0.39869) && (WS<= 0.322915) && (WC>0.414285)
C=: (COMPANY== C_M) && (NUM_LINE<= 1631.5) && (TYPE== sp) && (WC<= 0.21111) && (WNE>0.21111)
C=: (COMPANY== C_D) && (NUM_LINE>524) && (WNE<= 0.21111) && (WC>0.436505) && (TYPE== sp)
C=: (COMPANY== C_Y) && (NUM_LINE<= 1706) && (WC<= 0.267855) && (WNE<= 0.174245) && (TYPE== sub_v)
C=: (COMPANY== C_Y) && (NUM_LINE<= 1706) && (WC<= 0.267855) && (WNE<= 0.174245) && (TYPE== sp)
C=: (COMPANY== C_M) && (NUM_LINE>1631.5) && (WNE<= 0.21111) && (WS<= 0.207145) && (WC<= 0.21111)
C=: (COMPANY== C_M) && (NUM_LINE<= 1631.5) && (TYPE== sub_t) && (WC>0.13393) && (WNE<= 0.174245)
C=: (COMPANY== C_K) && (TYPE== sub_v) && (WS>0.121325) && (WNE>0.322915) && (WC<= 0.23611)
C=: (COMPANY== C_K) && (TYPE== sub_v) && (WS>0.121325) && (WNE<= 0.322915) && (WC<= 0.37647)
C=: (COMPANY== C_S) && (NUM_LINE>1571.5) && (TYPE== cs) && (WNE<= 0.267855) && (WS>0.39869)
C=: (COMPANY== C_K) && (TYPE== cs) && (WNE<= 0.39869) && (WS<= 0.322915) && (WC<= 0.414285)
C=: (COMPANY== C_D) && (NUM_LINE>524) && (WNE<= 0.21111) && (WC>0.436505) && (TYPE== sub_v)
C=: (COMPANY== C_S) && (NUM_LINE<= 1571.5) && (WNE<= 0.267855) && (WS>0.207145) && (WC>0.267855)
C=: (COMPANY== C_Y) && (NUM_LINE>1706) && (WS<= 0.39869) && (WNE>0.19091) && (WC>0.267855)
C=: (COMPANY== C_D) && (NUM_LINE<= 524) && (WC>0.13393) && (WNE>0.267855) && (WS<= 0.322915) && (TYPE== sub_t)
C=: (COMPANY== C_D) && (NUM_LINE>524) && (WNE>0.21111) && (TYPE== sub_v) && (WC>0.174245) && (WS>0.23611)
C=: (COMPANY== C_Y) && (NUM_LINE>1706) && (WS<= 0.39869) && (WNE<= 0.19091) && (TYPE== sb) && (WC>0.174245)
C=: (COMPANY== C_D) && (NUM_LINE>524) && (WNE>0.21111) && (TYPE== cs) && (WS>0.23611) && (WC<= 0.23611)
C=: (COMPANY== C_Y) && (NUM_LINE>1706) && (WS<= 0.39869) && (WNE<= 0.19091) && (TYPE== cs) && (WC<= 0.174245)
C=: (COMPANY== C_D) && (NUM_LINE>524) && (WNE<= 0.21111) && (WC<= 0.436505) && (WS<= 0.39869) && (TYPE== sp)
C=: (COMPANY== C_Y) && (NUM_LINE<= 1706) && (WC>0.267855) && (WS<= 0.322915) && (WNE>0.21111) && (TYPE== sub_t)
C=: (COMPANY== C_Y) && (NUM_LINE>1706) && (WS<= 0.39869) && (WNE>0.19091) && (WC<= 0.267855) && (TYPE== sb)
C=: (COMPANY== C_Y) && (NUM_LINE<= 1706) && (WC>0.267855) && (WS<= 0.322915) && (WNE<= 0.21111) && (TYPE== sp)
C=: (COMPANY== C_Y) && (NUM_LINE<= 1706) && (WC>0.267855) && (WS<= 0.322915) && (WNE<= 0.21111) && (TYPE== sub_v)
C=: (COMPANY== C_S) && (NUM_LINE<= 1571.5) && (WNE<= 0.267855) && (WS<= 0.207145) && (TYPE== sp) && (WC>0.21111)
C=: (COMPANY== C_M) && (NUM_LINE>1631.5) && (WNE<= 0.21111) && (WS<= 0.207145) && (WC>0.21111) && (TYPE== sb)
C=: (COMPANY== C_S) && (NUM_LINE<= 1571.5) && (WNE<= 0.267855) && (WS<= 0.207145) && (TYPE== sub_v) && (WC>0.21111)
C=: (COMPANY== C_S) && (NUM_LINE<= 1571.5) && (WNE<= 0.267855) && (WS<= 0.207145) && (TYPE== sub_t) && (WC<= 0.414285)
C=: (COMPANY== C_S) && (NUM_LINE<= 1571.5) && (WNE<= 0.267855) && (WS>0.207145) && (WC<= 0.267855) && (TYPE== sp)
C=: (COMPANY== C_S) && (NUM_LINE<= 1571.5) && (WNE<= 0.267855) && (WS>0.207145) && (WC<= 0.267855) && (TYPE== sub_v)
C=: (COMPANY== C_S) && (NUM_LINE<= 1571.5) && (WNE<= 0.267855) && (WS>0.207145) && (WC<= 0.267855) && (TYPE== sub_t)
C=: (COMPANY== C_M) && (NUM_LINE<= 1631.5) && (TYPE== sub_t) && (WC>0.13393) && (WNE>0.174245) && (WS<= 0.267855)
C=: (COMPANY== C_S) && (NUM_LINE>1571.5) && (TYPE== cs) && (WNE<= 0.267855) && (WS<= 0.39869) && (WC>0.436505)
C=: (COMPANY== C_Y) && (NUM_LINE<= 1706) && (WC>0.267855) && (WS<= 0.322915) && (WNE<= 0.21111) && (TYPE== sub_t)
C=: (COMPANY== C_D) && (NUM_LINE>524) && (WNE<= 0.21111) && (WC<= 0.436505) && (WS>0.39869) && (TYPE== sub_v)
C=: (COMPANY== C_Y) && (NUM_LINE<= 1706) && (WC<= 0.267855) && (WNE>0.174245) && (WS<= 0.267855) && (TYPE== sub_v)
C=: (COMPANY== C_Y) && (NUM_LINE<= 1706) && (WC<= 0.267855) && (WNE>0.174245) && (WS<= 0.267855) && (TYPE== sub_t)
C=: (COMPANY== C_M) && (NUM_LINE<= 1631.5) && (TYPE== sp) && (WC<= 0.21111) && (WNE<= 0.21111) && (WS>0.207145)
C=: (COMPANY== C_Y) && (NUM_LINE<= 1706) && (WC<= 0.267855) && (WNE>0.174245) && (WS>0.267855) && (TYPE== sub_t)
C=: (COMPANY== C_D) && (NUM_LINE>524) && (WNE<= 0.21111) && (WC<= 0.436505) && (WS>0.39869) && (TYPE== cs)
D=: (COMPANY== C_K) && (TYPE== cs) && (WNE>0.39869)
D=: (COMPANY== C_K) && (TYPE== cs) && (WNE<= 0.39869) && (WS>0.322915) && (WC>0.174245)
D=: (COMPANY== C_Y) && (NUM_LINE<= 1706) && (WC>0.267855) && (WS>0.322915) && (TYPE== sp)
D=: (COMPANY== C_D) && (NUM_LINE>524) && (WNE<= 0.21111) && (WC>0.436505) && (TYPE== cs)
D=: (COMPANY== C_M) && (NUM_LINE<= 1631.5) && (TYPE== sub_t) && (WC<= 0.13393) && (WNE<= 0.13393)
D=: (COMPANY== C_S) && (NUM_LINE>1571.5) && (TYPE== sb) && (WNE<= 0.39869) && (WS<= 0.39869)
D=: (COMPANY== C_M) && (NUM_LINE<= 1631.5) && (TYPE== sp) && (WC<= 0.21111) && (WNE<= 0.21111) && (WS<= 0.207145)
D=: (COMPANY== C_S) && (NUM_LINE<= 1571.5) && (WNE<= 0.267855) && (WS<= 0.207145) && (TYPE== sp) && (WC<= 0.21111)
D=: (COMPANY== C_D) && (NUM_LINE>524) && (WNE<= 0.21111) && (WC<= 0.436505) && (WS<= 0.39869) && (TYPE== sub_v)
D=: (COMPANY== C_Y) && (NUM_LINE<= 1706) && (WC>0.267855) && (WS<= 0.322915) && (WNE>0.21111) && (TYPE== sp)
D=: (COMPANY== C_S) && (NUM_LINE<= 1571.5) && (WNE<= 0.267855) && (WS<= 0.207145) && (TYPE== sub_v) && (WC<= 0.21111)
D=: (COMPANY== C_Y) && (NUM_LINE<= 1706) && (WC>0.267855) && (WS<= 0.322915) && (WNE>0.21111) && (TYPE== sub_v)
D=: (COMPANY== C_S) && (NUM_LINE>1571.5) && (TYPE== cs) && (WNE<= 0.267855) && (WS<= 0.39869) && (WC<= 0.436505)
D=: (COMPANY== C_Y) && (NUM_LINE>1706) && (WS<= 0.39869) && (WNE>0.19091) && (WC<= 0.267855) && (TYPE== cs)
D=: (COMPANY== C_Y) && (NUM_LINE>1706) && (WS<= 0.39869) && (WNE<= 0.19091) && (TYPE== sb) && (WC<= 0.174245)
D=: (COMPANY== C_Y) && (NUM_LINE>1706) && (WS<= 0.39869) && (WNE<= 0.19091) && (TYPE== cs) && (WC>0.174245)
E=: (COMPANY== C_S) && (NUM_LINE>1571.5) && (TYPE== sb) && (WNE>0.39869)
E=: (COMPANY== C_Y) && (NUM_LINE>1706) && (WS>0.39869) && (TYPE== cs)
E=: (COMPANY== C_D) && (NUM_LINE<= 524) && (WC<= 0.13393) && (WNE<= 0.13393)
E=: (COMPANY== C_Y) && (NUM_LINE<= 1706) && (WC>0.267855) && (WS>0.322915) && (TYPE== sub_v)
E=: (COMPANY== C_S) && (NUM_LINE>1571.5) && (TYPE== sb) && (WNE<= 0.39869) && (WS>0.39869)
E=: (COMPANY== C_Y) && (NUM_LINE<= 1706) && (WC>0.267855) && (WS>0.322915) && (TYPE== sub_t)
E=: (COMPANY== C_Y) && (NUM_LINE<= 1706) && (WC<= 0.267855) && (WNE>0.174245) && (WS>0.267855) && (TYPE== sub_v)
E=: (COMPANY== C_Y) && (NUM_LINE<= 1706) && (WC<= 0.267855) && (WNE>0.174245) && (WS<= 0.267855) && (TYPE== sp)
F=: (COMPANY== C_K) && (TYPE== sb) && (WNE<= 0.23611) && (WC<= 0.436505) && (WS>0.218255)
F=: (COMPANY== C_K) && (TYPE== sb) && (WNE<= 0.23611) && (WC<= 0.436505) && (WS<= 0.218255)
F=: (COMPANY== C_Y) && (NUM_LINE<= 1706) && (WC<= 0.267855) && (WNE>0.174245) && (WS>0.267855) && (TYPE== sp)
Tab.2  The 95 leaves derived through the decision tree
Fig.11  Automatic scheme matching weight-factor selection modules using the DT model DB: (a) Database building module based on decision tree model; (b) Suggest module for suitable weight factors of XML Schema matching; (c) Update module of the decision tree.
input variable MM module SMM module
type company No. of elements accuracy (%) used weight value accuracy (%)
cable-stayed bridge S engineering 1028 85.22 ωNE = 0.26, ωS = 0.21, ωC = 0.27, ωP = 0.26 95.08
steel plate bridge D engineering 845 90.91 ωNE = 0.21, ωS = 0.40, ωC = 0.33, ωP = 0.06 97.26
v-type substructure K engineering 549 87.50 ωNE = 0.32, ωS = 0.13, ωC = 0.38, ωP = 0.17 96.71
steel box girder bridge Y engineering 1826 78.13 ωNE = 0.19, ωS = 0.39, ωC = 0.18, ωP = 0.24 94.58
cable-stayed bridge M engineering 1933 93.33 ωNE = 0.21, ωS = 0.21, ωC = 0.20, ωP = 0.38 98.65
Tab.3  Examples of the DT model-based SMM accuracy
