Heuristic solution using decision tree model for enhanced XML schema matching of bridge structural calculation documents
Sang I. PARK1,2, Sang-Ho LEE2()
1. Department of Civil, Environmental and Architectural Engineering, University of Colorado at Boulder, Boulder, CO 80309, USA 2. Department of Civil and Environmental Engineering, Yonsei University, Seoul 03722, Korea
Research on the quality of data in a structural calculation document (SCD) is lacking, although the SCD of a bridge is used as an essential reference during the entire lifecycle of the facility. XML Schema matching enables qualitative improvement of the stored data. This study aimed to enhance the applicability of XML Schema matching, which improves the speed and quality of information stored in bridge SCDs. First, the authors proposed a method of reducing the computing time for the schema matching of bridge SCDs. The computing speed of schema matching was increased by 13 to 1800 times by reducing the checking process of the correlations. Second, the authors developed a heuristic solution for selecting the optimal weight factors used in the matching process to maintain a high accuracy by introducing a decision tree. The decision tree model was built using the content elements stored in the SCD, design companies, bridge types, and weight factors as input variables, and the matching accuracy as the target variable. The inverse-calculation method was applied to extract the weight factors from the decision tree model for high-accuracy schema matching results.
. [J]. Frontiers of Structural and Civil Engineering, 2020, 14(6): 1403-1417.
Sang I. PARK, Sang-Ho LEE. Heuristic solution using decision tree model for enhanced XML schema matching of bridge structural calculation documents. Front. Struct. Civ. Eng., 2020, 14(6): 1403-1417.
S Liu, C A McMahon, M J Darlington, S J Culley, P J Wild. A computational framework for retrieval of document fragments based on decomposition schemes in engineering information management. Advanced Engineering Informatics, 2006, 20(4): 401–413 https://doi.org/10.1016/j.aei.2006.05.008
B T Zhong, L Y Ding, H B Luo, Y Zhou, Y Z Hu, H M Hu. Ontology-based semantic modeling of regulation constraint for automated construction quality compliance checking. Automation in Construction, 2012, 28: 58–70 https://doi.org/10.1016/j.autcon.2012.06.006
4
J Zhang, N M El-Gohary. Semantic NLP-based information extraction from construction regulatory documents for automated compliance checking. Journal of Computing in Civil Engineering, 2016, 30(2): 04015014 https://doi.org/10.1061/(ASCE)CP.1943-5487.0000346
5
K Y Lin, L Soibelman. Incorporating domain knowledge and information retrieval techniques to develop an architectural/engineering/construction online product search engine. Journal of Computing in Civil Engineering, 2009, 23(4): 201–210 https://doi.org/10.1061/(ASCE)0887-3801(2009)23:4(201)
6
L J McGibbney, B Kumar. A knowledge-directed information retrieval and management framework for energy performance building regulations. In: Proceedings from International Workshop on Computing in Civil Engineering 2011. Miami, FL: American Society of Civil Engineers, 2011, 339–346
7
L Zhang, N M El-Gohary. Epistemology-based context-aware semantic model for sustainable construction practices. Journal of Construction Engineering and Management, 2016, 142(3): 04015084 https://doi.org/10.1061/(ASCE)CO.1943-7862.0001055
8
P Zhou, N M El-Gohary. Automated matching of design information in BIM to regulatory information in energy codes. In: Proceedings from Construction Research Congress 2018. New Orleans, LA: American Society of Civil Engineers 2018, 75–85
9
R Sacks, T Bloch, M Katz, R Yosef. Automating design review with artificial intelligence and BIM: State of the art and research framework. In: Proceedings from Computing in Civil Engineering 2019: Visualization, Information Modeling, and Simulation. Atlanta, GA: American Society of Civil Engineers 2019, 353–360
10
C H Caldas, L Soibelman. Automating hierarchical document classification for construction management information systems. Automation in Construction, 2003, 12(4): 395–406 https://doi.org/10.1016/S0926-5805(03)00004-9
11
Z Ma, H Li, Q P Shen, J Yang. Using XML to support information exchange in construction projects. Automation in Construction, 2004, 13(5): 629–637 https://doi.org/10.1016/j.autcon.2004.04.010
12
S I Park, B G Kim, K H Kim, S H Lee. A methodology for automatic hierarchy definition of sentences in engineering documents. Journal of Computational Structural Engineering Institute of Korea, 2009, 22: 323–330 (in Korean)
13
B G Kim, S I Park, H J Kim, S H Lee. Automatic extraction of apparent semantic structure from text contents of a structural calculation document. Journal of Computing in Civil Engineering, 2010, 24(3): 313–324 https://doi.org/10.1061/(ASCE)CP.1943-5487.0000047
14
S H Lee, B G Kim, H J Kim, S J Kim. A strategy for IT-based lifetime management of bridge. In: Proceedings from Bridge Maintenance, Safety, Management, Health Monitoring and Informatics (IABMAS08). Seoul: CRC Press, 2008.
15
B G Kim. Integration of a 3-D Bridge model and structured information of engineering documents. Dissertation for the Doctoral Degree. Seoul: Yonsei University, 2010
16
E Rahm, P A Bernstein. A survey of approaches to automatic schema matching. VLDB Journal, 2001, 10(4): 334–350 https://doi.org/10.1007/s007780100057
17
S H Lee, B G Kim, D H Kim, Y S Jeong. Development of standardized semantic model for structural calculation documents of bridges and XML schema matching technique. In: Proceedings from the 3rd International Conference on Bridge Maintenance Safety and Management (IABMAS). Porto: Taylor & Francis, 2006
18
S Yi, B Huang, W Tatchan. XML application schema matching using similarity measure and relaxation labeling. Information Sciences, 2005, 169(1-2): 27–46 https://doi.org/10.1016/j.ins.2004.02.013
19
S I Park, B G Kim, S H Lee. An efficient application of XML schema matching technique to structural calculation document of bridge. Journal of the Korean Society of Civil Engineers, 2012, 32: 51–59 (in Korean)
20
J G Lin. Multiple-objective problems: Pareto-optimal solutions by method of proper equality constraints. IEEE Transactions on Automatic Control, 1976, 21(5): 641–650 https://doi.org/10.1109/TAC.1976.1101338
21
W S Li, C Clifton. SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data & Knowledge Engineering, 2000, 33(1): 49–84 https://doi.org/10.1016/S0169-023X(99)00044-0
22
J Madhavan, P A Bernstein, E Rahm. Generic schema matching with cupid. In: Proceedings of the 27th International Conference on Very Large Data Bases. San Francisco, CA: Morgan Kanfmann Publishers Inc., 2001, 49–58
23
S Castano, V De Antonellis. Global viewing of heterogeneous data sources. IEEE Transactions on Knowledge and Data Engineering, 2001, 13(2): 277–297 https://doi.org/10.1109/69.917566
24
A Algergawy, E Schallehn, G Saake. Improving XML schema matching performance using Prüfer sequences. Data & Knowledge Engineering, 2009, 68(8): 728–747 https://doi.org/10.1016/j.datak.2009.01.001
25
A Algergawy, S Massmann, E Rahm. A clustering-based approach for large-scale ontology matching. In: Proceedings from ADBIS 2011. Berlin: Heidelberg, 2011, 415–428
26
S Melnik, H Garcia-Molina, E Rahm. Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In: Proceedings from 18th International Conference on Data Engineering. San Jose, CA: IEEE Computer Society 2002, 117–128
27
A Doan, J Madhavan, P Domingos, A Halevy. Learning to map between ontologies on the semantic web. In: Proceedings of the 11th International Conference on World Wide Web. Honolulu, HI: Association for Computing Machinery, 2002, 662–673
28
A Doan, P Domingos, A Y Halevy. Reconciling schemas of disparate data sources: A machine-learning approach. ACM SIGMOD Record Journal of Management in Engineering, 2001, 30(2): 509–520 https://doi.org/10.1145/376284.375731
29
U Fayyad, G Piatetsky-Shapiro, P Smyth. From data mining to knowledge discovery in databases. AI Magazine, 1996, 17(3): 37–54
30
P Adriaans, D Zantinge. Data Mining. Boston: Addison-Wesley, 1996