Bi-objective evolutionary Bayesian network structure learning via skeleton constraint

doi:10.1007/s11704-023-2740-6

Front. Comput. Sci.

2023, Vol. 17

Issue (6) : 176350 https://doi.org/10.1007/s11704-023-2740-6

Artificial Intelligence

Bi-objective evolutionary Bayesian network structure learning via skeleton constraint

Ting WU¹, Hong QIAN¹(

), Ziqi LIU², Jun ZHOU², Aimin ZHOU¹

¹. Shanghai Institute of AI for Education and School of Computer Science and Technology, East China Normal University, Shanghai 200062, China
². Ant Group, Hangzhou 310023, China

Download: PDF(5271 KB) HTML
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks

Abstract

Bayesian network is a popular approach to uncertainty knowledge representation and reasoning. Structure learning is the first step to learn a Bayesian network. Score-based methods are one of the most popular ways of learning the structure. In most cases, the score of Bayesian network is defined as adding the log-likelihood score and complexity score by using the penalty function. If the penalty function is set unreasonably, it may hurt the performance of structure search. Thus, Bayesian network structure learning is essentially a bi-objective optimization problem. However, the existing bi-objective structure learning algorithms can only be applied to small-scale networks. To this end, this paper proposes a bi-objective evolutionary Bayesian network structure learning algorithm via skeleton constraint (BBS) for the medium-scale networks. To boost the performance of searching, BBS introduces the random order prior (ROP) initial operator. ROP generates a skeleton to constrain the searching space, which is the key to expanding the scale of structure learning problems. Then, the acyclic structures are guaranteed by adding the orders of variables in the initial skeleton. After that, BBS designs the Pareto rank based crossover and skeleton guided mutation operators. The operators operate on the skeleton obtained in ROP to make the search more targeted. Finally, BBS provides a strategy to choose the final solution. The experimental results show that BBS can always find the structure which is closer to the ground truth compared with the single-objective structure learning methods. Furthermore, compared with the existing bi-objective structure learning methods, BBS is scalable and can be applied to medium-scale Bayesian network datasets. On the educational problem of discovering the influencing factors of students’ academic performance, BBS provides higher quality solutions and is featured with the flexibility of solution selection compared with the widely-used Bayesian network structure learning methods.

Keywords Bayesian network structure learning multi-objective optimization conditional independence test

Corresponding Author(s): Hong QIAN

Just Accepted Date: 07 April 2023 Issue Date: 14 August 2023

Cite this article:

Ting WU,Hong QIAN,Ziqi LIU, et al. Bi-objective evolutionary Bayesian network structure learning via skeleton constraint[J]. Front. Comput. Sci., 2023, 17(6): 176350.

URL:

https://academic.hep.com.cn/fcs/EN/10.1007/s11704-023-2740-6
https://academic.hep.com.cn/fcs/EN/Y2023/V17/I6/176350

Fig.1 The network cancer and its corresponding node set and edge set

Fig.2 The CPTs of the variables in the network cancer

Fig.3 The example directed acyclic graph and its adjacency matrix

Fig.4 The example Pareto front of BBS

Fig.5 Adding different node orders on the fixed skeleton in initial stage

Fig.6 An example of Pareto rank based crossover between two parents

Fig.7 An example of skeleton based mutation

Scale	Name	Nodes	Edges	Search space	Parameters
Small	Earthquake	5	4	$2.9 × 104$	10
	Cancer	5	4	$2.9 × 104$	10
	Survey	6	6	$3.8 × 106$	21
	Asia	8	8	$7.8 × 1011$	18
	Sachs	11	17	$3.1 × 1022$	178
Medium	Child	20	25	$2.3 × 1072$	230
Medium	Alarm	37	46	$3.0 × 10237$	509
Large	HeparII	70	123	$6.5 × 10815$	1453

Tab.1 The benchmark datasets used in the experiment

Fig.8 The Pareto fronts of dataset Alarm and Child. (a) Pareto front of Alarm; (b) Pareto front of Child

Networks	?score	precision/%	recall/%	F1/%	SHD	SHD^*
Cancer	0.00 $±$ 0.00	100.0 $±$ 0.0	100.0 $±$ 0.0	100.0 $±$ 0.0	0.0 $±$ 0.0	0.0 $±$ 0.0
Earthquake	0.00 $±$ 0.00	100.0 $±$ 0.0	100.0 $±$ 0.0	100.0 $±$ 0.0	0.0 $±$ 0.0	0.0 $±$ 0.0
Survey	?6.89 $±$ 0.00	90.91 $±$ 0.0	83.33 $±$ 0.0	100.0 $±$ 0.0	1.0 $±$ 0.0	1.0 $±$ 0.0
Asia	?1.99 $±$ 0.00	85.71 $±$ 1.42	85.71 $±$ 1.42	85.71 $±$ 1.42	2.0 $±$ 0.0	0.3 $±$ 0.6
Sachs	0.00 $±$ 0.00	86.49 $±$ 8.17	100.0 $±$ 0.0	77.06 $±$ 11.90	3.9 $±$ 2.0	1.9 $±$ 0.8
Child	0.00 $±$ 0.00	89.20 $±$ 3.60	100.0 $±$ 0.0	94.25 $±$ 2.01	2.7 $±$ 0.9	1.8 $±$ 0.7
Alarm	?98.16 $±$ 170.47	89.70 $±$ 2.82	95.25 $±$ 1.06	92.37 $±$ 1.59	6.6 $±$ 1.3	3.9 $±$ 1.0
HeparII	?2,836.38 $±$ 271.04	72.49 $±$ 5.26	79.64 $±$ 2.04	72.49 $±$ 5.26	37.7 $±$ 5.8	36.1 $±$ 5.8

Tab.2 Results of BBS on different datasets

Fig.9 The initial populations generated by BBS and random strategy on different datasets. (a) Asia; (b) Sachs; (c) Child; (d) Alarm

	Networks	HC	MMHC	H2PC	RSMAX2	BOS	IBS	BBS
$Δ s c o r e$	Cancer	$53.74 ↓$	0.00	0.00	0.00	$2.70 ± 2.93 ↓$	0.00 $±$ 0.00	0.00 $±$ 0.00
	Earthquake	$229.95 ↓$	0.00	0.00	0.00	$1.46 ± 2.33$	0.00 $±$ 0.00	0.00 $±$ 0.00
	Survey	$14.31 ↓$	?6.89	?6.89	?6.89	$? 5.54 ± 1.90$	?6.89 $±$ 0.00	?6.89 $±$ 0.00
	Asia	$731.00 ↓$	$1888.41 ↓$	$1888.41 ↓$	$1888.41 ↓$	$1.67 ± 9.67$	?2.00 $±$ 0.00 $↑$	?1.99 $±$ 0.00
	Sachs	$8837.94 ↓$	$2696.09 ↓$	0.00	$4597.81 ↓$	$88.16 ± 9.66 ↓$	0.00 $±$ 1.30	0.00 $±$ 0.00
	Child	$22145.36 ↓$	$3023.12 ↓$	$1014.62 ↓$	$3023.12 ↓$	$1711.97 ± 1064.31 ↓$	0.00 $±$ 0.00	0.00 $±$ 0.00
	Alarm	$63083.88 ↓$	$20580.58 ↓$	$8737.18 ↓$	$19981.98 ↓$	overtime	$627.7 ± 289.05 ↓$	?98.16 $±$ 170.47
	HeparII	$5296871.96 ↓$	$? 1861.03 ↓$	?3886.29 $↑$	$? 1938.59 ↓$	overtime	?3182.215 $±$ 59.91	?2836.38 $±$ 271.04
SHD	Cancer	$5.0 ↓$	0.0	0.0	0.0	1.4 $±$ 1.6	0.0 $±$ 0.0	0.0 $±$ 0.0
	Earthquake	$8.0 ↓$	0.0	0.0	0.0	$0.6 ± 1.2 ↓$	0.0 $±$ 0.0	0.0 $±$ 0.0
	Survey	$6.0 ↓$	1.0	1.0	1.0	2.5 $±$ 1.9	1.0 $±$ 0.0	1.0 $±$ 0.0
	Asia	$14.0 ↓$	$3.0 ↓$	$3.0 ↓$	$3.0 ↓$	3.0 $±$ 3.0	1.8 $±$ 0.4 $↑$	2.0 $±$ 0.0
	Sachs	$21.0 ↓$	$12.0 ↓$	$10.0 ↓$	$8.0 ↓$	2.7 $±$ 0.9 $↑$	3.1 $±$ 1.3	3.9 $±$ 2.0
	Child	$44.0 ↓$	$7.0 ↓$	1.0 $↑$	$7.0 ↓$	$20.7 ± 4.8 ↓$	$7.9 ± 2.0 ↓$	2.4 $±$ 0.5
	Alarm	$73.0 ↓$	$12.0 ↓$	$9.0 ↓$	$13.0 ↓$	overtime	$9.1 ± 1.4 ↓$	6.6 $±$ 1.3
	HeparII	$189.0 ↓$	$82.0 ↓$	$55.0 ↓$	$80.0 ↓$	overtime	$69.8 ± 2.0 ↓$	37.7 $±$ 5.8

Tab.3 SHD and

Δ s c o r e

of different methods. The smallest

Δ s c o r e

and SHD of each BN are shown in bold. The

t

-test is conducted with significance level of

0.05

. The symbol “

↑

” and “

↓

” after the data respectively indicate that the corresponding method is significantly better and worse than BBS

Networks	BOS	BBS
Cancer	0.7 $±$ 0.7	0.0 $±$ 0.0
Earthquake	0.3 $±$ 0.4	0.0 $±$ 0.0
Survey	1.1 $±$ 0.3	1.0 $±$ 0.0
Asia	1.5 $±$ 1.4	0.3 $±$ 0.6
Sachs	6.4 $±$ 2.4	1.9 $±$ 0.8
Child	15.3 $±$ 3.6	3.9 $±$ 1.0

Tab.4 SHD^*s of BBS and BOS. The smallest SHD^* of each BN are shown in bold

Fig.10 BN structure learned by HC. Blue nodes are isolated nodes

Fig.11 BN structure learned by RSMAX2, MMHC and H2PC. Blue nodes are isolated nodes

Fig.12 Different structures learned by BBS. Nodes in green are the natural information of students

Fig.13 The mean of

B I C

scores and standard deviation of different algorithms

1	D, Koller N Friedman . Probabilistic Graphical Models - Principles and Techniques. Cambridge: MIT Press, 2009
2	S M, Lee P A Abbott . Bayesian networks for knowledge discovery in large datasets: basics for nurse researchers. Journal of Biomedical Informatics, 2003, 36( 4−5): 389–399
3	G, Luo B, Zhao S Du . Causal inference and Bayesian network structure learning from nominal data. Applied Intelligence, 2019, 49( 1): 253–264
4	D Heckerman . Bayesian networks for data mining. Data Mining and Knowledge Discovery, 1997, 1( 1): 79–119
5	Y, Lv J, Miao J, Liang L, Chen Y Qian . BIC-based node order learning for improving Bayesian network structure learning. Frontiers of Computer Science, 2021, 15( 6): 156337
6	I, Tsamardinos L E, Brown C F Aliferis . The max-min hill-climbing Bayesian network structure learning algorithm. Machine Learning, 2006, 65( 1): 31–78
7	Wu T, Qian H, Zhou A, Li Z. Bi-objective search method for Bayesian network structure learning. In: Proceedings of the 7th IEEE International Conference on Cloud Computing and Intelligent Systems. 2021, 433−437
8	D M, Chickering C, Meek D Heckerman . Large-sample learning of Bayesian networks is NP-hard. In: Proceedings of the 19th Conference in Uncertainty in Artificial Intelligence. 2003, 124−133
9	R W Robinson . Counting unlabeled acyclic digraphs. In: Proceedings of the 5th Australian Conference on Combinatorial Mathematics V. 1976, 28−43
10	G Schwarz . Estimating the dimension of a model. The Annals of Statistics, 1978, 6( 2): 461–464
11	D M Chickering . Learning Bayesian networks is NP-complete. In: Fisher D, Lenz H J, eds. Learning from Data: Artificial Intelligence and Statistics V. New York: Springer, 1996, 121−130
12	D M, Chickering D, Heckerman C Meek . A Bayesian approach to learning Bayesian networks with local structure. In: Proceedings of the 13th Conference on Uncertainty in Artificial Intelligence. 1997, 80−89
13	D M Chickering . Learning equivalence classes of Bayesian network structures. In: Proceedings of the 12th International Conference on Uncertainty in Artificial Intelligence. 1996, 150−157
14	P, Larrañaga C M H, Kuijpers R H, Murga Y Yurramendi . Learning Bayesian network structures by searching for the best ordering with genetic algorithms. IEEE Transactions on Systems, Man, and Cybernetics- Part A: Systems and Humans, 1996, 26( 4): 487–493
15	Campos L M, de J M, Fernández-Luna J A, Gámez J M Puerta . Ant colony optimization for learning Bayesian networks. International Journal of Approximate Reasoning, 2002, 31( 3): 291–311
16	J, Yang Y, Tong Z, Wang S Tan . Efficient and effective Bayesian network local structure learning. Frontiers of Computer Science, 2014, 8( 4): 527–536
17	P, Larrañaga M, Poza Y, Yurramendi R H, Murga C M H Kuijpers . Structure learning of Bayesian networks by genetic algorithms: a performance analysis of control parameters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1996, 18( 9): 912–926
18	M, Teyssier D Koller . Ordering-based search: a simple and effective algorithm for learning Bayesian networks. In: Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence. 2005, 548−549
19	K O, Stanley J, Clune J, Lehman R Miikkulainen . Designing neural networks through neuroevolution. Nature Machine Intelligence, 2019, 1( 1): 24–35
20	H, Qian Y Yu . Derivative-free reinforcement learning: a review. Frontiers of Computer Science, 2021, 15( 6): 156336
21	G F, Cooper E Herskovits . A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 1992, 9( 4): 309–347
22	M, Singh M Valtorta . An algorithm for the construction of Bayesian network structures from data. In: Proceedings of the 9th International Conference on Uncertainty in Artificial Intelligence. 1993, 259−265
23	N, Friedman I, Nachman D Pe’er . Learning Bayesian network structure from massive datasets: the “sparse candidate” algorithm. In: Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. 1999, 206−215
24	I, Tsamardinos C F, Aliferis A Statnikov . Time and sample efficient discovery of Markov blankets and direct causal relations. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2003, 673−678
25	M, Gasse A, Aussem H Elghazel . A hybrid algorithm for Bayesian network structure learning with application to multi-label learning. Expert Systems with Applications, 2014, 41( 15): 6755–6772
26	Morais S R, de A Aussem . An efficient and scalable algorithm for local Bayesian network structure discovery. In: Proceedings of the 14th European Conference on Machine Learning and Knowledge Discovery in Databases. 2010, 164−179
27	K, Yu J, Liang B, Qu Y, Luo C Yue . Dynamic selection preference-assisted constrained multiobjective differential evolution. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2022, 52( 5): 2954–2965
28	P, Yang Q, Yang K, Tang X Yao . Parallel exploration via negatively correlated search. Frontiers of Computer Science, 2021, 15( 5): 155333
29	K, Deb A, Pratap S, Agarwal T Meyarivan . A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 2002, 6( 2): 182–197
30	K, Deb J Sundar . Reference point based multi-objective optimization using evolutionary algorithms. In: Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation. 2006, 635−642
31	R, Poli W B Langdon . Schema theory for genetic programming with one-point crossover and point mutation. Evolutionary Computation, 1998, 6( 3): 231–252
32	P, Kora P Yadlapalli . Crossover operators in genetic algorithms: a review. International Journal of Computer Applications, 2017, 162( 10): 34–36
33	K B, Korb A E Nicholson . Bayesian Artificial Intelligence. 2nd ed. Boca Raton: CRC Press, 2010
34	S L, Lauritzen D J Spiegelhalter . Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society: Series B (Methodological), 1988, 50( 2): 157–194
35	M, Scutari J B Denis . Bayesian Networks: with Examples in R. 2nd ed. New York: Chapman and Hall/CRC, 2021
36	K, Sachs O, Perez D, Pe’er D A, Lauffenburger G P Nolan . Causal protein-signaling networks derived from multiparameter single-cell data. Science, 2005, 308( 5721): 523–529
37	D J, Spiegelhalter S L Lauritzen . Techniques for Bayesian analysis in expert systems. Annals of Mathematics and Artificial Intelligence, 1990, 2(1−4): 353−366
38	I A, Beinlich H J, Suermondt R M, Chavez G F Cooper . The ALARM monitoring system: a case study with two probabilistic inference techniques for belief networks. In: Proceedings of the 2nd European Conference on Artificial Intelligence in Medicine. 1989, 247−256
39	A, Oniśko M J, Druzdzel H Wasyluk . A probabilistic causal model for diagnosis of liver disorders. In: Proceedings of the Workshop Held in Intelligent Information Systems VII. 1998, 379−388
40	Campos L M de . A scoring function for learning Bayesian networks based on mutual information and conditional independence tests. The Journal of Machine Learning Research, 2006, 7: 2149–2187
41	A, Ankan A Panda . pgmpy: probabilistic graphical models using python. In: Proceedings of the 14th Python in Science Conference. 2015, 6−11
42	M Scutari . Learning Bayesian networks with the bnlearn R package. Journal of Statistical Software, 2010, 35(3): 1−22
43	A E, Schwartz M W Rothbart . Let them eat lunch: the impact of universal free meals on student performance. Journal of Policy Analysis and Management, 2020, 39( 2): 376–410
44	D, Reilly D L, Neumann G Andrews . Gender differences in reading and writing achievement: evidence from the national assessment of educational progress (NAEP). American Psychologist, 2019, 74( 4): 445–458
45	J S, Hyde E, Fennema S J Lamon . Gender differences in mathematics performance: a meta-analysis. Psychological Bulletin, 1990, 107( 2): 139–155
46	S, Rodríguez B, Regueiro I, Piñeiro I, Estévez A Valle . Gender differences in mathematics motivation: differential effects on performance in primary education. Frontiers in Psychology, 2020, 10: 3050

[1]

FCS-22740-OF-TW_suppl_1

Download

[1]	Di ZHANG, Dong ZHAO, Huadong MA. Robust load-balanced backbone-based multicast routing in mobile opportunistic networks[J]. Front. Comput. Sci., 2023, 17(4): 174502-.
[2]	Amarjeet PRAJAPATI, Anshu PARASHAR, Amit RATHEE. Multi-dimensional information-driven many-objective software remodularization approach[J]. Front. Comput. Sci., 2023, 17(3): 173209-.
[3]	Zhe XUE, Junping DU, Xin XU, Xiangbin LIU, Junfu WANG, Feifei KOU. Few-shot node classification via local adaptive discriminant structure learning[J]. Front. Comput. Sci., 2023, 17(2): 172316-.
[4]	Yali LV, Junzhong MIAO, Jiye LIANG, Ling CHEN, Yuhua QIAN. BIC-based node order learning for improving Bayesian network structure learning[J]. Front. Comput. Sci., 2021, 15(6): 156337-.
[5]	Ildar NURGALIEV, Qiang QU, Seyed Mojtaba Hosseini BAMAKAN, Muhammad MUZAMMAL. Matching user identities across social networks with limited profile data[J]. Front. Comput. Sci., 2020, 14(6): 146809-.
[6]	Ilyes KHENNAK, Habiba DRIAS. Strength Pareto fitness assignment for pseudo-relevance feedback: application to MEDLINE[J]. Front. Comput. Sci., 2018, 12(1): 163-176.
[7]	Shangfei WANG, Menghua HE, Yachen ZHU, Shan HE, Yue LIU, Qiang JI. Learning with privileged information using Bayesian networks[J]. Front. Comput. Sci., 2015, 9(2): 185-199.
[8]	Yan ZHANG,Dunwei GONG. Generating test data for both paths coverage and faults detection using genetic algorithms: multi-path case[J]. Front. Comput. Sci., 2014, 8(5): 726-740.
[9]	Jianjun YANG,Yunhai TONG,Zitian WANG,Shaohua TAN. Efficient and effective Bayesian network local structure learning[J]. Front. Comput. Sci., 2014, 8(4): 527-536.
[10]	Shangfei WANG, Shan HE, Yue WU, Menghua HE, Qiang JI. Fusion of visible and thermal images for facial expression recognition[J]. Front. Comput. Sci., 2014, 8(2): 232-242.
[11]	Dunwei GONG, Yan ZHANG. Generating test data for both path coverage and fault detection using genetic algorithms[J]. Front Comput Sci, 2013, 7(6): 822-837.
[12]	Yaochu JIN, Robin GRUNA, Bernhard SENDHOFF. Pareto analysis evolutionary and learning systems[J]. Front Comput Sci Chin, 2009, 3(1): 4-17.
[13]	Carlos A. COELLO COELLO. Evolutionary multi-objective optimization:some current research trends and topics that remain to be explored[J]. Front Comput Sci Chin, 2009, 3(1): 18-30.
[14]	WEI Hui. A hierarchical model for structure learning based on the physiological characteristics of neurons[J]. Front. Comput. Sci., 2007, 1(3): 361-372.

Viewed

Full text

Abstract

Cited

Shared

Discussed