Neural partially linear additive model

doi:10.1007/s11704-023-2662-3

Front. Comput. Sci.

2024, Vol. 18

Issue (6) : 186334 https://doi.org/10.1007/s11704-023-2662-3

RESEARCH ARTICLE

Neural partially linear additive model

Liangxuan ZHU¹, Han LI¹(

), Xuelin ZHANG¹, Lingjuan WU¹, Hong CHEN^1,^2,^3,⁴

¹. College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
². Engineering Research Center of Intelligent Technology for Agriculture (Ministry of Education), Huazhong Agricultural University, Wuhan 430070, China
³. Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Shenzhen 518000, China
⁴. Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518000, China

Download: PDF(8741 KB) HTML
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks

Abstract

Interpretability has drawn increasing attention in machine learning. Most works focus on post-hoc explanations rather than building a self-explaining model. So, we propose a Neural Partially Linear Additive Model (NPLAM), which automatically distinguishes insignificant, linear, and nonlinear features in neural networks. On the one hand, neural network construction fits data better than spline function under the same parameter amount; on the other hand, learnable gate design and sparsity regular-term maintain the ability of feature selection and structure discovery. We theoretically establish the generalization error bounds of the proposed method with Rademacher complexity. Experiments based on both simulations and real-world datasets verify its good performance and interpretability.

Keywords feature selection structure discovery partially linear additive model neural network

Corresponding Author(s): Han LI

Just Accepted Date: 04 July 2023 Issue Date: 07 October 2023

Cite this article:

Liangxuan ZHU,Han LI,Xuelin ZHANG, et al. Neural partially linear additive model[J]. Front. Comput. Sci., 2024, 18(6): 186334.

URL:

https://academic.hep.com.cn/fcs/EN/10.1007/s11704-023-2662-3
https://academic.hep.com.cn/fcs/EN/Y2024/V18/I6/186334

Tab.1 Properties of different methods

Fig.1 NPLAM architecture. Orange represents nonlinear features, blue represents linear features, and green represents irrelevant features.

g 1

represent the feature selection gate and

g 2

represent the structure discovery gate. Note that NPLAM does not require the prior knowledge of the feature and nonlinear model is estimated by neural network with random weights

Simulation
Methods	TP( $↑$ )	F1( $↑$ )	CF( $↑$ )	MSE(STD)( $↓$ )
Lasso	4.1	0.759	-	0.0394(±0.0022)
NAM	3.7	0.407	-	0.0122(±0.0017)
SNAM	5.0	1.000	-	0.0037(±0.0009)
FCNN	3.3	0.733	-	0.0360(±0.0029)
FCNN( $l 1$ )	3.4	0.739	-	0.0297(±0.0032)
LassoNet	3.8	0.835	-	0.0327(±0.0024)
SpAM	5.0	1.000	-	0.0226(±0.0039)
SPINN	3.5	0.805	-	0.0294(±0.0015)
SPLAM	3.1	0.765	0.900	0.1494(±0.0133)
SPLAT	4.3	0.925	0.943	0.0178(±0.0040)
NPLAM	5.0	1.000	1.000	0.0025(±0.0002)

Tab.2 Part of results of simulation dataset

Fig.2 True and estimated function for simulation. Ground truth function is the solid red line, estimated component function by NPLAM is the dashed blue line, estimated component function by SNAM is the dashed green line, and estimated component function by NAM is the dashed orange line

Result of $R 2$
Methods	$x 1$ ( $↑$ )	$x 2$ ( $↑$ )	$x 3$ ( $↑$ )	$x 4$ ( $↑$ )	$x 5$ ( $↑$ )
NAM	0.9023	0.9674	0.9823	0.9735	0.9985
SNAM	0.8980	0.9139	0.9386	0.9925	0.9998
NPLAM	0.9788	0.9993	0.9867	0.9997	0.9998

Tab.3 Result of

R 2

. Quantitative analysis of NAM, SNAM, and NPLAM on the 30-dimension simulation dataset, with the metric of the coefficient of determination

R 2

between the estimated component function of each feature and their corresponding ground truth

Fig.3 The convergence path of double gates under simulation dataset.

g 1

represents feature selection gate.

g 2

represents structure discovery gate. Colored lines represent important features. Black and gray lines represent unimportant features

Fig.4 Comparative experiment results of FCNN, FCNN(

l 1

), LassoNet, SPINN, NAM, SNAM, and NPLAM under different floating point operations(FLOPs). FLOPs are obtained by computing the multiplication and addition during the forward propagation

Fig.5 Effects of varying sample size.

m

represents the sample size which used for training and the value of

m

satisfies

1 / m =

{0.003, 0.005, 0.01, 0.02, 0.03, 0.04, 0.05}

Simulation
Methods	TP( $↑$ )	F1( $↑$ )	CF( $↑$ )	MSE(STD)( $↓$ )
Lasso	4.0	0.889	-	0.0290(±0.0014)
NAM	4.6	0.102	-	0.0057(±0.0012)
SNAM	5.0	1.000	-	0.0032(±0.0002)
FCNN	3.9	0.830	-	0.0180(±0.0019)
FCNN( $l 1$ )	2.8	0.659	-	0.0167(±0.0025)
LassoNet	3.4	0.489	-	0.0268(±0.0011)
SpAM	5.0	1.000	-	0.0149(±0.0023)
SPINN	1.9	0.487	-	0.0357(±0.0024)
SPLAM	4.1	0.765	0.990	0.0978(±0.0555)
SPLAT	3.2	0.889	0.990	0.0245(±0.0007)
NPLAM	5.0	1.000	1.000	0.0024(±0.0002)

Tab.4 Part of results of 300-dimension simulation dataset

Fig.6 The convergence path of double gates under 300-dimnension dataset

Fig.7 The convergence paths of different double gates

g 1

and

g 2

over training epochs in 30-dimension dataset. Here,

g 1

represents feature selection gates,

g 2

represents structure discovery gates. The important features

x 1

x 2

x 3

x 4

and

x 5

are highlighted with colorful curves, and

x 6 ～ x 30

are shown in black/gray curves (mean: black curves; range: gray area). (a) represent the convergence paths of double gates, which are generated from experiment with the regularization term

λ 1 ‖ g 1 ‖ 1 + λ 2 ‖ g 2 ‖ 1 + λ 3 ‖ A ‖ 1

. (b) represent the convergence paths of double gates, which are generated from experiment with the regularization term

λ 2 ‖ g 2 ‖ 1 + λ 3 ‖ A ‖ 1

. (c) represent the convergence paths of double gates, which are generated from experiment with the regularization term

λ 1 ‖ g 1 ‖ 1 + λ 3 ‖ A ‖ 1

. (d) represent the convergence paths of double gates, which are generated from experiment with the regularization term

λ 3 ‖ A ‖ 1

Gates	F1( $↑$ )	CF( $↑$ )	MSE(STD)( $↓$ )
No double gates	0.286	0.100	0.002587(±4.56E-5)
Only $g 1$	1.000	0.100	0.002565(±2.86E-5)
Only $g 2$	0.286	1.000	0.002565(±5.78E-5)
Double gates	1.000	1.000	0.002527(±5.80E-5)

Tab.5 The result of different double gates

g 1

and

g 2

g 1

represents feature selection gate,

g 2

represents structure discovery gate

Fig.8 The convergence paths of NPLAM’s double gates under the different hyperparameters

Fig.9 The convergence paths of the double gates

g 1

and

g 2

over training epochs with three ablation experiments. Here,

g 1

represents feature selection gates,

g 2

represents structure discovery gates. (a) show the result of full linear feature dataset. (b) show the result of full nonlinear feature dataset. (c) show the result of partially linear feature dataset. In (a), the important linear features

x 1

x 2

x 3

x 4

and

x 5

are highlighted with colorful curves, and others are shown in black/gray curves. In (b), the important nonlinear features

x 1

x 2

x 3

x 4

and

x 5

are highlighted with colorful curves, and others are shown in black/gray curves. And in (c), the linear features

x 1 ～ x 15

are blue curves and the nonlinear features

x 16 ～ x 30

are orange curves

California Housing dataset
Methods				LON		LAT	HMA	TR	TB	POP	HH	MI	MSE(STD)( $↓$ )
Methods for feature selection		Lasso		√		√	√	-	-	-	-	√	0.1073(±0.0021)
		NAM		√		√	√	-	√	√	-	√	0.0645(±0.0018)
		SNAM		√		√	-	-	√	√	-	√	0.0630(±0.0022)
		FCNN		√		√	-	-	√	√	-	-	0.0493(±0.0039)
		FCNN(l₁)		√		√	-	-	√	-	-	√	0.0555(±0.0033)
		LassoNet		√		√	-	-	-	-	-	√	0.0774(±0.0046)
		SpAM		√		√	-	-	√	√	-	√	0.0792(±0.0032)
		SPINN		√		√	-	-	-	-	-	√	0.0980(±0.0031)
Methods for feature selection & structure discovery		SPLAM		-		√(L)	-	√(N)	√(L)	√(L)	-	√(L)	0.2521(±0.0084)
		SPLAT		√(N)		-	√(L)	-	-	-	-	√(L)	0.4136(±0.0187)
		NPLAM		√(N)		√(N)	-	-	√(L)	√(N)	-	√(L)	0.0628(±0.0022)

Super-Conductivity dataset
Methods				RAR		SAR		WET		WMV		WGV		MSE(STD)( $↓$ )
Methods for feature selection		Lasso		√		-		-		√		√		0.0563( $± 0.0013$ )
		NAM		√		-		-		-		√		0.0332( $± 0.0086$ )
		SNAM		√		√		√		√		-		0.0331(±0.0075)
		FCNN		√		-		√		-		√		0.0227(±0.0016)
		FCNN( $l 1$ )		√		√		√		-		√		0.0219( $± 0.0017$ )
		LassoNet		√		-		-		-		√		0.0326( $± 0.0037$ )
		SpAM		√		-		√		√		√		0.0406( $± 0.0079$ )
		SPINN		√		-		√		√		√		0.0345( $± 0.0007$ )
Methods for feature selection & structure discovery		SPLAM		√(N)		√(L)		-		√(N)		-		0.1309( $± 0.0096$ )
		SPLAT		√(L)		-		-		√(L)		√(L)		0.0864(±0.0595)
		NPLAM		√(N)		√(L)		√(N)		√(N)		√(N)		0.0313( $± 0.0009$ )

Beijing Air Quality dataset
Methods		PM10	$S O 2$	$N O 2$	$C O$	$O 3$	TEM	PRE	DEW	RA	WSP	WD	MSE(STD)( $↓$ )
Methods for feature selection	Lasso	√	-	√	√	-	-	-	-	-	-	-	0.0089(±0.0004)
	NAM	√	-	√	√	√	√	√	√	-	√	√	0.0053(±0.0004)
	SNAM	√	-	-	√	-	-	-	√	-	-	-	0.0049(±0.0005)
	FCNN	√	√	√	√	√	√	√	√	-	√	-	0.0028(±0.0002)
	FCNN( $l 1$ )	√	√	√	√	-	√	-	√	-	-	-	0.0031(±0.0002)
	LassoNet	√	-	√	√	-	-	-	-	-	-	-	0.0053(±0.0007)
	SpAM	√	-	-	√	-	-	-	√	-	-	-	0.0047(±0.0025)
	SPINN	√	-	√	√	-	√	√	√	-	-	√	0.0117(±0.0012)
Methods for feature selection & structure discovery	SPLAM	√(N)	√(L)	√(N)	√(L)	√(L)	-	-	-	√(L)	√(L)	√(L)	0.1241(±0.0158)
	SPLAT	√(N)	-	√(N)	√(L)	-	√(N)	√(L)	√(L)	-	-	√(L)	0.0741(±0.0024)
	NPLAM	√(N)	-	√(L)	√(N)	√(N)	√(L)	-	√(N)	-	-	-	0.0045(±0.0001)

Boston Housing dataset
Methods		CRIM	ZN	INDUS	CHAS	NOX	RM	AGE	DIS	RAD	TAX	PTRATIO	B	LSTAR	MSE(STD)( $↓$ )
Methods for feature selection	Lasso	-	-	-	√	-	√	-	√	-	√	√	√	√	0.0515(±0.0070)
	NAM	√	-	√	-	√	√	-	√	-	√	√	-	√	0.0366(±0.0041)
	SNAM	-	-	-	-	-	-	-	-	-	-	-	-	-	0.0352(±0.0053)
	FCNN	√	-	-	-	-	√	√	√	-	√	√	-	√	0.0407(±0.0036)
	FCNN( $l 1$ )	-	-	-	-	-	√	-	-	√	-	√	-	√	0.0415(±0.0089)
	LassoNet	-	-	-	-	-	√	-	-	-	√	√	-	√	0.0377(±0.0043)
	SpAM	√	-	-	-	√	√	-	√	-	-	√	-	√	0.0456(±0.0197)
	SPINN	-	-	-	√	-	√	-	√	-	-	√	√	√	0.0504(±0.0084)
Methods for feature selection & structure discovery	SPLAM	√(L)	√(L)	-	-	-	√(N)	√(L)	-	√(N)	√(L)	-	√(N)	√(N)	0.1781(±0.0650)
	SPLAT	√(L)	√(N)	√(N)	-	-	√(L)	√(N)	√(N)	√(L)	-	-	√(N)	√(N)	0.2820(±0.0107)
	NPLAM	√(N)	-	-	-	√(N)	√(N)	-	√(N)	√(L)	√(L)	√(L)	-	√(N)	0.0342(±0.0075)

Tab.6 Experimental results of real-world datasets

Methods	Regularization*	Parameters	Search range of initial coarse grids
Lasso	√	Coefficient of lasso penalty term.	${10 ? 5, 10 ? 4, 10 ? 3, 10 ? 2, 10 ? 1, 100}$
NAM	-	Learning rate.	${10 ? 4, 10 ? 3, 10 ? 2, 10 ? 1, 100}$
SNAM	√	Coefficient of $l 1$ penalty term for last hidden layer weight.	${10 ? 4, 10 ? 3, 10 ? 2, 10 ? 1, 100, 101}$
SNAM	√	Learning rate.	${10 ? 4, 10 ? 3, 10 ? 2, 10 ? 1, 100}$
FCNN	-	Learning rate.	${10 ? 4, 10 ? 3, 10 ? 2, 10 ? 1, 100}$
FCNN( $l 1$ )	√	Coefficient of $l 1$ penalty term for first hidden layer weight.	${10 ? 4, 10 ? 3, 10 ? 2, 10 ? 1, 100, 101}$
FCNN( $l 1$ )	√	Learning rate.	${10 ? 5, 10 ? 4, 10 ? 3, 10 ? 2, 10 ? 1, 100}$
SPINN	√	Coefficient of sparse group lasso penalty term.	${10 ? 4, 10 ? 3, 10 ? 2, 10 ? 1, 100, 101}$
SPINN	√	Learning rate.	${10 ? 4, 10 ? 3, 10 ? 2, 10 ? 1, 100}$
SPLAM	√	Coefficient of SPLAM penalty term.	${10 ? 5, 10 ? 4, 10 ? 3, 10 ? 2, 10 ? 1, 100}$
SPLAT	√	Alpha which controls the strength of the linear fit.	${0.1, 0.3, 0.5, 0.7, 0.9}$
SPLAT	√	Number of lambda values.	${5, 10, 15, 20, 25, 30}$
NPLAM(our)	√	Coefficient of $l 1$ penalty term for the feature selection gates $g 1$ .	${10 ? 4, 10 ? 3, 10 ? 2, 10 ? 1, 100, 101}$
		Coefficient of $l 1$ penalty term for the structure discovery gates $g 2$ .	${10 ? 4, 10 ? 3, 10 ? 2, 10 ? 1, 100, 101}$
		Coefficient of $l 1$ penalty term for the weight of neural network.	${10 ? 4, 10 ? 3, 10 ? 2, 10 ? 1, 100, 101}$
		Leaning rate.	${10 ? 4, 10 ? 3, 10 ? 2, 10 ? 1, 100}$

Table A1 Search Range of Initial Coarse Grids for different methods

Table A2 Confusion matrix

30-dimension simulation
Methods					$x 1$ (N)	$x 2$ (N)	$x 3$ (N)	$x 4$ (L)	$x 5$ (L)	Other(-)
Methods for feature selection	Lasso				√	√	-	√	√	√
	NAM				√	√	√	√	-	√
	SNAM				√	√	√	√	√	-
	FCNN				-	√	√	√	√	√
	FCNN( $l 1$ )				-	√	√	-	√	√
	LassoNet				-	√	√	√	√	-
	SpAM				√	√	√	√	√	-
	SPINN				-	√	-	√	√	-
Methods for feature selection & structure discovery	SPLAM				√(N)	√(L)	-	√(L)	√(L)	√
	SPLAT				√(N)	√(L)	-	√(L)	√(L)	√
	NPLAM				√(N)	√(N)	√(N)	√(L)	√(L)	-

30-dimension simulation
Methods		TP( $↑$ )	TN( $↑$ )	FP( $↓$ )	FN( $↓$ )	F1( $↑$ )	CF( $↑$ )	UF( $↓$ )	OF( $↓$ )	MSE(STD)( $↓$ )
Methods for feature selection	Lasso	4.1	23.3	1.7	0.9	0.759	-	-	-	0.0394(±0.0022)
	NAM	3.7	15.5	9.5	1.3	0.407	-	-	-	0.0122(±0.0017)
	SNAM	5.0	25.0	0.0	0.0	1.000	-	-	-	0.0037(±0.0009)
	FCNN	3.3	24.3	0.7	1.7	0.733	-	-	-	0.0360(±0.0029)
	FCNN( $l 1$ )	3.4	24.2	0.8	1.6	0.739	-	-	-	0.0297(±0.0032)
	LassoNet	3.8	24.7	0.3	1.2	0.835	-	-	-	0.0327(±0.0024)
	SpAM	5.0	25.0	0.0	0.0	1.000	-	-	-	0.0226(±0.0039)
	SPINN	3.5	24.8	0.2	1.5	0.805	-	-	-	0.0294(±0.0015)
Methods for feature selection & structure discovery	SPLAM	3.1	25	0.0	1.9	0.765	0.900	0.100	0.000	0.1494(±0.0133)
	SPLAT	4.3	25	0	0.7	0.925	0.943	0.057	0.000	0.0178(±0.0040)
	NPLAM	5.0	25.0	0.0	0.0	1.000	1.000	0.000	0.000	0.0025(±0.0002)

300-dimension simulation
Methods					$x 1$ (N)	$x 2$ (N)	$x 3$ (N)	$x 4$ (L)	$x 5$ (L)	Other(-)
Methods for feature selection	Lasso				√	√	-	√	√	-
	NAM				√	√	√	√	√	√
	FCNN				-	√	√	√	√	√
	FCNN( $l 1$ )				-	√	√	-	√	-
	LassoNet				√	√	-	√	√	√
	SpAM				√	√	√	√	√	-
	SPINN				-	√	-	√	√	-
Methods for feature selection & structure discovery	SPLAM				√(N)	√(L)	-	√(L)	√(L)	√
	SPLAT				√(L)	√(L)	-	√(L)	√(L)	-
	NPLAM				√(N)	√(N)	√(N)	√(L)	√(L)	-

300-dimension simulation
Methods		TP( $↑$ )	TN( $↑$ )	FP( $↓$ )	FN( $↓$ )	F1( $↑$ )	CF( $↑$ )	UF( $↓$ )	OF( $↓$ )	MSE(STD)( $↓$ )
Methods for feature selection	Lasso	4.0	295.0	0.0	1.0	0.889	-	-	-	0.0290(±0.0014)
	NAM	4.6	214.5	80.5	0.4	0.102	-	-	-	0.0057(±0.0012)
	SNAM	5.0	295.0	0.0	0.0	1.000	-	-	-	0.0032(±0.0002)
	FCNN	3.9	294.5	0.5	1.1	0.830	-	-	-	0.0180(±0.0019)
	FCNN( $l 1$ )	2.8	294.3	0.7	2.2	0.659	-	-	-	0.0167(±0.0025)
	LassoNet	3.4	289.5	5.5	1.6	0.489	-	-	-	0.0268(±0.0011)
	SpAM	5.0	295.0	0.0	0.0	1.000	-	-	-	0.0149(±0.0023)
	SPINN	1.9	294.1	0.9	3.1	0.487	-	-	-	0.0357(±0.0024)
Methods for feature selection & structure discovery	SPLAM	4.1	294.9	0.1	0.9	0.756	0.990	0.007	0.003	0.0978(±0.0555)
	SPLAT	3.2	295	0	0.8	0.889	0.990	0.010	0.000	0.0245(±0.0007)
	NPLAM	5.0	295.0	0.0	0.0	1.000	1.000	0.000	0.000	0.0024(±0.0002)

Table A3 Detailed results of simulation

1	C, Rudin C, Chen Z, Chen H, Huang L, Semenova C Zhong . Interpretable machine learning: fundamental principles and 10 grand challenges. Statistics Surveys, 2022, 16: 1–85
2	M, Du N, Liu X Hu . Techniques for interpretable machine learning. Communications of the ACM, 2019, 63( 1): 68–77
3	C Rudin . Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 2019, 1( 5): 206–215
4	W, Härdle H, Liang J T Gao . Partially Linear Models. Heidelberg: Physica, 2000
5	Q, Xie J Liu . Combined nonlinear effects of economic growth and urbanization on CO2 emissions in China: evidence from a panel data partially linear additive model. Energy, 2019, 186: 115868
6	J H, Shim Y K Lee . Generalized partially linear additive models for credit scoring. The Korean Journal of Applied Statistics, 2011, 24( 4): 587–595
7	M, Kazemi D, Shahsavani M Arashi . Variable selection and structure identification for ultrahigh-dimensional partially linear additive models with application to cardiomyopathy microarray data. Statistics, Optimization & Information Computing, 2018, 6( 3): 373–382
8	H H, Zhang G, Cheng Y Liu . Linear or nonlinear? Automatic structure discovery for partially linear models. Journal of the American Statistical Association, 2011, 106( 495): 1099–1112
9	P, Du G, Cheng H Liang . Semiparametric regression models with additive nonparametric components and high dimensional parametric components. Computational Statistics & Data Analysis, 2012, 56( 6): 2006–2017
10	J, Huang F, Wei S Ma . Semiparametric regression pursuit. Statistica Sinica, 2012, 22( 4): 1403–1426
11	Y, Lou J, Bien R, Caruana J Gehrke . Sparse partially linear additive models. Journal of Computational and Graphical Statistics, 2016, 25( 4): 1126–1140
12	A, Petersen D Witten . Data-adaptive additive modeling. Statistics in Medicine, 2019, 38( 4): 583–600
13	V, Sadhanala R J Tibshirani . Additive models with trend filtering. The Annals of Statistics, 2019, 47( 6): 3032–3068
14	R, Agarwal L, Melnick N, Frosst X, Zhang B, Lengerich R, Caruana G E Hinton . Neural additive models: Interpretable machine learning with neural nets. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. 2021, 4699–4711
15	J A, Nelder R W M Wedderburn . Generalized linear models. Journal of the Royal Statistical Society. Series A (General), 1972, 135( 3): 370–384
16	T, Hastie R Tibshirani . Generalized additive models. Statistical Science, 1986, 1( 3): 297–310
17	R Tibshirani . Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 1996, 58( 1): 267–288
18	P, Ravikumar J, Lafferty H, Liu L Wasserman . Sparse additive models. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 2009, 71( 5): 1009–1030
19	S Y, Xu Z Q, Bu P, Chaudhari I J Barnett . Sparse neural additive model: Interpretable deep learning with feature selection via group sparsity. In: Proceedings of ICLR 2022 PAIR2Struct Workshop. 2022
20	J, Feng N Simon . Sparse-input neural networks for high-dimensional nonparametric regression and classification. 2017, arXiv preprint arXiv: 1711.07592v1
21	I, Lemhadri F, Ruan L, Abraham R Tibshirani . Lassonet: A neural network with feature sparsity. The Journal of Machine Learning Research, 2021, 22( 1): 127
22	X, Wang H, Chen J, Yan K, Nho S L, Risacher A J, Saykin L, Shen H, Huang ADNI . Quantitative trait loci identification for brain endophenotypes via new additive model with random networks. Bioinformatics, 2018, 34( 17): i866–i874
23	V, Nair G E Hinton . Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on International Conference on Machine Learning. 2010, 807–814
24	P J Huber . Robust estimation of a location parameter. In: Kotz S, Johnson N L, eds. Breakthroughs in statistics: Methodology and Distribution. New York: Springer, 1992, 492–518
25	Lu Y Y, Fan Y, Lv J, Noble W S. DeepPINK: reproducible feature selection in deep neural networks. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018, 8690–8700
26	Kingma D P, Ba J. Adam: A method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations. 2015
27	N, Golowich A, Rakhlin O Shamir . Size-independent sample complexity of neural networks. In: Proceedings of Conference on Learning Theory. 2018, 297–299
28	C McDiarmid . On the method of bounded differences. In: Siemons J, ed. Surveys in Combinatorics. Cambridge: Cambridge University Press, 1989, 148–188
29	H, Chen Y, Wang F, Zheng C, Deng H Huang . Sparse modal additive model. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32( 6): 2373–2387
30	X, Wang H, Chen W, Cai D, Shen H Huang . Regularized modal regression with applications in cognitive impairment prediction. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 1447–1457
31	F, Cucker D X Zhou . Learning Theory: An Approximation Theory Viewpoint. Cambridge: Cambridge University Press, 2007
32	Q, Wu Y, Ying D X Zhou . Learning rates of least-square regularized regression. Foundations of Computational Mathematics, 2006, 6( 2): 171–192
33	A Krogh . What are artificial neural networks?. Nature Biotechnology, 2008, 26( 2): 195–197
34	A Y Ng . Feature selection, L1 vs. L2 regularization, and rotational invariance. In: Proceedings of the 21st International Conference on Machine Learning. 2004, 78
35	L, Yang S, Lv J Wang . Model-free variable selection in reproducing kernel Hilbert space. The Journal of Machine Learning Research, 2016, 17( 1): 2885–2908
36	Aygun R C, Yavuz A G. Network anomaly detection with stochastically improved autoencoder based models. In: Proceedings of the 4th IEEE International Conference on Cyber Security and Cloud Computing. 2017, 193–198
37	D, Chicco M J, Warrens G Jurman . The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Computer Science, 2021, 7: e623
38	Y, Lin Y, Tu Z Dou . An improved neural network pruning technology for automatic modulation classification in edge devices. IEEE Transactions on Vehicular Technology, 2020, 69( 5): 5703–5706
39	R K, Pace R Barry . Sparse spatial autoregressions. Statistics & Probability Letters, 1997, 33( 3): 291–297
40	K Hamidieh . A data-driven statistical model for predicting the critical temperature of a superconductor. Computational Materials Science, 2018, 154: 346–354
41	S, Zhang B, Guo A, Dong J, He Z, Xu S X Chen . Cautionary tales on air-quality improvement in Beijing. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2017, 473( 2205): 20170457
42	D Jr, Harrison D L Rubinfeld . Hedonic housing prices and the demand for clean air. Journal of Environmental Economics and Management, 1978, 5( 1): 81–102
43	L, Buitinck G, Louppe M, Blondel F, Pedregosa A, Müeller O, Grisel V, Niculae P, Prettenhofer A, Gramfort J, Grobler R, Layton J, VanderPlas A, Joly B, Holt G Varoquaux . API design for machine learning software: experiences from the scikit-learn project. 2013, arXiv preprint arXiv: 1309.0238
44	A, Asuncion D J Newman . UCI machine learning repository. Irvine: Irvine University of California, 2017
45	E, Hazan K Singh . Boosting for online convex optimization. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 4140–4149
46	N Couellan . Probabilistic robustness estimates for feed-forward neural networks. Neural Networks, 2021, 142: 138–147
47	A V, Konstantinov L V Utkin . Interpretable machine learning with an ensemble of gradient boosting machines. Knowledge-Based Systems, 2021, 222: 106993
48	Y F, Xing Y H, Xu M H, Shi Y X Lian . The impact of PM2.5 on the human respiratory system. Journal of Thoracic Disease, 2016, 8( 1): E69–E74
49	N, Oune R Bostanabad . Latent map Gaussian processes for mixed variable metamodeling. Computer Methods in Applied Mechanics and Engineering, 2021, 387: 114128
50	A, Bekkar B, Hssina S, Douzi K Douzi . Air-pollution prediction in smart city, deep learning approach. Journal of Big Data, 2021, 8( 1): 161

[1]

FCS-22662-OF-LZ_suppl_1

Download

[1]	Junfei TANG, Ran SONG, Yuxin HUANG, Shengxiang GAO, Zhengtao YU. Semantic-aware entity alignment for low resource language knowledge graph[J]. Front. Comput. Sci., 2024, 18(4): 184319-.
[2]	Yan LIN, Jiashu WANG, Xiaowei LIU, Xueqin XIE, De WU, Junjie ZHANG, Hui DING. A computational model to identify fertility-related proteins using sequence information[J]. Front. Comput. Sci., 2024, 18(1): 181902-.
[3]	Miao ZHANG, Tingting HE, Ming DONG. Meta-path reasoning of knowledge graph for commonsense question answering[J]. Front. Comput. Sci., 2024, 18(1): 181303-.
[4]	Yamin HU, Hao JIANG, Zongyao HU. Measuring code maintainability with deep neural networks[J]. Front. Comput. Sci., 2023, 17(6): 176214-.
[5]	Shuo TAN, Lei ZHANG, Xin SHU, Zizhou WANG. A feature-wise attention module based on the difference with surrounding features for convolutional neural networks[J]. Front. Comput. Sci., 2023, 17(6): 176338-.
[6]	Yongquan LIANG, Qiuyu SONG, Zhongying ZHAO, Hui ZHOU, Maoguo GONG. BA-GNN: Behavior-aware graph neural network for session-based recommendation[J]. Front. Comput. Sci., 2023, 17(6): 176613-.
[7]	Jinwei LUO, Mingkai HE, Weike PAN, Zhong MING. BGNN: Behavior-aware graph neural network for heterogeneous session-based recommendation[J]. Front. Comput. Sci., 2023, 17(5): 175336-.
[8]	Mingzhao WANG, Henry HAN, Zhao HUANG, Juanying XIE. Unsupervised spectral feature selection algorithms for high dimensional data[J]. Front. Comput. Sci., 2023, 17(5): 175330-.
[9]	Yuan GAO, Xiang WANG, Xiangnan HE, Huamin FENG, Yongdong ZHANG. Rumor detection with self-supervised learning on texts and social graph[J]. Front. Comput. Sci., 2023, 17(4): 174611-.
[10]	Shuang LIU, Fan ZHANG, Baiyang ZHAO, Renjie GUO, Tao CHEN, Meishan ZHANG. APPCorp: a corpus for Android privacy policy document structure analysis[J]. Front. Comput. Sci., 2023, 17(3): 173320-.
[11]	Zhe XUE, Junping DU, Xin XU, Xiangbin LIU, Junfu WANG, Feifei KOU. Few-shot node classification via local adaptive discriminant structure learning[J]. Front. Comput. Sci., 2023, 17(2): 172316-.
[12]	Zhen WU, Xinyu DAI, Rui XIA. Pairwise tagging framework for end-to-end emotion-cause pair extraction[J]. Front. Comput. Sci., 2023, 17(2): 172314-.
[13]	Momo MATSUDA, Yasunori FUTAMURA, Xiucai YE, Tetsuya SAKURAI. Distortion-free PCA on sample space for highly variable gene detection from single-cell RNA-seq data[J]. Front. Comput. Sci., 2023, 17(1): 171310-.
[14]	Hongjia RUAN, Huihui SONG, Bo LIU, Yong CHENG, Qingshan LIU. Intellectual property protection for deep semantic segmentation models[J]. Front. Comput. Sci., 2023, 17(1): 171306-.
[15]	Tian WANG, Jiakun LI, Huai-Ning WU, Ce LI, Hichem SNOUSSI, Yang WU. ResLNet: deep residual LSTM network with longer input for action recognition[J]. Front. Comput. Sci., 2022, 16(6): 166334-.

Viewed

Full text

Abstract

Cited

Shared

Discussed