Towards kernelizing the classifier for hyperbolic data

doi:10.1007/s11704-022-2457-y

Front. Comput. Sci.

2024, Vol. 18

Issue (1) : 181301 https://doi.org/10.1007/s11704-022-2457-y

Artificial Intelligence

Towards kernelizing the classifier for hyperbolic data

Meimei YANG^1,², Qiao LIU^1,², Xinkai SUN^1,², Na SHI^1,², Hui XUE^1,²(

)

¹. School of Computer Science and Engineering, Southeast University, Nanjing 210096, China
². MOE Key Laboratory of Computer Science and Information Integration (Southeast University), Nanjing 210096, China

Download: PDF(7817 KB) HTML
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks

Abstract

Data hierarchy, as a hidden property of data structure, exists in a wide range of machine learning applications. A common practice to classify such hierarchical data is first to encode the data in the Euclidean space, and then train a Euclidean classifier. However, such a paradigm leads to a performance drop due to distortion of data embedding in the Euclidean space. To relieve this issue, hyperbolic geometry is investigated as an alternative space to encode the hierarchical data for its higher ability to capture the hierarchical structures. Those methods cannot explore the full potential of the hyperbolic geometry, in the sense that such methods define the hyperbolic operations in the tangent plane, causing the distortion of data embeddings. In this paper, we develop two novel kernel formulations in the hyperbolic space, with one being positive definite (PD) and another one being indefinite, to solve the classification tasks in hyperbolic space. The PD one is defined via mapping the hyperbolic data to the Drury-Arveson (DA) space, which is a special reproducing kernel Hilbert space (RKHS). To further increase the discrimination of the classifier, an indefinite kernel is further defined in the Kreĭn spaces. Specifically, we design a 2-layer nested indefinite kernel which first maps hyperbolic data into the DA spaces, followed by a mapping from the DA spaces to the Kreĭn spaces. Extensive experiments on real-world datasets demonstrate the superiority of the proposed kernels.

Keywords data hierarchy hyperbolic geometry drury-arveson space kreĭn space

Corresponding Author(s): Hui XUE

About author:

Changjian Wang and Zhiying Yang contributed equally to this work.

Just Accepted Date: 26 October 2022 Issue Date: 27 February 2023

Cite this article:

Meimei YANG,Qiao LIU,Xinkai SUN, et al. Towards kernelizing the classifier for hyperbolic data[J]. Front. Comput. Sci., 2024, 18(1): 181301.

URL:

https://academic.hep.com.cn/fcs/EN/10.1007/s11704-022-2457-y
https://academic.hep.com.cn/fcs/EN/Y2024/V18/I1/181301

Fig.1 The 7-layer tree encoded in hyperbolic space

Fig.2 The 4-layer tree encoded in hyperbolic space

Fig.3 The 4-layer tree encoded in Euclidean space

Notations	Mathematical meaning
$H c n$	$n$ -dimensional hyperbolic space with curvature $c$
$R n$	$n$ -dimensional Euclidean space
$R$	$1$ -dimensional Euclidean space
$L n$	$n$ -dimensional Lorentz model with curvature ?1
$P c n$	$n$ -dimensional Poincaré ball model with curvature c
$P n$	$n$ -dimensional Poincaré ball model with curvature ?1
$B n$	$n$ -dimensional open unit ball
$K$	Kre?n space with indefinite inner product
$H +, H ?$	Hilbert spaces, where $H +, H ?$ span the Kre?n space $K$
$D A n$	$n$ -dimensional Drury-Arveson space
$(x i, y i)$	data set, where $x i ∈ P n$ and $y i$ is the label of $x i$ $(1 ≤ i ≤ m)$
$K$	kernel matrix with element $K (i, j) = k h s i g (x i, x j)$ $(1 ≤ i ≤ m, 1 ≤ j ≤ m)$
$K +, K ?$	kernel matrices, where $K = K + ? K ?$
$x, z$	feature vectors
$w$	the normal vector which determines the direction of the classification hyperplane in the SVM
$b$	the bias which determines the distance between the classification hyperplane and the origin
$ξ i$	the slack variables of the SVM $(1 ≤ i ≤ m)$
$f, f +, f ?$	functions in the Kre?n space $K$ , Hilbert spaces $H +$ and $H ?$
$? (?)$	mapping function, where $? (x)$ maps $x$ from the original space $P n$ into a new feature space
$k (?, ?)$	kernel function $k : P n × P n → R$ , where $k (x, z)$ returns $? ? (x), ? (z) ?$ .
$k d a (?, ?)$	DA kernel function $k d a : P n × P n → R$ , where $k d a (x, z) = (1 ? ? x, z ?) ? 1$
$δ d a (?, ?)$	function $δ d a$ : $D A n × D A n → R$ , where $δ d a (x, z)$ returns the metric value between $x$ and $z$ in $D A n$
$ρ (?, ?)$	function $ρ$ : $P n × P n → R$ , where $ρ (x, z)$ returns the pseudohyperbolic metric between $x$ and $z$ in $P n$
$k d a s (?, ?)$	DA-Sigmoid kernel function $k d a s$ : $P n × P n → R$ , where $k d a s (x, z) = tanh ? (γ 1 ? ? x, z ? + θ)$ , $γ > 0$ and $θ < 0$
$φ (?)$	mapping function $φ$ : $P n → K$ , where $x ∈ P n$ and $φ (x) ∈ K$
$φ 1 (?)$	mapping function $φ 1$ : $P n → D A n$ , where $x ∈ P n$ and $φ 1 (x) ∈ D A n$
$φ 2 (?)$	mapping function $φ 2$ : $D A n → K$ , where $x ∈ D A n$ and $φ 2 (x) ∈ K$

Tab.1 Notation table

Fig.4 2-dimensional Lorentz model located in the 3-dimensional Kre?n space

Fig.5 The mapping process form Poincaré ball to the Kre?n space induced from DA-Sigmoid kernel

Fig.6 Linear classification of the 2-dimensional dataset Amazon Electronics Computers in Poincaré ball model

Fig.7 Classification of the dataset Amazon Electronics Computers by kernel SVM with DA kernel

Dataset	Kernel
	Dim	Hyperbolic			Euclidean
	Dim	DA	H-Tangent	H-Linear	Linear
Facebook	#2	$81.1 ±$ 1.3	$62.1 ±$ $1.0 ?$	$61.5 ±$ $0.3 ?$	$62.3 ±$ $0.1 ?$
	#5	$85.5 ±$ 0.5	$64.4 ±$ $0.8 ?$	$62.6 ±$ $0.6 ?$	$66.1 ±$ $0.3 ?$
	#10	$85.7 ±$ 0.6	$73.5 ±$ $0.6 ?$	$66.5 ±$ $1.6 ?$	$77.1 ±$ $0.2 ?$
	#25	$85.3 ±$ 0.3	$87.5 ±$ $0.4 °$	$74.1 ±$ $2.2 ?$	$84.5 ±$ $0.2 ?$
Terrorist	#2	$58.7 ±$ 4.3	$50.0 ±$ $1.2 ?$	$53.1 ±$ $1.7 ?$	$50.7 ±$ $0.3 ?$
	#5	$68.1 ±$ 2.0	$49.0 ±$ $2.1 ?$	$51.1 ±$ $0.9 ?$	$52.8 ±$ $1.3 ?$
	#10	$69.3 ±$ 2.0	$51.2 ±$ $2.3 ?$	$53.4 ±$ $1.5 ?$	$60.4 ±$ $1.1 ?$
	#25	$68.2 ±$ 1.4	$52.5 ±$ $1.9 ?$	$51.1 ±$ $1.1 ?$	$61.8 ±$ $1.6 ?$
Wiki	#2	$53.5 ±$ 1.8	$17.4 ±$ $1.4 ?$	$21.9 ±$ $1.1 ?$	$19.3 ±$ $0.2 ?$
	#5	$66.3 ±$ 1.1	$41.1 ±$ $2.2 ?$	$31.4 ±$ $1.7 ?$	$37.8 ±$ $1.0 ?$
	#10	$65.7 ±$ 0.5	$47.8 ±$ $2.3 ?$	$32.5 ±$ $0.8 ?$	$53.4 ±$ $0.8 ?$
	#25	$66.7 ±$ 1.1	$62.4 ±$ $1.0 ?$	$41.0 ±$ $1.0 ?$	$62.0 ±$ $0.9 ?$
AC	#2	$69.4 ±$ 1.7	$68.6 ±$ 3.2	$56.1 ±$ $0.3 ?$	$56.7 ±$ $2.9 ?$
	#5	$83.8 ±$ 1.1	$78.4 ±$ $0.9 ?$	$64.4 ±$ $2.1 ?$	$70.3 ±$ $1.0 ?$
	#10	$86.6 ±$ 0.7	$82.0 ±$ $0.6 ?$	$71.9 ±$ $1.6 ?$	$79.7 ±$ $0.7 ?$
	#25	$86.1 ±$ 1.2	$85.8 ±$ 0.4	$77.0 ±$ $1.3 ?$	$81.7 ±$ $0.6 ?$
Cora ML	#2	$68.7 ±$ 0.6	$47.5 ±$ $0.7 ?$	$44.8 ±$ $1.4 ?$	$36.6 ±$ $1.7 ?$
	#5	$82.8 ±$ 0.6	$66.9 ±$ $0.9 ?$	$61.7 ±$ $1.6 ?$	$50.0 ±$ $0.3 ?$
	#10	$85.2 ±$ 0.4	$75.0 ±$ $0.8 ?$	$68.2 ±$ $1.6 ?$	$65.8 ±$ $0.4 ?$
	#25	$85.6 ±$ 0.3	$79.7 ±$ $0.3 ?$	$74.3 ±$ $1.3 ?$	$73.2 ±$ $0.3 ?$
Avg.ACC.		$75.1$	62.1	55.9	60.1
Top1 times		19	1	0	0

Tab.2 Average ACC of node classification on Facebook, Terrorist, Wiki, AC and Cora ML datasets. We use bold to indicate the best result. We use

? / °

behind the result of the algorithm when our method is significantly superior or inferior to the compared algorithms per case (pairwise

t