Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

   Online First

Administered by

, Volume 19 Issue 8

For Selected: View Abstracts Toggle Thumbnails
Artificial Intelligence
SharpSMT: a scalable toolkit for measuring solution spaces of SMT(LA) formulas
Cunjing GE
Front. Comput. Sci.. 2025, 19 (8): 198336-.  
https://doi.org/10.1007/s11704-024-40500-z

Abstract   HTML   PDF (1045KB)

In this paper, we present SHARPSMT, a toolkit for measuring solution spaces of SMT(LA) formulas which are Boolean combinations of linear arithmetic constraints, i.e., #SMT(LA) problems. It integrates SMT satisfiability solving algorithm with various polytope subroutines: volume computation, volume estimation, lattice counting, and approximate lattice counting. We propose a series of new polytope preprocessing techniques which have been implemented in SHARPSMT. Experimental results show that the new polytope preprocessing techniques are very effective, especially on application instances. We believe that SHARPSMT will be useful in a number of areas.

Figures and Tables | References | Supplementary Material | Related Articles | Metrics
Learning from shortcut: a shortcut-guided approach for explainable graph learning
Linan YUE, Qi LIU, Ye LIU, Weibo GAO, Fangzhou YAO
Front. Comput. Sci.. 2025, 19 (8): 198338-.  
https://doi.org/10.1007/s11704-024-40452-4

Abstract   HTML   PDF (2435KB)

The remarkable success in graph neural networks (GNNs) promotes the explainable graph learning methods. Among them, the graph rationalization methods draw significant attentions, which aim to provide explanations to support the prediction results by identifying a small subset of the original graph (i.e., rationale). Although existing methods have achieved promising results, recent studies have proved that these methods still suffer from exploiting shortcuts in the data to yield task results and compose rationales. Different from previous methods plagued by shortcuts, in this paper, we propose a Shortcut-guided Graph Rationalization (SGR) method, which identifies rationales by learning from shortcuts. Specifically, SGR consists of two training stages. In the first stage, we train a shortcut guider with an early stop strategy to obtain shortcut information. During the second stage, SGR separates the graph into the rationale and non-rationale subgraphs. Then SGR lets them learn from the shortcut information generated by the frozen shortcut guider to identify which information belongs to shortcuts and which does not. Finally, we employ the non-rationale subgraphs as environments and identify the invariant rationales which filter out the shortcuts under environment shifts. Extensive experiments conducted on synthetic and real-world datasets provide clear validation of the effectiveness of the proposed SGR method, underscoring its ability to provide faithful explanations.

Figures and Tables | References | Supplementary Material | Related Articles | Metrics
Adapting to the stream: an instance-attention GNN method for irregular multivariate time series data
Kun HAN, Abigail M Y KOAY, Ryan K L KO, Weitong CHEN, Miao XU
Front. Comput. Sci.. 2025, 19 (8): 198340-.  
https://doi.org/10.1007/s11704-024-40449-z

Abstract   HTML   PDF (1542KB)

Multivariate time series (MTS) data are vital for various applications, particularly in machine learning tasks. However, challenges such as sensor failures can result in irregular and misaligned data with missing values, thereby complicating their analysis. While recent advancements use graph neural networks (GNNs) to manage these Irregular Multivariate Time Series (IMTS) data, they generally require a reliable graph structure, either pre-existing or inferred from adequate data to properly capture node correlations. This poses a challenge in applications where IMTS data are often streamed and waiting for future data to estimate a suitable graph structure becomes impractical. To overcome this, we introduce a dynamic GNN model suited for streaming characteristics of IMTS data, incorporating an instance-attention mechanism that dynamically learns and updates graph edge weights for real-time analysis. We also tailor strategies for high-frequency and low-frequency data to enhance prediction accuracy. Empirical results on real-world datasets demonstrate the superiority of our proposed model in both classification and imputation tasks.

Figures and Tables | References | Supplementary Material | Related Articles | Metrics
Top Pass: improve code generation by pass@k-maximized code ranking
Zhicun LYU, Xinye LI, Zheng XIE, Ming LI
Front. Comput. Sci.. 2025, 19 (8): 198341-.  
https://doi.org/10.1007/s11704-024-40415-9

Abstract   HTML   PDF (1512KB)

Code generation has been greatly enhanced by the profound advancements in Large Language Models (LLMs) recently. Nevertheless, such LLM-based code generation approaches still struggle to generate error-free code in a few tries when faced with complex problems. To address this, the prevailing strategy is to sample a huge number of candidate programs, with the hope of any one in them could work. However, users of code generation systems usually expect to find a correct program by reviewing or testing only a small number of code candidates. Otherwise, the system would be unhelpful. In this paper, we propose Top Pass, a code ranking approach that identifies potential correct solutions from a large number of candidates. Top Pass directly optimizes the pass@k loss function, enhancing the quality at the top of the candidate list. This enables the user to find the correct solution within as few tries as possible. Experimental results on four benchmarks indicate that our Top Pass method enhances the usability of code generation models by producing better ranking results, particularly achieving a 32.9% relative improvement in pass@1 on CodeContests when compared to the state-of-the-art ranking method.

Figures and Tables | References | Related Articles | Metrics
EduStudio: towards a unified library for student cognitive modeling
Le WU, Xiangzhi CHEN, Fei LIU, Junsong XIE, Chenao XIA, Zhengtao TAN, Mi TIAN, Jinglong LI, Kun ZHANG, Defu LIAN, Richang HONG, Meng WANG
Front. Comput. Sci.. 2025, 19 (8): 198342-.  
https://doi.org/10.1007/s11704-024-40372-3

Abstract   HTML   PDF (1968KB)

Student cognitive modeling is a fundamental task in the intelligence education field. It serves as the basis for various downstream applications, such as student profiling, personalized educational content recommendation, and adaptive testing. Cognitive Diagnosis (CD) and Knowledge Tracing (KT) are two mainstream categories for student cognitive modeling, which measure the cognitive ability from a limited time (e.g., an exam) and the learning ability dynamics over a long period (e.g., learning records from a year), respectively. Recent efforts have been dedicated to the development of open-source code libraries for student cognitive modeling. However, existing libraries often focus on a particular category and overlook the relationships between them. Additionally, these libraries lack sufficient modularization, which hinders reusability. To address these limitations, we have developed a unified PyTorch-based library EduStudio, which unifies CD and KT for student cognitive modeling. The design philosophy of EduStudio is from two folds. From a horizontal perspective, EduStudio employs the modularization that separates the main step pipeline of each algorithm. From a vertical perspective, we use templates with the inheritance style to implement each module. We also provide eco-services of EduStudio, such as the repository that collects resources about student cognitive modeling and the leaderboard that demonstrates comparison among models. Our open-source project is available at the website of edustudio.ai.

Figures and Tables | References | Supplementary Material | Related Articles | Metrics
Tool learning with large language models: a survey
Changle QU, Sunhao DAI, Xiaochi WEI, Hengyi CAI, Shuaiqiang WANG, Dawei YIN, Jun XU, Ji-rong WEN
Front. Comput. Sci.. 2025, 19 (8): 198343-.  
https://doi.org/10.1007/s11704-024-40678-2

Abstract   HTML   PDF (4156KB)

Recently, tool learning with large language models (LLMs) has emerged as a promising paradigm for augmenting the capabilities of LLMs to tackle highly complex problems. Despite growing attention and rapid advancements in this field, the existing literature remains fragmented and lacks systematic organization, posing barriers to entry for newcomers. This gap motivates us to conduct a comprehensive survey of existing works on tool learning with LLMs. In this survey, we focus on reviewing existing literature from the two primary aspects (1) why tool learning is beneficial and (2) how tool learning is implemented, enabling a comprehensive understanding of tool learning with LLMs. We first explore the “why” by reviewing both the benefits of tool integration and the inherent benefits of the tool learning paradigm from six specific aspects. In terms of “how”, we systematically review the literature according to a taxonomy of four key stages in the tool learning workflow: task planning, tool selection, tool calling, and response generation. Additionally, we provide a detailed summary of existing benchmarks and evaluation methods, categorizing them according to their relevance to different stages. Finally, we discuss current challenges and outline potential future directions, aiming to inspire both researchers and industrial developers to further explore this emerging and promising area.

Figures and Tables | References | Related Articles | Metrics
Robust semi-supervised learning in open environments
Lan-Zhe GUO, Lin-Han JIA, Jie-Jing SHAO, Yu-Feng LI
Front. Comput. Sci.. 2025, 19 (8): 198345-.  
https://doi.org/10.1007/s11704-024-40646-w

Abstract   HTML   PDF (527KB)

Semi-supervised learning (SSL) aims to improve performance by exploiting unlabeled data when labels are scarce. Conventional SSL studies typically assume close environments where important factors (e.g., label, feature, distribution) between labeled and unlabeled data are consistent. However, more practical tasks involve open environments where important factors between labeled and unlabeled data are inconsistent. It has been reported that exploiting inconsistent unlabeled data causes severe performance degradation, even worse than the simple supervised learning baseline. Manually verifying the quality of unlabeled data is not desirable, therefore, it is important to study robust SSL with inconsistent unlabeled data in open environments. This paper briefly introduces some advances in this line of research, focusing on techniques concerning label, feature, and data distribution inconsistency in SSL, and presents the evaluation benchmarks. Open research problems are also discussed for reference purposes.

Figures and Tables | References | Related Articles | Metrics
7 articles