arXiv:2603.01241 - Information Retrieval (cs.IR)
TARSE: Test-Time Adaptation via Retrieval of Skills and Experience for Reasoning Agents
Mar 1, 2026
Abstract
Complex clinical decision making often fails not because a model lacks facts, but because it cannot reliably select and apply the right procedural knowledge and the right prior example at the right reasoning step. We frame clinical question answering as an agent problem with two explicit, retrievable resources: skills, reusable clinical procedures such as guidelines, protocols, and pharmacologic mechanisms; and experience, verified reasoning trajectories from previously solved cases (e.g., chain-of-thought solutions and their step-level decompositions). At test time, the agent retrieves both relevant skills and experiences from curated libraries and performs lightweight test-time adaptation to align the language model's intermediate reasoning with clinically valid logic. Concretely, we build (i) a skills library from guideline-style documents organized as executable decision rules, (ii) an experience library of exemplar clinical reasoning chains indexed by step-level transitions, and (iii) a step-aware retriever that selects the most useful skill and experience items for the current case. We then adapt the model on the retrieved items to reduce instance-step misalignment and to prevent reasoning from drifting toward unsupported shortcuts. Experiments on medical question-answering benchmarks show consistent gains over strong medical RAG baselines and prompting-only reasoning methods. Our results suggest that explicitly separating and retrieving clinical skills and experience, and then aligning the model at test time, is a practical approach to more reliable medical agents.
Repository Summary
TARSE frames clinical QA as an agent problem with two explicit retrievable resources—skills from guideline-like procedures and experience from verified reasoning traces—and performs lightweight test-time adaptation on the retrieved items, showing consistent gains over strong medical RAG and prompting-only reasoning baselines.
Bibliographic Data
- Title
- TARSE: Test-Time Adaptation via Retrieval of Skills and Experience for Reasoning Agents
- Authors
- Junda Wang, Zonghai Tao, Hansi Zeng, Zhichao Yang, Hamed Zamani, Hong Yu
- Publication date
- 2026/03/01
- Identifier
- arXiv:2603.01241
- DOI
- 10.48550/arXiv.2603.01241
- PDF size
- 2.2 MB