Medical Agents — MED-SPIRAL

✦ Agentic Systems with Tools and Reasoning Medical Agents

Clinical decision-making demands more than pattern recognition — it requires tool use, multi-step reasoning, and the ability to integrate heterogeneous information across a patient journey. We build medical AI agents that call specialized tools, reason through complex diagnostic workflows, and operate reliably in realistic clinical settings.

Tool Use

Meta-Tool: Unleash Open-World Function Calling Capabilities of General-Purpose Large Language Models

ACL, 2025

A framework that equips general-purpose LLMs with open-world function calling, enabling them to discover, select, and invoke tools beyond a fixed predefined set — a critical capability for medical agents operating in dynamic clinical environments.

MeNTi: Bridging Medical Calculator and LLM Agent with Nested Tool Calling

NAACL, 2025 · GitHub

Connects LLM agents with clinical medical calculators through a nested tool-calling mechanism, allowing agents to invoke multi-step calculator pipelines as part of complex clinical reasoning chains.

MedMCP-Calc: Benchmarking LLMs for Realistic Medical Calculator Scenarios via MCP Integration

ACL, 2026 · GitHub

A benchmark that evaluates LLMs on realistic medical calculator scenarios through Model Context Protocol integration, probing the ability to select the right calculator, extract required parameters, and interpret results in clinical context.

Reasoning

O1 Replication Journey — Part 3: Inference-time Scaling for Medical Reasoning

arXiv, 2025 · GitHub

Investigates inference-time compute scaling for medical reasoning, replicating and extending OpenAI o1-style chain-of-thought strategies to complex clinical question answering and diagnostic tasks.

DiagnosisArena: Benchmarking Diagnostic Reasoning for Large Language Models

ACL Findings, 2026 · HuggingFace

A comprehensive benchmark for evaluating LLM diagnostic reasoning across diverse clinical specialties, testing multi-step differential diagnosis, evidence integration, and alignment with clinical guidelines.

Multi-agent Applications

GI-AgentX: A Multi-modal Agentic System for Gastrointestinal Cancer

Manuscript in preparation

The first agentic system for unified patient-journey care of gastrointestinal cancer. GI-AgentX integrates an LLM-based scheduler agent for orchestration, planning, and recommendation; three dedicated vision agents for multimodal image analysis; and a retrieval-augmented generation module for guideline invocation and evidence grounding.

World Models

EHRWorld: A Patient-Centric Medical World Model for Long-Horizon Clinical Trajectories

arXiv, 2025

A patient-centric world model built on electronic health records that simulates long-horizon clinical trajectories, enabling agents to plan, predict, and reason about patient outcomes over extended time horizons.

✦ Agentic AI is a central focus of our lab right now — we are actively working on memory, skills, and world models for medical agents, as well as building benchmarks to rigorously evaluate agent capabilities in clinical settings. More coming soon.