Projects

VFM, LLM and VLM in the Medical Domain Large Models

Large-scale pretrained models are reshaping medical AI — replacing task-specific pipelines with unified representations that generalize across organs, modalities, and clinical tasks. We build and adapt visual foundation models, large language models, and vision-language models for medicine, covering pretraining strategies, efficient finetuning, and rigorous clinical evaluation.

Large models project overview

Visual Foundation Models

MedLSAM: Localize and Segment Anything Model for 3D CT Images

Extends the Segment Anything Model to 3D medical imaging by introducing a localization prior that guides mask generation in volumetric CT scans, achieving strong zero-shot segmentation across diverse anatomical structures.

A Data-efficient Pan-tumor Foundation Model for Oncology CT Interpretation

A foundation model for oncology CT that achieves broad tumor coverage with significantly reduced annotation requirements, enabling scalable pretraining across heterogeneous cancer types and imaging protocols.

Towards Generalist Intelligence in Dentistry: Vision Foundation Models for Oral and Maxillofacial Radiology

A generalist vision foundation model for dental and maxillofacial radiology, pretrained on large-scale oral imaging data and evaluated across detection, segmentation, and diagnostic classification tasks.

MedDiff-FM: A Diffusion-based Foundation Model for Versatile Medical Image Applications

A diffusion-based foundation model that unifies medical image generation, restoration, and segmentation within a single framework, demonstrating strong generalization across modalities including CT, MRI, and X-ray.

Large Language Models

PULSE: Pretrained and Unified Language Service Engine

A Chinese medical large language model open-sourced in 2023 and continuously developed since. PULSE covers a wide range of clinical NLP tasks including diagnosis, drug information, medical Q&A, and report generation, with ongoing expansion of training data and capabilities.

Vision-Language Models

CAT: Coordinating Anatomical-Textual Prompts for Multi-organ and Tumor Segmentation

A vision-language segmentation model that coordinates anatomical and textual prompts to achieve unified multi-organ and tumor segmentation, enabling open-vocabulary queries over 3D medical volumes.

Unleashing the Potential of Vision-Language Pre-Training for 3D Zero-Shot Lesion Segmentation via Mask-Attribute Alignment

Leverages vision-language pretraining to enable zero-shot lesion segmentation in 3D medical images through mask-attribute alignment, bridging the gap between language descriptions and volumetric spatial understanding.

✦ Vision-language models for medical imaging are a current research focus of our lab — more work coming soon.