ECG-WM: A Physiology-Informed ECG World Model for Clinical Intervention Simulation

Physiology-Informed Predictive Simulation for Action-Conditioned Cardiac Dynamics

Zhikang Chen1*, Yue Wang2, Sen Cui2, Yu Zhang3,
Changshui Zhang2, Tianling Ren2, Tingting Zhu1
1University of Oxford, 2Tsinghua University, 3Southern University of Science and Technology
*Corresponding author: zhikang.chen@eng.ox.ac.uk
🚧 Code will be released soon!

Abstract

Electrocardiogram (ECG)-based models have achieved strong performance in diagnostic tasks, yet they remain limited in modeling how cardiac dynamics evolve under external interventions. In particular, existing approaches focus primarily on static prediction and lack mechanisms to capture ECG variations under different pharmacological conditions. In this work, we propose an ECG World Model for action-conditioned predictive simulation of cardiac electrophysiology. Moving beyond disjoint pipelines, our framework features a principled integration of physiological ordinary differential equation (ODE) priors into latent diffusion dynamics via energy regularization. This structural constraint enables the synthesis of physiologically plausible post-intervention ECG trajectories while effectively mitigating generative hallucinations. Building on this simulation process, we introduce an uncertainty-aware evaluation strategy that leverages the stochasticity of diffusion sampling to characterize both the expected clinical risk and its variability, allowing a more reliable comparative assessment of candidate interventions. We evaluate our method across diverse settings, including controlled drug-response scenarios and real-world clinical records. Beyond standard waveform metrics, experimental results demonstrate improved risk calibration and strong alignment with expert-informed treatment preferences. These results establish our approach as a robust foundation for safe and intervention-aware clinical decision support.

Method Overview

Our generation paradigm integrates drug-conditioned simulation with uncertainty-aware risk evaluation. Top: Unlike the traditional prediction paradigm—which maps static patient data to a risk score without counterfactual reasoning—our method generates post-dose ECGs via a cardiac-ODE-enhanced world model, enabling actionable drug recommendations. Middle: From a pre-dose ECG, the model simulates drug-specific post-dose trajectories, evaluates them through an uncertainty-aware risk predictor, and ranks candidate interventions. Bottom: Single-step and multi-step rollout experiments show our model achieves the closest alignment with oracle ground truth compared to GPT baselines.

Model Architecture

Given a pre-dose ECG and a clinical query, an ECG Foundation Model extracts domain-informed features. The ODE-enhanced World Model then simulates latent cardiac dynamics forward in time via an ODE solver, producing a predicted post-dose state. A VAE decoder reconstructs the post-dose ECG waveform, while an inverse dynamics module and Action Projection head infer the drug action underlying the transition. The generated Post-dose ECG is fed into a frozen risk predictor, with a Vision-Language Model providing the final drug recommendation.

Experimental Results

Table 1: Post-dose ECG Prediction Error

Method Latent ECG Signal ECG
MSE ↓ MAE ↓ MSE ↓ MAE ↓
Qwen2.5-VL-7B2.1480.3680.1950.201
GPT-5 mini0.7190.1660.0660.128
GPT-4o0.5500.4140.5780.386
GPT-3.50.4240.2850.2650.232
GLM-4.50.0950.1640.0620.125
MedGemma5.6550.8990.5610.349
Ours0.0450.1590.0530.110

* Post-dose ECG prediction error on the MIMIC-IV-ECG benchmark. Lower MSE/MAE indicates more accurate simulation of drug-induced ECG changes across both latent and waveform spaces.

Table 2: Performance comparison of different methods on three ECG intervals.

Methods QTc Interval PR Interval Tpeak−Tend Interval
Acc (%) ↑Rec (%) ↑ Acc (%) ↑Rec (%) ↑ Acc (%) ↑Rec (%) ↑
WGAN75.23±0.6977.23±0.7157.67±0.6458.04±0.5485.49±0.1287.35±0.16
StyleGAN72.71±1.0674.29±1.4952.15±3.5455.35±1.2088.71±1.9690.35±1.87
ECG ODE-GAN72.81±3.2473.95±2.2554.63±1.4956.16±1.3987.03±0.5288.19±0.79
TTS-CGAN86.43±1.6587.21±0.6470.09±2.0872.82±2.3483.43±1.8585.48±1.57
CECG-GAN79.92±1.2782.75±1.3161.08±1.4360.62±0.7189.29±0.1391.26±0.05
DiffuSETS86.81±0.9890.07±1.2472.62±1.3474.20±1.5489.84±0.9892.10±0.96
DADM89.46±1.3693.69±1.4968.02±1.1472.26±1.7291.63±0.7693.04±0.58
Ours 90.55±0.7696.26±0.82 76.04±0.9780.84±1.24 94.70±1.1395.06±0.49

* Evaluation on the ECGRDVQ and ECGDMMLD drug-response databases. A prediction is considered correct only when all three interval labels (QTc, PR, Tpeak−Tend) simultaneously match the ground truth. Our method consistently outperforms baselines, demonstrating that the physiology-informed prior effectively constrains the generative process toward clinically plausible waveforms.

Generalization and Risk Modeling

Out-of-Distribution Generalization

Cross-dataset generalization. Models trained on the controlled drug-response ECG datasets (ECGRDVQ, ECGDMMLD) are evaluated on unseen MIMIC-IV ICU patients administered Verapamil, testing robustness to comorbidity-induced distribution shifts.

Table 3: Drug-Induced Risk Modeling

Core Evaluation Metric Ours DADM
ΔRisk Pearson Correlation ↑0.6200.266
ΔRisk Spearman Correlation ↑0.5980.255
ΔRisk Sign Agreement ↑76%58%
MAE ↓0.09730.1291
RMSE ↓0.12530.1625

* Evaluation on 200 test samples from the MIMIC-IV-ECG cohort across five drug candidates: Propofol, Regular Insulin, Heparin Sodium, Furosemide, and Norepinephrine. ΔRisk Sign Agreement measures the consistency of predicted risk-change direction against clinical ground truth, reflecting alignment with expert-informed treatment preferences.

Drug Response Simulation

Comparing pre-dose and simulated post-dose ECG waveforms. Use the arrows or click thumbnails below to navigate.

Pre-dose
Pre-dose ECG
Post-dose
Post-dose (True)

Cross-Drug Physiological Fidelity

Dofetilide

Ranolazine

Verapamil

Simulated post-dose ECG waveforms under three pharmacologically distinct interventions from the ECGRDVQ and ECGDMMLD databases. Dofetilide (IKr blocker), Ranolazine (late INa blocker), and Verapamil (L-type calcium channel blocker) each induce characteristic ECG morphological changes that our model faithfully reproduces.

Robustness to Missing ECG Leads

Abnormal ECG Lead Recovery

Post-dose ECG reconstruction when one or more leads exhibit abnormal or flat waveforms. Our model recovers physiologically consistent 12-lead signals, outperforming DADM in preserving clinically meaningful morphology.

Pre-dose ECG

Ours

DADM

True Post-dose ECG

Normal ECG Leads Reconstruction

Structured evaluation across varying numbers of missing leads. Each row shows a condition (1 or 2 leads missing); each column shows a processing stage: input (Pre-dose with missing leads), model output (Generated Post-dose), and ground truth (True Post-dose ECG).

1 Lead Missing · Pre-dose

1 Lead Missing · Generated

True Post-dose ECG

2 Leads Missing · Pre-dose

2 Leads Missing · Generated

Ablation Studies

EPK Loss Ablation

Ablation of the External Physiological Knowledge (EPK) loss weight. Higher EPK weighting enforces stronger physiological consistency, reducing generative hallucinations at the cost of slightly increased reconstruction error.

Multi-step Rollout Stability

Multi-step rollout stability. Our model maintains predictive accuracy over extended horizons, while baseline methods exhibit compounding drift in generated waveforms.

Risk-Aversion Coefficient λ

Effect of the risk-aversion coefficient λ in the uncertainty-aware scoring function S(a) = μR(a) + λ·σR(a). Larger λ penalizes high-variance drug candidates, enabling clinicians to trade off expected risk against uncertainty.

Limitations and Impact

Impact

This work proposes an ECG World Model for simulation-based analysis of cardiac interventions, enabling AI systems to explore potential physiological responses under different treatment scenarios. By combining a physiological prior with data-driven generative modeling, the framework supports hypothesis generation and in-silico evaluation of treatment effects.

Limitations

  1. Data Coverage. The model is trained on observational and simulated data, which may not fully capture real-world complexity, variability, or rare edge cases. This can lead to deviations from true clinical outcomes under distributional shifts.
  2. Simplified Prior. The physiological prior is simplified and may be misspecified, limiting its ability to represent full cardiac dynamics, especially in pathological regimes.
  3. Evaluation Scope. Evaluation focuses on aggregate stability metrics rather than clinical endpoints, and does not fully explain the interaction between prior-driven and data-driven components.
  4. Safety Guarantees. The model lacks calibrated uncertainty estimates and formal safety guarantees, constraining its use in high-stakes settings.

Important. This system is not intended as a standalone clinical tool. It should be used only as a decision-support framework requiring expert validation. Generated scenarios should be treated as exploratory hypotheses rather than definitive predictions.

Despite these limitations, the model exhibits stable and bounded behavior across varying levels of prior mismatch, with variability increasing smoothly rather than collapsing. The influence of the prior is also continuous and controllable, as partial prior injection yields intermediate performance. These results suggest a degree of robustness under imperfect assumptions.

Overall, this work takes a step toward more robust and interpretable generative models for physiological simulation, while underscoring the need for further validation, mechanistic understanding, and safety-aware design prior to real-world deployment.

Citation

If you use this work or find it helpful, please consider citing:

@article{chen2026ecg, title={ECG-WM: A Physiology-Informed ECG World Model for Clinical Intervention Simulation}, author={Chen, Zhikang and Wang, Yue and Cui, Sen and Zhang, Yu and Zhang, Changshui and Ren, Tianling and Zhu, Tingting}, journal={arXiv preprint arXiv:2605.17580}, year={2026} }