PRAGMA-VL: Towards a Pragmatic Arbitration of Safety and Helpfulness in MLLMs

Wen, Ming; Yang, Kun; Chen, Xin; Zhang, Jingyu; Han, Dingding; Cui, Shiwen; Xu, Yuedong

PRAGMA-VL: Towards a Pragmatic Arbitration of Safety and Helpfulness in MLLMs

Ming Wen^1,2,3, Kun Yang^3,4, Xin Chen⁵, Jingyu Zhang³, Dingding Han¹, Shiwen Cui³, Yuedong Xu^1,6,*

¹Fudan Univ, ²Shanghai Innovation Inst, ³Ant Group, ⁴ZJU, ⁵UCLA, ⁶Shenzhen Loop Area Inst, ^*Corresponding Author ICLR 2026

Paper (OpenReview)

Dataset(Soon) Code arXiv

Introduction

Multimodal Large Language Models (MLLMs) pose critical safety challenges, as they are susceptible not only to adversarial attacks such as jailbreaking but also to inadvertently generating harmful content for benign users. While internal safety alignment via Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) is a primary mitigation strategy, current methods often face a safety-utility trade-off: they either refuse benign queries out of excessive caution or overlook latent risks in cross-modal interactions.

To resolve this, we introduce Pragma-VL, an end-to-end alignment algorithm that enables MLLMs to pragmatically arbitrate between safety and helpfulness. First, we enhance visual risk perception with a novel cold-start SFT stage. This is achieved by applying risk-aware clustering to the visual encoder and using an interleaved dataset of risk descriptions and high-quality data. Second, we introduce a theoretically-guaranteed reward model that leverages synergistic learning. We train it with a novel data augmentation method that assigns dynamic weights based on the queries, enabling contextual arbitration between safety and helpfulness.

Extensive experiments show that Pragma-VL effectively balances safety and helpfulness, outperforming baselines by 5% to 20% on most multimodal safety benchmarks while preserving its general capabilities in areas such as mathematics and knowledge reasoning.

PRAGMA-VL Motivation: Safety-Helpfulness Trade-off in MLLMs — Figure 1: Safety-Helpfulness Trade-off in Multimodal Large Language Models

Current MLLMs face challenges in balancing safety and helpfulness, often refusing benign queries or overlooking latent risks.

Dataset

We introduce Pragma-Safe, a comprehensive multimodal safety dataset designed to evaluate and improve safety alignment in MLLMs. The dataset contains diverse risk scenarios with carefully annotated safety and helpfulness labels.

Dataset Construction Process

Multi-Dimensional Annotation: We generate diverse responses using six MLLMs and employ GPT-4o to assign granular helpfulness, harmlessness, and trade-off weight labels.
Consensus-Driven Aggregation: Multiple rounds of position annotations are processed via majority voting to establish stable baseline scores and minimize individual rater bias.
Variance-Aware Weight Refinement: A stochastic adjustment mechanism refines weight vectors based on rater consensus, preventing overfitting and ensuring robust context-dependent arbitration.

Data Analysis Results — Dataset Statistics and Distribution

Dataset Table — Dataset Composition Table

Dataset Examples — Risk Detection Examples

Helpfulness Examples — Helpfulness Evaluation Examples

Algorithm

Pragma-VL employs a two-stage alignment approach: (1) Risk-aware pre-alignment via cold-start SFT with visual risk clustering, and (2) Context-dependent reward modeling with synergistic learning for dynamic safety-helpfulness arbitration.

Experiment Results

Pragma-VL demonstrates significant improvements across multiple multimodal safety benchmarks while maintaining general capabilities. Our method achieves 5-20% improvement in safety metrics compared to baselines.

BibTeX

@inproceedings{
wen2026pragmavl,
title={Pragma-{VL}: Towards a Pragmatic Arbitration of Safety and Helpfulness in {MLLM}s},
author={Ming Wen and Kun Yang and Xin Chen and Jingyu Zhang and DINGDING HAN and shiwen cui and Yuedong Xu},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=KwWYvt547M}
}

PRAGMA-VL: Towards a Pragmatic Arbitration of Safety and Helpfulness in MLLMs

Introduction

Figure 1: Safety-Helpfulness Trade-off in Multimodal Large Language Models

Dataset

Figure 2: Pragma-Safe Dataset Construction Overview

Dataset Construction Process

Algorithm

Figure 3: Pragma-VL Algorithm Architecture

Experiment Results

BibTeX