中文 | Sitong Fang

我是北京大学元培学院人工智能专业本科生，导师为杨耀东助理教授。我是北大对齐小组成员，同时担任逆矩阵科技（Physis AI）研究员。

我的研究聚焦于可信多模态人工智能与世界模型。我提出了 TruthfulVQA（ACL 2026），首个多模态真实性评测基准；以及 Debate with Images（preprint, 2025），基于视觉证据的多智能体辩论欺骗检测框架，与 MM-DeceptionBench，首个多模态欺骗评测基准。我也是 AI Deception Survey 的共同第一作者，该报告是首个国际 AI 欺骗系统性综述，图灵奖得主姚期智院士为通讯作者。

动态

2026.04 一篇论文被 ACL 2026 录用。
2025.11 AI Deception 综述发布，首个国际 AI 欺骗系统性报告，图灵奖得主姚期智院士为通讯作者。
2025.09 Debate with Images 发布，提出 MM-DeceptionBench 与多智能体辩论欺骗检测框架。
2025.06 Eval-Anything 在 PKU-Alignment 开源。
2025.05 获北京市自然科学基金本科生”启研”计划资助。

荣誉与奖项

2025 元培青年学者（仅 10 人）
2025 宋庆龄未来助学金
2025 受北京市自然科学基金本科生”启研”计划资助 (2023级人工智能方向本科生唯一)
2024 北京大学博雅奖学金
2024 北京大学招商证券奖学金
2024 北京大学学习优秀奖
2024 北京大学社会工作奖
2023 北京大学新生奖学金（一等奖）
2023 福建省高考物理类第一名

代表论文

2026

ACL 2026
When Slower Isn’t Truer: Inverse Scaling Law of Truthfulness in Multimodal Reasoning

Sitong Fang , Wenjing Cao , Jiahao Li , 5 more authors Yaodong Yang, and Jiaming Ji

In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2026

Abs arXiv Bib Code Website

We investigate truthfulness in multimodal large language models and discover an inverse scaling law: slower reasoning models are less truthful in multimodal settings. We propose TruthfulVQA, the first benchmark for multimodal truthfulness evaluation, and TruthfulJudge, a reliable human-in-the-loop evaluation framework.
@inproceedings{fang2026truthful, title = {When Slower Isn't Truer: Inverse Scaling Law of Truthfulness in Multimodal Reasoning}, author = {Fang, Sitong and Cao, Wenjing and Li, Jiahao and Wang, Xuyao and Chan, Chi-Min and Han, Sirui and Dai, Juntao and Guo, Yike and Yang, Yaodong and Ji, Jiaming}, booktitle = {Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL)}, year = {2026}, }
Under Review
Debate with Images: Detecting Deceptive Behaviors in Multimodal Large Language Models

Sitong Fang , Shiyi Hou , Kaile Wang , 4 more authors Yaodong Yang, and Jiaming Ji

Under Review at the International Conference on Machine Learning (ICML), 2026

Abs arXiv Bib Code Website

We introduce MM-DeceptionBench, the first benchmark for evaluating deceptive behaviors in multimodal LLMs, and propose Debate with Images, a multi-agent debate framework requiring models to ground claims in visual evidence. Our approach significantly improves deception detection accuracy and human agreement.
@article{fang2026debate, title = {Debate with Images: Detecting Deceptive Behaviors in Multimodal Large Language Models}, author = {Fang, Sitong and Hou, Shiyi and Wang, Kaile and Chen, Boyuan and Hong, Donghai and Zhou, Jiayi and Dai, Juntao and Yang, Yaodong and Ji, Jiaming}, journal = {Under Review at the International Conference on Machine Learning (ICML)}, year = {2026}, }

2025

Under Review

AI Deception: Risks, Dynamics, and Controls

Boyuan Chen^* , Sitong Fang^* , Jiaming Ji^* , 51 more authors Yaodong Yang, Tiejun Huang, Ya-Qin Zhang, HongJiang Zhang, and Andrew Yao

Under Review at ACM Computing Surveys, 2025

Abs arXiv Bib Website

@article{chen2025deception,
  title = {AI Deception: Risks, Dynamics, and Controls},
  author = {Chen, Boyuan and Fang, Sitong and Ji, Jiaming and Zhu, Yanxu and Wen, Pengcheng and Wu, Jinzhou and Tan, Yingshui and Zheng, Boren and Yuan, Mengying and Chen, Wenqi and Hong, Donghai and Qiu, Alex and Chen, Xin and Zhou, Jiayi and Wang, Kaile and Dai, Juntao and Zhang, Borong and Yang, Tianzhuo and Siddiqui, Saad and Duan, Isabella and Duan, Yawen and Tse, Brian and Huang, Jen-Tse and Wang, Kun and Zheng, Baihui and Liu, Jiaheng and Yang, Jian and Li, Yiming and Chen, Wenting and Liu, Dongrui and Vierling, Lukas and Xi, Zhiheng and Fu, Haobo and Wang, Wenxuan and Sang, Jitao and Shi, Zhengyan and Chan, Chi-Min and Shi, Eugenie and Li, Simin and Li, Juncheng and Yang, Jian and Ji, Wei and Li, Dong and Yang, Jinglin and Song, Jun and Dong, Yinpeng and Fu, Jie and Zheng, Bo and Yang, Min and Guo, Yike and Torr, Philip and Trager, Robert and Zeng, Yi and Wang, Zhongyuan and Yang, Yaodong and Huang, Tiejun and Zhang, Ya-Qin and Zhang, HongJiang and Yao, Andrew},
  journal = {Under Review at ACM Computing Surveys},
  year = {2025},
}

经历

逆矩阵科技（Physis AI） · 研究员 · 2026.02 – 至今
致力于构建物理真实的世界基础模型与强化学习。

北大对齐小组，北京大学 · 研究实习生 · 2024.12 – 至今

核心贡献者：Align-Anything (4.6k+ ★) — 全模态对齐框架
核心贡献者：Eval-Anything — 全模态安全评测框架

HKGAI 研发中心 / 香港科技大学 · 研究实习生 · 2024.12 – 2025.03
参与开发 HKGAI-V1，香港政府首个基于 DeepSeek 的本地化微调生成式 AI 模型，支持粤语、普通话和英语，具备本地化安全对齐能力。

教育经历

2023 - 至今 北京大学元培学院，人工智能专业