Publications | Sitong Fang

2026

ACL 2026
When Slower Isn’t Truer: Inverse Scaling Law of Truthfulness in Multimodal Reasoning

Sitong Fang , Wenjing Cao , Jiahao Li , 5 more authors Yaodong Yang, and Jiaming Ji

In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2026

Abs arXiv Bib Code Website

We investigate truthfulness in multimodal large language models and discover an inverse scaling law: slower reasoning models are less truthful in multimodal settings. We propose TruthfulVQA, the first benchmark for multimodal truthfulness evaluation, and TruthfulJudge, a reliable human-in-the-loop evaluation framework.
@inproceedings{fang2026truthful, title = {When Slower Isn't Truer: Inverse Scaling Law of Truthfulness in Multimodal Reasoning}, author = {Fang, Sitong and Cao, Wenjing and Li, Jiahao and Wang, Xuyao and Chan, Chi-Min and Han, Sirui and Dai, Juntao and Guo, Yike and Yang, Yaodong and Ji, Jiaming}, booktitle = {Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL)}, year = {2026}, }
Under Review
Debate with Images: Detecting Deceptive Behaviors in Multimodal Large Language Models

Sitong Fang , Shiyi Hou , Kaile Wang , 4 more authors Yaodong Yang, and Jiaming Ji

Under Review at the International Conference on Machine Learning (ICML), 2026

Abs arXiv Bib Code Website

We introduce MM-DeceptionBench, the first benchmark for evaluating deceptive behaviors in multimodal LLMs, and propose Debate with Images, a multi-agent debate framework requiring models to ground claims in visual evidence. Our approach significantly improves deception detection accuracy and human agreement.
@article{fang2026debate, title = {Debate with Images: Detecting Deceptive Behaviors in Multimodal Large Language Models}, author = {Fang, Sitong and Hou, Shiyi and Wang, Kaile and Chen, Boyuan and Hong, Donghai and Zhou, Jiayi and Dai, Juntao and Yang, Yaodong and Ji, Jiaming}, journal = {Under Review at the International Conference on Machine Learning (ICML)}, year = {2026}, }

2025

Under Review

AI Deception: Risks, Dynamics, and Controls

Boyuan Chen^* , Sitong Fang^* , Jiaming Ji^* , 51 more authors Yaodong Yang, Tiejun Huang, Ya-Qin Zhang, HongJiang Zhang, and Andrew Yao

Under Review at ACM Computing Surveys, 2025

Abs arXiv Bib Website

@article{chen2025deception,
  title = {AI Deception: Risks, Dynamics, and Controls},
  author = {Chen, Boyuan and Fang, Sitong and Ji, Jiaming and Zhu, Yanxu and Wen, Pengcheng and Wu, Jinzhou and Tan, Yingshui and Zheng, Boren and Yuan, Mengying and Chen, Wenqi and Hong, Donghai and Qiu, Alex and Chen, Xin and Zhou, Jiayi and Wang, Kaile and Dai, Juntao and Zhang, Borong and Yang, Tianzhuo and Siddiqui, Saad and Duan, Isabella and Duan, Yawen and Tse, Brian and Huang, Jen-Tse and Wang, Kun and Zheng, Baihui and Liu, Jiaheng and Yang, Jian and Li, Yiming and Chen, Wenting and Liu, Dongrui and Vierling, Lukas and Xi, Zhiheng and Fu, Haobo and Wang, Wenxuan and Sang, Jitao and Shi, Zhengyan and Chan, Chi-Min and Shi, Eugenie and Li, Simin and Li, Juncheng and Yang, Jian and Ji, Wei and Li, Dong and Yang, Jinglin and Song, Jun and Dong, Yinpeng and Fu, Jie and Zheng, Bo and Yang, Min and Guo, Yike and Torr, Philip and Trager, Robert and Zeng, Yi and Wang, Zhongyuan and Yang, Yaodong and Huang, Tiejun and Zhang, Ya-Qin and Zhang, HongJiang and Yao, Andrew},
  journal = {Under Review at ACM Computing Surveys},
  year = {2025},
}

Preprint
Mitigating Deceptive Alignment via Self-Monitoring

Jiaming Ji^* , Wenqi Chen^* , Kaile Wang , 6 more authors Yike Guo, and Yaodong Yang

arXiv preprint, 2025

Abs arXiv Bib Code

We propose CoT Monitor+, a framework that embeds a Self-Monitor inside chain-of-thought reasoning to detect and suppress deceptive alignment. Reduces deceptive behaviors by 43.8% while preserving task accuracy.
@article{ji2025selfmonitoring, title = {Mitigating Deceptive Alignment via Self-Monitoring}, author = {Ji, Jiaming and Chen, Wenqi and Wang, Kaile and Hong, Donghai and Fang, Sitong and Chen, Boyuan and Zhou, Jiayi and Dai, Juntao and Han, Sirui and Guo, Yike and Yang, Yaodong}, journal = {arXiv preprint}, year = {2025}, }