Publications

Publications in reverse chronological order.

2026

  1. When Slower Isn’t Truer: Inverse Scaling Law of Truthfulness in Multimodal Reasoning
    Sitong Fang , Wenjing Cao , Jiahao Li , 5 more authors Yaodong Yang, and Jiaming Ji
    In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2026
  2. Under Review
    debate_with_images.png
    Debate with Images: Detecting Deceptive Behaviors in Multimodal Large Language Models
    Sitong Fang , Shiyi Hou , Kaile Wang , 4 more authors Yaodong Yang, and Jiaming Ji
    Under Review at the International Conference on Machine Learning (ICML), 2026

2025

  1. Under Review
    ai_deception.png
    AI Deception: Risks, Dynamics, and Controls
    Boyuan Chen* , Sitong Fang* , Jiaming Ji* , 51 more authors Yaodong Yang, Tiejun Huang, Ya-Qin Zhang, HongJiang Zhang, and Andrew Yao
    Under Review at ACM Computing Surveys, 2025
  2. Preprint
    self_monitoring.png
    Mitigating Deceptive Alignment via Self-Monitoring
    Jiaming Ji* , Wenqi Chen* , Kaile Wang , 6 more authors Yike Guo, and Yaodong Yang
    arXiv preprint, 2025