Eval-Anything

Comprehensive Safety Evaluation for Any-to-Any Multimodal Models

Eval-Anything

A comprehensive evaluation framework for assessing the safety and capabilities of multimodal large language models across diverse modalities.

PKU-Alignment

GitHub


Key Features

  1. Broad Modality Coverage: Evaluates across text, image, video, speech, and action (embodied) modalities.

  2. 50+ Integrated Benchmarks: Aggregates a wide range of open-source datasets alongside custom-developed ones.

  3. Structured Safety Taxonomy: 5 core evaluation dimensions and 35 sub-dimensions for fine-grained safety measurement.

  4. Embodied Safety Evaluation: Uniquely addresses execution safety, trajectory safety, and hardware safety for robotic agent scenarios.