EgoNormia

Can large models make normative decisions in physical-social embodied situations?

Verified Split: A high-quality subset of 200 videos with full agreement on the correct answers among 5 independent annotators.

Input Modality Types:

Video: Models receive both video input (1 fps, concatenated into a single image) and questions

Model	Modality	Both	Act	Jus	Sen	Date
🥇 Gemini 2.5 Pro (05-06-2025 Preview) Google	Video	67.8	74.4	68.9	56.7	2025-05-20
🥈 Gemini 2.5 Flash (04-17-2025 Preview) Google	Video	58.4	69.7	59.6	58.9	2025-05-20
🥉 o4-mini OpenAI	Video	58.3	66.7	66.7	64.6	2025-05-20
Gemini 2.0 Thinking Google	Video	50.0	70.6	50.0	56.1	2025-05-20
Gemini 1.5 Pro Google	Video	49.0	56.5	50.5	61.8	2025-05-20
Gemini 1.5 Flash Google	Video	48.0	53.0	50.5	56.8	2025-05-20
Qwen2.5 VL (72B) Alibaba	Video	47.0	57.5	48.0	68.2	2025-05-20
GPT-4.1 OpenAI	Video	46.4	50.0	50.0	57.7	2025-05-20
GPT-4o OpenAI	Video	45.5	53.0	50.0	62.7	2025-05-20
QWQ-32B Alibaba	Video	37.5	37.5	37.5	39.6	2025-05-20
Claude 3.7 Sonnet Anthropic	Video	33.3	40.0	41.7	40.8	2025-05-20
Claude 3.5 Sonnet Anthropic	Video	22.7	27.3	27.3	47.7	2025-05-20
S InternVL 2.5 Shanghai AI Lab	Video	13.0	16.5	15.0	52.1	2025-05-20
Llama 3.2 Meta	Video	4.0	18.0	10.5	55.6	2025-05-20