Vision Language Models Are Biased

https://news.ycombinator.com/rss Hits: 9

Summary

Finding: State-of-the-art Vision Language Models achieve 100% accuracy counting on images of popular subjects (e.g. knowing that the Adidas logo has 3 stripes and a dog has 4 legs) but are only ~17% accurate in counting in counterfactual images (e.g. counting stripes in a 4-striped Adidas-like logo or counting legs in a 5-legged dog). VLMs don't actually "see" - they rely on memorized knowledge instead of visual analysis due to bias. Figure 1: VLM Failures Overview VLMs fail on 6 counting tasks and one low-level vision task across seven domains VLMs fail on 6 counting tasks (a–e & g) and one low-level vision task (f). State-of-the-art models achieve perfect performance on original images but catastrophically fail when objects are subtly modified, defaulting to memorized knowledge rather than actual visual analysis. The Problem: VLMs Can't Count When It Matters Imagine asking GPT-4o to count the legs of an animal, and it gets it right every time. Impressive, right? Now imagine adding just one extra leg to that animal and asking again. Suddenly, it fails completely. The Dog Experiment Original dog (4 legs): All models get it right Same dog with 5 legs: All models still say "4" They're not counting - they're just recalling "dogs have 4 legs" from their training data. Figure 3: Subtle Modification Failures VLMs fail to detect subtle changes in counterfactuals and default to biased answers VLMs fail to detect subtle changes in counterfactuals (CF) and default to biased answers. Despite clear visual modifications (extra legs, extra stripes), all models consistently output the expected "normal" values rather than counting what they actually see. The Core Issue: VLMs suffer from severe confirmation bias. When they see familiar objects, they default to memorized knowledge instead of performing actual visual analysis. This isn't a minor glitch - it's a fundamental flaw in how these models process visual information. How We Test VLM Bias: The VLMBias Framework Our testing meth...

First seen: 2025-06-03 17:42

Last seen: 2025-06-04 01:43

Read Full Article More from this Source

Vision Language Models Are Biased

Summary

Related News

Using computers more freely and safely (2023)

Zero shot forecasting: finding the right foundation model for O11Y forecasting

Show HN: Job Compass – AI agents that help you find jobs, not replace you

The Hat, the Spectre and SAT Solvers (2024)

When random people give money to random other people (2017)