QVQ-Max: Think with Evidence

https://news.ycombinator.com/rss Hits: 16
Summary

QWEN CHAT GITHUB HUGGING FACE MODELSCOPE DISCORDIntroduction#Last December, we launched QVQ-72B-Preview as an exploratory model, but it had many issues. Today, we are officially releasing the first version of QVQ-Max, our visual reasoning model. This model can not only “understand” the content in images and videos but also analyze and reason with this information to provide solutions. From math problems to everyday questions, from programming code to artistic creation, QVQ-Max has demonstrated impressive capabilities. Though this is just our first version, its potential is already eye-catching.MathVision is a benchmark that aggregates various challenging multimodal mathematical problems, and we evaluate a model’s ability to solve complex math problems based on its performance on this benchmark. As shown in the figure, by adjusting the maximum length of the model’s thinking process, we observe a continuous improvement in the model’s accuracy on MathVision, demonstrating the immense potential of the model.In the following sections, we will discuss the design philosophy behind QVQ-Max, its actual capabilities, and what it can do for you.Why Do We Need Visual Reasoning?#Traditional AI models mostly rely on text input, such as answering questions, writing articles, or generating code. However, in real life, much of the information isn’t expressed through words but rather through images, charts, or even videos. A single image can contain rich details like colors, shapes, spatial relationships, and more. These elements are often more intuitive, but also more complex than text.For example, if you want to determine whether an architectural blueprint is reasonable, a description alone might not be enough. But if you could see the blueprint and analyze it using professional knowledge, the task becomes much easier. This is the significance of visual reasoning—it allows AI to not just “see,” but also “understand” and “think.”Our goal in designing QVQ-Max was simple: to create an...

First seen: 2025-04-06 12:13

Last seen: 2025-04-07 03:16