Building an agentic image generator that improves itself

https://news.ycombinator.com/rss Hits: 4

Summary

The goal was to build a system that automatically improves the quality of images generated by the OpenAI API. To do this, we needed a robust evaluatorAn evaluator is an AI system that assesses the quality or characteristics of generated content. In the context of LLMs, evaluators examine outputs against specific criteria, providing feedback that can be used to improve generation quality. This creates a closed loop system where one model critiques another model's work. to detect imperfections—such as distorted text or weak visual appeal—and an iterative feedback loop to refine the image with each pass.Defining an Initial PromptWe began by defining an initial prompt to generate our ad. As shown below, we settled upon a prompt that included various distinct, challenging components for an image generation model to create.An ad for Redbull's summer campaign. It should include multiple flavors of RedBull, with lots of colors surrounding it. The image should be on a rooftop in SF, with lots of people socializing like a party. Include a discount code in plain text, on the bottom right.We found that gpt-image-1 struggled to generate high-quality images from this prompt. While the overall concepts existed, the result felt like a blurry abstraction. The distinct visual elements seemed to overwhelm the model's ability to render each one in detail.Approach 1: LLM-as-a-Judge for Text ImprovementText Blurriness DetectionLLM-as-a-JudgeLLM-as-a-Judge is an evaluation approach where a large language model is prompted to assess the quality of generated content. In this context, we use it to identify issues in AI-generated images, particularly focusing on text clarity and visual coherence. was first picked as the evaluation method for blurry and distorted text. We started by prompting o3 to identify discrepancies in the initially generated image.We took the output image from the prompt above, and requested o3 to identify all of the issues with the image related to text blurriness or di...

First seen: 2025-05-21 13:20

Last seen: 2025-05-21 16:21

Read Full Article More from this Source

Building an agentic image generator that improves itself

Summary

Related News

Ratatoi is a C libary that wraps stdlib's strtol (as atoi does), but it's evil.

All That Glitters

Harnessing the Universal Geometry of Embeddings

By Default, Signal Doesn't Recall

Collaborative Text Editing Without CRDTs or OT