Pico-Banana-400k

https://news.ycombinator.com/rss Hits: 14
Summary

🍌 Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing Pico-Banana-400K is a large-scale dataset of ~400K text–image–edit triplets designed to advance research in text-guided image editing. Each example contains: an original image (from Open Images), (from Open Images), a human-like edit instruction , and , and the edited result generated and verified by the Nano-Banana model. The dataset spans 35 edit operations across 8 semantic categories, covering diverse transformations—from low-level color adjustments to high-level object, scene, and stylistic edits. 🧩 Key Features Feature Description Total Samples ~257K single-turn text–image–edit triplets for SFT, ~56K single-turn text-image(positive) - image(negative)-edit for preference learning, and ~72K multi-turn texts-images-edits for multi-turn applications Source Open Images Edit Operations 35 across 8 semantic categories Categories Pixel & Photometric, Object-Level, Scene Composition, Stylistic, Text & Symbol, Human-Centric, Scale & Perspective, Spatial/Layout Image Resolution 512–1024 px Prompt Generator Gemini-2.5-Flash Editing Model Nano-Banana Self-Evaluation Automated judging pipeline using Gemini-2.5-Pro for edit quality 🏗️ Dataset Construction Pico-Banana-400K is built using a two-stage multimodal generation pipeline: Instruction Generation Each Open Images sample is passed to Gemini-2.5-Flash, which writes concise, natural-language editing instructions grounded in visible content. We also provide short instructions summarized by Qwen-2.5-Instruct-7B. Example: { "instruction" : " Change the red car to blue. " } Editing + Self-Evaluation The Nano-Banana model performs the edit, then automatically evaluates the result using a structured quality prompt that measures: Instruction Compliance (40%) Editing Realism (25%) Preservation Balance (20%) Technical Quality (15%) Only edits scoring above a strict threshold (~0.7) are labeled as successful, forming the main dataset; the remaining ~56K are ret...

First seen: 2025-10-26 02:48

Last seen: 2025-10-26 16:05