Large generative models are becoming increasingly capable and more widely deployed to power production applications, but getting these models to produce exactly what's desired can still be challenging. Fine-grained control over these models' outputs is important to meet user expectations and to mitigate potential misuses, ensuring the models' reliability and safety. To address these issues, Apple machine learning researchers have developed a new technique that is modality-agnostic and provides fine-grained control over the model's behavior with negligible computational overhead, while minimally impacting the model's abilities. Activation Transport (AcT) is a general framework to steer activations guided by optimal transport theory that generalizes many previous activation-steering works. The work will be presented as a Spotlight at ICLR 2025, and code is available here. To help generative models produce output that aligns with their users' expectations, researchers often rely on reinforcement learning with human feedback (RLHF) or instruction fine-tuning, but these approaches are resource-intensive and become increasingly impractical as models grow in complexity. In addition, changing a model's parameters can have unintended consequences, affecting its overall performance on other tasks. To control the output of these generative models, users often try crafting precise prompts, but while this is more accessible, it offers limited controls. Even with carefully constructed prompts, a model’s output can be unpredictable and lacking the nuance a user might need. For example, it’s common for models to fail when prompted with instructions not to include something (see figure 1): Figure 1: Text-to-Image model such as SDXL and FLUX.1.dev tend to generate pink elephants even when instructed not to generate one. Left: “An astronaut in a space station. Do not show a pink elephant”. Right: “A dog running on the beach splashing water. Do not show a pink elephant”. In many applic...
First seen: 2025-04-10 18:45
Last seen: 2025-04-11 07:47