Launch HN: Augento (YC W25) – Fine-tune your agents with reinforcement learning

https://news.ycombinator.com/rss Hits: 21

Summary

Hi HN, we’re the cofounders of Augento (https://augento.ai/). We’re building Deepseek R1-like fine-tuning as a service. You connect your agent, tell us when it’s right or wrong, and we deliver an LLM optimized for that agent. There’s a demo video https://www.youtube.com/watch?v=j5RQaTdRrKE, and our docs are at https://docs.augento.ai/. It’s open for anyone to use at https://augento.ai.Agents fail all the time, especially when you try to use them for something actually useful. Current solution approaches suck: prompting has intrinsic limits and supervised fine-tuning requires big explicit datasets that are hard to collect.Two months ago, the DeepSeek R1 paper outlined a way to post-train LLMs with (almost) pure reinforcement learning. We took up their research and built a fine-tuning platform around that.You let us intercept your agent's data flow, and we deliver you a fine-tuned open-source model, that is trained on the agent's specific task. Instead of providing big datasets of explicit fine-tuning samples, you provide a reward function, judging the model's outputs.Here are examples of what this can be used for:Coding Agent: We fine-tuned a coding agent that was constantly making syntax errors and failed to handle semantic edge cases properly. By providing a reward function that evaluated code against the compiler, the agent learned not to produce these errors. The fine-tuned model reduced critical bugs by 40% with just 20 training samples.MCP Tool Specialization: Imagine you have a custom set of internal tools using the MCP protocol, but your agent keeps selecting the wrong tool or passing incompatible parameters. You could fine-tune with a reward function that scores tool selection and parameter matching.Browser Agent Navigation: If you're building a browser agent that struggles with complex web UIs or specific sites, you could fine-tune it to better understand UI elements and navigation patterns. With a reward function that scores successful task completion (lik...

First seen: 2025-03-31 17:43

Last seen: 2025-04-01 13:46

Read Full Article More from this Source

Launch HN: Augento (YC W25) – Fine-tune your agents with reinforcement learning

Summary

Related News

Visual Transistor-level Simulation of the 6502 CPU

How a Pipe Organ Works

TmuxAI: AI-Powered, Non-Intrusive Terminal Assistant

Cut: Chattanooga Civic User Testing

Show HN: I created snapDOM to capture DOM nodes as images with exceptional speed