Hi HN, we’re the cofounders of Augento (https://augento.ai/). We’re building Deepseek R1-like fine-tuning as a service. You connect your agent, tell us when it’s right or wrong, and we deliver an LLM optimized for that agent. There’s a demo video https://www.youtube.com/watch?v=j5RQaTdRrKE, and our docs are at https://docs.augento.ai/. It’s open for anyone to use at https://augento.ai.Agents fail all the time, especially when you try to use them for something actually useful. Current solution approaches suck: prompting has intrinsic limits and supervised fine-tuning requires big explicit datasets that are hard to collect.Two months ago, the DeepSeek R1 paper outlined a way to post-train LLMs with (almost) pure reinforcement learning. We took up their research and built a fine-tuning platform around that.You let us intercept your agent's data flow, and we deliver you a fine-tuned open-source model, that is trained on the agent's specific task. Instead of providing big datasets of explicit fine-tuning samples, you provide a reward function, judging the model's outputs.Here are examples of what this can be used for:Coding Agent: We fine-tuned a coding agent that was constantly making syntax errors and failed to handle semantic edge cases properly. By providing a reward function that evaluated code against the compiler, the agent learned not to produce these errors. The fine-tuned model reduced critical bugs by 40% with just 20 training samples.MCP Tool Specialization: Imagine you have a custom set of internal tools using the MCP protocol, but your agent keeps selecting the wrong tool or passing incompatible parameters. You could fine-tune with a reward function that scores tool selection and parameter matching.Browser Agent Navigation: If you're building a browser agent that struggles with complex web UIs or specific sites, you could fine-tune it to better understand UI elements and navigation patterns. With a reward function that scores successful task completion (lik...
First seen: 2025-03-31 17:43
Last seen: 2025-04-01 13:46