Domain Adaptation of Base Models + ShadowdarkQA Bench

https://news.ycombinator.com/rss Hits: 4

Summary

There’s a fast way and a slow way to go about developing an autonomous LLM Game Master. The fast way - the agentic way - would probably be to build some MCP servers to act as a harness, providing access to maps and rulesets, plugging the results into a frontier model and seeing what happened. I was intending to go that way, developing some environments I could do GRPO on immediately. The thought was, provide an MCP server that allowed rule search, and then come up with some combat scenarios that had verifiable outcomes. Those would work both as eval for frontier models and external rewards for any model being trained, and would certainly go a lot further a lot faster to getting to the final desired result. However, that would only make sense if my goal was primarily or exclusively to get to the end product: an LLM that can run TTRPGs. It’s not. The goal is to get a better understanding of the front-to-back development of model capabilities and get as much painful hands on experience as possible with every part of the stack. Besides, with the research suggesting RL may just be focusing skills/knowledge a pretrained model already has. At any rate, it makes sense to bake in priors specific to TTRPGs - their rulesets are particular, and a baseline understanding of their structure would make any tool-assisted lookups easier to do, and the results of that search easier to interpret for that model. So, we’ll start with base models and work our way up from scratch. Before we have an LLM GM, we’ll create a model that can act as an assistant to a GM or player playing a TTRPG, and see how such a system can make a transition to a more agentic GM. Getting Started# The additional constraint is around compute. This project is GPU poor. Due to those constraints, we’ll only choose models we can reasonably train without blowing our budget. We’ll be seeking to make this model as small as possible while still being useful. I’m an admirer of Alexander Doria’s work, which inspired me to ...

First seen: 2025-05-29 15:06

Last seen: 2025-05-29 18:07

Read Full Article More from this Source

Domain Adaptation of Base Models + ShadowdarkQA Bench

Summary

Related News

Buttplug MCP

Notes on Tunisia

Learning C3

My website is ugly because I made it

Human coders are still better than LLMs