What we’ll do is set a low-ish turn limit and see how much they manage to accomplish in that time.1 Another alternative for more linear games is running them multiple times with a turn limit and seeing how often they get past a particular point within that turn limit. Given how much freedom is offered to players of text adventures, this is a difficult test. It’s normal even for a skilled human player to immerse themselves in their surrounding rather than make constant progress. I wouldn’t be surprised if I got a score of zero if someone plopped me down in front of this test. But still, maybe it’s the best we can do with limited resources.2 Another idea is to give them a far-off goal and then somehow have them request hints when they are stuck, and count how many hints they need to get there. However, given how little they used hints given in the previous article, I doubt this would work very well either. What we’ll do is define a set of achievements for a game. These achievements will be clustered around the first few turns of the game, because we’ll only give the llm a few turns to earn them. Here’s an example for 9:05. TURN_LIMIT 40 ANSWER_PHONE Click. EXIT_BED You get out of bed. OPEN_DRESSER revealing some clean ENTER_BATHROOM far from luxurious REMOVE_SOILED You take off the soiled REMOVE_WATCH You take off the watch ENTER_SHOWER dawdle WEAR_CLEAN You put on the clean OPEN_FRONT You open the front UNLOCK_CAR Unlocked. ENTER_CAR Las Mesas OPEN_WALLET open the wallet CARD_SLOT green LED lights It should be fairly clear how this works: the TURN_LIMIT specifies how many turns the llm has to collect achievements. Every line other than that specifies an achievement: the name is on the left, and it counts as earned when the game prints the text on the right. The llm knows nothing of these achievements. It tries to get through the game and in the background we use the achievements to count how far it gets. It might seem like the turn limit must be calibrated such that ...
First seen: 2025-08-12 15:54
Last seen: 2025-08-13 05:56