We have all been there. You build an agent. It works perfectly in the demo. You deploy it. And then, on a Tuesday at 3 PM, it decides that the URL for the API documentation is api.stripe.com/v1/users (a 404), but it looks so plausible that you waste 20 minutes debugging network errors.Worse, it says this with 100% confidence.When we try to fix this today, the industry tells us to use “LLM-as-a-Judge.” We are told to ask GPT-4o to grade GPT-3.5. We are told to fix the “vibes.”But this creates a dangerous circular dependency. If the underlying models suffer from sycophancy (agreeing with the user) or hallucination, a Judge model often hallucinates a passing grade.We are trying to fix probability with more probability. That is a losing game.I believe we need to stop treating Agents like magic boxes and start treating them like software. Software has assertions. Software has unit tests. Software has return False.We need to re-introduce Determinism into the stack.Don’t ask an LLM if a URL is valid. It will hallucinate a 200 OK. Run requests.get().Don’t ask an LLM if a SQL query is safe. It will miss subtle injections. Parse the AST.Don’t ask an LLM if “Springfield” is ambiguous. It will guess Illinois. Check the database count.If the code says “No,” it doesn’t matter how confident the LLM is. The action is blocked.I got tired of debugging these errors by reading logs after the fact. I wanted a firewall that would catch these “Confident Idiot” moments in real-time.So I built Steer.It isn’t a heavy observability platform. It’s a simple Python library that wraps your agent functions and enforces hard guardrails.python # The “Steer” way: Hard Rules. @capture(verifiers=[ # 1. Enforce SSN Format RegexVerifier(pattern=r"^\d{3}-\d{2}-\d{4}$"), # 2. Block Markdown JsonVerifier(strict=True) ]) def update_user_profile(data): # If the LLM messes up the format, this code never runs. # The error is caught, logged, and sent to a dashboard for correction. db.update(data)The most interes...
First seen: 2025-12-08 13:25
Last seen: 2025-12-08 20:26