Production tests: a guidebook for better systems and more sleep

https://news.ycombinator.com/rss Hits: 5
Summary

Your customers expect your site to be fully working whenever they need it. This means you need to aim for near-perfect uptime not just for the site itself, but for all features customers may use. Modern software engineering uses quality control measures such as automated test suites and observability tools (tracing, metrics, and logs) to ensure availability. Often overlooked in this landscape is production tests (also known as synthetics) that can give you immediate notification of failures in production. Production tests can be setup up with minimal fuss—usually within one sprint—and can provide a high return on investment. In this post, I will cover how to best set up production tests and how they can help with reliability, deployments, and observability. While I have always liked production tests, I got a real appreciation for them at Atlassian, where they are used extensively and are called “pollinators”, and I have seen first hand how they can give early warnings of problems, which can be fixed before the become incidents. What are production tests? A production test is any automated test that runs on the production environment. The test runs on a frequent schedule so that an on-call engineer can respond quickly. Typically, they run every minute. The test might use a headless browser to emulate user actions, or it may use an API directly to emulate the actions of browser code or backend service. The production test should run in a reasonable time. I suggest 30 seconds or less, so that you can run the test easily once per minute. A test that takes longer than 30 seconds is probably to complex anyway for a production test. How the test deals with failure is up to your team. It could integrate with your on call paging system, send a Slack notification, or just log an error into your logging systems. How do production tests help? Productions tests help make your production environment more reliable by giving immediate warning of a regression. This means you can pot...

First seen: 2025-05-20 16:11

Last seen: 2025-05-20 20:14