Lessons from building an AI data analyst

https://news.ycombinator.com/rss Hits: 4
Summary

MalloyTL;DRText-to-SQL is not enough. Answering real user questions requires going the extra mile like multi-step plans, external tools (coding) and external context.Context is the product. A semantic layer (we use Malloy ⎋) encodes business meaning and sharply reduces SQL complexity.Use a multi-agent, research-oriented system. Break problems down using context / domain knowledge, retrieve precisely, write code, interact with the environment and learn from it.Retrieval is a recommendation problem. Mix keyword, embeddings, and a fine-tuned reranker; optimise for precision, recall, and latency.Benchmarks ≠ production. Users expect human-level answers, drill-downs, and defensible reasoning, not just pass@k.Latency and quality are a tight bar. Route between fast and reasoning models; cache aggressively; keep contexts short. Continuous model evaluation is needed to avoid drifts as new models are launched.The short storyI spent years on ML for Analytics and Knowledge Discovery at Google and Twitter. For the past 3 years I've been building an AI data analyst at Findly (findly.ai ⎋). We entered Y Combinator with a different idea, but quickly realised the real problem for most teams wasn't "lack of data" — it was data discovery and use.We started the company as Conversion Pattern, tackling post-iOS 14 attribution and the privacy-driven collapse of cookie-based measurement. What we kept seeing: our customers already had most of the data they needed. They either didn't know it existed or couldn't stitch it together to answer business questions. The job wasn't to generate new data; it was to unlock the value of existing data.We started with a toy problem — text-to-SQL — and then let users pull us forward. The product evolved into a generative BI platform: it generates SQL, draws charts, writes Python for complex calculations, grounds itself in enterprise context, and pulls in external sources (web, PDFs) when the data story demands it.Why text-to-SQL isn't enoughReal questions ...

First seen: 2025-09-01 18:48

Last seen: 2025-09-03 23:58