Chrome's New Embedding Model: Smaller, Faster, Same Quality

https://news.ycombinator.com/rss Hits: 2

Summary

TL;DR Chrome’s latest update incorporates a new text embedding model that is 57% smaller (35.14MB vs 81.91MB) than its predecessor while maintaining virtually identical performance in semantic search tasks. The size reduction was achieved primarily through quantization of the embedding matrix from float32 to int8 precision, with no measurable degradation in embedding quality or search ranking. Discovery and Extraction During routine analysis of Chrome’s binary components, I discovered a new version of the embedding model in the browser’s optimization guide directory. This model is used for history clustering and semantic search. Model directory: ~/AppData/Local/Google/Chrome SxS/User Data/optimization_guide_model_store/57/A3BFD4A403A877EC/ Technical Analysis Methodology To analyze the models, I developed a multi-faceted testing approach: Model Structure Analysis: Used TensorFlow’s interpreter to extract model architecture, tensor counts, shapes, and data types. Binary Comparison: Analyzed compression ratios, binary patterns, and weight distributions. Weight Quantization Assessment: Examined specific tensors to determine quantization techniques. Output Precision Testing: Estimated effective precision of output embeddings by analyzing minimum differences between adjacent values. Semantic Search Evaluation: Compared similarity scores and result rankings across multiple queries using a test corpus. Key Findings 1. Architecture Comparison Both models maintain identical architecture with similar tensor counts (611 vs. 606) and identical input/output shapes ([1,64] input and [1,768] output). This suggests they were derived from the same base model, likely a transformer-based embedding architecture similar to BERT. 2. Quantization Details The primary difference is in the embedding matrix, which stores token representations: Old model: arith.constant30: [32128, 512], <class 'numpy.float32'>, 62.75 MB New model:tfl.pseudo_qconst57: [32128, 512], <class 'numpy.int8'>, 15.69 MB...

First seen: 2025-05-13 18:31

Last seen: 2025-05-13 19:32

Read Full Article More from this Source

Chrome's New Embedding Model: Smaller, Faster, Same Quality

Summary

Related News

Using Obscure Graph Theory to Solve Programming Languages Problems

Launch HN: Miyagi (YC W25) turns YouTube videos into online, interactive courses

Show HN: Helixdb – Open-source vector-graph database for AI applications (Rust)

Build Real-Time Knowledge Graph for Documents with LLM

Mill as a Direct Style Build Tool