Chrome's New Embedding Model: Smaller, Faster, Same Quality

https://news.ycombinator.com/rss Hits: 2
Summary

TL;DR Chrome’s latest update incorporates a new text embedding model that is 57% smaller (35.14MB vs 81.91MB) than its predecessor while maintaining virtually identical performance in semantic search tasks. The size reduction was achieved primarily through quantization of the embedding matrix from float32 to int8 precision, with no measurable degradation in embedding quality or search ranking. Discovery and Extraction During routine analysis of Chrome’s binary components, I discovered a new version of the embedding model in the browser’s optimization guide directory. This model is used for history clustering and semantic search. Model directory: ~/AppData/Local/Google/Chrome SxS/User Data/optimization_guide_model_store/57/A3BFD4A403A877EC/ Technical Analysis Methodology To analyze the models, I developed a multi-faceted testing approach: Model Structure Analysis: Used TensorFlow’s interpreter to extract model architecture, tensor counts, shapes, and data types. Binary Comparison: Analyzed compression ratios, binary patterns, and weight distributions. Weight Quantization Assessment: Examined specific tensors to determine quantization techniques. Output Precision Testing: Estimated effective precision of output embeddings by analyzing minimum differences between adjacent values. Semantic Search Evaluation: Compared similarity scores and result rankings across multiple queries using a test corpus. Key Findings 1. Architecture Comparison Both models maintain identical architecture with similar tensor counts (611 vs. 606) and identical input/output shapes ([1,64] input and [1,768] output). This suggests they were derived from the same base model, likely a transformer-based embedding architecture similar to BERT. 2. Quantization Details The primary difference is in the embedding matrix, which stores token representations: Old model: arith.constant30: [32128, 512], <class 'numpy.float32'>, 62.75 MB New model:tfl.pseudo_qconst57: [32128, 512], <class 'numpy.int8'>, 15.69 MB...

First seen: 2025-05-13 18:31

Last seen: 2025-05-13 19:32