A few days ago, someone called PixelMelt published a way for Amazon's customers to download their purchased books without DRM. Well… sort of. In their post "How I Reversed Amazon's Kindle Web Obfuscation Because Their App Sucked" they describe the process of spoofing a web browser, downloading a bunch of JSON files, reconstructing the obfuscated SVGs used to draw individual letters, and running OCR on them to extract text. There were a few problems with this approach. Firstly, the downloader was hard-coded to only work with the .com site. That fix was simple - do a search and replace on amazon.com with amazon.co.uk. Easy! But the harder problem was with the OCR. The code was designed to visually centre each extracted glyph. That gives a nice amount of whitespace around the character which makes it easier for OCR to run. The only problem is that some characters are ambiguous when centred: When I ran the code, lots of full-stops became midpoints, commas became apostrophes, and various other characters went a bit wonky. That made the output rather hard to read. This was compounded by the way line-breaks were treated. Modern eBooks are designed to be reflowable - no matter the size of your screen, lines should only break on a new paragraph. This had forced linebreaks at the end of every displayed line - rather than at the end of a paragraph. So I decided to fix it. I decided that OCRing an entire page would yield better results than single characters. I was (mostly) right. Here's what a typical page looks like after de-obfuscation and reconstruction: As you can see - the typesetting is good for the body text, but skew-whiff for the title. Bold and italics are preserved. There are no links or images. Here's how I did it. As in the original code, I took the SVG path of the character and rendered it as a monochrome PNG. Rather than centring the glyph, I used the height and width provided in the glyphs.json file. That gave me a directory full of individual letters, numbers,...
First seen: 2025-10-19 14:01
Last seen: 2025-10-20 00:02