Show HN: Defuddle, an HTML-to-Markdown alternative to Readability

https://news.ycombinator.com/rss Hits: 22
Summary

de·​fud·dle /diˈfʌdl/ transitive verb to remove unnecessary elements from a web page, and make it easily readable. Beware! Defuddle is very much a work in progress! Defuddle extracts the main content from web pages. It cleans up web pages by removing clutter like comments, sidebars, headers, footers, and other non-essential elements, leaving only the primary content. Try the Defuddle Playground → Features Defuddle aims to output clean and consistent HTML documents. It was written for Obsidian Web Clipper with the goal of creating a more useful input for HTML-to-Markdown converters like Turndown. Defuddle can be used as a replacement for Mozilla Readability with a few differences: More forgiving, removes fewer uncertain elements. Provides a consistent output for footnotes, math, code blocks, etc. Uses a page's mobile styles to guess at unnecessary elements. Extracts more metadata from the page, including schema.org data. Installation npm install defuddle For Node.js usage, you'll also need to install JSDOM: npm install jsdom Usage Browser import { Defuddle } from 'defuddle' ; // Parse the current document const defuddle = new Defuddle ( document ) ; const result = defuddle . parse ( ) ; // Access the content and metadata console . log ( result . content ) ; console . log ( result . title ) ; console . log ( result . author ) ; import { JSDOM } from 'jsdom' ; import { Defuddle } from 'defuddle/node' ; // Parse HTML from a string const html = '<html><body><article>...</article></body></html>' ; const result = await Defuddle ( html ) ; // Parse HTML from a URL const dom = await JSDOM . fromURL ( 'https://example.com/article' ) ; const result = await Defuddle ( dom ) ; // With options const result = await Defuddle ( dom , { debug : true , // Enable debug mode for verbose logging markdown : true , // Convert content to markdown url : 'https://example.com/article' // Original URL of the page } ) ; // Access the content and metadata console . log ( result . content ) ; cons...

First seen: 2025-05-22 22:27

Last seen: 2025-05-23 19:31