Scraperr – A Self Hosted Webscraper

https://news.ycombinator.com/rss Hits: 19

Summary

A powerful self-hosted web scraping solution 📋 Overview Scraperr enables you to extract data from websites with precision using XPath selectors. This self-hosted application provides a clean interface to manage scraping jobs, view results, and export data. 📚 Check out the docs for a comprehensive quickstart guide and detailed information. ✨ Key Features XPath-Based Extraction : Precisely target page elements : Precisely target page elements Queue Management : Submit and manage multiple scraping jobs : Submit and manage multiple scraping jobs Domain Spidering : Option to scrape all pages within the same domain : Option to scrape all pages within the same domain Custom Headers : Add JSON headers to your scraping requests : Add JSON headers to your scraping requests Media Downloads : Automatically download images, videos, and other media : Automatically download images, videos, and other media Results Visualization : View scraped data in a structured table format : View scraped data in a structured table format Data Export : Export your results in various formats : Export your results in various formats Notifcation Channels: Send completion notifcations, through various channels 🚀 Getting Started make up ⚖️ Legal and Ethical Guidelines When using Scraperr, please remember to: Respect robots.txt : Always check a website's robots.txt file to verify which pages permit scraping : Adhere to each website's regarding data extraction Rate Limiting: Implement reasonable delays between requests to avoid overloading servers Disclaimer: Scraperr is intended for use only on websites that explicitly permit scraping. The creator accepts no responsibility for misuse of this tool. 📄 License This project is licensed under the MIT License. See the LICENSE file for details. 👏 Contributions Development made easier with the webapp template.

First seen: 2025-05-11 19:23

Last seen: 2025-05-12 13:27

Read Full Article More from this Source

Scraperr – A Self Hosted Webscraper

Summary

Related News

University of Texas-Led Team Solves a Big Problem for Fusion Energy

Implicit UVs: Real-time semi-global parameterization of implicit surfaces [pdf]

A Typical Workday at a Japanese Hardware Tool Store [video]

A crypto founder faked his death. We found him alive at his dad's house

Writing an LLM from scratch, part 13 – attention heads are dumb