Scraperr โ€“ A Self Hosted Webscraper

https://news.ycombinator.com/rss Hits: 19
Summary

A powerful self-hosted web scraping solution ๐Ÿ“‹ Overview Scraperr enables you to extract data from websites with precision using XPath selectors. This self-hosted application provides a clean interface to manage scraping jobs, view results, and export data. ๐Ÿ“š Check out the docs for a comprehensive quickstart guide and detailed information. โœจ Key Features XPath-Based Extraction : Precisely target page elements : Precisely target page elements Queue Management : Submit and manage multiple scraping jobs : Submit and manage multiple scraping jobs Domain Spidering : Option to scrape all pages within the same domain : Option to scrape all pages within the same domain Custom Headers : Add JSON headers to your scraping requests : Add JSON headers to your scraping requests Media Downloads : Automatically download images, videos, and other media : Automatically download images, videos, and other media Results Visualization : View scraped data in a structured table format : View scraped data in a structured table format Data Export : Export your results in various formats : Export your results in various formats Notifcation Channels: Send completion notifcations, through various channels ๐Ÿš€ Getting Started make up โš–๏ธ Legal and Ethical Guidelines When using Scraperr, please remember to: Respect robots.txt : Always check a website's robots.txt file to verify which pages permit scraping : Adhere to each website's regarding data extraction Rate Limiting: Implement reasonable delays between requests to avoid overloading servers Disclaimer: Scraperr is intended for use only on websites that explicitly permit scraping. The creator accepts no responsibility for misuse of this tool. ๐Ÿ“„ License This project is licensed under the MIT License. See the LICENSE file for details. ๐Ÿ‘ Contributions Development made easier with the webapp template.

First seen: 2025-05-11 19:23

Last seen: 2025-05-12 13:27