Online image scraper from websites and their pages | ImageThief

Clock
27.11.2023
Clock
21.05.2025
An eye
1511
Hearts
2
Connected dots
1
Connected dots
0
Connected dots
3
Tags: Internal tool Scraper Django app Webtool ImageThief
Web tool
Web tool
Django app
Django app
Terminal user interface
Terminal user interface
Scraper
Scraper

ImageThief

Start

ImageThief-Status

ImageThief-Results

ImageThief-Logs


							
							

Proxy

?
?
?

Mode

About image parser

Common

This is a web scraping tool that searches and downloads all images from a site. It works in 3 different modes. In the single-page parser mode, it searches and downloads images only from the specified page. In the multi-page mode. . In this case, the list of provided pages is parsed. And finally, the mode analyzes the image across the entire site and, if possible, downloads them. Although you can't stop scraping, you can close the tab and continue to scrape from the last link. Just enter the same address and mode and click Start buttons.
Scraping is implemented in single-threaded mode with user-agents swapping and proxies. Swapping and selection of the same is performed randomly using weights. That is, the more and longer you scrape the site, the better and faster the scraper will select the most effective proxies and user-agents.
To save space on the server, every day at 0:00 Moscow time I delete all collected parsing results.
This tool is developed in 2 variations. As a django application and as a separate CLI tool. Quite an important note, if I constantly update and improve the Django application, then the CLI version is not. Keep this in mind. Here is a link to Django app. Here it is a link to script.

About proxies

This tool supports proxies. Only public ones for now, but still. Here is an example of a file with proxies Can work with such proxies protocols as http, https, socks4 socks5. Also, due to the fact that the ProxyChecker tool is not ready yet, the option of automatic generation and selection of proxies for a specific site is not available.

Limitations and disclaimer

This tool has several limitations while scraping. Such as, it does not scrape svg files, it does not scrape background images specified in styles. Also dynamic scraping mode not yet implemented, but soon will. This web tool is absolutely free, the only thing I ask is, add this tool to your bookmarks, or share a link to it. Thank you.
Also, the author of this tool does not bear any responsibility for what visitors scrape. It was created solely to save time and nerves of those who simply need to collect all the images from the site.

Similar tools

Financial statistics parser from Yahoo

Terminal user interface
Scraper
This parser is implemented as a command-line tool that allows you to choose which financial instruments or categories to parse and how to save them.

A dynamic parser of ingredients and recipes

Terminal user interface
Scraper
This is a dynamic site parser, with bypassing blocking and constant waiting for the site to load content. Works on Selenium, but is quite slow

Cyberforum parser

Terminal user interface
Scraper
This parser parses all questions in the forum in multithreaded mode. Nothing special, just an example.

Scraper of skins and store items from CSGO website

Terminal user interface
Scraper
This is a scraper of cs go skins and online store items. Works in multi-threaded mode, with the ability to filter and paginate all skins

SEO blog parser

Terminal user interface
Scraper
This parser parses available content on the site in multi-threaded mode with user agent rotation. Simple example.

An online scraper of links from websites and their pages | LinkThief

Web tool
Django app
Telegram bot
With graphical interface
Terminal user interface
Scraper
This tool is a web version and skin for my library for parsing links from websites. This library has several more skins, such as a CLI script, a GUI application, a Telegram bot and as a regular python library (link-thief) available through PyPI.

heart
cloud
cloud
cloud
message

Reviews

(0)

captcha
Send
Response for
>
It's empty now. Be the first (o゚v゚)ノ