3 horizontal lines, burger
3 horizontal lines, burger

3 horizontal lines, burger
Remove all
LOADING ...

Online image scraper from websites and their pages | ImageThief

Clock
27.11.2023
/
Clock
21.05.2025
An eye
3205
Hearts
2
Connected dots
1
Connected dots
0
Connected dots
4
Web tool
Web tool
Django app
Django app
Terminal user interface
Terminal user interface
Scraper
Scraper

ImageThief

Start

Scrape an images from dynamic websites

Proxy

?
An example of the file you can get below
?
The tool for selecting proxies for a specific site has not yet been transferred to the site.
?
An example: http:123.22.44.1:801

Mode

About image parser

Common

This is a web scraping tool that searches and downloads all images from a site. It works in 3 different modes. In the single-page parser mode, it searches and downloads images only from the specified page. In the multi-page mode. . In this case, the list of provided pages is parsed. And finally, the mode analyzes the image across the entire site and, if possible, downloads them. Although you can't stop scraping, you can close the tab and continue to scrape from the last link. Just enter the same address and mode and click Start buttons.
Scraping is implemented in single-threaded mode with user-agents swapping and proxies. Swapping and selection of the same is performed randomly using weights. That is, the more and longer you scrape the site, the better and faster the scraper will select the most effective proxies and user-agents.
To save space on the server, every day at 0:00 Moscow time I delete all collected parsing results.
This tool is developed in 2 variations. As a django application and as a separate CLI tool. Quite an important note, if I constantly update and improve the Django application, then the CLI version is not. Keep this in mind. Here is a link to Django app. Here it is a link to script.

About proxies

This tool supports proxies. Only public ones for now, but still. Here is an example of a file with proxies Can work with such proxies protocols as http, https, socks4 socks5. Also, due to the fact that the ProxyChecker tool is not ready yet, the option of automatic generation and selection of proxies for a specific site is not available.

Limitations and disclaimer

This tool has several limitations while scraping. Such as, it does not scrape svg files, it does not scrape background images specified in styles. Also dynamic scraping mode not yet implemented, but soon will. This web tool is absolutely free, the only thing I ask is, add this tool to your bookmarks, or share a link to it. Thank you.
Also, the author of this tool does not bear any responsibility for what visitors scrape. It was created solely to save time and nerves of those who simply need to collect all the images from the site.

A notes about this tool, devnotes

Has published the 9th version of ImageThief

Clock
21.11.2024
/
Clock
21.11.2024
Now proxies are available for use. Let me be honest, the implementation of this feature leaves much to be desired, but as I usually say, first make "it" work, then make "it" work better. Or something like that.

Working on proxy support for ImageThief tool

Clock
20.11.2024
/
Clock
21.11.2024
Today I worked on ImageThief. I was busy with the layout and preparation of the backend for working with proxies. And spoiler, I did everything right. I probably could have done more, but I was too lazy. By the end of this year, I plan to finish ImageThief and add two smaller tools ProxyChecker and ProxyParser.

Cleaning up the ImageThief tool

Clock
17.11.2024
/
Clock
21.11.2024
Successfully migrated .09 version of ImageThief to the server. With some major changes. Removed the ability to stop scraping, replaced Process base threading with Thread base threading. Also replaced several timers. More to come.

Similar tools

Clock
01.11.2023
/
Clock
14.05.2025
/
Terminal user interface
Scraper
An eye
286
Hearts
0
Connected dots
0
Connected dots
0
Connected dots
0
This parser is implemented as a command-line tool that allows you to choose which financial instruments or categories to parse and how to save them.
Clock
11.09.2024
/
Clock
14.05.2025
/
Terminal user interface
Scraper
An eye
207
Hearts
0
Connected dots
0
Connected dots
0
Connected dots
0
This is a dynamic site parser, with bypassing blocking and constant waiting for the site to load content. Works on Selenium, but is quite slow
Clock
11.09.2024
/
Clock
14.05.2025
/
Terminal user interface
Scraper
An eye
283
Hearts
0
Connected dots
0
Connected dots
0
Connected dots
0
This parser parses all questions in the forum in multithreaded mode. Nothing special, just an example.
Clock
11.09.2024
/
Clock
21.05.2025
/
Terminal user interface
Scraper
An eye
402
Hearts
0
Connected dots
0
Connected dots
0
Connected dots
0
This is a scraper of cs go skins and online store items. Works in multi-threaded mode, with the ability to filter and paginate all skins
Clock
11.09.2024
/
Clock
14.05.2025
/
Terminal user interface
Scraper
An eye
283
Hearts
0
Connected dots
0
Connected dots
0
Connected dots
0
This parser parses available content on the site in multi-threaded mode with user agent rotation. Simple example.
Clock
04.05.2025
/
Clock
21.05.2025
/
Web tool
Django app
Telegram bot
With graphical interface
Terminal user interface
Scraper
An eye
632
Hearts
0
Connected dots
0
Connected dots
0
Connected dots
0
This tool is a web version and skin for my library for parsing links from websites. This library has several more skins, such as a CLI script, a GUI application, a Telegram bot and as a regular python library (link-thief) available through PyPI.

Do not forget to share, like and leave a comment :)

Reviews

(0)

captcha
Send
LOADING ...
It's empty now. Be the first (o゚v゚)ノ