About image parser
Common
This is a web scraping tool that searches and downloads all images from a site. It works in 3 different modes. In the single-page parser mode, it searches and downloads images only from the specified page. In the multi-page mode. . In this case, the list of provided pages is parsed. And finally, the mode analyzes the image across the entire site and, if possible, downloads them. Although you can't stop scraping, you can close the tab and continue to scrape from the last link. Just enter the same address and mode and click Start buttons.
Scraping is implemented in single-threaded mode with user-agents swapping and proxies. Swapping and selection of the same is performed randomly using weights. That is, the more and longer you scrape the site, the better and faster the scraper will select the most effective proxies and user-agents.
To save space on the server, every day at 0:00 Moscow time I delete all collected parsing results.
This tool is developed in 2 variations. As a django application and as a separate CLI tool. Quite an important note, if I constantly update and improve the Django application, then the CLI version is not. Keep this in mind. Here is a link to
Django app. Here it is a link to script.
About proxies
This tool supports proxies. Only public ones for now, but still.
Here is an example of a file with proxies
Can work with such proxies protocols as http, https, socks4 socks5. Also, due to the fact that the ProxyChecker tool is not ready yet, the option of automatic generation and selection of proxies for a specific site is not available.
Limitations and disclaimer
This tool has several limitations while scraping. Such as, it does not scrape svg files, it does not scrape background images specified in styles. Also dynamic scraping mode not yet implemented, but soon will. This web tool is absolutely free, the only thing I ask is, add this tool to your bookmarks, or share a link to it. Thank you.
Also, the author of this tool does not bear any responsibility for what visitors scrape. It was created solely to save time and nerves of those who simply need to collect all the images from the site.
A notes about this tool, devnotes
Cleaning up the ImageThief tool
17.11.2024
Working on proxy support for ImageThief tool
20.11.2024
Has published the 9th version of ImageThief
21.11.2024
Reviews
(0)