About a scraper of text
Online tool to scrape text, headers and source code (just use CSS selector) from websites, web pages and lists of pages. With subsequent basic processing, which includes the number of words, the number of unique words and collecting a list of the frequency of occurrence of these words in the text.
This tool works in 3 modes. Parsing mode from one page, from a list of pages and from the entire site.
This web page text parser is also a webimplementation of the text-thief python library. Which provides general functionality for working with text. There is also an implementation in the form of a command line tool, which is much easier to understand and study. This library is available via PiPI, or you can install its sources directly from here.
Reviews
(0)