German government parser

Main
Goals
Solution
Result

Get to know all the people in power in Germany

Due to the large amount of work and the need to check the results, this parser is divided into two parts.

The first part simply downloads all the necessary pages for further processing.

The second part deals with data collection.

The results are output in JSON and CSV formats.

Goals

  • Save all data in CSV format.

  • Save copies of government member pages.

  • Save First Name, Last Name, Position, Contacts, and Description of government member

Solution

Website parsing is divided into two stages.

First stage. Collection of raw materials. Raw material refers to the pagination pages and cards of government members.

For this, I used the python package requests.

After which, the second stage begins. Analysis of downloaded pages and data collection.

The beautifulsoup4 package is already used for this.

As a result, we get two files in JSON and CSV formats.

Result

We have a parser that can parse all members of the German government.

Sources can be viewed here

Repository

Download the script here

Archive

Additional materials


heart 0
3 connected dots 0