3 horizontal lines, burger
3 horizontal lines, burger
3 horizontal lines, burger
3 horizontal lines, burger

3 horizontal lines, burger
Remove all
LOADING ...

Content



    Someone is parsing my website from China, and then I've blocked them via htaccess file

    Clock
    22.09.2025
    /
    Clock
    11.03.2026
    /
    Clock
    11 minutes
    An eye
    3426
    Hearts
    0
    Connected dots
    0
    Connected dots
    0
    Connected dots
    0

    Backstory

    So, I'm sitting there, glued to the charts. Yes, I am obsessed with looking at charts and graphs, especially when they're trending upwards. :) I'm seeing some incredible numbers for views and bounce rates.
    My Metrica web app usually shows a bounce rate of around 15-20%. In Google Analytics, it's around 50%. And now it's a 50% bounce rate for Metrica and 60% for Google Analytics.
    This is what the site's monthly traffic graph looks like (from August 19, 2025, to September 18, 2025)
    Plus, interestingly enough, it's mostly the pagination pages that are visited. Of course, they visit other pages too, but  not so often. Here's the graph:
    What could this graph mean? I don't know; you tell me. Maybe someone decided to scrape all the links on my site?
    Naturally, I got worried. I started looking for the cause because I'd heard about traffic from China. Additionally, my site isn't adapted to Asian culture at all, not to mention translations. I could have made a joke like:
    But I don't find it funny, and what if there really are real Chinese people there? If so, no offense - just a joke.

    My investigation into where and why they came

    First, I decided to find out where they were coming from. Maybe Google started sending this kind of traffic, or Yandex. Considering the friendship between Russia and China, I wouldn't be surprised. Or maybe Baidu got to me? No, no, and no. It's all direct traffic, and no search engine has anything to do with it.
    And there could be several options:
    1. Either DDoS
    2. Or clickers
    3. Or parsers
    4. Or content copying
    Let's figure it out.

    DDOS Atack

    DDoS (Distributed denial-of-service attack) is a type of attack on a server in which multiple requests are made to the server from multiple devices, the purpose of which is to make the server unavailable to regular users.
    There are also DoS attacks, the same thing, but the requests originate from a single machine. Considering the fact that the Metrica records access from many IP addresses, it's more likely a DDoS attack, but not quite.
    You see, there aren't that many requests. There should be thousands, tens of thousands... per hour. I'm sure it's not a DDoS attack. On graphs, a DDoS attack might look something like this:
    If you're interested, you can read Google's report on the most powerful DDoS attack in their history here - https://cloud.google.com/blog/products/identity-security/google-cloud-mitigated-largest-ddos-attack-peaking-above-398-million-rps/. More than 398 million requests per second!

    Clickers, or a type of negative SEO

    A clicker is a special program designed to simulate bad (or good) behavioral factors on a website, with the goal of deceiving search engines.
    Clear and obvious signs of clickers are:
    1. Abnormally low or high click depth
    2. Short time on site
    3. The source of traffic is a search engine.
    The most important thing here is that the source must be a search engine; otherwise, it's just an annoying bot clogging your statistics. That's all.
    As you remember, my traffic is direct. And the pages this bot views are pagination pages, meaning pages that generally don't contribute to rankings.
    Direct traffic is a source of traffic that originates from bookmarks and direct URL typing into the search bar. It could also be browser bookmarks or any other source unidentified by metrics systems.
    Therefore, I don't think it's a clicker, or at least not the smartest one.

    Perhaps it's a parser?

    A parser is a special program designed to collect necessary content quickly and in large quantities, without human intervention.
    And I'm probably right in saying that this is most likely it. The thing is, parsers tend to favor paginated pages. After all, they allow you to find all the content on a website.
    But a couple of questions still arise. For example:
    1. Why not look at the sitemap.xml if you really need that list of articles on the website?
    2. Why make it so complicated? Why rotate user agents and use proxies? My anti-parser protection isn't that strong.

    My solution

    In any case, it doesn't really matter why someone decided to sic a bunch of bots on my site. What matters more is what I do about it. Will I be a softie and just tolerate these bots, or can I take some countermeasures to make life harder for them (whoever sicced them on me).
    What I can do:
    1. Block an entire country via the .htaccess file on the server.
    2. Change the site structure to make it harder to parse the content.
    3. Use some kind of bot check, like adding a captcha or something like that.
    Well, since I'm in a hurry to minimize the damage to my site, whatever it may be, I've decided to block an entire country from visiting it. At least it will hold them temporarily.

    How to block an entire country using an .htaccess file

    Before we start doing things that could harm the site, let's do some investigation. Firstly, we need to know what an .htaccess file is. Secondly, how it works. Thirdly, why we would do this. At least, until we find another, more flexible solution.
    .htaccess(hyper text access) - is a file that allows you to control the operation, that is, access to content, of the Apache web server through special directives.
    The Apache web server is an open-source web server, also known as the Apache HTTP server, which stores website files and responds to user browser requests, delivering web pages, images, and other content via the HTTP(S) protocol.
    A directive in .htaccess is a command that specifies which pages and who should be allowed or denied access.
    Directives consist of three parts:
    1. Defining order
    1. Order deny,allow - Deny all IP addresses except those listed from responding
    2. Order allow,deny - Allow all IP addresses except those listed from responding
    2. Action
    1. Denyfrom — Deny (respond to requests) from
    2. Allow from — Allow (respond to requests) from
    3. Adress/Range/Classless addressing
    1. 1.0.8.0 - Specific IP
    2. 1.0.8  -  Will be blocked all IPs from 1.0.8.0 to 1.0.8.255
    3. 1.0.8.0/24 - Using classless inter-domain routing, you can flexibly specify the desired ranges like this.
    In my specific case, blocking all addresses from China looks like this:
    PassengerEnabled On PassengerPython /home/t/timachuduk/timthewebmaster.com/venv_new/bin/python Order Allow,Deny Allow from all # China Deny from 1.0.1.0/24 Deny from 1.0.2.0/23 Deny from 1.0.8.0/21 ... ## China
    Some clarification is needed. So, the first line activates the web server for Python applications – Phusion Passenger. Then, for this web server, we select a virtual environment. But that's a different topic. Let's move on to the web content access rules.
    1. Order allow,deny – allows access to the web resource to everyone except those listed below.
    2. Allow from all – allows access to everyone.
    3. Deny from ADDRESS – denies access to the resource to specific addresses.
    You can replay it and, for example, deny access to everyone except those listed below, like this:
    PassengerEnabled On PassengerPython /home/t/timachuduk/timthewebmaster.com/venv_new/bin/python Order Deny,Allow Deny from all # China Allow from 1.0.1.0/24 Allow from 1.0.2.0/23 Allow from 1.0.8.0/21 ... ## China
    Now only Chinese people will be able to watch my content :)
    But there are so many Chinese IP addresses, and how can you fit them all into one file? Yeah... I can also add that not only are there so many, but they change, and tracking so many addresses—hundreds of millions—is becoming an impossible task.
    Cloud services like CloudFlare are suggested as a solution. But I have slightly different suggestions. The first is fairly fast but temporary. The second takes longer to implement and requires individual configuration on each hosting provider, but it provides an up-to-date list of IP addresses.
    First, you need to find the corresponding IP addresses and their ranges by the countries they're located in. I recommend this website; it not only provides a convenient way to quickly copy addresses but also allows for parsing, making it very convenient for custom solutions like mine.

    A simple way: blocking traffic from a specific country

    On this website, you can download the file you need with IP addresses by country, then copy its contents directly into your .htaccess file.
    Don't forget to add an action before each range, either "Deny from" or "Allow from". You can download a special Python script for this, adding the appropriate prefix to each line.
    Save the file and restart the web server. If everything is done correctly, nothing will change for you, but access to the site will be blocked for Chinese users. However, you could have accidentally blocked yourself and your country; this is what it would look like:
    This is if you accidentally blocked yourself and your country
    If there was an error in the syntax of the .htaccess file, you will see something like this:
    Something is wrong on the server's side.
    I'd like to point out that my site has implemented its own 403 (Forbidden) and 500 (Server Error) response codes. These codes are not displayed when blocked in this way.

    The hard way: writing a custom bash script

    The core of this method is essentially the same; only here the entire process is automated. I'll write a Bash script and create a task to run it at the beginning of each day. The script below will:
    1. Download and update the list of IP ranges.
    2. Add - Deny from
    3. Insert it into the .htaccess file.
    #!/usr/bin/bash COUNTRY_CODE="$2" INPUT_FILE="$COUNTRY_CODE-ip-groups.txt" OUTPUT_FILE=".htaccess" TO_DOWNLOAD_URL="https://www.ipdeny.com/ipblocks/data/countries/$COUNTRY_CODE.zone" BLOCK_START="##$COUNTRY_CODE START" BLOCK_END="##$COUNTRY_CODE END" PREFIX="Deny from" ALLOWED_ACTIONS=("--add" "--remove" "--update") ALLOWED_CODES=(aa ab ac ad ae af ag ah ai aj ak al am an ao ap aq ar as at au av aw ax ay az ba bb bc bd be bf bg bh bi bj bk bl bm bn bo bp bq br bs bt bu bv bw bx by bz ca cb cc cd ce cf cg ch ci cj ck cl cm cn co cp cq cr cs ct cu cv cw cx cy cz da db dc dd de df dg dh di dj dk dl dm dn "do" dp dq dr ds dt du dv dw dx dy dz ea eb ec ed ee ef eg eh ei ej ek el em en eo ep eq er es et eu ev ew ex ey ez fa fb fc fd fe ff fg fh "fi" fj fk fl fm fn fo fp fq fr fs ft fu fv fw fx fy fz ga gb gc gd ge gf gg gh gi gj gk gl gm gn go gp gq gr gs gt gu gv gw gx gy gz ha hb hc hd he hf hg hh hi hj hk hl hm hn ho hp hq hr hs ht hu hv hw hx hy hz ia ib ic id ie "if" ig ih ii ij ik il im in io ip iq ir is it iu iv iw ix iy iz ja jb jc jd je jf jg jh ji jj jk jl jm jn jo jp jq jr js jt ju jv jw jx jy jz ka kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq kr ks kt ku kv kw kx ky kz la lb lc ld le lf lg lh li lj lk ll lm ln lo lp lq lr ls lt lu lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv nw nx ny nz oa ob oc od oe of og oh oi oj ok ol om on oo op oq or os ot ou ov ow ox oy oz pa pb pc pd pe pf pg ph pi pj pk pl pm pn po pp pq pr ps pt pu pv pw px py pz qa qb qc qd qe qf qg qh qi qj qk ql qm qn qo qp qq qr qs qt qu qv qw qx qy qz ra rb rc rd re rf rg rh ri rj rk rl rm rn ro rp rq rr rs rt ru rv rw rx ry rz sa sb sc sd se sf sg sh si sj sk sl sm sn so sp sq sr ss st su sv sw sx sy sz ta tb tc td te tf tg th ti tj tk tl tm tn to tp tq tr ts tt tu tv tw tx ty tz ua ub uc ud ue uf ug uh ui uj uk ul um un uo up uq ur us ut uu uv uw ux uy uz va vb vc vd ve vf vg vh vi vj vk vl vm vn vo vp vq vr vs vt vu vv vw vx vy vz wa wb wc wd we wf wg wh wi wj wk wl wm wn wo wp wq wr ws wt wu wv ww wx wy wz xa xb xc xd xe xf xg xh xi xj xk xl xm xn xo xp xq xr xs xt xu xv xw xx xy xz ya yb yc yd ye yf yg yh yi yj yk yl ym yn yo yp yq yr ys yt yu yv yw yx yy yz za zb zc zd ze zf zg zh zi zj zk zl zm zn zo zp zq zr zs zt zu zv zw zx zy zz) # Send message depends on status codes status() { if [ $? -eq 0 ]; then echo " OK: $1" else echo " ERR: $2" exit fi } # Check if first value in array has_arg() { local term="$1" local array_name="$2[@]" local array=("${!array_name}") shift for array_element in ${array[@]} ; do if [[ $array_element == "$term" ]]; then return 0 fi done return 1 } # Download the file contains the IP's download_the_ips() { STATUS_CODE="$(curl -s -o /dev/null -w "%{http_code}" $TO_DOWNLOAD_URL)" if [ "$STATUS_CODE" != "200" ]; then echo "ERROR: Couldn't find the source: $TO_DOWNLOAD_URL" exit fi curl -L $TO_DOWNLOAD_URL > $INPUT_FILE echo "SUCCESS: Downloaded file: $TO_DOWNLOAD_URL" } remove_block() { # Clear everything in between # Dont redirect the output like >> or > it will wipe your file, edit in place sed "/$BLOCK_START/,/$BLOCK_END/d" $OUTPUT_FILE -i status "The block $COUNTRY_CODE removed" "Couldn't remove the block $COUNTRY_CODE" } insert_ips() { # Insert prefix before each line echo "$BLOCK_START" >> $OUTPUT_FILE sed "s/^/$PREFIX /" $INPUT_FILE >> $OUTPUT_FILE echo "$BLOCK_END" >> $OUTPUT_FILE echo "SUCCESS: Inserted IPs" } # Procceed only if 2 positional arguments are present if has_arg "$1" "ALLOWED_ACTIONS" && has_arg "$2" "ALLOWED_CODES"; then if [ "$1" == "--add" ] || [ "$1" == "--update" ]; then download_the_ips remove_block insert_ips rm $INPUT_FILE elif [ "$1" == "--remove" ]; then remove_block fi exit 0 fi exit 1
    For example, if you run the command below:
    ./block-country.sh --add cn
    Then you'll block all traffic from China. You can find more commands and options in my repository, where you can download this script.
    Now, to keep the list of IP addresses constantly updated, you can create a Cron task that will run the block-country.sh script daily, for example.
    This article is already quite long, and creating and configuring a Cron task deserves a separate article, which will appear soon. Links to them will also appear here – don't forget to subscribe to email notifications and the RSS feed.

    Conclusions and results

    After successfully adding IP addresses from China to htaccess, traffic dropped as expected, and bounce rates and time on site improved dramatically. This refers to average viewing time and average bounce rate.
    So, it was Sunday, and I blocked China at 12:00.
    That's how it is. It's unpleasant, but there's more to come. :) I hope this article helped me figure out what's wrong with your site, why you're suddenly so inundated with Chinese traffic, and what can be done about it.

    Do not forget to share, like and leave a comment :)

    Comments

    (0)

    captcha
    Send
    LOADING ...
    It's empty now. Be the first (o゚v゚)ノ

    Other

    Similar articles


    SEO website audit of the website by January and February

    Clock
    03.03.2025
    /
    Clock
    11.03.2026
    An eye
    649
    Hearts
    0
    Connected dots
    0
    Connected dots
    0
    Connected dots
    0
    About what my site achieved during SEO promotion in January-February. Analysis of traffic from Google via GSC and analysis of traffic from Yandex using Yandex.Webmaster. Also provided are full statistics …

    SEO Optimization Case Study: How Meta Tag Tweaks and Link Building Boosted Impressions by 55%

    Clock
    13.09.2025
    /
    Clock
    11.03.2026
    An eye
    1226
    Hearts
    0
    Connected dots
    0
    Connected dots
    0
    Connected dots
    0
    A real-world SEO experiment: refining meta tags, building quality backlinks, and removing low-value pages — resulting in +55% impressions and +17% clicks in Google.