Hosting On Cloudflare ‘Cause I Need To

Kana

So, Cloudflare was down today, again, and this incident brought down a huge part of the Internet with it. It’s actually pretty impressive how much of the Internet infrastructure depends on Cloudflare these days.

Should we try to be more decentralized? Probably. But will I migrate my sites away from Cloudflare? Sadly, maybe not. And here is why.

Some of my machines just don’t have a public IP (neither IPv4 nor IPv6).

I won’t go into why they don’t have one: maybe you’re sharing network with many other people, or maybe your ISP isn’t kind enough to allocate one for you. But anyway, if indeed you don’t have one, Cloudflare Tunnel is actually pretty nice.

Of course, you can self-host reverse proxies like frp, which I have absolutely no idea how to configure for all my services and domains. But ignoring that, even though I have a VPS with public IPs for reverse proxies—
I don’t want to expose my public IP.

When I was trying out all the Fediverse server softwares a few years ago, I occasionally saw servers DDoSed. And my personal takeaway from this this Pleroma setup in tinfoil mode is that, never expose your source IP. And I’m not planning to get rid of this new habit of mine any time soon.

Currently, I use the VPS as an outbound proxy for my Friendica instance hosted on a Raspberry Pi (with no public IP). Since the VPS is a low-end one, I don’t expect it to have an up-time higher than Cloudflare, so using Cloudflare Tunnels for inbound traffic actually makes the service usable even when the VPS is down (which has happened a few times).
Now I have to plead guilty: I hosts some projects (including this blog) on GitHub and use GitHub actions.

When I want to publish a new blog post like this, I will commit it to the git repo, push it to GitHub, let GitHub actions build the HTML pages, and publish it to Cloudflare Pages. Yeah, GitHub is another huge centralization beast, and I’ve tried to move away from it. To do that, I might need to:
1. Move to Codeberg, or self-host Forgejo.
  - Oh, let’s not forget to prevent crawlers from querying diff pages.
2. Self-host a Forgejo runner, because the actions in some of the projects are quite heavy and I don’t want to put extra burden on Codeberg.
3. Rewrite the workflow script, which seems hard enough already.
4. Publish, uh, from Forgejo runners to my Raspberry Pi? No idea how to do that. I guess there’s existing self-host software?
Alternatively, I guess I can use GitHub actions to build the site, and then let my Raspberry Pi download the artifacts with web hooks. But then, there are still public IP issues, reverse proxies, web hook setup, authentication… Or maybe the real alternative is to use WordPress.com instead of static site generators.
Finally, I despise AI crawlers.

It’s not because my sites were affected in any way: most of them are static and should do just fine. It’s because I feel for people impacted by AI crawlers or corporate AI.

Yes, you can block bots with self-hosted software. But, it’s just too overwhelming for me, considering how fast crawlers are changing.

As for people still thinking AI crawlers are not much of a hassle, please, before you pile on Cloudflare, Anubis, or web hosters using them, please understand that:
- Not all sites are static;
- Not all people have the energy to configure/optimize their site to cater for random crawlers;
- Some dynamic pages, like git diffs, can be really expensive to compute and/or meaningless to cache;
- There are plenty of bad bots out there;
- Real people have suffered from crawlers crawlering their site, sometimes even with quite some counter-measures;
- You cannot block all bots by blocking IPs and IP ranges, and some bots now might be using hacked IoT devices, making requests from networks owned by actual human users;
- You cannot block all bots by filtering user agents, because they can lie;
- Anubis aims exactly to make the bots not pretend to be a human visitor by posing challenges for browsers, and should be used together with user agent blocks;
- Anubis (or Cloudflare I guess) can be escaped, but please understand that it worked, is still working for many, and will continue to evolve. In an ever-getting-worse cat-and-mouse game like this, what is it do you want?
  - There are some other projects working on blocking crawler request. And I believe the same argument also applies.
So with all that, finally, please understand that, it’s not the CDN; it’s not the software; it’s not the people hosting the sites; it is what has forced websites into a self-protecting posture—they either filter requests (and risk false positives), or go down and serve no request at all.

Why, you’re in a sinking ship and yet choose to complain about the remaining air chambers being too air-tight?

Comments