Skip to main content
How I Load Balanced My Portfolio, Applications, and Databases

How I Load Balanced My Portfolio, Applications, and Databases

Your portfolio should be the best showcase of yourself. For a website, that means it should be extremely fast and there should be zero bugs. Yes, zero. I know that sounds tedious, even impossible. However, if someone is interested in you and finds bugs or has a bad experience with your site, it reflects poorly on you. Speed and availability matter just as much as the content itself. I self-host my website on a cloud server. I used to deploy on the edge with Cloudflare, then Netlify, but neither option did what I needed.

The problem with hosting on a single cloud server is maintenance. If I have to reboot for updates, my portfolio goes down. It’s very rare for a server to just “stop working” when you haven’t touched it (which is why the CrowdStrike incident was such a huge deal). However, there’s always a chance of things going wrong when doing updates or redeploying. That’s what led me to load balance the entire server. Not just my portfolio website, but the demo applications and the databases as well. This post is more of a summary of my experiences than a tutorial.

When you load balance an application, you have to make sure that all the servers hosting it have the same application and data. If there’s a database involved, this means replicating it between each server, where one server is typically the master and all the others are replicas. The code matters just as much. You wouldn’t want half of your servers on version 2.0 and half on version 3.0 of an application. Things could easily break if they’re not on the same version.

The first thing I load balanced was my actual portfolio. This was easy since there’s no database to keep in sync. I created a Node.js script to accept webhook events from GitHub, so every time I push a new commit, it signals both servers to redeploy the portfolio. That means I don’t have to login, pull the changes, rebuild the application, and restart Node.js. As for the actual load balancing, I just set two A records pointing to both servers. This is the most basic load balancing you can do, and it doesn’t give you much flexibility or control over which server people connect to. What I found was that basically every request was only hitting the second server. When I shut it down, traffic wouldn’t switch to the first. This was likely due to DNS caching in my router. Routers cache IP addresses so they don’t have to query DNS every time, which makes websites feel like they load faster. However, the downside is that if the IP address changes, you run into errors.

Because of all these issues, I switched to the load balancer offered by Hetzner. My servers are already hosted there, so it made sense to use theirs. Cloudflare was also an option, but their load balancer is more complex and expensive. By far the biggest issue was getting TLS working. TLS is one of the technologies used to encrypt traffic between you (the client) and the server. It took some effort to get the Hetzner load balancer to issue an SSL certificate and get everything communicating properly. The way I ended up solving it was by removing HTTPS from the applications themselves. TLS now terminates at the load balancer, not the backend. This is technically less secure because the traffic between the load balancer and my servers is unencrypted, but since it’s on a private network that I control, it’s an acceptable tradeoff. In a true enterprise environment, you’d encrypt that hop as well.

The Survey Builder application was incredibly easy to load balance. It uses MongoDB for the database and is built using Astro and Svelte, just like my portfolio. Getting the same GitHub webhook system working was just repeating the same steps. However, having the same code running on all servers means nothing if the data between them isn’t in sync. It turns out MongoDB has first-class support for replication. It’s a built-in feature, and it took minutes to set up. I modified my connection string to add both IP addresses and everything just worked without any changes to the actual application code. MongoDB uses an election system where it communicates between all of the servers to automatically figure out who is the master. I was honestly shocked at how easy the whole thing was.

Next was the Mailroom Management System. That application was originally built with MySQL/MariaDB as the database engine. I have done replication with MariaDB before, and it is an awful experience. Not something I ever want to do again. Since the Mailroom Management System uses LINQ and EF Core, switching the database engine is trivially easy. LINQ abstracts the actual SQL queries, so there’s nothing to rewrite. I decided to go with Postgres as a replacement. It has far more options for replication and is a stronger database engine overall. All I had to do was delete the EF Core migrations (since they were made for MySQL), have EF Core recreate them for Postgres, change the connection string, and everything just worked. However, that was only one server. I still didn’t have replication set up.

The tool I chose for Postgres replication was Patroni. It uses another tool called etcd to communicate between the servers and determine a leader, and then it replicates the master to the replicas. It took quite a bit of time to get set up because it’s inherently more complex than what MongoDB offers. Most of my issues were with authentication, and that came down to me not fully understanding how Patroni authenticated to the database. Postgres also has a strict firewall where you have to manually add every IP you want to allow to connect. It took a lot of debugging and firewall configuration to get everything communicating between servers. I also had to write systemd unit files so etcd and Patroni would run in the background. It was definitely much more work than MongoDB. I’m an experienced Linux system admin and have done this kind of thing before, but there was no easy-to-follow tutorial for any of this. A lot of it was trial and error.

Even after all of that, I still wasn’t done with the Mailroom Management System. The application itself had no idea which server was the master. I had to add some logic to query the etcd API from both servers to determine the current master, and that IP address gets put in the connection string. However, this only runs at build time. If the master changes while the application is running, it needs to know about it. So I added an interceptor to detect database errors and re-resolve the master. I could have set up something like HAProxy to dynamically route traffic to the master, but since I wrote the Mailroom application, it was easier to just handle this logic myself.

Everything is working now, but it’s worth thinking about where it can still fail. If both servers simultaneously crash, the load balancer can’t do anything about it. Both servers are in the same datacenter, so a geographic event would take them both out. The load balancer also takes around 30 seconds to recognize that a server is unhealthy. Before doing any maintenance, it’s important to “drain” the existing connections first, which basically means gracefully pulling one server out of rotation and letting the other take over. If I had been using a more advanced load balancer, I could do more advanced health checks and explicitly set a server’s status to “draining” so it stops receiving new traffic.

It’s also worth noting that I didn’t try to do all of this at once. If I had jumped straight into Postgres replication on day one, I would have been in over my head. Starting out with my portfolio was simpler and taught me the basics before moving on to database replication. There are more advanced ways of doing this too. I could buy two more cloud servers and set them up to load balance in sync with a floating IP. However, that’s significantly more expensive than just using the Hetzner load balancer.

I could also set up something like Ansible to automatically deploy cloud servers with identical configurations and keep them in sync. However, that would mean learning how to dynamically join and remove servers from the MongoDB and Postgres clusters, and also dynamically adding and removing IP addresses from the load balancer. There’s room to grow here, but trying to learn everything at once is a good way to learn nothing.

Yes, I am probably “overbuilding” this infrastructure. It is more complex to load balance two servers and all the applications than to just host everything on one box. After all, there are more points of failure. However, this was always a learning exercise for me. It’s about understanding how these systems actually work, not building the same infrastructure Google or Facebook uses.