
How I Load Balanced My Portfolio, Applications, and Databases
One of the things I believe is that your portfolio should be the best showcase of yourself. In the context of a website, this means it should be extremely fast and there should be zero bugs. Yes, zero. I know that sounds tedious, even impossible. However, if someone is interested in you but finds bugs or has a bad experience with your portfolio website, it can diminish your reputation. One of the important factors is the speed and availability of your website. I self-host my website on a cloud server. I used to deploy on the edge with Cloudflare, then Netlify, but neither option met my ever-growing needs.
There is one problem with hosting my website on one cloud server: maintenance. If I have to reboot the server for updates, it could impact the availability of my portfolio. It’s very rare for a server to just “stop working” when you haven’t touched it (which is why the CrowdStrike issue last year was such a huge deal). However, there is a chance of things going wrong when doing updates or redeploying my portfolio. This is where I decided to load balance the entire server. Not just my portfolio website, but the actual demo applications and the databases as well. This blog post will be more of a summary of my experiences rather than a tutorial.
When you load balance an application, you have to ensure that all the servers hosting it have the same application and data. If there’s a database involved, this means replicating it between each server. One server is typically the master and all the others are replicas. Ensuring the actual code is the same is also important. You wouldn’t want half of your servers on version 2.0 and half on version 3.0 of an application. There could be differences in the actual application and things could easily break if they’re not on the same version.
The first thing I load balanced was my actual portfolio. This was an easy task as there is no database to keep in sync. I created a Node.js script to accept webhook events from GitHub. Every time I push a new commit, it sends a signal to the two servers to redeploy the portfolio. This means I don’t have to login, pull the changes, rebuild the application, and restart Node.js. As for the actual load balancing, I just set two A records pointing to both of the servers. This is a very basic load balancing setup. It doesn’t give you much flexibility or control over which servers people connect to. I found out that basically every request was only being sent to the second server. When shutting it down, it wouldn’t know how to switch to the first server. This is likely due to DNS caching within my router. It caches the IP address and doesn’t let go of it for some time. This is a feature built into most routers to make websites feel like they’re loading faster. If you don’t have to query the IP address for a website every time, it can make things load faster. The downside is if the IP address changes, you could run into errors.
Due to all of these issues, I switched to the load balancer by Hetzner. My servers are hosted there, so it made sense to use their load balancer. Cloudflare was also an option, but their load balancer is more complex and expensive. By far the biggest issue was getting TLS working. TLS is one of the technologies used to encrypt traffic between you (the client) and the server. It took some effort to get the Hetzner load balancer to issue an SSL certificate and get everything working between the load balancer and my server. I ended up removing HTTPS from the applications themselves. TLS now happens at the load balancer, not the backend. This setup is technically less secure because the traffic between the load balancer and my servers is unencrypted, but since it’s on a private network that I control, it’s an acceptable tradeoff for my use case. In a true enterprise environment, traffic between the load balancer and backend would typically be encrypted as well.
The Survey Builder application was incredibly easy to load balance. It uses MongoDB for the database. The Survey Builder is also built using Astro and Svelte, just like my portfolio. Getting the same GitHub webhook system working that I built for my portfolio working was just repeating the same steps. However, having the same code running on all servers means nothing if the data between them isn’t in sync. MongoDB has first class support for replication. It’s a built-in feature, and it took minutes to get setup. I then modified my connection string to add both IP addresses and everything was working flawlessly. I had to make no changes to the actual code in the Survey Builder application. MongoDB uses an election system. It communicates between all of the servers to automatically figure out who is the master. I was shocked at how easy it was to get replication working.
Next is the Mailroom Management System. That application was built with MySQL or MariaDB as the database engine. However, I have done replication with MariaDB before. It is an awful experience and not something I wish to ever do again. Since the Mailroom Management System uses LINQ and EF Core, switching out the actual database engine is trivially easy. There are no queries to rewrite as LINQ abstracts the actual SQL queries. I decided to use Postgres as a replacement. There are numerous options for replication and it is a much more robust database engine overall. All I had to do was delete the EF Core migrations, as they were made for MySQL. I just had EF Core recreate the migrations in Postgres, changed the connection string, and everything just worked. However, that was only one server. I didn’t have replication setup yet.
I decided to use a tool called Patroni to do the Postgres replication. It basically uses another tool called etcd to inter-communicate between the servers and determine a leader. It then can replicate the master to the replicas and ensure data consistency. It took quite a bit of time to get setup, due to it inherently being more complex. There were many issues with authentication, but that had to do with me not fully understanding how the tool authenticated to the database. Postgres also has a strict firewall and you have to manually add the IPs you want to allow to connect. It took quite a bit of debugging and firewalling to get everything to communicate between servers. I then had to make systemd files to tell the etcd and Patroni applications to run in the background. It definitely was much more work than MongoDB. However, I am an experienced Linux system admin and have done this kind of thing before. There definitely was not an easy-to-follow tutorial for any of this and there was a lot of trial and error.
But we’re still not done with the Mailroom Management System yet. The application has no idea which server is the master. I had to add some logic to query the etcd API from both servers to determine which one is the master. That master’s IP address gets put in the connection string. However, this only runs at build time. I had to add an interceptor to detect any database errors, and then re-resolve the master if it changed while the application was running. This was a personal preference. I could have set up something like HAProxy to dynamically route traffic to the master. But since I wrote the Mailroom application, it was easier to just add this logic myself.
Everything is working now, but of course a retrospective of failure points is still important. If both servers simultaneously crash, the load balancer can’t save it. Both servers are geographically located in the same datacenter. If something were to happen geographically, there aren’t any backup servers in other regions. The load balancer itself also takes around 30 seconds to recognize one of the servers is unhealthy. It’s important to “drain” the existing connections before doing any maintenance. This basically means gracefully taking down one of the servers and making the other the master. If I had been using a more advanced load balancer, I could do more advanced health checks and set the status to “draining” and tell the load balancer to not direct traffic to this server anymore.
It’s important to not try and overbuild everything at once, though. It would have been a tremendous learning curve to try and replicate Postgres first. Starting out with my portfolio was simpler and taught me important concepts before jumping to more advanced load balancing. There are also more advanced ways of doing the load balancing setup. I could buy two more cloud servers and set them to load balance in sync with a floating IP. However, this is significantly more expensive than just using the Hetzner load balancer.
I could also set up something like Ansible to automatically deploy cloud servers with the same configurations and keep them in sync. However, this would require more work in learning how to dynamically join and unjoin servers from the MongoDB and Postgres clusters. I’d also have to dynamically add and remove the IP addresses from the load balancer. There’s a lot of potential for growth with this system, but taking steps to make it more robust is important. It’s a tremendous learning curve to do it all at once.
Fundamentally, all of this boils down to it being a learning experience for me. Some could argue that I am “overbuilding” this infrastructure and causing more room for errors than if I just had one server. After all, it is more complex to load balance two servers and all the applications rather than just hosting it all on one server. However, it’s more about learning than it is building the same infrastructure Google or Facebook uses.