This article is written by Shaun Smith, an engineering fellow at ExpressVPN and the creator of TrustedServer.
In this article, we’re taking a deep dive into TrustedServer, our industry-leading VPN server technology. We will look at this system by comparing it to a gourmet meal that’s been designed with creativity, taste-tested by experts, and executed to perfection by world-class chefs.
First, we will look at a diagram showing the broad design of TrustedServer. Don’t worry about understanding this diagram right now, but do refer back to it to help visualize the system and how it all bolts together.
We’ll dive into the various components in this diagram and walk through each one at a high level to give you an idea of what it means for you, as a user, and how it all comes together.
Jump to…
Writing the recipe: The operating system
The taste test: Build verification and signing
Soft opening: Deployment process
Mass market: Orchestration
Fat-free: No logs
Summary: You really can have your cake and eat it
Writing the recipe: The operating system
The operating system powering the core of our TrustedServer platform is a Linux distribution whose make-up is defined as source code. That isn’t to say we are building the Linux kernel from source, since that would be extremely involved and introduce unnecessary risk (we leave that to the experts building operating systems for a living). This source code is responsible for defining how the operating system is bolted together: which packages are installed, how services are configured, and what exactly should be running.
This is essentially a recipe describing all the ingredients needed, and when and how they are put together to produce the tasty morsel that is ExpressVPN’s TrustedServer.
The result of cooking up this recipe is a single special file known as an ISO image. If you’ve ever installed Ubuntu Linux or even digital copies of Windows, you may be familiar with the idea of downloading an ISO file, booting it and using it before having to write any data to your hard drive (this is also known as a Live build). That special ISO file contains the entire operating system, allowing you to load it up simply by restarting your computer.
Some of the benefits to having the entire operating system defined as source code in this way include:
- Versioning. Every code change results in a new version of TrustedServer. That means we know exactly what is running across our fleet based on this unique version identifier alone.
- Two-person code reviews. All code changes to TrustedServer must be reviewed by a second individual. In fact, this is a requirement for all of our code at ExpressVPN, not only TrustedServer. For the core code bases, a third review is often required by a Code Owner. They are experts who are intimately familiar with TrustedServer and responsible for shepherding its development in line with our vision.
- Tested. We can run automated test suites on every single code change to ensure the reliability, quality, and integrity of the changes that are being made.
- Repeatability. This means that we can take the same point in the source code’s history (i.e., a commit, in Git parlance) and rebuild the TrustedServer operating system as it would have been at that point in time. That is great for rollbacks, debugging, and auditing.
The taste test: Build verification and signing
We have a clear recipe that outlines exactly what TrustedServer will look like. But how do we ensure that the recipe is followed to the letter every time we whip up a build? How can we have enough confidence in the outcome that we are proud to put our brand on it?
We must ensure that nobody manages to slip in their own ingredients along the way. This is a hugely complex problem because it involves many moving parts from the source code, to the build servers responsible for taking the source code and turning it into the final product, to the engineers themselves who are building and releasing it.
We’ve already covered some of our source code controls above, including two-person code reviews and code owners, but we also defend against impersonation of an ExpressVPN engineer by ensuring that every code change is cryptographically signed using a YubiKey, a physical signing key that uniquely identifies each engineer. This key requires physical touch confirmation in order to limit remote threats.
You’ll see this concept of multiple layers of security throughout. Defense in depth.
Securing build machines is much more difficult. These are the servers that take the source code, interpret it, and compile it into a finished product ready for release. Think of them as the chef putting the dish together. This is the point in the process where a spurious ingredient may slip in, souring our recipe.
The way we ensure that every release of TrustedServer has been built to our exact recipe is through a concept called reproducibility. What this means is that two or more chefs (build servers) take the exact same source (the recipe) and execute it independently, and producing an independently built OS image. We then take those two OS images and compare them using what is known as a cryptographic hash. If the two OS images are identical, so too is their hash. If there is a single change to even a single character in a single file within the OS image, their hashes will no longer match.
We then compare the images hash to ensure they are identical before proceeding. This process is publicly broadcast within the organization, and every engineer has the ability to independently verify the reproducibility of a release at any time. This level of transparency helps ensure consistency of every release.
Once TrustedServer is confirmed reproducible, it’s time to add our brand to it. Think of this like a famous cake maker adding their name to the three-tier masterpiece they just spent a week creating. Only ours is digital: a digital signature, again using hardware keys requiring physical presence.
This special digital signature uniquely identifies the signatory, but more than that, it is only valid alongside the exact file that has been signed. Similar to reproducibility, should the OS image be changed by a single character, this signature will no longer be valid. That is a very important concept; it protects against tampering of the OS image even after it is distributed around the world.
This is an involved process. But thankfully, we only need to do this once per TrustedServer release at the source. That’s thanks to our Build Once, Ship Everywhere policy.
Soft opening: Deployment process
We now have a signed OS image. The signature signifies that it was built from our verified source code, and that multiple build environments interpreted that source code in the exact same way, producing an identical result. More than that, a person has been involved in broadcasting this process within the organization and a TrustedServer engineer has put their unique signature on it, locking it in its current state for release.
Now is the time to send it out into the world. The first step here is deployment. That means taking the OS image and uploading it to our CDN for distribution to the fleet. But there is a very important step to take before we consider doing that: validating that all-important signature.
The OS image must be paired with a signature that is both approved, meaning it belongs to a pre-compiled list of authorized engineers, and valid, meaning the contents of the OS image are an exact match to what was originally signed. In fact, we use signature verification for security and data integrity in much of what we do.
It’s time for a taste test. First this new release will go to our special internal-use VPN servers, which we like to call “dogfood.” These special servers are used by our own employees on a daily basis as early adopters. That way we pick up any minor issues that might have slipped through our many layers of testing. Following that, we push it out further to selected pre-production servers, which hold a variety of different setups to ensure we haven’t introduced any unforeseen regressions across our many features.
And from there to the global fleet.
Mass market: Orchestration
ExpressVPN has numerous servers located in 105 countries. In order to upgrade a server, we ship an entire operating system and reboot into it, effectively reinstalling it afresh—which is obviously quite an involved process. A pertinent question at this point might be: How often do we upgrade?
The answer may surprise you: Every week.
That’s right, we effectively reinstall our servers globally every single week, without any customer downtime. We do that using our custom orchestration systems that are responsible for upgrading servers in waves, based on the service offerings they provide. This ensures that every single VPN protocol, within every location, always has enough capacity to handle our customer’s traffic while groups of servers reboot into their new OS.
We do this for a few key reasons.
First, the more frequently we push out these upgrades, the closer we track the upstream software from the Linux distribution. That means we pull in updates not only to the software installed on the server, but also to the core OS such as the Linux kernel and critical libraries like OpenSSL. Not only that, but because our upgrades necessitate a server reboot, those new packages are also immediately in use. The same is not true for traditional auto-upgrades in Linux; consider how you need to restart your laptop every time Windows has updates available.
Second, upgrades are atomic. That is to say the entire operating system ships as a single unit with all of the new packages included. There is no need to install updates in a particular order or modify any configurations along the way. There are no blips when applications restart. There is no inconsistency between servers. The entire operating system boots up in its upgraded state, it either works everywhere or nowhere.
Finally, because we are running in RAM only, every weekly upgrade cycle also results in the server forgetting everything it may have known from the prior week. As a result there are never any secrets to obtain from a hard disk, such as cryptographic keys. In the unlikely event that a server is compromised by an attacker, this upgrade cycle also makes it difficult to persist across a reboot—partly because they are unable to modify the OS image, but also because we are pulling in all security patches from upstream at least weekly.
Fat-free: No logs
We’ve touched on the fact that TrustedServer runs entirely in RAM. We know it forgets everything when it reboots, and that we go through a reboot cycle every single week. But that wasn’t enough for us. Our thinking goes: What if an attacker were to compromise a server toward the end of that week? There would be a treasure trove of information in the logs that are about to be forgotten. This will simply not do; PII is a toxic asset.
Since “no logs” is so core to what we do, when building TrustedServer we took a layered approach to enforcing it.
Starting at the very top: the VPN process itself (such as OpenVPN) has a natural desire to produce logs, so it’s vitally important that we tell them not to via their configuration. An exception here being Lightway which was built from the ground up not to log. Some older protocols don’t offer any such configuration, so in those cases we had to modify the source code of the application to forcefully remove their log lines.
Where possible to configure no logging within the application, we’ve done so. But we weren’t comfortable with how easily that configuration could change by mistake—so we also implemented a suite of code tests that assert no logging configuration has crept in. These tests run on every single configuration change and give us high confidence in our no-logging configuration.
Should any logging events creep through those stringent measures, they would then be received by the VPN Process Manager, the application responsible for starting and maintaining the various VPN services running on a VPN server. This application is configured to automatically discard any output received from the VPN application, sending it to a special location in Linux known as /dev/null (a black hole). This layer therefore protects against failure of any layer above. But the VPN process manager itself may also create logs, despite being configured not to do so.
For that reason, the system-wide process manager that starts the VPN process manager also directs any and all output from the VPN process manager to the same black hole, /dev/null.
After all of these layers, we are extremely confident that any stray output from any VPN instance isn’t getting anywhere. But as we are discussing defense in depth, we still have some way to go. For TrustedServer, the location where logs would normally be written is actually not a disk drive at all, but a virtual “in-memory” disk which can only store information temporarily. It is ephemeral. Once the next upgrade+reboot cycle comes around the following week, all data that might be written to this “disk” is forgotten. It is effectively wiped.
The thoroughness and completeness of these protections, with the number of layers upholding them, provide us with very high confidence in our no-logs policy.
Summary: you really can have your cake and eat it
TrustedServer has it all. Security, flexibility, reproducibility and importantly agility. It provides all the security you expect from a modern operating system and world-class VPN service. It enables weekly rolling releases, consistency across our entire fleet and confidence in our privacy policy. It allows us to respond extremely quickly with security patches. All this, while providing a full-featured operating system on which we can build our innovative technologies such as Lightway.
Take back control of your privacy
30-day money-back guarantee
Comments
Strange that there’s no mention of the server hardware itself.
Are you running this on physical machines? Or virtual machines? Any BMC controllers involved? Firmware updates? Servers can be compromised and/or monitored even if the software side works flawlessly.
Good question.
It’s funny to hear you guys talk about trust. I was mysteriously updated to 10.28 and Express doesn’t know anything about it.
Well written article put in a very understandable context.
Once an intruder gets in to your network there should be a way to get them out. Otherwise it’s working great 👍.
This is why I pay more for a solid VPN.
El Cheapo VPN providers can’t give you the Express VPN quality.