Upgrading a NATS Cluster in Production

Upgrading a NATS Cluster in Production

This blog post provides details on how to upgrade a NATS cluster from one version to another version. This is a pretty straightforward process, assuming

  • You have a good Highly Available (HA) setup (topology) for the NATS cluster. And all your data is replicated - for example, if you have Jetstream enabled - every Jetstream has multiple replicas and not just one replica

  • You are upgrading to a new patch or minor version, without breaking changes. For example -

    • New Patch version - v2.9.6 to v2.9.22

    • New Minor version - v2.9.22 to v2.10.0

    • New Minor version - v2.9.22 to v2.10.1

  • There are no special steps to upgrade

    • Check the official documentation of the release ( https://github.com/nats-io/nats-server/releases ) you want to upgrade to and any prior releases in case it’s a jump upgrade, that is, for example - for v2.9.22 to v2.10.1 - you gotta check v2.10.0 and v2.10.1 release docs, as those are the releases between the two releases v2.9.22 and v2.10.1
  • You don’t plan on downgrading in case there is an issue with the upgrade

    Downgrade compatibility note

    2.10.x brings on-disk storage changes which bring significant performance improvements. Upgrade existing server versions will handle the new storage format transparently. However, if a downgrade from 2.10.x occurs, the old version will not understand the format on disk with the exception 2.9.22 and any subsequent patch releases for 2.9. So if you upgrade from 2.9.x to 2.10.0 and then need to downgrade for some reason, it must be back to 2.9.22+ to ensure the stream data can be read correctly.
  • Any connections to the NATS servers that get broken due to server restart during the upgrade should be automatically fixed when the clients retry and connect to the NATS servers. Ensure that the NATS clients retry and connect to the NATS servers on connection failure

NATS server is an easy server to manage - in the aspect that it’s just a single binary and if you change this binary to the latest version, you are good to go.

Steps for upgrade are given below. For every NATS server in the cluster, do the following one by one, for each NATS server in the cluster

Note: Ensure you upgrade the NATS servers one by one to do a smooth rollout/upgrade

# SSH into the NATS server

# go to your home directory 🏠 🏡
cd $HOME

# check the NATS server version
nats-server --version

# download the NATS server tar ball for the version you want to upgrade to.
# for example, to upgrade to v2.10.1 , do this

wget https://github.com/nats-io/nats-server/releases/download/v2.10.1/nats-server-v2.10.1-linux-amd64.tar.gz

# extract the tar ball

tar xvzf nats-server-v2.10.1-linux-amd64.tar.gz

# use the below to find the existing NATS server binary location
ls -al $(which nats-server)

# move the new version of NATS server binary to the appropriate location where existing NATS server binary is there. For example, for v2.10.1 and /usr/bin , do this 

sudo mv nats-server-v2.10.1-linux-amd64/nats-server /usr/bin

# check the status of the existing version of NATS server

systemctl status nats-server.service

# restart the NATS server - so that it can use the new version of the NATS server binary

sudo systemctl restart nats-server.service

# check the status of the new version of NATS server

systemctl status nats-server.service

Look at the NATS server logs, it should show the version of the NATS server being used in the logs - it is shown whenever the NATS server starts. So, for an upgrade to v2.10.1, it should look like this

$ sudo tail -n 100 -f /data/nats-cluster/logs/nats-server.log
....
[12052] 2023/09/27 08:22:48.561553 [INF] Server Exiting..
[5436] 2023/09/27 08:22:48.586120 [INF] Starting nats-server
[5436] 2023/09/27 08:22:48.586241 [INF]   Version:  2.10.1
[5436] 2023/09/27 08:22:48.586246 [INF]   Git:      [d3ef745]
[5436] 2023/09/27 08:22:48.586249 [INF]   Cluster:  togai-nats-cluster
[5436] 2023/09/27 08:22:48.586272 [INF]   Name:     togai-nats-cluster-nats-server-1
[5436] 2023/09/27 08:22:48.586277 [INF]   Node:     2M6l87aT
[5436] 2023/09/27 08:22:48.586281 [INF]   ID:       <id>
[5436] 2023/09/27 08:22:48.586288 [WRN] Plaintext passwords detected, use nkeys or bcrypt
[5436] 2023/09/27 08:22:48.586295 [INF] Using configuration file: /data/nats-cluster/nats-server.conf
[5436] 2023/09/27 08:22:48.587988 [INF] Starting http monitor on 0.0.0.0:8222
[5436] 2023/09/27 08:22:48.588076 [INF] Starting JetStream
[5436] 2023/09/27 08:22:48.588238 [INF]     _ ___ _____ ___ _____ ___ ___   _   __  __
[5436] 2023/09/27 08:22:48.588245 [INF]  _ | | __|_   _/ __|_   _| _ \\ __| /_\\ |  \\/  |
[5436] 2023/09/27 08:22:48.588248 [INF] | || | _|  | | \\__ \\ | | |   / _| / _ \\| |\\/| |
[5436] 2023/09/27 08:22:48.588251 [INF]  \\__/|___| |_| |___/ |_| |_|_\\___/_/ \\_\\_|  |_|
[5436] 2023/09/27 08:22:48.588254 [INF] 
[5436] 2023/09/27 08:22:48.588257 [INF]          <https://docs.nats.io/jetstream>
[5436] 2023/09/27 08:22:48.588260 [INF] 
[5436] 2023/09/27 08:22:48.588263 [INF] ---------------- JETSTREAM ----------------
[5436] 2023/09/27 08:22:48.588272 [INF]   Max Memory:      953.67 MB
[5436] 2023/09/27 08:22:48.588276 [INF]   Max Storage:     4.66 GB
[5436] 2023/09/27 08:22:48.588300 [INF]   Store Directory: "/data/nats-cluster/jetstream-store/jetstream"
[5436] 2023/09/27 08:22:48.588304 [INF] -------------------------------------------
[5436] 2023/09/27 08:22:48.590676 [INF]   Starting restore for stream '$G > revenue_events'
[5436] 2023/09/27 08:22:48.591326 [INF]   Restored 0 messages for stream '$G > revenue_events' in 1ms
[5436] 2023/09/27 08:22:48.591380 [INF]   Recovering 1 consumers for stream - '$G > revenue_events'
[5436] 2023/09/27 08:22:48.592916 [INF] Starting JetStream cluster
[5436] 2023/09/27 08:22:48.592926 [INF] Creating JetStream metadata controller
[5436] 2023/09/27 08:22:48.596012 [INF] JetStream cluster recovering state
[5436] 2023/09/27 08:22:48.597572 [INF] Listening for client connections on 0.0.0.0:4222
[5436] 2023/09/27 08:22:48.597726 [INF] Server is ready