Monitoring NATS using New Relic: Instrumentation

Monitoring NATS using New Relic: Instrumentation

ยท

4 min read

At Togai we are using NATS as our messaging system. Our microservices use it to communicate among themselves

We wanted to set up monitoring for all the systems that we run and manage. We self-host NATS so we wanted to monitor NATS too, apart from our other systems

We use New Relic as our monitoring platform - to push all the monitoring data (metrics, events etc) and then create dashboards, and alerts based on it

When trying to integrate NATS with New Relic, we noticed that there was no existing on-host integration for NATS. In New Relic, an on-host integration is a piece of software that can obtain metrics and/or event data from a given system. There are on-host integrations for PostgreSQL, Redis, and more, that we use here at Togai

For NATS too there's a New Relic page - https://newrelic.com/instant-observability/nats but that talks about monitoring a Golang service. NATS is built using Golang. Surely the Golang service monitoring can help NATS developers - who can integrate New Relic in the Golang code that they write for NATS. But to monitor NATS from outside from a separate process, we found two ways that would work for us

Prometheus NATS exporter

We found a Prometheus NATS exporter that hits the NATS monitoring endpoint and exports data in Prometheus format. We tried to use this. One of the ways to use this was - to use Prometheus - that is, host a Prometheus server and then scrape the Prometheus NATS exporter endpoint and use New Relic as a remote write endpoint. We didn't wanna host and manage a Prometheus server just to scrape one Prometheus exporter endpoint. Another way to use this was to use vector and we were already using vector for our logging needs - to centralize logs from different servers that we run and manage. Vector supports many sources and sinks. We tried to use the Prometheus scrape source and New Relic sink. This worked for us, but, New Relic just had the metric name and metric value, no labels. And I couldn't find a way to add the labels to this data, by configuring the Prometheus scrape and New Relic sink. My end conclusion was - it's not possible, but I would be curious to see if someone else has found a way

I spent a few days on this and then reached out to the awesome support team at New Relic. They were prompt in getting back. They told me two options - either just run Prometheus and use the remote write configuration, or just build a custom integration

I chose to build a custom integration. The support team pointed me to Flex and I knew there were SDKs in different languages that one could use and push data to New Relic

Using New Relic Flex to flex ๐Ÿ’ช ;)

We got the nri-flex binary to run it in the instances running our NATS servers. We have Chef Cookbooks as of now to manage the installation and configuration of the New Relic Infrastructure agent, New Relic Flex integration etc

nri-flex is pretty cool because you can just give it JSON and it can push that to New Relic and New Relic will just show that data

NATS exposes different kinds of data in its monitoring port, at different paths. We ended up scraping every path that was there to get all the data we could get. The New Relic Flex integration with NATS looks like this -

integrations:
  - name: nri-flex
    interval: 30s
    timeout: 5s
    config:
      name: NATS
      apis:
        - event_type: NatsGeneralSample
          url: http://localhost:8222/varz
        - event_type: NatsJetStreamSample
          url: http://localhost:8222/jsz
        - event_type: NatsConnectionsSample
          url: http://localhost:8222/connz
        - event_type: NatsAccountsSample
          url: http://localhost:8222/accountz
        - event_type: NatsAccountStatsSample
          url: http://localhost:8222/accstatz
        - event_type: NatsSubscriptionsSample
          url: http://localhost:8222/subsz
        - event_type: NatsRoutesSample
          url: http://localhost:8222/routez
        - event_type: NatsLeafNodesSample
          url: http://localhost:8222/leafz
        - event_type: NatsGatewaysSample
          url: http://localhost:8222/gatewayz
        - event_type: NatsHealthProbeSample
          url: http://localhost:8222/healthz

8222 is the monitoring port that we have configured in NATS. The above configuration runs the nri-flex binary, hits the various URLs, gets the JSON data from NATS and pushes that to New Relic

Conclusion

So that's how we instrumented NATS and sent the data to New Relic. Please post your comments to share your feedback and questions. Let me know if the post helped you :)

ย