Zabbix alternative

Posted by zatset@reddit | sysadmin | View on Reddit | 64 comments

Hello, colleagues.

What kind of open sources Zabbix alternatives have you tried and would recommend?
Yes, Zabbix is a decent piece of software and I have actually written templates for it, as well as modifications and so on. But lately, the complexity starts to annoy me. Simple things require 3-4 levels of menus and are all over the place. It is cumbersome.

The main install of Zabbix I use mainly to pool/monitor SNMP capable devices and send automated alerts if defined triggers are triggered, which in most cases are either numeric values or ping drops. Other features would be nice, but honestly Zabbix is rather overcomplicated and cumbersome....And it's documentation till I learned it...is proven to be rather unreliable. Major feature and template syntax changes and so on.. Which made and makes finding information rather....interesting... experience...

[-]

omn1p073n7@reddit

I was evaluating Zabbix for my org due to the VC Solarwinds price hikes. Should I pass? We were hoping for something that we could keep onprem without some company pressuring a cloud migration

[-]

blackvelvet58@reddit

Stick with it. We're in the midst of moving from Solarwinds for the same reasons. It has been refreshing and stupid simple if you're just doing ping/snmp. Installation on Debian with Postgresql was very quick. Had to tune the cache memory a bit, but out of the box we found it was just as capable for what we were using Orion for. Solarwinds was headed to 4x hikes, no thanks!

[-]

Fluffy_Regular4054@reddit

Prometheus SNMP and Grafana, I'm never going back to *cinga after experiencing real dashboards.

[-]

unixuser011@reddit

Personally, CheckMK has been great, just install the agent and select what services you want to monitor. Writing plugin for it are pretty easy also and I think Nagoya plugins will work for it too

[-]

Jotadog@reddit

But… he said no agents

[-]

zatset@reddit (OP)

Well... First of all, agents are not required for SNMP monitoring, it is about pooling OID-s.

Second... You are right. I started to despise agents. Every piece of software nowadays comes with some kind of agent... And all those agents most of the time just waste hardware resources and are slowing down things, as well as they are security risk.

The software might have agents and capability for such, but I just won't use them.

[-]

unixuser011@reddit

How are agents a security risk? Most of these agents let you lock them down to only communicate with the monitoring server and only over specific ports

[-]

zatset@reddit (OP)

Agents are security risk, because the server they connect to becomes single point of failure. Weakness in the agent combined with breach of that server could lead to privilege escalation, as they usually run with Administrator rights.

One example is the open source Wazuh SIEM/XDR which isn't monitoring system, but uses agents. If I compromise the server, I can send random commands to all network connected workstations and servers.

[-]

Spirited-Background4@reddit

Isn’t it much much worse if your SIEM is compromised? That’s all your security logs? SIEM must be well protected

[-]

Zahninator@reddit

No agent is also a security risk because incoming traffic to each device has to be allowed. I would much rather have TLS encrypted outbound communication from an agent to a centralized locked down server than the alternative personally.

[-]

zatset@reddit (OP)

I cannot agree. First of all, to establish any communication requires it to be able to pass the firewall. Second, there are native ways for encrypting management communication in Windows, for example. And you can get values directly using WinRM.

In the case of using "agents", on the top of the inherent operating system vulnerabilities now you have to take into account the potential agent vulnerabilities and the "server application" they connect to.

[-]

Zahninator@reddit

In the case of using "agents", on the top of the inherent operating system vulnerabilities now you have to take into account and worry about the potential agent vulnerabilities and the "server application" they connect to.

Isn't this also true for whatever is doing the "reading" of values from the device/server? SNMP and/or WinRM.

[-]

zatset@reddit (OP)

The good thing about SNMP is that you can use read-only user/creds. Even if compromised, the most one can do with those credentials is getting the current OID values. I cannot imagine that you can do very much with having information about the current RAM usage of a network switch.

[-]

420GB@reddit

Arguably having an agent and disabling WinRM (it is blocked by the firewall out of the box, you do need to specifically allow remote access) is safer because WinRM does so much its complexity and featureset makes it very dangerous

[-]

zatset@reddit (OP)

Actually Microsoft is talking about how many bugs there are in the OpenSSH implementation. But there is truth in what you say. Poorly secured WinRM is very bad thing. That’s why it is not enabled by default. What I meant is that “agents” are just applications sending shell commands while running as admin/root. Any remote access carries risk, this is without any doubt true.

[-]

id0lmindapproved@reddit

I think you just need to be honest with what your threat model is and what kind of posture you have. There is nothing 100% secure. If I compromise a Domain Controller I own the whole domain. Or if I compromise a Global Admin or Owner of an Azure subscription, or if I compromise private keys. The list goes on and on. You are going to have creds or agents everywhere, its how you lock them down.

[-]

zatset@reddit (OP)

My model is simple. Reducing attack surfaces by minimizing the use of third party software that runs as service or requires admin/root privileges.

[-]

420GB@reddit

lol, perfect moment to post this: https://youtu.be/3Lyex2tSUyA

[-]

NormanNormieNup@reddit

CheckMK can also be set to monitor snmp only

[-]

unixuser011@reddit

Well, most of these solutions are going to use agents. So shrug

[-]

trail-g62Bim@reddit

I tried checkmk out and had the hardest time wrapping my head around it. Maybe I'm just stupid because I do see it recommended quite a bit.

[-]

unixuser011@reddit

I did have a bit of a struggle using it the first time around but their docs explain it pretty well

[-]

ihaxr@reddit

I haven't used it in a while, but I went with PRTG over Zabbix due to how stupidly simple PRTG was to get up and running.

[-]

Substantial-Reach986@reddit

If you want something simpler to use than Zabbix you're looking for a vendor-specific monitoring system. Zabbix has a steep learning curve, yes, but that's mostly because you have to deal with SNMP or some other generic way to poll basically anything

Any actually good one-click systems will be vendor-specific.

[-]

DeadOnToilet@reddit

If you think Zabbix is overly complex as a monitoring tool, you’re in for a wild ride looking for alternatives.

[-]

zatset@reddit (OP)

I would say cumbersome. It more or less does the job.

[-]

Akmetra@reddit

Zabbix (as I see it, can't say that I've tested each release) gets more reasonable with updates. The problems start when you need to update templates, sometimes things.. break.

LibreNMS is an option, but it's quite different (SNMP - great, agents/scripting .. not so much) - I've got a test deployment running at the moment, and don't see it being a complete replacement for Zabbix in our case.

But we're using SNMP / Agent / API requests together, bringing in data from different sources.

[-]

NoDistrict1529@reddit

LibreNMS.

[-]

Specialist_Cow6468@reddit

For network gear Zabbix tied into netbox via the nbxsync plugin is the gold standard. There’s a bit of a learning curve but it means that as long as Netbox, my source of truth for other automation, is accurate then everything will have the proper monitoring templates pushed out automatically

[-]

rayferrell@reddit

The trade-off nobody mentions: LibreNMS and CheckMK are simpler because they do less. Zabbix's menu depth exists because it can model almost anything, and that flexibility is what makes it annoying for simple SNMP polling but invaluable when you need to track something outside the standard template ecosystem. I switched a fleet to LibreNMS for exactly your use case, and it was great for two years until I needed to track a SaaS API's rate limits, which LibreNMS doesn't do well without hacking at it. Zabbix handled it in an hour. If your environment is truly just SNMP devices with standard OIDs, LibreNMS will serve you well. Just know what you're trading away.

[-]

LINAWR@reddit

"don't really want any type of agents or additional software on any server machine, unless it is actually absolutely required and unavoidable, as third party "agents" and so on are always a security risk..."

Well you're not going to have that many choices unless you do SNMP or REST API calls only. CheckMK is incredibly robust and easy to install / upgrade.

[-]

zatset@reddit (OP)

unless you do SNMP

And that's exactly what I want. Ping and get value, ability to see values(any kind of dashboard), compare with threshold - if no ping or value abnormal - alert.

As for the servers... I would rather pull everything from the servers using WinRM than installing third party agents. As I actually do.

[-]

HaplessMegalosaur@reddit

So, you implicitly trust all SNMP agents then? It's all software one way or another and open to vulnerabilities just like the rest.

[-]

zatset@reddit (OP)

It adds attack complexity. You pool the device. It responds. No agent on the device.
You need to compromise the monitoring application. Then you need to gain write access to the device having only read-only creds. This means attacking the device and exploiting firmware vulnerabilities. That are different for any given device and depend on the version of the firmware, programming and so on.

In the case of having an agent running as root/admin, it means that by compromising single agent version that is used everywhere and successfully escalating privileges, you gain access to any and every device on the network running that agent.

[-]

pointandclickit@reddit

Why would you need to compromise the monitoring host? Every snmp implementation I've seen the only real option for locking down is by IP, so they would simply need to know the IP of the monitoring host.

You're not eliminating risk by using snmp over an agent. You're just moving it. Instead of having to trust that the agent doesn't have vulnerabilities, you have to trust that your snmp implementation doesn't have any vulnerabilities.

[-]

zatset@reddit (OP)

SNMP v3 includes authentication and encryption. With the older implementations like v2 and v1 only community name was actually required.

Anyway, in the context of switches and routers, you cannot install anything on them. So in every case you rely on the SNMP implementation. Even worse, if you don't secure them, some have default settings like read-only and write communities like "private", "public". So, one must disable SNMP v1 and SNMP v2, unless there is serious reason to keep them around...and even then only if the devices are isolated, like separate VLAN.

[-]

LINAWR@reddit

Then CheckMK has great support for that. We use SNMP monitoring for most of our network equipment besides windows / linux, which use agent based polling.

[-]

MrNegativ1ty@reddit

Currently going through the same thing.

I don’t really LOVE any of the current big names for this, but I will say I like the idea of using netbox as a “source of truth”, then configuring Prometheus to collect the device data and IP/DNS names. Then, any updates made in netbox automatically propagate to your monitoring and that’s one less thing you have to keep always updated when you make changes.

[-]

brekfist@reddit

Cacti

[-]

Highpanurg@reddit

Prom + grafana

[-]

Dax420@reddit

Prometheus

[-]

DULUXR1R2L1L2@reddit

Am I wrong or doesn't Prometheus also require a lot of manual configuration?

[-]

420GB@reddit

Yes and it also requires an agent / exporter

[-]

Prox_The_Dank@reddit

I use this + Grafana / Loki

[-]

MRdecepticon@reddit

I setup an ObserviumCE server and it wasn’t too too difficult. Yes it requires some deeper configuration if you want to customize it heavily but it is free and does not require agents.

[-]

wasteoide@reddit

I'm also using Observium for some monitoring.

[-]

khobbits@reddit

LibreNMS

Is basically a SNMP monitoring tool, but has a first party understanding of networks.

That means it can work out what device is plugged into a switch, and start monitoring that device automatically.

It does support Nagios / CheckMK plugins, so if you do have some services you need that little bit more coverage of, but it's primarily a SNMP tool.

The addons/exporters do work pretty well as well, so if you want Grafana dashboards, or want switch config backups (oxidized) it can operate as a source for those as well.

[-]

sembee2@reddit

Another vote for LibreNMS. For pure SNMP monitoring it does the job really well. The Docket setup is quite straight forward if you follow their instructions.

[-]

bumbo79@reddit

+1 for LibreNMS. We've been running it for about 6 years now and while there have been both growing pains and adaptations to updates over the years, laf and the crew are always willing to assist on Discord when you're ready to pull your hair out....

https://www.librenms.org/

[-]

FloiDW@reddit

Once Icinga, always Icinga2 😬

[-]

M1D1M@reddit

Netdata

[-]

AdInevitable8483@reddit

Prometheus grafana are great but there is no comparison to zabbix. Its the best. I have not seen anything zabbix can't do.its the best. Used for 10+ years

[-]

Ok_Signature_6030@reddit

have you looked at librenms? it's basically zabbix's lightweight cousin - snmp-first, web ui, alerts to email out of the box. lot less template wrangling for the basic ping + snmp threshold workflow you described, no agents required since it just polls oids on switches/routers/printers.

observium is the simpler-still option if you only need polling without much alerting flexibility. community edition is fine for that use case.

if you ever want sms escalation for critical alerts down the line, just point alerts at an email-to-sms gateway. that part is independent of which monitor you pick, so don't let it influence the choice now.

[-]

Silent_Title5109@reddit

Since you want SNMP, Prometheus has an SNMP exporter.

https://prometheus.io/

https://github.com/prometheus/snmp_exporter

[-]

placated@reddit

Problem with using Prometheus for this is it becomes a mib management nightmare. A lot of the more network focused monitoring solutions have canned preloaded mib data for a large swath of network equipment.

[-]

zatset@reddit (OP)

I don’t mind MIBs. Zabbix actually did not come with any templates or MIBs that can monitor the devices I monitor specifically. I had to manually pool devices and manually create all the templates.

[-]

Molasses_Major@reddit

We stuck with LibreNMS after growing tired of installing agents. It works well in large environments, has auto-discovery, and is highly customizable.

[-]

retiredaccount@reddit

At a previous engagement, when the zabbix host died, we went with observium, which still had comprehensive data reporting just without all the extras that end up confusing or burying important signals. In practice, I only used zabbix for a couple years, but it seems geared toward a place where you have a dedicated masseuse who can spend lots of time and effort massaging it. When you don’t have that, simple is better. Once off of zabbix, hidden signals immediately rose out of the noise.

[-]

zatset@reddit (OP)

Thank you, but complexity and too many functionalities is what I am trying to avoid. That's why I am looking for something simple and lightweight that doesn't require much hardware resources.

[-]

nv1t@reddit

what about icinga? :)