what do you prefer as monitoring software/system?
Posted by satisfaction_olaf@reddit | sysadmin | View on Reddit | 96 comments
We are currently trying zabbix and Icinga2/nagios at our company for monitoring our hardware and software.
What do you guys recommend that is stable and cost-efficient?
crreativee@reddit
Check out ManageEngine OpManager
Key-Brilliant9376@reddit
Zabbix is the best monitoring tool I have found, hands down. Once you learn it, you can monitor just about anything. Nagios is a joke compared to it. But I prefer Zabbix even over Solarwinds, New Relic, or ManageEngine.
serverhorror@reddit
Are you running a large setup?
Multiple locations spread over different continents, ideally able to converge upwards.
I know it from a different life way back but I didn't dare to take another stab.
How did the API evolve, is that a first class method of configuring things nowadays?
Sylogz@reddit
We use it over multiple locations and monitored hosts/objects.
There is good optimization info around and the usage of proxys help control the load of your main instance.
JwunsKe@reddit
I actually like Kaseya Traverse
ESCASSS@reddit
We recently started using Datto RMM for monitoring, and let me tell you, it is pretty solid, it really stays ahead of potential issues with real-time alerts and monitoring.
ROvAES@reddit
For monitoring we use Network Detective Pro cause it offers robust monitoring and detailed reporting.
oddeeea@reddit
My RMM, VSA X has great features for monitoring and has integrated antivirus and policy management features.
BossSAa@reddit
I like Traverse and the real-time monitoring it includes. It also helps you identify trends and prevent problems before they occur.
andrea_ci@reddit
Right now I'm testing out CheckMK.
Nice, Nagios based, "it works". A little complicated to configure.
wezelboy@reddit
But once you figure it out it scales well.
andrea_ci@reddit
yes, right now I am having a few problems... all of them because of the sh*tty SNMP implementations from HPE/ARUBA.
for anything else? it works.
savekevin@reddit
Oh, we're a large Aruba shop, and I was about to try CheckMK. Am I asking for a headache?
andrea_ci@reddit
no, only ""SOME"" host that will completely refuse to publish SNMP data or similar. But it's not a problem with the software here, it's with the iLO/switch software
SiAnK0@reddit
Not in particular. We use it too with about 3000 hosts that are monitored, just use snmpv3 and ansible scripts to activate shit on Aruba and set users.
Just use the same user for anything, let the network scan get your things into a folder and rename it with dns. On folders you can set the snmp user and pw. Done
I won’t would do it all again but the labels and tags are pretty usefull to get rules all over the place and create team based dashboards.
Sometimes snmp bugs a little bit, but if you are patient ( like 10min) it all gets itself together in the most times!
savekevin@reddit
Thank you!
SiAnK0@reddit
The only problems I get with any switch , pdu or usv is usually with very old hardware, like 2005-2010 things. But hey, I do monitoring full time ( lmao ) and even if it’s all running I like checkmk for its new features every now and then!
The only thing that really tires me out is jsm integration ( there is none ) and the need to do it over opsgenie ( atm we integrate this, and our jsm is pretty much a unicorn at this moment ) . But everything else can be a problem but I think I never used more than 2 days on anything and usually when you find the solution you need you can just spread it on the network you wish it to be.
But I think I would not use it without ansible/automation for the rollout of agents! We can’t use the automatic updates because our security team is doing its job, but that’s an option for many people I guess!
wezelboy@reddit
I'm having problems with HPE SNMP also. The newer iLO implementation sucks.
andrea_ci@reddit
iLO5 not reporting all, too.
not only to CheckMK, but with any SNMP reader
IAmTheM4ilm4n@reddit
That's not so unusual - non-standard SNMP definitions (looking at you RoomAlert) will drive you crazy.
I hooked ours to mail alerts to a Teams channel - everyone gets a toast message for those.
An alternative for that is NagStaMon, but that becomes a pain when screen-sharing.
fragwhistle@reddit
Us too. We're looking to use it for a distributed monitoring system.
I've used Zabbix a fair bit in the past but CheckMK has my attention at the moment, especially because it's got Proxmox and VMWare support baked in.
Informal_Plankton321@reddit
Same here, they support a lot of workloads.
I’m a bit tired of Zabbix with constant post-update problems and customization wipes on template update.
networknymph@reddit
We switched from a barely configured PRTG to a fully configured CheckMK RAW.
I got this as a project one and a half years into my trainee job, and it did take a lot of time and nerves to properly configure it to our needs, and I could've saved myself so much trouble if we asked for CMK Enterprise.
But in hindsight, with a stable and informative monitoring now, I am super glad we chose RAW because damn, it taught me SO so much.
So +1 for CMK! 💚
savekevin@reddit
Can you expand on how CMK Ent would have been easier to deploy? I'm was just about to download RAW to try it out and would prefer the easy way. lol
networknymph@reddit
It's fundamentally a different product. CMK RAW is Nagios-based and acts in pull mode, and has to ideally be coupled with something like Ansible or Puppet. But it is also completely free.
CMK Enterprise is using the CheckMK microcore and acts in push mode, which will reduce load on the target hosts. It also comes with the Agent Bakery that does the agent packaging with plugins, configuration and provisioning to the target hosts.
Let's just say, in about 1 1/2-ish years of usage, there have been a multitude of features where it would've been done in a couple of minutes with Enterprise, and took 2 hours to get done via RAW.
Also, depending on your the size of your org, RAW might not even be an option if you do not want many installations so you can escape the Nagios-core limitations. But for us with about 7.5k Services on 150 Hosts it's still super fine.
savekevin@reddit
Thanks!
krystmantsje@reddit
We put grafana behind it. The dashboards of cmk kinda suck.
patjuh112@reddit
Still rolling with PRTG here :)
cvilsmeier@reddit
I write and use https://www.google.com/search?q=monibot
little_pimpi@reddit
NetCrunch - this is the way.
Wrzos17@reddit
What’s your priority? Are you monitoring infrastructure, apps, virtualization, or all of the above? Need dashboards, auto-alerting, or REST API integration? For Windows-heavy environments, check out NetCrunch. It handles network topology, device/config monitoring, and even telemetry. On-prem or self-hosted, modern interface, and low system requirements (embedded database). Licensing is flexible (permanent/annual).
Overall_Protection45@reddit
Centreon
uptimefordays@reddit
Prometheus and Grafana are the gold standard for monitoring, but require more internal engineering support/commitment than Nagios, NewRelic, Zabbix, etc. That said, commercial monitoring solutions can be very expensive and their support often doesn’t include “work with platform’s domain specific language to build custom monitoring integrations we require.” So you may end up requiring the skill set that can build/run/manage Prometheus/Grafana anyway while spending $300k a year on your monitoring tool!
krystmantsje@reddit
Also add an ux engineer to that tally. A customer of mine has over 9000 metrics and wants a dashboard....for hardware, application, k8s on rhel9... They needed to hire two additional guys to make heads or tails of it.
thekdubmc@reddit
Currently using Zabbix and quite happy with it.
-SPOF@reddit
Prometheus + Grafana if you like metrics-based monitoring.
KindlyGetMeGiftCards@reddit
Cost-efficient means different things to different companies.
If you in government or non profit, they can afford the time but not the license purchase, then LibreNMS or Zabbix. It takes time to set it up and understand how it works for with your needs, license is free.
If your in a private company that has budget but no time to spare, PRTG. It just works and is easy enough to get up and running quickly, you don't need a expert just a team that is savvy enough. Cost for the license.
I've used all 3, in the above mentioned use cases
cwk9@reddit
Prometheus with Grafana might be worth a look. I found the learning curve similar to icinga or nagios.
TK-CL1PPY@reddit
PRTG. I've used nagios in the past. PRTG recently had an investment, but not outright purchase, by private capital. Their prices are going up significantly.
I've heard good things about Zabbix. I'd definitely spend time getting to know it well.
pauleewalnuts@reddit
I use PRTG and just stay under the 100 node paywall
domainnamesandwich@reddit
Isn't PRTG licensing based on sensors, not nodes? Most be a really small environment if you can get away with 100 sensors.
TK-CL1PPY@reddit
Its sensors, not nodes, correct. So you can load a server up with 70 sensors and monitor every damn thing, and pay a ton of money if you have a lot of servers.
Or you can just ping it to make sure its up, or anything in between. That gives a sysadmin a lot of flexibility with a quality product. I feel like I'm being a shill, so: there are things just as good. They just aren't as easy to setup. You can spend a long time getting to know something like nagios.
Honestly, I have no idea why I am writing this book except that it's the end of the workday. I have on premise licensing at really excellent pricing, and over two years left on the contract. Starting by May, I'll have one of my guys start setting up nagios, so I can help teach him with what I remember from ages ago, and I'll be trialing new products with both him and a desktop support person.
I fully expect PRTG will massively increase price and force people to cloud based products, unless the buyers are a huge company and can negotiate better on premise pricing. I don't think anyone loves PRTG that much.
So if you're a PRTG lurker, take heed. I'm not going to be the only decision maker feeling this way.
Admirable-Fail1250@reddit
I have roughly 3 years. I love PRTG and I've come to really depend on it but I just cannot justify the price increase. I don't even have to get the purchase approved - it's my call. I still can't do it - I won't do it.
My guess is they're hoping enough of their larger clients will stay and it'll more than make up for the loss of us little 500, 1000, or 2500 sensor clients.
domainnamesandwich@reddit
Have no intention of moving away from Zabbix to be honest.
Admirable-Fail1250@reddit
I have a few small clients that use the 100 sensor version. It's tight but at least the key systems are being monitored. And some sensors have a lot of channels so if you use them right you can kind of get more than 100 "sensors".
pauleewalnuts@reddit
Ah yes, definitely sensors. My coffee hadn't fully kicked in yet.
judgethisyounutball@reddit
Zabbix ftw
Admirable-Fail1250@reddit
Well for 10 years I've been using PRT.... oh - cost efficient?
I think I'm going to bookmark this thread.
kris1351@reddit
If you are looking to stick with Opensource the Zabbix is the most complete product. CheckMK is a good alternative, but lacks a lot in the community version that the commercial version and even Zabbix contain. Librenms with Graylog integrations is another good alternative, I use it for my network and equipment like PDUs that are snmp only. I like the graphs better and it is just simple.
Break2FixIT@reddit
Zabbix for everything.
I pull snmp to get asset information while also pulling things like low toner or paper out / paper drawer open.
I also pull network stats that report to me.
Things I pull are battery up times, if they fail tests I get notified
I pull input voltage to push maintenance to get an electrician while also pulling network closet temps for faster reaction to dirty filters.
satisfaction_olaf@reddit (OP)
nobody is using icinga? why?
exekewtable@reddit
Lots of people are, they just aren't on Reddit. Icinga2 and Netbox for monitoring automation is my favourite combo. Making monitoring config sustainable with changing network data is the end goal. It's one thing to have pretty graphs and blinking lights, another to build a system that scales and lasts.
jup1ke@reddit
Currently running
Checkmk
zabbix
prometheus
icinga2
my favorite of the bunch icinga2 + prometheus for the performance data
i_andmic@reddit
Checkmk
whetu@reddit
I've worked with BigBrother, Nagios, CheckMK, Prometheus and others throughout my career. With CheckMK, I do have a contributor tag on their github, so if you're running CheckMK, some code that I wrote is buried in there.
Currently supporting an inherited PRTG system. I'm not a fan, and I'm looking to get rid of it either this quarter or next. My employer also has Datadog in the mix for APM purposes, but it's fucking expensive, so I don't have much taste for ballooning that bill.
Zabbix, I've POC'd it, it's fine, it just feels old and clunky. I'd take it over PRTG any day of the week though.
As someone else has said: The gold standard is Prometheus and Grafana, but they require a high amount of effort to get setup.
Next up on my POC list is Netdata. It looks like easy-mode Prometheus/Grafana and in some cases uses Prometheus exporters, which makes a lot of sense.
safesploit@reddit
I’m going to presume that for your monitoring solution, your primary focus is on infrastructure monitoring, with the expectation to expand into application monitoring later on.
CheckMK (Infrastructure Monitoring)
At work, we use CheckMK for monitoring, which has been solid for our needs. One of the things I like about it is that it allows custom scripts to be written. For example, I’ve created Bash scripts to check if a licence has less than 30 days before expiring, and similar checks for other systems. CheckMK excels at infrastructure monitoring and is great for quickly setting up checks for servers (Linux/Windows), network devices, and basic service status.
Prometheus (Application and Infrastructure Monitoring)
In my homelab, I've been dabbling with Prometheus to explore more application-focused monitoring. Prometheus doesn’t run an agent per se, which is a big plus if you’re cautious about running additional agents on systems. Instead, it uses a pull model to scrape metrics directly from endpoints via HTTP, which is great if you want to avoid managing extra agents. Prometheus is more flexible and allows for detailed metrics collection, especially useful when monitoring applications, services, and containerised environments. It gives more granular insights into system performance but can require more setup for custom metrics collection compared to CheckMK.
New Relic (Infrastructure and Application Monitoring)
New Relic has been mentioned, but personally, I’m not fond of it simply due to being a SaaS solution that I can't self-host. Otherwise, New Relic is nice for both infrastructure and application monitoring, with a straightforward dashboard and integration with a wide range of services.
Datadog (Infrastructure and Application Monitoring)
I studied Datadog for a few weeks. It's a powerful tool with excellent capabilities for both infrastructure and application monitoring, but it has a steep learning curve. The setup and configuration can be complex, especially when you dive deeper into custom metrics and integrations. Still, it’s a solid choice once you get the hang of it. It’s been an interesting shift as I dive into both infrastructure and application monitoring in different environments!
chancamble@reddit
We use Zabbix in our environment. It works great. NetXMS is a also a nice solution.
Ziegelphilie@reddit
Mostly prometheus, visualized by Grafana
NilByM0uth@reddit
I just did the Zabbix Specialist course. Definitely worth the cost to round off your knowledge after you've been using it for a while.
nurbleyburbler@reddit
I want one that does not require learning a whole new skillset to configure and an FTE to maintain. I just want simple monitoring. Is there nothing that does this with the simplicity of PRTG and the price of Observium? My team has too many projects to devote weeks of learning for something as basic as monitoring.
Silent331@reddit
We use OMD Labs Its a Naemon/Icinga2 (Nagios 3 compatible) core. We used Nagios before and OMD Labs includes everything we used packaged together. Makes updates easier. Its not fancy but it does what we need it to. Windows checks are custom PowerShell, network is done over SNMP.
Xzenor@reddit
Even been rocking Zabbix for at least a decade now. Higher ups want to replace Zabbix with Datto but that shouldn't even be called monitoring software..
koliat@reddit
Im surprised i havent seen SCOM here for Microsoft shops - its the best tool for MS infra deployments
fdeyso@reddit
Because people are happy if they can finally abandon it.
koliat@reddit
It is true that hardly any company have had spent a serious time properly designing and deploying SCOM on their premises - but those who did the homework - the software is powerful and enables a lot of scenarios for distributed discovery and monitoring.
A basic scom setup without knowledge and architecture insight can be a garbage that people want to get rid of, but ultimately thats people problem, not product problem
fdeyso@reddit
If the product is not intuitive and requires a perfect understanding of the whole environment at any given time and not flexible then that’s the user’s fault, i even know people who believe sccm is the best possible tool and it turns out they never tried anything else.
koliat@reddit
Same principle applies for AD, SQL server, Exchange Sharepoint and others - it takes expertise, and its fair to demand such expertise to run a serious tool. Assuming it can be “intuitive”, “plug and play” etc for a product capable of running enterprise monitoring at gigantic scales only hints limited perspective.
While the aforementioned products like AD are core to the operations and were given enough budget and attention, monitoring bit never did. I dont blame people for not willing to become experts on yet another tool as it requires a team to deliver.
RFilms@reddit
We just switched to logic monitor
coolstorysimp@reddit
Logic monitor is probably the best but it is expensive as hell
AviationLogic@reddit
Good lord, you weren't kidding...
RFilms@reddit
O is it hahahaha. That was cyber security’s choice haha we where on like a 10year old version of nagios but they wanted to switch
TrexVsBigfoot@reddit
We have this as well, the best of breed.
Time_Dot_6918@reddit
CheckMK Raw Edition
Superfluxus@reddit
It really depends how many endpoints you're monitoring, how complicated your checks are, and how much time you can dedicate to learning your product.
I love Zabbix but some of the external scripts and trapper items take a decent chunk of time to learn properly. It's as good (or bad) of a product as you invest in making it. If you don't have the time or patience to learn it, you might benefit from a more "works out the box" solution like PRTG (free under 100 nodes), or one of the nagios based ones floating around such as checkMK.
N0bleC@reddit
Prometheus
Ok_Size1748@reddit
Old school Nagios. Over 25k checks here
almightyloaf666@reddit
Centreon. Not that straightforward imo, but they got a free/open source tier
hightechcoord@reddit
Nagios Core....Hold the hate
raffey_goode@reddit
we have checkMK being built out, I might dabble in some zabbix as well. seems to be the 2 people love the most.
We pay for WUG, there is some convenience with it, but we also aren't paying for addons so we don't get additional "good" monitoring. A lot of bugs in recent versions that they want you to upgrade to, because they keep finding security flaws requiring you to go to next version. Seems like they had potential but just never put much effort into it. Progress bought them and seem to be trying somewhat, but we will be attempting to replace.
twisymctwist@reddit
LibreNMS
PlaneAdmirable5177@reddit
rapid 7
analogliving71@reddit
Zabbix. 100%
sysacc@reddit
For Infrastructure teams I generally recommend LibreNMS if you need something Easy to set up and easy to manage or Zabbix if you have more complex needs and a team that can manage it.
For DevOps teams I usually see Prometheus or InfluxDB with Grafana being used.
PRTG is what I recommend to smaller teams who have limited knowledge. Its pricy but easy to use.
In bigger orgs or more mature orgs I tend to see Zabbix for the Network/Servers and Prometheus for everything else and a central Grafana server.
datenresilienz@reddit
Zabbix it is
jr_sys@reddit
I've mentioned before but have been delighted with PA Server Monitor for a number of years.
mbahmbuh@reddit
Try: Observium
Aware_Ad4598@reddit
Zabbix
GleithCZ@reddit
Zabbix
Kind_Philosophy4832@reddit
Depends. I use ninjaone & netlock RMM (OSS) as backup solution
nakkipappa@reddit
I think it is more about how you want to visualize it, if required. We use zabbix and prometheus + azure and have it shown in grafana with nice graphs so big boss gets a happy face.
xMarGeta@reddit
I have worked with a wide variety of monitoring software and zabbix is by far my favorite.
dogcmp6@reddit
The screams of end users. . .Great for cost efficiency, but does have a lot more false postive alerts.
Depending on the size of your shop, and enviroment, I would look into LibreNMS, or Nagios. Solarwinds is a great product, but not cost effective unless you have a massive enviroment.
Maxplode@reddit
I like Zabbix as it's free and is a good way to learn about web servers and linux if you've not dabbled before.
We also use Security Onion to keep a record of all logs, using Kibana on it has helped me to diagnose a few issues. I'm still a Linux noob tho
rthonpm@reddit
Zabbix for me. Been using it since version 3 and it has been steadily improving. The setup time is drastically shorter now than it used to be and the number of built-in templates is growing.
sharpied79@reddit
LibreNMS
Significant_Sky_4443@reddit
checkmk