How do you keep track of what’s really running inside your Windows VMs?

Posted by tommipani@reddit | sysadmin | View on Reddit | 30 comments

Hi everyone,
I'm 21 and currently doing an internship in IT, working in an environment with a decent number of Windows VMs on vSphere. One of the biggest challenges I've faced so far is simply trying to keep track of what’s actually running inside those machines.

Over time, I noticed a few recurring issues that caused unnecessary stress:

Certificate expirations no one tracked, leading to unexpected service outages.
Audit requests like "give us all the Java or Log4j versions across the fleet", which usually mean hours or days of scripting and manual digging.
A server starts acting up and there’s no easy way to figure out what changed—was it a new app? a scheduled task? a misconfigured service?

I looked for tools to help with this, but most of what I found was either part of large enterprise suites we can’t afford, or required agents everywhere, which isn't always realistic.

So, as a side project, I built a PowerShell script that:

Connects to vCenter to list powered-on VMs
Tries multiple sets of credentials to connect via WinRM
Collects system info, installed software, certificates, Windows services, scheduled tasks
Uses UUIDs to track VMs over time (even if their names change)
Exports everything to CSV and marks removed items instead of deleting them, to keep a historical view
Outputs progress clearly to the console with status info for each VM

This isn’t a product or anything—just something I built to help myself and maybe my team. But it got me thinking:

Is this a problem others are dealing with too?
Do your teams use internal tools or existing solutions to manage this kind of inventory and visibility?
Is there something obvious I’m missing?

I’d really appreciate hearing how more experienced teams approach this. I'm trying to learn, improve what I built, or at least understand if I’ve been solving a problem that already has a better answer.

Thanks in advance for any insights.

[-]

JaRi100@reddit

[-]

Cormacolinde@reddit

For certificates you can automate their deployment and monitor them. As long as they expose some port/interface with TLS it’s fairly easy.

For auditing software, especially vulnerable ones, that’s a job for management software, EDR or vulnerability management. I’ve done this with SCCM, Microsoft Defender and Nessus for example. Yes, you will need agents.

For auditing configuration, that is more complex. One possibility is to make sure configuration changes are audited and logged properly, then feed those into a SIEM or other log aggregator. That works better with stuff like firewalls than Windows or even Linux servers though. Another option is to centralize all configuration, using Infrastructure as Code for example, or centralized configuration stores. Tools like SCCM, Ansible and the like can help with this, or even just AD GPOs. If all configuration changes are made in GPOs, then you can audit the GPOs.

[-]

tommipani@reddit (OP)

This is a fantastic, high-level overview, thank you. You're absolutely right that in a perfect world, everything would be managed via IaC tools like Ansible and all changes would flow into a SIEM.

I guess the script I wrote comes from a more "brownfield" perspective. It's designed for those real-world environments where you have years of manually configured servers, a mix of auto-deployed and self-signed certs, and you don't yet have a mature IaC or SIEM practice in place.

The goal was to get an immediate, actionable baseline of "what's out there right now" without having to boil the ocean with a full SIEM implementation. Your point about needing agents for deep software audit also reinforces my drive to see how far we can get with a purely agent-less approach for that initial discovery.

Really appreciate you sharing your consultant's perspective, it's super valuable.

[-]

Cormacolinde@reddit

Of course, the real-world has budgets, constraints, and technological debt. And maturing your infrastructure management environment will not be an instant thing.

Your method is certainly not bad, and I’ve used similar quick and simple gathering tools and scripts for audits or to document an environment before making changes, or gain information to make recommendations.

This works well when those tools and scripts are limited in scope and maintained by someone familiar with the situation and the environment, and used sparingly. I’ve seen situations where such tools grew way beyond such a simple scope, the admin who built them left, and they have this monster that’s plugged into 3 different databases, feeding multiple critical alerts or even managing endpoint devices. And they don’t have source code, or they do and it’s undocumented, with no documented architecture, and it runs on a deprecated version of Python.

It’s when the gathering tool becomes essential and permanent that it can be a problem. And let’s be honest in IT and business, too often the temporary becomes permanent very quickly.

Also, think about the real cost of what you’re doing. How much time it will take you to make it work well enough, comparing the features you can code yourself to what a paid (or even OSS) product could. And especially how much time you will need to MAINTAIN this thing. Because it will break.

So do it right: upload your code into a corporate github account, write a lot of comments in it, document it well, and see if someone else can be familiar with what you’re doing and putting in place. Discuss it with the team, check what you need, see what you can do yourself, how big a project it is, and plan ahead. And discuss it with management.

[-]

Dersafterxd@reddit

saltstack would be a tool for you
if you dont setup mutch it allows you to execute remote commands from a central server and filter them with regex

as example you can make
salt '*' cmd.run 'hostname'

gives you the hostnames of all machines that are managed by your salt master
you can also do mutch more, but you have to look into it

[-]

Akai-Raion@reddit

A lot of the answers here answer your questions so this is unrelated. Maybe it's just me but a lot of your responses feel AI generated are you a bot? or you use it to generate responses?🤖😉

[-]

tommipani@reddit (OP)

English isn't my first language, so I'm using a translator/AI to help me polish my responses and make sure my technical questions come across clearly. Sometimes it probably ends up sounding a bit too formal.

[-]

Akai-Raion@reddit

I see, that explains it, it's not just the formality the wording, structure... among other things gives it away.

I was just curious, best of luck.

[-]

Kind_Philosophy4832@reddit

NetLock RMM is open source and already massive. If you don't compile yourself, the free plan is 25 devices and then unlimited. With the PowerShell sensor you can cover anything automatically

[-]

tommipani@reddit (OP)

Thanks for the suggestion, I hadn't come across NetLock RMM before and I'll definitely check it out.

The "PowerShell sensor" approach is really interesting. It seems like a lot of modern RMMs are heading in that direction, providing a platform to run custom logic.

I guess the script I wrote comes from a slightly different place. In our environment with 400+ VMs, we needed a tool that was laser-focused on deep discovery right out-of-the-box. My script is essentially that "custom logic" part, but designed to run standalone without needing the RMM framework around it. The goal was to get an answer to "what's out there?" immediately, without the overhead of another platform.

Appreciate you adding another great tool to the discussion!

[-]

bitslammer@reddit

OP be very cautious of this person. Looking at their post history is all them pushing Net Lock and never disclosing the fact that they are likely an employee.

[-]

Kind_Philosophy4832@reddit

Yes, I am pushing NetLock, rustdesk and other open source solutions when I see the opportunity. Doesn't change that these tools are good tho

[-]

tommipani@reddit (OP)

Thanks for the heads-up, I appreciate it.

[-]

TheDawiWhisperer@reddit

Our monitoring suite does most of it, monitors a lot of standard metrics like disk space, CPU usage etc via wmi, along with certificates and automatic services

[-]

cpz_77@reddit

I’d suggest looking into a third party solution for this, that’s awesome that you wrote your own script to fill the gaps. But there are many RMMs out there that can do this for you. Most will require agents - agent bloat is something we’re constantly fighting off , it’s a balance between getting the functionality you need and not just putting unnecessary agents everywhere. But having one solid agent-based RMM solution could help a lot here - I think you’d be surprised how much. And you could then setup custom scripts to check anything that isn’t built in (e.g. if there’s a specific cert or certs on your servers you need to check the expiry date of and send an alert when it gets close). I really think it would be worth it…we didn’t have one for years and when we added it, it gave us a lot more visibility than we had before. And, doubly, it can serve as a great way for you or your support team to support your users if you license it for your workstations as well.

[-]

tommipani@reddit (OP)

Thanks for the thoughtful reply, I really appreciate it. You've perfectly captured the dilemma: "agent bloat is something we're constantly fighting off." That's the exact phrase I've had in my head.

Your approach of using one solid RMM as a base and then running custom scripts on top for the gaps (like the certificate checks you mentioned) is a really smart compromise.

That actually sparks another question for me. My main drive for building this script was to see if I could get that deep inventory data natively, without the RMM platform acting as a middleman. My thinking was that maintaining custom scripts inside another platform can sometimes become its own kind of technical debt.

[-]

SpiceIslander2001@reddit

About 25(!) years ago, I developed a simple program to collect data off of our domain-connected devices, and supplemented that info with a collection of scripts that I created, all run by scheduled tasks implemented via group policy, so no client installation required. They all dumped data into a central location, and I put together another web-based reporting tool to turn that data into reports. . The system is still in use, albeit in greatly expanded form because I've tweaked and improved almost every year, and I'm due to retire from the company this year, LOL. It's going to be fun unwinding all of that ...

[-]

tommipani@reddit (OP)

Wow, what an incredible story. It's both inspiring and a bit terrifying to see that you built almost the exact same system 25 years ago, right down to the web-based reporting tool. It's a huge validation that the core problem has been around forever.

Your comment about retiring and "unwinding all of that" absolutely nails the biggest fear with homegrown tools. That's the exact problem I'm trying to solve now – turning this concept into a supported, maintainable product so it doesn't just become "that one guy's script" that no one else can touch.

Honestly, it's an honor to get your perspective.

[-]

tommipani@reddit (OP)

[-]

Psychological_Luck37@reddit

Zabbix is awesome for this. You can match it with grafana and you get visibility and trending dashboards. (Stretch goal) Both products you can get for free.

We had an intern upgrade and configure our systems with these two tools and present it at the end of their internship.

This looks like a great project you could implement for your place of internship. Hopefully you have good support from your mentor.

[-]

tommipani@reddit (OP)

Thanks! Zabbix/Grafana is a powerful combo for sure. It's a fantastic learning project, and you're right, it's very similar in spirit to what I've built. I guess my goal was to create something that could deliver the deep inventory results in minutes rather than as a full-scale integration project. It's great to see what other people are using to tackle the same problem!

[-]

RamboPeng@reddit

We use LanSweeper for that kind of thing, it’s very good. Nessus for vulnerability scans. We’re way off the ball with certs, them moving to 90day lifespan is something I’ve buried my head in the sand about but will need to check it sooner rather than later and set up some automation

[-]

tommipani@reddit (OP)

Thanks for sharing your stack, Lansweeper is a beast for sure.

And honestly, you're not alone on the certificate front. That was the number one driver for the script I mentioned earlier. We were getting blindsided by expirations, and the new 90-day lifespans are only going to make manual tracking impossible.

Setting up automation just for that piece was my main goal. Having a script that scans all the stores on every server and dumps a clean "certs expiring in the next 90 days" report has been a total game-changer for us. It sounds like you're in the exact same boat.

[-]

DidYouTryToRestart@reddit

I just leave it alone and know it won't hurt anyone

[-]

tommipani@reddit (OP)

Thanks for circling back with the serious note, and man, that story hits close to home. I think every sysadmin has a story about building a custom tool because management had very... specific ideas about budget and technology.

You've raised a super valid point about the long-term maintenance. That's the eternal struggle with custom scripts, right? They solve the immediate problem perfectly, but then you become the sole maintainer for life.

That's kind of what I'm grappling with now. The script I wrote to solve this works great for our needs, but I'm thinking about how to make it a more robust, maintainable "product" instead of just "that one script I wrote," precisely to avoid the maintenance trap you're talking about.

And the Grafana part is hilarious, I was literally thinking the same thing for visualization.

[-]

Pickle-this1@reddit

Cert expiry is a common one, because its set and forget, I have a calendar entry 1 month before expiry, I dont deal with them often so that works for me.

Action1 handles a lot of our inventory for Windows as well as Intune (endpoint / servers)

A1 is deployed at build time on every PC in the estate, once the agent has finished its collection, we put it into a group depending on department.

It will tell me system info like serial numbers, hardware specs, software installed, patches and vulns missing etc, before I started there was a massive excel sheet with this, I've since binned off that list and rely on A1 for inventory, it can pull reports of installed apps, or tell me when someone installs something (however the fix here is to take away the ability for users to install, which is in the works!).

We use Qualys for vulnerabilities in terms of config, this is setup to meet requirements for CE+ in the UK, I get a daily report.

Windows Defender can do this also, if you have Biz prem across the board, you get data about new vulns in the environment, I get email alerts when they are found.

For the server thing, you either want something like Netdata, PRTG or CheckMK, Splunk or make an event forwarding server.
Servers shouldn't be changing that often, they are usually set and forget for the most part, and you treat them like cattle not a pet.

[-]

tommipani@reddit (OP)

Wow, thanks for taking the time to write such a detailed, real-world breakdown. This is incredibly helpful.

It's interesting to see how you've built a solid "best-of-breed" system by combining different tools for their specific strengths (Action1 for inventory, Qualys for vulns, etc.). It confirms my suspicion that there isn't one simple tool that does it all well.

That actually gets to the core of my question. All the powerful tools you mentioned, like Action1 and the RMMs, seem to rely on deploying an agent. My main curiosity was whether a similar depth of inventory could be achieved without adding another agent to every server, maybe just using native tools like PowerShell and WinRM. The overhead of managing yet another agent is a headache we're trying to avoid.

Is the agent management side of things with tools like Action1 or Ninja a non-issue for you, or is it just a necessary evil to get the data you need?

[-]

Pickle-this1@reddit

Agents are better than scripts unless your amazing at writing scripts.
Agents often sit at the system level in Windows, so they can interogate more data than a script, and then show it in an easy way.

Its about filtering the noise out to focus on the data I want.

Action1 doesn't need any management really, I stuck their deployer agent on all our servers, it scans anything in our domain, then pulls it in, thats it, then you setup things like patching policies etc, took about half a day to roll out, quite painless.

I come from MSP-land, so Agents are a big thing, everything telemetry has an agent, Security like Defender, S1 or Huntress, all agent based, they are just more efficient than scripts.

I see no reason not to use agents, scripts have their place, but I prefer to let devs to the dev work, I'll stick to deployment and management from a sysadmin perspective, that is where I excel.

[-]

tommipani@reddit (OP)

That's a fantastic perspective, and it makes perfect sense, especially coming from an MSP background where reliable, scalable agents are the standard. I completely get the "let the devs do the dev work" philosophy.

You've hit on the exact trade-off I was thinking about. It sounds like for you, the reliability and ease-of-use of a well-made agent like Action1's outweighs the overhead of having an agent in the first place.

That's actually what led me to build my own solution. I spent a good amount of time writing a pretty comprehensive PowerShell script that does exactly this: it connects to vCenter, uses WinRM to talk to the guests, and pulls a deep, incremental inventory of certs, software, scheduled tasks, etc., all without any agents.

It started as an internal tool to solve our own "agent bloat" problem and to get visibility into the specific gaps that RMMs don't always cover easily, like the certificate and scheduled task inventories.

Thanks again for sharing your experience, it’s been super insightful to see how different people tackle the same challenge.

[-]

Lower_Fan@reddit

is there something obvious I'm missing?

Yes this is the function of an rmm software. Look into them.