Scripting project for SharePoint sites’ cleaning
Posted by amaretto_sh@reddit | sysadmin | View on Reddit | 19 comments
Hello!
I’m an intern and just got the mission of cleaning useless sites from SharePoint by hand. A lot of it is repetitive and I’m pretty sure there is a way of automatising it. This project concerns < 2Go sites.
My top goals are :
- Adding myself admin to all targeted sites in order to freely manipulate them
- Reunite all sites created by obsolete users AND under 1Go AND unmodified (not “last visited” but “last modified”) since 2024 and delete them
- Delete all directories unmodified since 2024 (by checking dates from all sub-directories and its content ; this one is a sensitive case because if a directory contains elements modified after 2024 but the directory in itself wasn’t modified, I really need my script to not delete it)
I’m admin in my society, with an OnMicrosoft address. I’ve already tried the first one but to no avail, and I feel like I’m not going the right direction (I get errors concerning my ID but I have all the rights and can do most of the manipulations by hand).
Is this attainable? Is it too hard for my level? Where should I dig first? What tools do I have at my disposal?
A part of me is convinced that if I can do it with GUI, it means there is a way to do it even better with a CLI, but I’m not familiar enough with PowerShell and Microsoft’s limitations to attain this goal.
Thank you all!
Previous-Low4715@reddit
Just buy a single copilot 365 license and you unlock the entire SharePoint advanced management feature set (nothing to do with copilot) which gains you Sharepoint lifecycle management for inactive sites. Inactive sites can then be set to switch to read only, then auto archived to archive tier azure storage which costs about a quarter as much as active Sharepoint storage. Sites are still queryable via ediscovery while in read only/archive.
But this isn’t really your job, this is a job for a data/knowledge team and you can get in very serious hot water deleting data outside of official policies and retention requirements. But you can recommend this as a solution and get buy in from management.
https://learn.microsoft.com/en-us/sharepoint/site-lifecycle-management
R3luctant@reddit
Aquaint yourself with graph API.
bbqwatermelon@reddit
You misspelled PnP.
Wishful_Starrr@reddit
I was so resistant to the idea until this year. It really is the way to go.
Ummgh23@reddit
As soon as we can finally start using M365 I‘ll get right to it lol. It's been ages just for all the gdpr stuff etc
R3luctant@reddit
I am pretty engrained in powershell scripting but adapting your script to do graph API pulls opens up a lot of possibilities that they do not have native powershell commands for.
Wishful_Starrr@reddit
It really does. I was working on an off boarding script and found something similar to what I was looking for and they called the API. I wanted to create a few more functions and found I had a much easier time finding the calls than trying to string something together with cmdlets.
R3luctant@reddit
Are you me? That's exactly how I got into it. API call to permission termed mailboxes was plugged into the end of my offboarding script and it worked perfectly.
Wishful_Starrr@reddit
Great minds haha! I was looking for a good way to bulk remove users from groups, distro lists, mail groups and found this script.
https://michev.info/blog/post/6062/remove-user-from-all-microsoft-365-groups-and-roles-and-more-via-the-graph-api-non-interactive
Took it and used as a base and made seperate functions for resetting their password, removing their MFA, clearing company name and manager, revoking sessions etc.
AdeelAutomates@reddit
I have a full tutorial on graph api on shaping if you are interested: https://youtu.be/kdu6TSOnqYE?si=weUTLrIcPeBXqUCh
GratefulGolfer@reddit
Did you have the correct modules installed locally in powershell?
Also, test, test, then test again. Don't run this on live data until you're damn sure it's working as intended. You can create test files and modify creation dates with powershell to simulate how it will act.
Since you're dealing with live data I'd test with files you've created in a test site created just for you to play with. Once your script is doing what you want it to, test using 'WhatIf' and observe logs/output to see if it's behaving as expected. I'm super conservative, so my next move would be to target a reasonable size directory and do a 'move' instead of delete. This way I can check the moves files to ensure only the ones I wanted were targeted. If all looks well you can manually delete the files and run the live script.
At this point you've done 3 rounds of testing and aren't moving on until each round is perfect.
Lastly, is this on prem or could? Consider how moving large amounts of files will affect users and if this is something that needs to happen off hours. Also, do you have access to an always on VM?
amaretto_sh@reddit (OP)
First off, I absolutely love your answer, this kind of procedural discipline lacks at work so thank you!
Concerning the files in themselves, it only concerns 0 to 2Go sites (mostly empty/unused), priorising untouched ones from before 2024. My post was actually badly written, we are deleting sites from the cloud by moving them to an Archive Server just in case. Btw there's a very sketchy method consisting in first downloading them on our local machines and then putting them in the network, but I'm trying to find a cleaner way to do it.
Thank you again for your answer it is REALLY helpful and gives me a few hints on how to proceed! Please don't hesistate to add further advice if you have some more :)
GratefulGolfer@reddit
Use the OneDrive desktop app to sync data locally. Use robocopy to move data to the archive. Do this on an "always on" VM instead of a local machine if at all possible.
amaretto_sh@reddit (OP)
This is really useful, thank you.
Do you happen to know if there are some other tools/options with robocopy that match my other criterias/can match several criterias at once (i.e, weight of the site + minage,..)?
GratefulGolfer@reddit
Robocopy can do a lot, look into it and see if you will work for you.
anmghstnet@reddit
Could you do this, yes, more than likely.
Should you do this? Not without full, written buy-in from management.
Historical data is usually there for a reason, although not all the time.
Also, use a service account, not your account, if you do end up doing this.
amaretto_sh@reddit (OP)
I have both admin and user account. Is the service account the admin one?
anmghstnet@reddit
Not exactly, it depends on what your organization considers a service account vs an administrative account.
Service accounts are generally used for single functions. Administrative accounts are used for actions that you do on a regular basis that require administrative elevation.
ShadoWolf@reddit
You’ll probably want to start with the SharePoint Online Management Shell:
Install-Module -Name Microsoft.Online.SharePoint.PowerShell
Once that’s installed, you can use it to pull a list of sites along with useful metadata like URL, owner, storage usage, and last content modification date:
Get-SPOSite | Select Url, Owner, StorageUsageCurrent, LastContentModifiedDate
That gives you enough to start filtering things down. For example, you can narrow it to sites under 1 GB that haven’t been modified since 2024.
Get-SPOSite | Where-Object {
$_.StorageUsageCurrent -lt 1024 -and
$_.LastContentModifiedDate -lt (Get-Date "2024-01-01")
}
Before doing anything destructive, you’ll likely want to add yourself as a site collection admin so you have full access:
Set-SPOUser -Site -LoginName your@domain.com -IsSiteCollectionAdmin $true
And once you’re confident in your filtering, you can remove sites with:
Remove-SPOSite -Identity
That handles most of the site-level cleanup pretty cleanly.
The folder cleanup is where things gets harder. The SharePoint Online module is really geared toward tenant and site administration, not walking through document libraries. If you need to safely delete folders based on modification dates, you can’t just rely on the folder’s own timestamp. You have to look at everything inside it and make sure nothing has been modified more recently. For that part, you’ll probably want to use Microsoft Graph or PnP PowerShell so you can recurse through the structure properly.