I did the thing (Sharepoint Versioning Cleanup)
Posted by PorreKaj@reddit | sysadmin | View on Reddit | 53 comments
We've been hitting the storage limit a few times, forcing us to purchase 11TB of extra storage for SharePoint, with no end to it.
SharePoint previously had no clear ownership in our organization. It recently became mine, and inspired by that guy, I went a head and spent several days running scripts to set Automatic Versioning, and ordering the batch delete job.
Fun facts:
Set-SPOSite -Identity $siteUrl -EnableAutoExpirationVersionTrim $true -confirm:$false
New-SPOSiteFileVersionBatchDeleteJob -Identity $siteUrl -Automatic -confirm:$false
Takes about 3-4 seconds to run per site, meaning I could get to around 6-8000 sites during one activation of my sharepoint admin role (of 33.000 sites).
In the end we managed to reduce our storage consumption beyond our wildest dreams, from 98.1% capacity to 50,3% - or 54TB storage released!
Don't be like that guy, consider your file version policies!
Next on the agenda: the fact that only 4% of our sites are considered 'active'
Skrunky@reddit
I've found the total storage of versioning ends up being around 2-3x the raw size of the file. It's such a pain.
We've started created document libraries under some sites called 'Online Archive', OneDrive sync is disabled on that doc library and versioning is completely disabled. Gives the business a nice way to archive off data to a place where versions aren't going to cause a problem.
Also god help you if you have people embed video files into PowerPoint or other office apps. Watch that 500mb PPT file grow to 25GB of consumed space because of versioning.
RunningAfterRabbits@reddit
I had a guy at my office that had three PowerPoint files, each about 50mb but the versions made each file take up about 40-60GB... He used PowerPoint instead of OneNote
Chellhound@reddit
Straight to jail.
PorreKaj@reddit (OP)
Our versioning setting was set to 1000! :D bonkers design.
willdeleteacct1year@reddit
Wait until you learn that Microsoft not only allows all your users to opt in for trials of stuff but TO ACTUALLY PURCHASE THEM AS WELL.
How that is even fucking legal is beyond me.
Elensea@reddit
You can block that. I had to figure that one out.
BillSull73@reddit
wow default is 500. Someone did that ON PURPOSE!!!!!!!
PorreKaj@reddit (OP)
Christ 😆
flyguydip@reddit
Probably Lucifer. lol
simple1689@reddit
But if you pay me more money for storage... /s
Good stuff
xtigermaskx@reddit
I'm really wanting to do this but I want to generate reporting info to actually show that having our settings at 500 major versions and never delete is consuming space.
Did you use any tools to collect data or some other scripts?
AcidBuuurn@reddit
I did a backup to a Synology NAS that didn’t include versioning and it was about 1/4 the size of the SharePoint storage since it didn’t include versions.Â
Top-Perspective-4069@reddit
MS has scripts to do that. It's not super complex but it does take a while for the report to generate, depending on how much stuff you have.Â
xtigermaskx@reddit
Thanks. I actually found one that's supposed to do it as an all in one as well I'm gonna kick it off today and see how it goes.
Top-Perspective-4069@reddit
I wouldn't recommend just yoloing the first one. Generate the report and then do the what if analysis. You have to do it per site but it does get you where you need to go.
https://learn.microsoft.com/en-us/sharepoint/tutorial-run-what-if-analysis
xtigermaskx@reddit
Sorry I meant I found some reporting scripts. That'd be all id yolo is for data. I saw the Microsoft ones and want to use them but some of our departments get touchy when we place files in their SharePoint sites and it doesn't seem like I can set the report to a different site but if I can I'll give those a try.
Top-Perspective-4069@reddit
Never tried but I believe you can, it just has to be within a document library that's accessible via whatever your session lets you see. Give it a shot and see what it gives you.
It's also worth bringing it to your leadership to tell them what you're doing and why and let them figure out if they want to bother with telling those other departments to shut up about it.
xtigermaskx@reddit
Yeah leadership told me to just do it and I wasn't really thrilled with the idea of changing every sharepoint sites version limit and time period without know who i'd be affecting.
PorreKaj@reddit (OP)
there are commands for it, however the time it takes to return results were too much, It took more than a day for our biggest site.
I started out setting the versioning to automatic for our 10 biggest sites, and then running the cleanup job - within a few hours 2TB og storage was released just from two of the sites. those numbers alone were justification enough to just send it.
Top 10 sites, change in GiB:
3588 -> 3580
2062 -> 737
1837 -> 484
1640 -> 491
1577 -> 465
1453 -> 515
1372 -> 1309
1327 -> 1326
1278 -> 253
1267 -> 1246
1267 -> 477
Sadly the biggest one were to new to be affected by versioning.
Elensea@reddit
We had like 900gb of data being used on one 20mb excel file because it had hundreds of versions saved.
Master-IT-All@reddit
Oh hey, this is interesting. I had to do some version cleanup but I didn't come across these commands during research so I ended up writing some bastard thing that queries every file and prunes all but the 50 recent versions. This took Days to run.
When this is run, you're enabling automated version control correct?
How quickly from running the commands to seeing versions removed and space freed?
PorreKaj@reddit (OP)
I started with our 10 largest sites, the first TB was released within 2 hours and visible on the graph in SharePoint admin portal the next day. It took less than a day for the top 10 to clear.
Master-IT-All@reddit
Nice. Well done.
PorreKaj@reddit (OP)
And yes, these commands apply site level versioning, and orders the trim job to run.
PorreKaj@reddit (OP)
This is the commands I've been using
Import-Module Microsoft.Online.SharePoint.PowerShell -UseWindowsPowerShell -scope Global -DisableNameChecking$SharepointAdminURL = 'https://shouldhavedonethissooner-admin.sharepoint.com'Connect-SPOService -Url $SharepointAdminURL -UseSystemBrowser $true$allsites = get-sposite -limit all$allsitescount = $allsites.count# Get Batch Delete Status$list = new-object 'collections.generic.list[psobject]'[int]$i = 0Foreach ($siteUrl in $allsites.url){$list.add($(Get-SPOSiteFileVersionBatchDeleteJobProgress -Identity $siteUrl | select-object url,Status,FilesProcessed,StorageReleasedInBytes,@{name='ReleasedGB';Expression={[math]::Round(((($_.StorageReleasedInBytes / 1000)/1000)/1000),2)}}))$i++Write-output "$i of $allsitescount"}($list.ReleasedGB | Measure-Object -sum).sum# order Cleanup of 'NoRequestFound' sites[int]$ii = 0[int]$nocleanupcount = ($list | where-object -property status -eq 'NoRequestFound').countForeach ($siteUrl in ($list | where-object -property status -eq 'NoRequestFound').url){Set-SPOSite -Identity $siteUrl -EnableAutoExpirationVersionTrim $true -confirm:$falseNew-SPOSiteFileVersionBatchDeleteJob -Identity $siteUrl -Automatic -confirm:$false$ii++Write-output "$ii of $nocleanupcount"}admiralspark@reddit
Thyg0d@reddit
Does that also remove the versions outside of the limit?
PorreKaj@reddit (OP)
No, it only trims versions that are outside the rules of the versioning settings for the site. example 2 here:
New-SPOSiteFileVersionBatchDeleteJob (Microsoft.Online.SharePoint.PowerShell) | Microsoft Learn
Thyg0d@reddit
Hmmm need to check command.. I've written a script for it which works but it's like 350 rows and albeit my sites are huge it takes 4hrs to clean one site as its per file.
Sab159@reddit
Intelligent versioning and expiration by age are fairly new settings.
lawno@reddit
I did this recently, too. Make sure your retention policies are set correctly as they will override automatic versioning. I reduced our retention policy significantly but only because we have nightly backups of M365.
log1k@reddit
What did you do with your retention policies to have this run? I thought if you have any retention policies in place on the sites, the trim job on version history can't run at all. Every 6 months I have to exclude dozens of sites, queue up the trim job and then let it run. Then I place the retention policies back on for those sites.
lawno@reddit
I shortened it from years to months. It will trim versions but only versions that have aged out of the site's retention policy.
Szeraax@reddit
Are you going to enable file level archiving next? Bet you can do some crazy work if you do
BeautifulMulberry948@reddit
I am waiting for this. Automatic Versioning + File Level Archiving
Szeraax@reddit
We're using file level archive and seeing good space saving. People like that they can restore things without putting in IT tickets. Wish I could tell SPO to archive anything that hasn't been accessed in 3 years.
log1k@reddit
How were you able to see the results so fast? My understanding was running New-SPOSiteFileVersionBatchDeleteJob queued up the job. But the actual job processed over a week or so, no?
PorreKaj@reddit (OP)
Didn't do anything particular for that. It took a few hours for the first job to change from New to Inprogress, almost half a day for the first few sites to complete, more than a day for top 20 to complete.
I cheked status every once in a while using:
Get-SPOSiteFileVersionBatchDeleteJobProgress -Identity $siteUrl | select-object url,Status,FilesProcessed,StorageReleasedInBytes,@{name='ReleasedGB';Expression={[math]::Round(((($_.StorageReleasedInBytes / 1000)/1000)/1000),2)}})Took about 6 days for all +31.000 sites to complete.
BlockBannington@reddit
We set the limit to 10 versions and remove the rest. Got a lot of asshole saying 'hey, can you get the version from 3 years ago back' or some shit, and they actu irrational when we explain it doesn't work that way. All was communicated well in advance
BillSull73@reddit
cant go lower than 100 in the tenant settings or the GUI. Are you doing this with powershell?
BlockBannington@reddit
Yep
PorreKaj@reddit (OP)
10 versions is pretty strict, those would be fused by autosaved pretty quick yes?
With Automatic Versioning, smaller changes doesn't count as versions, and once files hit 1 month of age the number of versions get trimmed to a few.
BlockBannington@reddit
I believe we only target major versions. Not sure, my colleague took over when I was out but we really needed to save some room. 23 TB now and almost every month we have to buy more as even that strict shit doesn't really help. But then again, they're saving video files at around 30 gigs in there when they definitely shouldn't
BeautifulMulberry948@reddit
I did a lot of cleanup using ShareGate last year for our tenant. It went from only 5 TB free space to 15 TB. Not sure how long it's been since anyone really cleaned it up. I thought changing the version history settings on the site level doesn't retroactively do a cleanup? Or at least that's what I experienced before hence the manual cleanup.
I looked into the automatic setting but unfortunately in our org we have many departments who have different needs when it comes to keeping version history, so the compromise was they can only keep 100 versions and/or within 90 days.
FearlessAwareness469@reddit
does anyone have a recommendation on what they are setting theirs to? ours is 500. saw 10 is too little. so 100 maybe?
PorreKaj@reddit (OP)
There is an "automatic" option. It trims verions automatically as they age.
Long_Inflation_7524@reddit
I bet that a lot of the 4% sites being active is due to O365 Teams/groups getting a site by default.
How recently did you prune SharePoint? On some of our more active sites, we have some serious issues with users checking in/out documents. I've been wondering if that's partly due to the sheer volume of minor versions floating around creating conflicts.
CHRDT01@reddit
This was one of a number of reasons that we disabled group creation by users, the other major ones being the creation of redundant DGs and teams.
If people want to create a new group, it needs to be requested in a ticket with written department head approval. It's a policy that scales horribly, but it works at our size. I've got nothing but respect for the large business admins that get stuck managing that mess.
PorreKaj@reddit (OP)
Started less than a week ago.
Long_Inflation_7524@reddit
Thanks - going to check this out and reclaim/remove. We're at about 90% capacity and need a serious trim. I'm sure much of it is old fluff.
almethai@reddit
ahh, miss the times where I could work on Sharepoint, now stuck in devops, no more onprem sharepoint digging for me ;F
Top-Perspective-4069@reddit
I turned on automatic versioning and it saved us for about 4 months. We are well over capacity again because no one in the company wants to decide what the retention policies should actually be.Â
Curious201@reddit
i like this, but i would be careful with cleanup commands until you have reporting good enough to explain the impact to users and management. sharepoint version bloat is real, especially with libraries that have years of office files and default versioning, but the angry ticket comes when someone expects an older version to exist and it is gone. before running broad cleanup, i would export the biggest sites/libraries, current version settings, storage used, item counts, and maybe the top offenders by file type or library. then set sane version limits going forward and only clean old versions where the business owner signs off. the storage win is nice, but the real win is getting versioning under control before the next 11TB surprise.