Compress a file with PowerShell

Jeff Hicks over at The Lonely Administrator recently pointed out one of his older blog posts on the topic of compressing a file. This has always struck me as one of the odd gaps in PowerShell’s core capabilities. There are so many ways to accomplish it yet no one at Microsoft thought to include that with PowerShell. The thing that always struck me as odd was that Microsoft identifies ‘Compress’ as a valid verb in their MSDN documentation, however no such cmdlet ships in PowerShell core with that verb.

What I found interesting about Jeff’s post was that I learned yet another way to compress a file. His code utilized the compress method on the WMI CIM_DATAFILE class. I did it for years using “Shell. Application”. I ended up writing this one to use .NET, System.IO.Compression.GzipStream. I’ve added to it and cleaned it up over time. With the current gap in PowerShell core it’s defiantly one of my go to cmdlet/functions. Seeing Jeff’s post prompted me to dig it out and post it up here.

Here is the complete function with embed help and everything.

CompressFile.ps1

I’ll break it down below.

Function Compress-JRFile {
[CmdletBinding()]
Param(
    [Parameter(Position=0,ValueFromPipelineByPropertyName=$true,Mandatory=$True)]
    [Alias("FullName")]
    [string]$InputFile,
    [string]$OutputPath,
    [string]$CompressExtension = ".zip",
    [switch]$Delete
)

The usual CmdletBinding for advanced functions. The basic parameters are: InputFile, what you want to compress; OutputPath, put compressed file somewhere other then next to input. I also created parameters for your choice of extension, and whether to delete the original.

$InputFile = (Resolve-Path -Path $InputFile).Path            

If ($OutputPath -eq "")
{
    $OutputFile = [IO.Path]::ChangeExtension($InputFile,$CompressExtension)
}
Else
{
    $CompletePath = Join-Path -Path ($(Resolve-Path -Path $OutputPath).Path) -ChildPath ((Get-ItemProperty -Path $InputFile).Name)
    $OutputFile = [IO.Path]::ChangeExtension($CompletePath,$CompressExtension)
}

I resolve and prepare all of the input and output paths, whether or not they are relative, as well as handling whether or not the OutputPath parameter is specified or not.

Try
{
    $SourceFile = New-Object System.IO.FileStream $InputFile, ([IO.FileMode]::Open), ([IO.FileAccess]::Read), ([IO.FileShare]::Read)
    $SourceFileSize = $(Get-ItemProperty -Path $InputFile).Length
}
Catch [System.IO.IOException]
{
    Throw "Unable to access $InputFile"
    Exit
}            

$SourceBuffer = New-Object byte[]($SourceFile.Length)
$SourceBytes = $SourceFile.Read($SourceBuffer,0,$SourceFile.Length)
$SourceFile.Close()

I move on to prepping the source file. I have used this cmdlet for predominately large IIS log files. In an early version of this cmdlet I used Get-Content to initially read the file in, however I learned that Get-Content doesn’t operate very well on large files. So I just switched over to reading the file using .NET. I also have my Try/Catch in place with a Throw in case the files are unreadable for some reason. I think this might be a place I could improve it. I don’t think I’ve ever expressly tested or even encountered the Catch in real world usage. Not sure if I could or should be doing something differently here.

Try
{
    $GzipFile = New-Object System.IO.FileStream $OutputFile, ([IO.FileMode]::Create), ([IO.FileAccess]::Write), ([IO.FileShare]::None)
}
Catch [System.IO.IOException]
{
    Throw "Unable to create $OutputFile"
    Exit
}            

$GzipStream = New-Object System.IO.Compression.GzipStream $GzipFile, ([IO.Compression.CompressionMode]::Compress)
$GzipStream.Write($SourceBuffer, 0, $SourceBuffer.Length)
$GzipStream.Close()
$GzipFile.Close()            

$GzipFileSize = $(Get-ItemProperty -Path $OutputFile).Length

The actual writing of the compression stream is pretty similar to the reading portion as far as the Try/Catch/Throw goes. Most of the .NET was taken was from examples I found from MSDN and elsewhere. I’m not expressly a programmer by trade or training. I have found I have learned a fair bit of C# and .NET while attempting to develop and solve my more atypical PowerShell problems.

If($Delete)
{
    Remove-Item -Path $InputFile
    $IsDeleted = $true
}
Else
{
    $IsDeleted = $false
}            

$ObjectProperties = @{ "SourceFile" = $InputFile;
                       "SourceSize" = $SourceFileSize;
                       "OutputFile" = $OutputFile;
                       "OutputSize" = $GzipFileSize;
                       "SourceFileDeleted" = $IsDeleted;
                       "CompressionRatio" = [int]$(($GzipFileSize / $SourceFileSize) * 100 )
                     }            

$Object = New-Object -Type PSObject -Property $ObjectProperties            

Write-Output $Object
}

Mostly the tail end of a typical advanced function, and what makes PowerShell uniquely PowerShell. I handle my Delete parameter and do the needful. Lastly I create a hashtable with all the properties that I intend to send along with my object. I’ve seen so much great (and no so great) PowerShell code over the years but so many people go to all the trouble of writing their code to solve their problems and then shot themselves in the foot by not outputting a simple object to close out their code (ideally a function).

I tend to capture and output anything that I might need now or in the future so I have it. You could probably get carried away but I have found most of what I output useful over time. One thing on my to-do list with this and my stable of other cmdlet/functions is prepare format.ps1xml files for my custom functions. This way they would look a little prettier in the shell. I would need to modify the $Object further before sending it along to make this work. To this point I mostly just use Select-Object to filter out what I’m really after or just don’t care because I never see it on its way to some other cmdlet.

Well hopefully I can retire this someday when Microsoft adds this functionality to the PowerShell core. Until then I’ll happily continue to use this one, as well as my other gap fillers. The nice thing about a properly built PowerShell advanced function is that anyone with basic PowerShell skills can use your cmdlet and never need to have the programming skills necessary to accomplish a common task like this.

Advertisements

7 responses to “Compress a file with PowerShell

  1. Hi Joel, this is a very handy and well written script you have supplied. I agree, you would think by now powershell would have some kind of built in cmdlet to compress files. For someone who is new to powershell I was able to easily follow your script and successfully compress some IIS weblogs. I was wondering, have you had success using this script to compress log files over 1 GB? I was testing it out successfully using a handful of web logs ranging from 100 ~ 200 MB in size, but trying to compress a weblog that’s ~1.3GB results in an error:

    New-Object : Exception calling “.ctor” with “1” argument(s): “Exception of type
    ‘System.OutOfMemoryException’ was thrown.”

    Thinking it was an issue with memory on the machine I ran the script on, I tried on a Windows 2003 server with 5 GB of ram, but alas the same error (unless that’s not enough either). I’ve also tried supplying the input file by piping the path with *.log using get-childitem and just supplying the file path to the exact log file to be sure it wasn’t an issue with my get-childitem piping. Either method results in the error as well. Would you have any suggestions for compressing weblogs over 1 GB? Any help you could provide would be greatly appreciated.

  2. Ben I’m glad you found it useful. You didn’t mention whether you were running it on a 32bit or 64bit version of the OS and or which version of the PowerShell console you were running it under. I suspect it has something to do with that. I have not expressly tried it with a 1GB plus file but I do use similar code to regularly compress my IIS logs and I’m pretty sure some are over 1GB.

    I use mostly 2008R2 servers which are 64bit with 8GB or more of ram. If you are running on 64bit, make sure you are using the 64bit version of PowerShell console which would also in turn utilize the 64bit version of the .NET Framework that hosts the objects and methods the script is calling.

    I suspect that the system.OutOfMemoryException your seeing is not from lack of system memory (i.e. the 5 gigs) but the lack of process memory as each 32bit process can only utilize 2GB of memory. While the logfile itself may be only 1.2 Gig in size there likely is some added .NET and PowerShell overhead in the process memory space as well. Take a look at the PowerShell.exe process in Task Manager while you are running the script against your log, you’ll probably see it top out just over or under 2GB.

    Thanks,
    Joel.

    • Thanks for the quick reply. Sorry I left out some details in my original post. I am using PowerShell 1.0 on 32-bit Windows Server 2003 Enterprise Edition. I’m guessing this could be the issue since I’m using this on an outdated environment compared to what you’ve had success running it on. I tested this out a few more times, keeping an eye on the powershell.exe process in task manager and the process peaked at ~33 MB of memory usage each time until the script errors.

      Thanks again,
      Ben

  3. Thanks for the quick reply. Sorry I missed a few details in my first post. I am using the 32-bit version of Windows Server 2003 Enterprise Edition and PowerShell v1.0. I’m guessing that could be the issue. I tested this out a few times again keeping an eye on the powershell.exe process each time and the memory usage for the powershell.exe process peaked at about 33 MB each time.

    Thanks again,
    Ben

  4. Hi Joel,

    I’m looking to use your function to help me automate the compression of my database backup files, but when I try to run it on a Large file I am receiving the fillowing error

    New-Object : Cannot convert argument “0”, with value: “19533531648”, for “Byte[]” to type “System.Int32”: “Cannot convert value “19533531648” to type “System.Int32”. Error: “Value was either too large or too small for an Int32
    .””
    At CompressFile.ps1:74 char:31
    + $SourceBuffer = New-Object <<<< byte[]($SourceFile.Length)
    + CategoryInfo : InvalidOperation: (:) [New-Object], MethodException
    + FullyQualifiedErrorId : ConstructorInvokedThrowException,Microsoft.PowerShell.Commands.NewObjectCommand

    I believe this is related to the max value for System.Int32 being 2,147,483,647 and I'm trying to compress a file that is larger. Can you help me get around this?

  5. Hi, the function runs, and creates files just the way it should, but I get a “file does not exist” when trying to extract and a “file is invalid” when trying to open. The file size looks reasonable and the output indicates shows the original size and compressed size.
    example.
    OutputSize : 23009
    SourceFile : w:\qtrtst\bj10-Oct-12.csv
    SourceSize : 77346
    SourceFileDeleted : False
    OutputFile : F:\tmptst\bj10-Oct-12.zip
    CompressionRatio : 30

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s