Ever since I started using PowerShell (back in 2012) for automating boring tasks or automating complex tasks, I always thought PowerShell Command-lets provided in powershell were the best to use not only for the ease of use but also for the performance. I always thought PowerShell Cmdlet provide high performance than any other alternatives available. However, when I needed to work with large text files (we still have to work on text files and they are very large), I needed to wait for result longer than usual.
Before looking at the script and comparing the performance, let’s look at the requirement:
I needed to take around 10% of the lines from the file that had 260 Million lines and that was 22 GB large. To be precise, I needed to take 29,331,402 lines out of the 259 Million lines. As always, I used simple powershell script to do this. It took more than half an hour and I just cancelled the script in between. I’d never seen powershell script taking that long time for any large tasks before. So, I decided to try some other alternatives and benchmark the output. I hope this will be beneficial to you as well.
To Benchmark this process, I took smaller file with 2405 MB File size and 29.33 Million lines. The test I performed on is Dell Latitude E7450 with i7-5600 2.6 GHz CPU, 16 GB Ram and Solid-state drive. Then, I’m trying to take First 1 Million lines from this text file and write to newer one.
Measure-Command{ Get-Content “D:\largefile.txt” -TotalCount 1000000 | out-file d:\largefile1m.txt -Encoding ascii }
Execution Time: 6 Minute(s) 41 Second(s) 479 Milliseconds.
Not quiet impressive right?
Let’s try another alternative, again through Powershell Script.
Measure–Command{ $file = [IO.File]::OpenText(“D:\largefile.txt”) $output= New–Object System.IO.StreamWriter(“D:\largefile1m.txt”); $count=0; while ($count++ –lt 1000000) { $line = $file.ReadLine() $output.WriteLine($line); $count++; } $file.Dispose() $output.Close(); }
Execution Time: 22 Second(s) 445 Millisecond(s)
In the above script, I used .Net Framework’s IO classes through Powershell to read and write files. We can see the second script is 18 times faster than first one.
Why not we try simple C# console application that does all of this and benchmark this.
static void Main(string[] args) { Stopwatch sw= new Stopwatch(); sw.Start(); StreamReader fileReader=new StreamReader(@”D:\largefile.txt”); StreamWriter fileWriter = new StreamWriter(@”D:\largefile1m.txt”); int line = 0; while (line++ < 1000000) { fileWriter.WriteLine(fileReader.ReadLine()); } fileWriter.Flush(); fileWriter.Close(); fileReader.Close(); sw.Stop(); Console.WriteLine(“Total Time: {0} Minutes {1} Seconds {2} Milliseconds”,sw.Elapsed.Minutes,sw.Elapsed.Seconds,sw.Elapsed.Milliseconds); }
Execution Time: 0 Minute(s) 2 Second(s) 816 Milliseconds.
I was surprised this time again, With .Net completing this all in just 3 seconds and giving the same output. If writing a simple console application and compiling with .Net framework gives 133 times performance why not to use Visual Code and write simple code?
If you’ve made until this point, why don’t you give some feedback to me.