A More Interesting Windows Subsystem for Linux Benchmark

Submitted by co60ca on Thu, 09/15/2016 - 19:11

I'll preface this article by saying I'm not a Windows or Linux expert, I know some of the internals of how a general operating system work and applied that knowledge to WSL and Linux.

If you haven't been on the internet or watching the news in some time you may not have heard of Bash for Windows, or perhaps more correctly, Windows Subsystem for Linux. This tool is ground breaking for Microsoft not because it lets you use the bash (bourne again shell) but rather that it lets you run native Linux binaries on a Windows operating system. Many tech reporters are missing the point of WSL's utility. 

This article provides some arguably misleading and non informative statistics. In my article I intend to instead provide an extreme case that will allow you to make your own inferences on the performance of WSL and under which circumstances it will perform best rather than providing statistics based on naïve understanding of the tools. The article in question ran one test that was rather deceiving:
 

dd if=/dev/zero of=testfile bs=1G count=1 oflag=dsync

Apparently, in WSL the translation layer ignores the synchronization. Therefore, whereas using oflag=dsync on Linux would obviously result in poor performance it performs so-so on the WSL. Instead of redoing the test without the flag the author chose to include this one. Why they did this? I suppose the forgot the reason they were writing the article and wanted to show an edge case.

For a more interesting benchmark for WSL we look at how WSL work under the hood. More details are available here, however the only important part to know is that operations that do not include syscalls will be fast on both systems, and anything that includes syscalls with be reasonably slower. We then, iterate over these slow syscalls to get an application that runs in a long enough time that we can determine that the difference in time is not due to chance and with a high enough time resolution to compare.

Diagram showing the Linux programs go though Lxss.sys in order to remap the syscalls to Windows native ones.
Microsoft blog post with high level description of the execution of a Linux native application

From the image you can see that the Linux program is routed through the translation layer. What we then want to do to show stark contrast is only use code that we expect will force the translation layer to do extra work and the Linux side will cleanly execute without translation. Instead of compiling a program we use a bash script. It is Bash on Windows right?

#!/bin/bash

# 1000 iterations
for i in {0..1000}
do
    # Make a new file and ensure we write some text to it
    echo "test" >> /tmp/testfile"$i"
    # Read the file back into the sinkhole file
    echo /tmp/testfile"$i" >> /dev/null
    # Remove the file once we've read it
    rm /tmp/testfile"$i"
    # Attempt to prevent the OS from asynchronously deleting the file
    # This code never actually runs the inner section and was just a safeguard,
    # It however attempts to stat the file which is also a syscall
    while [ "$(stat /tmp/testfile$i 2> /dev/null" ]; do
        echo "File was not deleted yet"
    done
done

Additionally, the above code was run 5 times to get a range of performances.

WSL Performance
Trial real user sys
1 27.268s 0.875s 25.719s
2 28.000s 0.781s 26.516s
3 28.219s 0.891s 26.563s
4 27.323s 0.953s 25.578s
5 27.437s 0.875s 25.469s
Avg 27.6494s 0.875s 25.969s

 

Linux Performance
Trial real user sys
1 1.526s 0.063s 0.233s
2 1.526s 0.083s 0.217s
3 1.508s 0.100s 0.200s
4 1.491s 0.087s 0.203s
5 1.519s 0.097s 0.200s
Avg 1.514s 0.086s 0.2106s

 

In conclusion, we can see that Windows took 18.26x longer than Linux for this benchmark. Obviously this is not a perfect example of a correct use case but the performance of syscalls on WSL should be a concern if you intend to write applications or optimize applications for WSL. Applications that heavily rely on system operations will be significantly worse on WSL. Optimizations can be made by reducing the number of syscalls in your application if at all possible.