Restless Bytes

A few weeks ago I came across this intriguing little post: Hello world.

I found that to be an interesting investigation and decided to do something similar, but instead of simply repeating the whole process with some new languages plugged in I wanted to make it more thematic and come up with and test a few hypotheses:

Script vs Executable

The author of the original post has “deliberately chosen not to account for” the difference between “interpreted” and “compiled” modes of execution. I do want to account for that and so the first hypothesis is:

1. Executables need less syscalls than scripts.

That should be the case, taking into considering the fact that you fire up your whole interpreter every time you run a script.

It’s a Lisp!

I’m a big Lisp fan. I like to dive into and tinkering around with various flavours of that family of languages. Also, since most languages in the original post are Algol/C-like languages I want to add some Lisp flavour to the list.

Now, the thing is that Lisp isn’t known for its speed of performance with the remarkable exception of Common Lisp. So my second hypothesis is:

2. Algol family languages do better than Lisp languages.

The (not-yet-holy) Graal¹ of the JVM

I’ve been experimenting a lot with Oracle’s GraalVM lately and am kind of hooked on what you can do with it. I won’t go into it here but two ot the features of GraalVM are better overall performance and the ability to sort of compile JVM bytecode to native binaries called “native images”. I want to include both features into my test and so my third and fourth hypotheses are going to be:

Java on GraalVM perfoms better than Java on OpenJDK (“HotSpot”).
4. Native images performe better than “ordinary” binaries.

Tests and results

Disclaimer 1: A program that does less system calls isn’t necessarily better than one that does more.

While I agree with the statement in the original post that a lot of syscalls add a lot of complexity and impact performance negatively, I do think that it isn’t that big of an issue. Since most of the programs were run only once, effectively perfoming a cold start, a lot of time is spent on context switching, memory allocation, populating caches and pre-loading frequently used objects and libraries, this initial overhead pays out eventually, especially if you’re doing a bit more than print “hello” and exit.

Disclaimer 2: It should go without saying that all of the hypotheses above were made up before the tests. I had no idea which ones will hold and which ones will break.

With that out of the way, let’s have a quick look at the test setup:

First of all, I’ve settled on 7 languages in total, with 4 of them being Algol-like (C, Java, JavaScript and Python) and 3 being Lisps (Common Lisp, Racket, Clojure). Furthermore, Common Lisp comes in two flavours: Steel Bank Common Lisp (SBCL) and Embeddable Common Lisp (ECL).

5 of those languages are also featured on GraalVM: Java, Clojure, C, Python and JavaScript, the latter two are even exclusively tested on GraalVM. Besides simply running those languages on the GraalVM, another important aspect that we want to cover is the performance of native images created from those languages (where possible).

Most languages in our test support more than one “mode of execution” which means that you can run them directly as a script or compile them first and run them via an executable. One language, however - namely C - can only be executed as compile executables while two of them - Python and JavaScript - cannot be compiled and are therefore always run as scripts.

Speaking of modes of execution, some degree of variation persists for executables as well, due to the different ways of creating them. For instance, binaries can be built with libraries linked dynamically or statically. Java can be executed as bytecode or compiled and executed as a jar file. And don’t forget about GraalVM native images!

By the way, all source code, setup, build, and run scripts etc. used in this post can be found on github - check it out (pun intended) in case you’re interested :)

But now, with the preparations done and working hypotheses in place, we can finally dive straight into testing and evaluation - and here we go - the results!²

rank	Language	Approach	time (sec)	syscalls	errors	size (script or binary)
1	C	static	0.000000	12	1	852K
2	C	native img (LLVM, static)	0.000000	12	1	860K
3	C	dynamic	0.000000	34	2	17K
4	C	native img (LLVM)	0.000000	34	2	21K
5	Clojure	native img (jar, static)	0.000248	63	2	9.7M
6	Clojure	native img (jar)	0.005229	127	3	8M
7	Java	native img (class)	0.000687	128	3	6.9M
8	Java	graalVM (java)	0.010598	158	40	428
9	Java	javac / java	0.012145	159	41	428
10	Clojure	uberjar	0.044244	175	41	4.4M
11	JavaScript	graalVM (node)	0.001197	196	101	38
12	SBCL	binary	0.000497	225	4	37M
13	SBCL	Lisp image	0.000860	233	4	43M
14	JavaScript	graalVM (js)	0.000292	262	5	38
15	ECL	binary	0.000740	298	16	15K
16	ECL	script	0.001654	336	32	66
17	SBCL	script	0.000806	334	16	66
18	C	graalVM (lli)	0.003084	411	12	21K
19	Clojure	script	0.112218	520	83	43
20	Clojure	lein run	0.377057	780	114	130
21	Racket	binary	0.003183	1612	11	1.2M
22	Python	graalpython	0.001441	1703	43	28
23	Racket	script	0.001598	2904	89	49

Discussion

Description of the results

The languages in question were tested in a total of 23 different setups, taking into consideration different modes of execution per language and platform.

The overall best language in terms of “least number of syscalls” is C in the form of statically linked binaries created via gcc and a total of 12 syscalls. It was ranked no. 1 instead of the C native images due to the slightly smaller size of the binary file.

The worst language turned out to be Racket with 2904 system calls for a simple “Hello World” which is 2892 system calls more than the first place.

The upper half of the ranking (i.e. ranks 1 to 12 inclusive) consists of C, Java, Clojure, JavaScript and SBCL. Taking into consideration modes of execution then C appears 4 times, Java and Clojure 3 times and JavaScript and SBCL once. If we look at modes of execution only then 9 languages are executed as executables while 3 of them were run as scripts.

The lower half of the ranking (i.e. ranks 13 to 23) consists of SBCL, ECL, JavaScript, Python, C, Clojure and Racket. Taking into consideration modes of execution then Clojure, SBCL, ECL and JavaScript each appear twice and JavaScript, C and Python only once. Furthermore, 7 languages were run as scripts while 4 were run via executables.

Hypothesis checks

1. Executables need less syscalls than scripts.

Yes, but that does not really come as a surprise. Only 3 of 12 places in the upper half go to “interpreted” modes of execution (Java (2x) and JavaScript). Running scripts via an interpreter means that you first have to fire up your interpreter. This also means a “cold start” in a lot of cases, so no libraries, caches, etc available from previous runs.

2. Algol family languages do better than Lisp languages.

Yes, that’s true as well. Starting with Clojure ranked fifth, most Lisps are found in the lower half of the ranking and since Racket is a Lisp they even got the last place.

3. Java on GraalVM perfoms better than Java on OpenJDK (“HotSpot”).

No, that one doesn’t really hold: Java on GraalVM took 158 syscalls vs 159 syscalls by Java on OpenJDK which doesn’t make much of a difference.

4. Native images perform better than “ordinary” binaries.

Yes, native images fare better, not only in terms of syscalls but performance as well.

Since we only produced native images from Clojure and C (and Java), I won’t compare them with all binaries rather than, well, binaries and “executables” from Clojure and C (and not Java since I didn’t create a jars from Java):

C native images are pretty much on par with C binaries though they tend to be slightly “bigger” (as in “a few kB bigger”).

Comparing (uber-) jars with native images, things get a little more interesting: Clojure uberjars make 48 syscalls more than Clojure native images. This number goes down by another 64 syscalls if images were statically linked. (175 vs 63).

Add to that the fact that the uberjar took 0.044 seconds while the native images took only 0.005 (dyn.) and 0.0002 (stat.) seconds.

The only drawback is that native images were roughly double the size of uberjars.

5. Conclusions

3 of our 4 hypotheses turned out to be correct which is pretty solid.

I was also very delighted to see how good native images performed. I think that this will become a pretty important topic in the Java / Container world in the coming years.

What really surprised me in a not-so-good way was Racket’s performance: ~2,900 system calls is awful!

On the bright side, despite making thousands of syscalls, Racket manages to be pretty fast and to produce relatively small binaries. So there’s some redemption to be achieved here, I guess.

But, overall, a funny and intriguing little experiment :)

“graal” is an archaic form of “grail”.↩︎
ranked in ascending order; ordered by no. of syscalls > time (sec) > size.↩︎