Hacker News new | ask | show | jobs
by spankalee 93 days ago
Avoiding Java's string footguns is an interesting problem in programming languages design.

The String.format() problem is most immediately a bad compiler and bad implementation, IMO. It's not difficult to special-case literal strings as the first argument, do parsing at compile time, and pass in a structured representation. The method could also do runtime caching. Even a very small LRU cache would fix a lot of common cases. At the very least they should let you make a formatter from a specific format string and reuse it, like you can with regexes, to explicitly opt into better performance.

But ultimately the string templates proposal should come back and fix this at the language level. Better syntax and guaranteed compile-time construction of the template. The language should help the developer do the fast thing.

String concatenation is a little trickier. In a JIT'ed language you have a lot of options for making a hierarchy of string implementations that optimize different usage patterns, and still be fast - and what you really want for concatenation is a RopeString, like JS VMs have, that simply references the other strings. The issue is that you don't want virtual calls for hot-path string method calls.

Java chose a single final class so all calls are direct. But they should have been able to have a very small sealed class hierarchy where most methods are final and directly callable, and the virtual methods for accessing storage are devirtualized in optimized methods that only ever see one or two classes through a call site.

To me, that's a small complexity cost to make common string patterns fast, instead of requiring StringBuilder.

5 comments

It's interesting to see how something like Common Lisp handles the format issue.

In CL, there's a general infrastructure called "compiler macros" that is intended as a hint to the compiler to expand calls as macros at compile time. The macro is also allowed to just leave the form unexpanded, in which case it defaults to an unexpanded function call. And the function can be turned into a value itself and passed around, even if the compiler macro exists.

For CL's format, this means an implementation will typically have a compiler macro (or some similar mechanism) that does an expansion if the format is a string constant.

CL also has a function called formatter that takes a format string and returns a function that acts like (lambda (&rest args) (apply #'format <the format string> args). This function can be implemented as something that expands the format string into code and then compiles the code.

The mechanisms in CL would allow a user to implement the equivalent of a format compiler macro (and formatter) even if the implementation didn't provide them.

> But ultimately the string templates proposal should come back and fix this at the language level.

They tried, its opponents dilluted it to the point of uselessness and now will forever use this failed attempt as a wedge.

I'm sorry, I don't believe Java will get sensible String templates in our life time.

Yeah, Java is pretty fast despite the fact that it still has these kinds of obviously suboptimal things going on.

I love how Zig, D and Rust do exactly what you say: parse the format string at compile time, making it super efficient at runtime (no parsing, no regex, just the optimal code to get the string you need).

I say this but I write most of my code in Java/Kotlin :D . I just wish I could write more low-level languages for super efficient code, but for what I do, Java is more than enough.

Kotlin string interpolation turns into the fancy invokedynamic based string concatenation behind the scenes so it's very optimized: https://openjdk.org/jeps/280
I admire Rich Hickey's approach of building on top of the Java ecosystem for this reason, adding a functional first approach with emphasis on data structures, where using the right algorithms comes naturally.
> Zig, D and Rust

Also C++, which works the same way.

Sometimes running strace on jvm software you will see some sycall patterns that are incredibly inefficient.
Perhaps something like zig’s comptime would help a bit.