Hacker News new | ask | show | jobs
by fauigerzigerk 4594 days ago
>As an aside, there is a well-known gotcha in Java and other languages that when you slice a string to save a small piece, the reference to the original keeps the entire original string in memory even though only a small amount is still needed. Go has this gotcha too.

That is no longer the case in Java. String.substring() now makes a copy. I think it doesn't matter much which of the two approaches a language takes as long as everybody knows it. This needs to be in the language spec and can't be an implementation issue.

1 comments

for historical note, this change was made in may 2012 Java 7u6
That's kind of a big deal for a point release...
Still, it is a change that could dramatically affect the behavior of some programs. A program that takes many references to one string might consume a lot of memory when those references become copies.
Here's more nitty-gritty from Oracle: http://mail.openjdk.java.net/pipermail/core-libs-dev/2012-Ma...

If I'm following, a Java string used to be a []char and offset/count ints, and this change let them drop those ints. You saved RAM if you had a lot of little strings, but paid for extra copying if you took lots of substrings.

Go slices/strings don't have a pointer to the "original" backing array, just a pointer to the first byte in this (sub)string. It doesn't need extra fields to do substrings by reference.

I think part of the technical reason for the different string headers is that the Java designers didn't want their GC to have to handle "internal pointers" into strings/objects (maybe for performance reasons?), whereas the Go designers decided to support 'em (maybe to support more C-like code in Go?).

Go does not support internal pointers into strings. You have to use slicing for that.
Sorry, I mean that there's an internal pointer in Go's in-memory representation of the string, not that there's a naked byte pointer directly visible to the programmer.

Go's GC's support for internal pointers means it can use a pointer-and-length representation for substring references. Java's lack of support for them means its string representation needs a pointer to the start of the char array and a separate offset and count in order to do the same substring-reference trick. (And, I'm saying, that helps explain why Java and Go now do substrings differently.)

There are other places where Go's ability to use internal pointers is exposed more directly to the programmer: for example, Go lets you take the address of an array element or struct field and pass around the resulting pointer.