Hacker News new | ask | show | jobs
Go Data Structures (2009) (research.swtch.com)
72 points by zachorr 4594 days ago
4 comments

>As an aside, there is a well-known gotcha in Java and other languages that when you slice a string to save a small piece, the reference to the original keeps the entire original string in memory even though only a small amount is still needed. Go has this gotcha too.

That is no longer the case in Java. String.substring() now makes a copy. I think it doesn't matter much which of the two approaches a language takes as long as everybody knows it. This needs to be in the language spec and can't be an implementation issue.

for historical note, this change was made in may 2012 Java 7u6
That's kind of a big deal for a point release...
Still, it is a change that could dramatically affect the behavior of some programs. A program that takes many references to one string might consume a lot of memory when those references become copies.
Here's more nitty-gritty from Oracle: http://mail.openjdk.java.net/pipermail/core-libs-dev/2012-Ma...

If I'm following, a Java string used to be a []char and offset/count ints, and this change let them drop those ints. You saved RAM if you had a lot of little strings, but paid for extra copying if you took lots of substrings.

Go slices/strings don't have a pointer to the "original" backing array, just a pointer to the first byte in this (sub)string. It doesn't need extra fields to do substrings by reference.

I think part of the technical reason for the different string headers is that the Java designers didn't want their GC to have to handle "internal pointers" into strings/objects (maybe for performance reasons?), whereas the Go designers decided to support 'em (maybe to support more C-like code in Go?).

Go does not support internal pointers into strings. You have to use slicing for that.
Note that this is from 2009. Although the main details have not changed, the int type is more commonly 64 bits now (since 64 bit architectures are much more common)
Do you know what version that happens in? I tested on my 32 and 64 bit platforms with golang 1.1 and a static definition of an integer results in type int (which is explicitly 32 bit)

  package main
  import "fmt"
  import "reflect"
  func main() {
      i := 3
      z := reflect.ValueOf(i)
      fmt.Printf("%s\n", z.Kind()) 
  }
  // $./test
  // int
  // $
It's my understanding that this intentional and won't change, only explicit declarations of int64 are 64-bit.
It is implementation-specific (from the spec:)

There is also a set of predeclared numeric types with implementation-specific sizes:

uint either 32 or 64 bits

int same size as uint

uintptr an unsigned integer large enough to store the uninterpreted bits of a pointer value

The size of int on 64-bit systems was increased to 64 bits as of Go 1.1: http://golang.org/doc/go1.1#int
Cool, thanks for the clarification, this makes sense!
i := 3 means declare i to be an "int", which is the default numeric type. The size of that int will vary from platform to platform. See http://golang.org/ref/spec#Numeric_types
Awesome, thanks!
I wish there were a way to create custom data structures without casting to and from interface{} all the time. Heck, it would already help if there were a shorthand for interface{}, like "any" or something.
The usual pattern is to use a type that requires the thing you pass in to have methods that you use for the data structure, like sort.Interface[1]. It's faster, safer, and better than using interface{}.

As for shorthand, behold!

    type any interface{}
[1]: http://golang.org/pkg/sort/#Interface
That introduces a new named type though, i.e., the "any" in your package is different from the "any" in mine, which is not what I want.

(Unless I'm mistaken here, which might very well be the case.)

Check it out: http://play.golang.org/p/l9yn0PRbrd

Anyway, that's the point of go's type inference- if the object implements the necessary parts of the interface, it counts as that kind of object.

make(* Point) seems much better than having a separate new keyword. Surprised to hear that was changed after just a few days.
The "new" keyword is practically unused in modern Go development, but is kept for backwards compatibility. The usual way to make a point is "p := &Point{}", without using any keyword.
Not true. I count "new" being used about half as often as "&Point{}" in the Go standard library. That's not "practically unused".

  g% cg -c -f 'g/go/src/pkg.*\.go' '\bnew\(' | total 2
  1485
  g% cg -c -f 'g/go/src/pkg.*\.go' '\&[A-Za-z0-9_.]+\{' | total 2
  3051
  g% cg -c -f 'g/go/src/pkg.*\.go' . | total 2
  430482
  g%
So 430,482 non-blank lines of code, 1485 lines with new, 3051 lines that look like a struct pointer literal.
If I had to guess, I think they meant that anyone writing new code will avoid using 'new'.