Hacker News new | ask | show | jobs
by mwsherman 268 days ago
I move between Go and C#. I wrote a zero-allocation package in Go [1] and then ported to C# — and the allocations exploded!

I had forgotten, or perhaps never realized, that substrings in C# allocate. The solution was Spans.

Notably, it caused me to realize that Go had “spans” designed in from the start.

[1] https://github.com/clipperhouse/uax29

2 comments

Strings in C# are inmutable.

To work with strings you should use StringBuilder.

  > Strings in C# are inmutable.
Yes, but

  > To work with strings you should use StringBuilder.
It helps combine strings together. The author needed the opposite - split/slice strings.
Eric Lippert describes the difference between immutability and what he calls "persistence" and explains why C#/.NET copies the string contents to make a substring: https://stackoverflow.com/a/6750591/814422

Go's strings are also immutable and yet substrings share the same internal memory. Java/JVM also has immutable strings and yet substrings shared the char[] array of the parent string up until Java 7, when they switched to copying instead (for the same reason as .NET): https://mail.openjdk.org/pipermail/core-libs-dev/2012-June/0...

That SO link is really good, thank you for the comment.
No, slices in Go are more akin to ArraySegment but with resizing/copy-on-append. It does not have the same `byref` mechanism .NET supports, which can reference arbitrary memory (GC-owned or otherwise) in a unified way as a single (special) pointer type.
This is wrong.

Slices in Go are not restricted to GC memory. They can also point to stack memory (simply slice a stack-allocated array; though this often fails escape analysis and spills onto the heap anyway), global memory, and non-Go memory.

The three things in a slice are the (arbitrary) pointer, the length, and the capacity: https://go.dev/src/runtime/slice.go

Go's GC recognizes internal pointers, so unlike ArraySegment<T>, there's no requirement to point at the beginning of an allocation, nor any need to store an offset (the pointer is simply advanced instead). Go's GC also recognizes off-heap (foreign) pointers, so the ordinary slice type handles them just fine.

The practical differences between a Go slice []T and a .NET Span<T> are only that:

  1. []T has an extra field (capacity), which is only really used by append()
  2. []T itself can spill onto the managed heap without issue (*)
Go 1.17 even made it easy to construct slices around off-heap memory with unsafe.Slice: https://pkg.go.dev/unsafe#Slice

(*): Span<T> is a "ref struct" which restricts it to the stack (see https://learn.microsoft.com/en-us/dotnet/csharp/language-ref...); whereas, []T can be safely stored anywhere *T can

(can't respond directly and don't have the rep to vouch)

> Span bounds are guaranteed to be correct at all times and compiler explicitly trusts this (unless constructed with unsafe), because span is larger than a single pointer, its assignment is not atomic, therefore observing a torn span will lead to buffer overrun, heap corruption, etc. when such access is not synchronized, which would make .NET not memory safe

Indeed, the lack of this restriction is actually a (minor) problem in Go. It is possible to have a torn slice, string, or interface (the three fat pointers) by mutably sharing such a variable across goroutines. This is the only (known) source of memory unsafety in otherwise safe Go, but it is a notable hole: https://research.swtch.com/gorace

Go pointers can point at the stack or inside objects just fine, they are exactly as expressive as C# unsafe pointers (i.e. more expressive than `ref`).

What Go can't do is create a single-element slice out of a variable or pointer to it. But that just means code duplication if you need to cover both cases, not that it's not expressible at all.

> What Go can't do is create a single-element slice out of a variable or pointer to it.

  var x int
  s := unsafe.Slice(&x, 1)
  fmt.Println(&x == &s[0])
  // Output: true
Good catch! That takes care of the unsafe pointer case, but not the safe ref case.

There's no reason for this to be unsafe - you're asking for a 1-element slice, and the compiler knows that the variable is always going to be there as long as the reference exists.

In C#, `Span<T>` has a (safe) constructor from `ref T`.