Hacker News new | ask | show | jobs
by scoresmoke 731 days ago
The most important changes are deprecations of certain public APIs: https://numpy.org/devdocs/release/2.0.0-notes.html#deprecati...

One new interesting feature, though, is the support for string routines: https://numpy.org/devdocs/reference/routines.strings.html#mo...

4 comments

Interesting that the new string library mirrors the introduction of variable-length string arrays in Matlab in 2016 (https://www.mathworks.com/help/matlab/ref/string.html).
> One new interesting feature, though, is the support for string routines

Sounds almost like they're building a language inside a language.

No. Native python ops in string suck in performance. String support is absolutely interesting and will enable abstractions for many NLP and LLM use cases without writing native C extensions.
> Native python ops in string suck in performance.

That’s not true? Python string implementation is very optimized, probably have similar performance to C.

It is absolutely true that there is massive amounts of room for performance improvements for Python strings and that performance is generally subpar due to implementation decisions/restrictions.

Strings are immutable, so no efficient truncation, concatenation, or modifications of any time, you're always reallocating.

There's no native support for a view of string, so operations like iteration over windows or ranges have to allocate or throw away all the string abstractions.

By nature of how the interpreter stores objects, Strings are always going to have an extra level of indirection compared to what you can do with a language like C.

Python strings have multiple potential underlying representations, and thus have some overhead for managing and dealing with those multiple representations without exposing those details to user code

There is a built in memoryview. But it only works on bytes or other objects supporting the buffer protocol, not on strings.
stringzilla[1] has 10x perf on some string operations - maybe they don't suck, but there's definitely room for improvement

[1] - https://github.com/ashvardanian/StringZilla?tab=readme-ov-fi...

For numpy applications you have to always box a value to get a new python string. It quite far from fast.
Yeah, operating on strings has historically been a major weak point of Numpy's. I'm looking forward seeing benchmarks for the new implementation.
It's already very much a DSL, and has been for the decade-ish that I've used it.

They're not building a language. They're carefully adding a newly-in-demand feature to a mature, already-built language.

This one will be rough :|

> arange’s start argument is positional-only

Looks like that might get reverted [0].

[0] https://github.com/numpy/numpy/pull/25955

Does numpy use GPU?
No.

You may want to check out cupy

https://cupy.dev/