| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by fbdab103 740 days ago
	Any notable highlights for a consumer of Numpy who rarely interfaces directly with it? Most of my work is pandas+scipy, with occasionally dropping into the specific numpy algorithm when required. I am much more of an "upgrade when there is a X.1" release kind of guy, so my hat off to those who will bravely be testing the version on my behalf.

2 comments

scoresmoke 740 days ago

The most important changes are deprecations of certain public APIs: https://numpy.org/devdocs/release/2.0.0-notes.html#deprecati...

One new interesting feature, though, is the support for string routines: https://numpy.org/devdocs/reference/routines.strings.html#mo...

link

etbebl 740 days ago

Interesting that the new string library mirrors the introduction of variable-length string arrays in Matlab in 2016 (https://www.mathworks.com/help/matlab/ref/string.html).

link

amelius 740 days ago

> One new interesting feature, though, is the support for string routines

Sounds almost like they're building a language inside a language.

link

ssahoo 740 days ago

No. Native python ops in string suck in performance. String support is absolutely interesting and will enable abstractions for many NLP and LLM use cases without writing native C extensions.

link

ayhanfuat 740 days ago

> Native python ops in string suck in performance.

That’s not true? Python string implementation is very optimized, probably have similar performance to C.

link

mirashii 739 days ago

It is absolutely true that there is massive amounts of room for performance improvements for Python strings and that performance is generally subpar due to implementation decisions/restrictions.

Strings are immutable, so no efficient truncation, concatenation, or modifications of any time, you're always reallocating.

There's no native support for a view of string, so operations like iteration over windows or ranges have to allocate or throw away all the string abstractions.

By nature of how the interpreter stores objects, Strings are always going to have an extra level of indirection compared to what you can do with a language like C.

Python strings have multiple potential underlying representations, and thus have some overhead for managing and dealing with those multiple representations without exposing those details to user code

link

Too 739 days ago

There is a built in memoryview. But it only works on bytes or other objects supporting the buffer protocol, not on strings.

link

csjh 740 days ago

stringzilla[1] has 10x perf on some string operations - maybe they don't suck, but there's definitely room for improvement

[1] - https://github.com/ashvardanian/StringZilla?tab=readme-ov-fi...

link

bvrmn 739 days ago

For numpy applications you have to always box a value to get a new python string. It quite far from fast.

link

topper-123 740 days ago

Yeah, operating on strings has historically been a major weak point of Numpy's. I'm looking forward seeing benchmarks for the new implementation.

link

nerdponx 740 days ago

It's already very much a DSL, and has been for the decade-ish that I've used it.

They're not building a language. They're carefully adding a newly-in-demand feature to a mature, already-built language.

link

ahurmazda 740 days ago

This one will be rough :|

> arange’s start argument is positional-only

link

haiguise 739 days ago

Looks like that might get reverted [0].

[0] https://github.com/numpy/numpy/pull/25955

link

brcmthrowaway 740 days ago

Does numpy use GPU?

link

ahurmazda 740 days ago

No.

You may want to check out cupy

https://cupy.dev/

link

nerdponx 740 days ago

As a more or less daily user, I was surprised at how not-breaking the 2.0 changes will be for 90% of Numpy users. Unless their dependencies/environments break, I expect that casual users won't even notice the upgrade.

Even the new string dtype I expect would go unnoticed by half of users or more, because they won't be using it (because Numpy historically only had fixed-length strings and generally poor support for them) and so won't even think to try it. Pandas meanwhile has had a proper string dtype for a while, so anyone interested in doing serious work on strings in data frames / arrays would presumably be using Pandas anyway.

Most of the breaking changes are in long-deprecated oddball functions that I literally have never seen used in the wild, and in the internal parts that will be a headache for library developers.

The only change that a casual user might actually notice is the change in repr(np.float64(3.0)), from "3.0" to "np.float64(3.0)".

link

cozzyd 739 days ago

I suspect the C ABI break to be the biggest issue, though maybe fewer packages than I imagine compile against the numpy C ABI...

link