| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Zach_the_Lizard 2524 days ago

> Among other things, I had to parse a 80000 line csv file. Splitting and the rest of the processing created so many temporary strings the system ran out of ram. We eventually gave up

Let us all hope that Project Valhalla, the effort to add value types to Java, comes to a swift completion. It would be very helpful in these sorts of scenarios.

Though at this point I'd wonder (as you did) why I'm using Java in the first place given the resource constraints of the system, I've had some success using byte[] arrays and using `yourInputStreamHere.read(someBuffer, yourOffset, YOUR_PAGE_SIZE)` in a few less constrained but relatively more performance critical areas.

Depending on your exact needs, this can sometimes result in more or less constant memory consumption; it's possible to reuse the array on each read. You can also take it a step further by finding the indexes where you would split the array (e.g. the index of the next comma for a CSV) and using methods that operate on an array plus start and end indexes. That sort of strategy can allow you to avoid additional copies the array, at the cost of somewhat annoying and less maintainable code that is definitely not Java-esque.

There are also less severe solutions that won't buffer the whole file in memory, using StringBuilder or other techniques, etc. depending on what you're doing.

This all might be for naught though; I'm assuming the records have to be parsed into some structure, so tricks like the above might not be good enough given Java's seemingly insatiable hunger for heap space.

While I'm not the language's greatest fan, this is one area where Go has been able to shine. Having value types as a possibility makes solving this sort of problem a typically less expensive proposition.

2 comments

dionian 2524 days ago

I'm a little confused, trying to parse an entire file in memory instead of streaming it is one of the mistakes I might make in my first year or two of development. parsing 80k lines of CSV in java is pretty easy as long as you write your code to be efficient and release memory line by line

link

pjmlp 2524 days ago

You can already play with Valhalla.

https://mail.openjdk.java.net/pipermail/valhalla-dev/2019-Ju...

link