| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by DanielBryars 798 days ago
	What's the utility of defining the "Error" exception. Why not use an existing one, say InvalidOperationException, or a plain Exception. Is making your own better practice?

3 comments

gnabgib 798 days ago

There is no utility. It's perhaps written for JavaScript developers who are used to Error.. but it's not idiomatic C#. Might be indicative of a copilot too.

The use of a class-scoped `StringBuilder` that only one method uses, and `ReadQuotedColumn`/`ReadNonQuotedColumn` yielding one character at a time, rather than accepting a the builder isn't a good sign either (for efficiency). Or casting everything to a `char` (this won't support UTF8), or assuming an end quote followed by anything (:71) is valid way to end a field.

neonsunset 798 days ago

C# `char` is a UTF-16 code unit. It does not indicate a byte which is just `byte`.

Having StringBuilder be a private field on the parser instance is not an issue either - it is simply reused.

giaour 798 days ago

Iterating over the `char`s does not support the full range of what can be stored in a C# string (for instance, UTF-8 graphemes that are serialized as surrogate pairs are usually two `char`s in a C# string.

.Net provides a TextElementEnumerator that will iterate over graphemes instead: https://learn.microsoft.com/en-us/dotnet/api/system.globaliz...

There's a fairly comprehensive guide to working with .net character encodings at https://learn.microsoft.com/en-us/dotnet/standard/base-types... .

neonsunset 798 days ago

The return value of StreamReader.Read() will always be within bounds of -1 and char.MaxValue.

All surrogate pairs will be drained into the StringBuilder, working correctly. Most implementations usually agree that torn UTF-16 surrogate pairs (which are strictly the code points outside of basic multilingual plane) may exist in the input and will be passed as is, which is different to what UTF-8 implementations choose (Rust is strict with this, Go lets you tear code points arbitrarily).

We, as a community, can do better than to jump to immediate criticism of this type.

gnabgib 798 days ago

If you (a consuming dev) want the world's smallest (in your code) - use the .net built in parser[0]. Bonus, it's RFC4180 compliant.

If you (competing/learning) want to write the world's smallest (code golf style)... this isn't it, and has some weird superfluous lines (if that's your measure - per the original question).

If you (learning) want to write an efficient parser.. this isn't it. You don't need a StringBuilder, you can seek the Stream to collect the (already formed) strings directly from source vs char-by-char memory copy and rebuild. Yes; that limits your stream choices, but since the example/tests only use FileStreams (which are seekable) you might not come across other kinds. If you need to use un-seekable streams, then you'll need to use a large enough buffer.

[0]: https://learn.microsoft.com/en-us/dotnet/api/microsoft.visua...

neonsunset 798 days ago

This is not a correct link (it refers to VB.NET). There are better parsers out there (Sep).

I'm not sure what is your point but it certainly misses the idea behind this HN submission and makes me sad as it would be nice to see words of encouragement in .NET submissions here instead.

vilark 797 days ago

The unit tests have an emoji test (which uses a surrogate pair). I thought I would have to use Runes, but it's not necessary. https://github.com/kjpgit/SmallestCSVParser/blob/master/Smal...

Jerrrry 798 days ago

You imply that a string, reversed, would have the same length as the original.

This is not true.

esdf 798 days ago

Where are they implying this and why would the strings not have the same length? Is there normalization implied somewhere?

Jerrrry 798 days ago

If they weren't reversing it, what other operation would separate grapheme clusters?

Someone 797 days ago

> Having StringBuilder be a private field on the parser instance is not an issue either - it is simply reused.

It doesn’t matter for this API, but it is a code smell. It makes the class not reentrant.

Talking of the API, I would make it simpler to use and more idiomatic by making the entire public API

   static IEnumerable<List<String>> parse(StreamReader sr)

That call would store the parser state (currently just the StreamReader and that reused StringBuilder) in a private inner class. There would not be a constructor of the publicly visible class, removing that code smell.

vilark 797 days ago

I will add a micro benchmark to see if the `yield return` is slowing things down, compared to just calling _sb.Add() inside Read*(). I will also see if it looks cleaner that way. To be honest, the `yield return` is currently in there just because I thought it's "cool".

vilark 796 days ago

30% performance improvement after removing the `yield return`, and readability is probably better too.

jayd16 797 days ago

It's good practice to throw an exception from your own namespace if you're writing a library.

You don't want to expose an implementation detail like some specific exception as part of your public API and have to worry about breaking that later.

You could overload some built in exception but IMO that's not the best practice. You muddy your API and a caller has to wrap your exception if they want to bubble it up and catch it specifically, anyway.

taspeotis 798 days ago

> Why not use ... a plain Exception.

It is forbidden.

https://learn.microsoft.com/en-us/dotnet/standard/exceptions...

> Exception ... None (use a derived class of this exception).

https://learn.microsoft.com/en-us/dotnet/standard/design-gui...

> DO NOT throw System.Exception or System.SystemException.

throwaway2990 798 days ago

It’s a guideline not to. It’s not a hard rule or forbidden.

recursive 798 days ago

I'm not seeing where it says not to throw InvalidOperationException.

TillE 797 days ago

InvalidOperationException means "the object is in an inappropriate state". That does not describe a parse error.

C# conventions for exceptions are admittedly a bit confusing. There are a handful of very specific scenarios where you're supposed to use a built-in exception (most commonly ArgumentException). For everything else, you want to define your own type.

recursive 797 days ago

The state of the stream, such that it's pointing to an illegal character, actually does seem to be invalid though. Maybe this is an overly pedantic argument. I've been writing quite a bit of c# for over a decade and have basically been doing this the whole time. I thought that I knew how to use exceptions, but it seems I do not.

flowerlad 797 days ago

If you want to recover from CSV errors you should ideally throw InvalidCSVException.