Hacker News new | ask | show | jobs
by hfbff 1220 days ago
`pyarrow`'s `read_csv` function[0] has just four default arguments (defaulted to None): 3 option objets and one Memory Pool option.

``` pyarrow.csv.read_csv(input_file, read_options=None, parse_options=None, convert_options=None, MemoryPool memory_pool=None) ```

You can then pass a `ReadOptions`[1] object if needed.

For example:

``` read_options = csv.ReadOptions( column_names=["animals", "n_legs", "entry"], skip_rows=1) csv.read_csv(io.BytesIO(s.encode()), read_options=read_options) ```

You can see how ReadOptions is written on this link [2]. It's interesting they use a `cdef class` from `Cython` for this.

This doesn't solve all issues (the ReadOptions object and the others will inevitably have a bunch of default arguments) but I do think it's safer and it's easier to have a mental map of the things you need to decide and what's decided for you.

[0] https://arrow.apache.org/docs/python/generated/pyarrow.csv.r... [1] https://arrow.apache.org/docs/python/generated/pyarrow.csv.R... [2] https://github.com/apache/arrow/blob/master/python/pyarrow/_...