elif not isinstance(fn_transform, FunctionType) or not isinstance(fn_transform, LambdaType):
raise TypeError('Transformation parameter should be a function or lambda i.e. fn = lambda x: x.replace(a,b)')
What about using callable()? There's no reason you couldn't use a functor, for example.
#2:
curr = file_handle.read(chunk_size)
if encoding:
curr = curr.decode(encoding)
That assumes a single-byte encoding. Consider a multi-byte encoding where the chunk_size reads only part of the character:
>>> s="ΓΌ"
>>> s.encode("utf8")[:1].decode("utf8")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 0: unexpected end of data
#3:
if chr(terminator) in chunk:
lines = chunk.split(chr(terminator))
Might want to compute chr(terminator) once, rather than re-evaluate it each time.
Since you've already checked for the two cases, set a flag to remember what fn_transform contains. Then branch on that, rather than use the try/except.
Otherwise, consider what happens if one of the callables raises a TypeError because of an internal error, rather than because of an expected structural mismatch.
Thank you @eesmith.
Comments appreciated, and PRs to the repo as well. ;-)
The multi-byte is great catch! I made the wrong assumption, on single byte separators. Perhaps a library limitation if the we want to keep the logic simple. Ideas on the fix?
#1:
What about using callable()? There's no reason you couldn't use a functor, for example.#2:
That assumes a single-byte encoding. Consider a multi-byte encoding where the chunk_size reads only part of the character: #3: Might want to compute chr(terminator) once, rather than re-evaluate it each time.#4:
Since you've already checked for the two cases, set a flag to remember what fn_transform contains. Then branch on that, rather than use the try/except.Otherwise, consider what happens if one of the callables raises a TypeError because of an internal error, rather than because of an expected structural mismatch.