|
|
|
|
|
by crote
619 days ago
|
|
The correct thing to do is to not do it at all. If text is 3rd-party supplied, treat it like an opaque byte sequence. Alternatively, pay a well-trained human to do it by hand. All other options are going to result in edge cases where you're not handling it properly. It's like trying to programmatically split a full name into a first name and a last name: language doesn't work like that. |
|