For a Regex approach take a look at the work from Jina.ai who among other things have a chunk/tokenizer [1] and now it's part of a bigger API service [2] also they developed an interesting late interaction (aka ColBERT like) chunking system that fits certain use cases. But the Regex is enough all by itself: