https://en.wikipedia.org/wiki/Byte_pair_encoding
You can use HuggingFace's GPT-2 tokenizer as well. (some of OpenAI's GPT-3 notebooks do just that).