|
|
|
|
|
by slimsag
607 days ago
|
|
Has anyone worked on making tokens 'clusters of words with specific semantic meaning'? e.g. instead of tokens ['i', 'am', 'beautiful'] having tokens ['I am', 'beautiful'] on the premise that 'I am' is a common set of bytes for a semantic token that identifies a 'property of self'? Or taking that further and having much larger tokens based on statistical analysis of common phrases of ~5 words or such? |
|