|
|
|
|
|
by ggerganov
1250 days ago
|
|
I was recently playing with the GPT-2 and GPT-J models. Results are often non-sensical for any practical purposes, but I think can be used for making something fun - similar to your IRC bot idea. If you are interested in running these models yourself without having a beefy GPU, you can try my custom inference implementation. It's in pure C/C++ without any 3rd party dependencies, runs straight on the CPU and builds very easily. I think it is relatively well optimised. For example, on a MacBook M1 Pro I can run GPT-2 XL (1.5B params) at 42ms/token and GPT-J / GPT-JT (6B params) at 125ms/token. Here are a couple of generated examples using GPT-J: https://github.com/ggerganov/ggml/tree/master/examples/gpt-j These are examples using zero-shot prompt where the model auto-completes a text given a starting prompt. You can try to make a conversation bot with a few-shot prompt, but it's not great. Probably the model needs some fine-tuning for that to become feasible. |
|
Oddly enough any processing delay is good in an "AI" chat bot, within reason, makes it feel more natural rather than getting a response ping instantly. Chat version of uncanny valley or something, haha.
Something it also did in Markov form was pick randomly from the longest words in the sentence it had decided to reply to, build the rest of it from that, then run itself "backwards" from the picked word to a sentence starter word it knew.
Thank you for the reply! Looking forward to some tinkering.