Hacker News new | ask | show | jobs
ChatGPT-J: The Privacy-First, Self-Hosted Chatbot Built on GPT-J's Powerful AI (colab.research.google.com)
54 points by jarrell_mark 1193 days ago
3 comments

"Privacy-First", but also working in a colab notebook - meaning running on someone else's machine? That doesn't seem very private.
Download the notebook and run locally?
Yes, the GitHub has the Jupyter .ipynb notebook that can be run locally: https://github.com/jarrellmark/chatgpt-j

And even in Colab, it's privacy first in the sense that user input or model output isn't being sent anywhere. The data is local to your Colab session.

Can this be run locally without beefy GPUs by any chance?
ggml (https://github.com/ggerganov/ggml) has a GPT-J example, the 6B parameter model runs happily on the CPU 16gb of ram and 8 cores at a couple of words per second, no GPUs necessary.

    gptj_model_load: ggml ctx size = 13334.86 MB
    gptj_model_load: memory_size =  1792.00 MB, n_mem = 57344
    gptj_model_load: model size = 11542.79 MB / num tensors = 285
    main: number of tokens in prompt = 12

    An example of GPT-J running on the CPU is shown in Fig. [4](#Fig4

    main: mem per token = 16179460 bytes
    main:     load time =  7463.20 ms
    main:   sample time =     3.24 ms
    main:  predict time =  4887.26 ms / 232.73 ms per token
    main:    total time = 13203.91 ms
There have been CPU implementations of LLAMA (7b parameters, comparable in size) with very impressive performance
I haven't used this yet, but I am currently running GPT-J on my Mac Studio, so I suspect so.
It should work with about 12gb GPU RAM.

I got it to load on a GTX 1070 with 8GB GPU RAM, but then it crashed before it could generate a response.

It needs less RAM than regular GPT-J because the weights are converted to 8-bit