| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by fl7305 1167 days ago

Thanks, I'm testing various models myself, and am very interested in hearing about people's experiences with them.

It looks like Alpaca 30billion 4 bit does better when thinking out loud. It starts with some prompt engineering:

  Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.

  User: Hello, Bob.
  
  Bob: Hello. How may I help you today?
  
  User: Please tell me the largest city in England.
  
  Bob: Sure. The largest city in England is London, the capital of the United Kingdom.
  
  User:A glass door has ‘push’ written on it in mirror writing. Should you push or pull it? Please think out loud step by step.
  
  Bob: Hmmm... I'm not sure what to do. Let me think about this for a moment.
  Okay, so the door says 'push', but it is written in mirror writing. That means that if I were to follow the instructions on the door, I would have to push the door in the opposite direction of the way it is written. So, if the door says 'push' and it is written in mirror writing, then I should pull the door instead.

Model info:

  ./main -m ./models/alpaca-lora-30B-ggml/ggml-model-q4_1.bin --temp 0.2  --top_k 30 --top_p 0.30 -c 2048 -b 2048 -n 1024 --keep 1024     --repeat_penalty 1.1   --color -i -r "User:"   -f prompts/chat-with-bob.txt
  main: seed = 1681138223
  llama_model_load: loading model from './models/alpaca-lora-30B-ggml/ggml-model-q4_1.bin' - please wait ...
  llama_model_load: n_vocab = 32000
  llama_model_load: n_ctx   = 2048
  llama_model_load: n_embd  = 6656
  llama_model_load: n_mult  = 256
  llama_model_load: n_head  = 52
  llama_model_load: n_layer = 60
  llama_model_load: n_rot   = 128
  llama_model_load: f16     = 3
  llama_model_load: n_ff    = 17920
  llama_model_load: n_parts = 4
  llama_model_load: type    = 3
  llama_model_load: ggml map size = 23269.46 MB
  llama_model_load: ggml ctx size = 151.25 KB
  llama_model_load: mem required  = 25573.60 MB (+ 3124.00 MB per state)
  llama_model_load: loading tensors from './models/alpaca-lora-30B-ggml/ggml-model-q4_1.bin'
  llama_model_load: model size = 23269.01 MB / num tensors = 543
  llama_init_from_file: kv self size  = 3120.00 MB