|
Here's some example output how fast it is running 13B on a 3090 with a Ryzen 9 5900X In [5]: output = llm("Q: Name the planets in the solar system? A: ", max_tokens=32, stop=["Q:", "\n"], echo=True) llama_print_timings: load time = 209.23 ms
llama_print_timings: sample time = 11.39 ms / 32 runs ( 0.36 ms per token)
llama_print_timings: prompt eval time = 209.16 ms / 15 tokens ( 13.94 ms per token)
llama_print_timings: eval time = 1806.98 ms / 31 runs ( 58.29 ms per token)
llama_print_timings: total time = 3033.91 ms In [6]: print(output)
{'id': '', 'object': 'text_completion', 'created': 1684604167, 'model': './models/Wizard-Vicuna-13B-Uncensored.ggml.q5_1.bin', 'choices': [{'text': 'Q: Name the planets in the solar system? A: 1. Mercury, 2. Venus, 3. Earth, 4. Mars, 5. Jupiter, 6. Saturn', 'index': 0, 'logprobs': None, 'finish_reason': 'length'}], 'usage': {'prompt_tokens': 15, 'completion_tokens': 32, 'total_tokens': 47}} |