Hacker News new | ask | show | jobs
by mr_magoo 1161 days ago
I've also been struggling to run anything but the smallest model you have shared on paper space:

import torch from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM

import torch from transformers import pipeline

generate_text = pipeline(model="databricks/dolly-v2-6-9b", torch_dtype=torch.bfloat16, trust_remote_code=True, device=0) generate_text("Explain to me the difference between nuclear fission and fusion.")

Causes the kernel to crash, GPU should be plenty

+-----------------------------------------------------------------------------+ | NVIDIA-SMI 510.73.05 Driver Version: 510.73.05 CUDA Version: 11.6 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Quadro P6000 Off | 00000000:00:05.0 Off | Off | | 26% 45C P8 10W / 250W | 6589MiB / 24576MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

I'm extremely excited to try these models but they are by far the most difficult experience I've ever had trying to do basic inference.

1 comments

I’ve never used Paperspace, so I’ll try to give it a try this weekend. How much RAM do you have attached to the compute. We don’t think it should be any harder to run this via HF pipelines than other similarly sized models, but I’ll look into it.