Hacker News new | ask | show | jobs
by brucethemoose2 967 days ago
Are y'all doing anything in particular to stretch the context out? Just plain old rope scaling?

While it seems to be a base model, I'm pretty excited about coming finetunes on this one: https://huggingface.co/SciPhi/SciPhi-Mistral-7B-32k