| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by 098799 942 days ago
	This is quite intriguing, mostly because of the author. I don't understand very well how llamafiles work, so it looks a little suspicious to just call it every time you want completion (model loading etc), but I'm sure this is somehow covered withing the llamafile's system. I wonder about the latency and whether it would be much impacted if a network call has been introduced such that you can use a model hosted elsewhere. Say a team uses a bunch of models for development, shares them in a private cluster and uses them for code completion without the necessity of leaking any code to openai etc.

2 comments

jart 942 days ago

I've just added a video demo to the README. It takes several seconds the first time you do a completion on any given file, since it needs to process the initial system prompt. But it stores the prompt to a foo.cache file alongside your file, so any subsequent completions start generating tokens within a few hundred milliseconds, depending on model size.

link

098799 942 days ago

Thanks, this showcases the product very well.

Looks like I won't use it though, cause I like how Microsoft's copilot and it's implementations in emacs work: suggest completions with greyed out text after cursor, in one go, without the need to ask for it and discard it if it doesn't fit. Just accept the completion if you like it. For reference: https://github.com/zerolfx/copilot.el

That, coupled with speed, makes it usable for slightly extended code completion (up to one line of code), especially in a highly dynamic programming languages that have worse completion support.

link

jart 942 days ago

Fair enough. Myself on the other hand, I want the LLM to think when I tell it to think (by pressing the completion keystroke) and I want to be able to supervise it while it's thinking, and edit out any generated prompt content I dislike. The emacs-copilot project design lets me do that. While it might not be great for VSCode users, I think what I've done is a very appropriate adaptation of Microsoft's ideas that makes it a culture fit for the GNU Emacs crowd, because Emacs users like to be in control.

link

098799 942 days ago

While I understand the general sentiment, I don't understand the specific point. After all, company-mode and it's numerous lsp-based backends are often used as an _unprompted_ completion (after typing 2 or 3 characters) which the user has the option to select or move on. It's the first time I hear of this being somehow against the spirit of GNU. Would you argue this is somehow relinquishing control? I like it, since it's very quick and cheap, I don't mind it running more often than I use it, because it saves me the keyboard clicks to explicitly ask for completion.

FYI I'm not trying to diminish your project, and I'm glad you've made something which scratches your exact itch. I'm also hopeful others will like it.

link

csdvrx 942 days ago

> Would you argue this is somehow relinquishing control? I like it, since it's very quick and cheap, I don't mind it running more often than I use it, because it saves me the keyboard clicks to explicitly ask for completion.

I can't answer for others, but personally I don't like the zsh-like way to "show the possible completions in dark grey after the cursor" because it disrupts my thoughts.

It's pull vs push: whether on the commandline or using an AI, I want the results only when I feel I need them - not before.

If they are pushed into me (like the mailbox count, or other irrelevant parameters), they are distracting and interrupting my thoughts.

I love optimization and saving a few clicks, but here the potential for distraction during an activity that requires intense concetration would be much worse.

link

vidarh 941 days ago

I don't mind a single completion so much, as long as there's a reasonable degree of precision there. But otherwise I agree with you. I feel like they're only useful if you start typing without knowing what you want to do or how to do it, but if that is the case I know that is the case. Having a keypress to turn on that behavior temporarily just for that might not be so bad.

link

vidarh 941 days ago

It's a massive distraction to me, and I refuse to have it turned on anywhere I can turn it off and will actively choose away software that forces it on me.

I can somewhat accept it showing an option if 1) it's the only one, 2) it's not rapidly changing with my typing. I know what I want to type before I type it or know I'm unsure what to type. In the former, a completion is only useful if it correctly matches what I wanted to type.

In the latter, what I'm typing is effectively a search query, and then completion on typing might not be so bad, but that's the exception, not the norm.

link

regularfry 942 days ago

Eh, it's a mixed bag. The way Github Copilot offers suggestions means that it's very easy to discover the sorts of things it can autocomplete well, which can be surprising. I've certainly had it make perfect suggestions in places I thought I was going to have to work at it a bit - like, say, thinking I'm going to need to insert a comment to tell it what to generate, pressing enter, and it offering the exact comment I was going to write. Having tried both push and pull modes I found it much harder to build a good mental model of LLM capabilities in pull-mode.

It's annoying when a pushed prediction is wrong, but when it's right it's like coding at the speed of thought. It's almost uncanny, but it gets me into flow state really fast. And part of that is being able to accept suggestions with minimal friction.

link

ParetoOptimal 941 days ago

This seems like tab complete vs autocomplete. The resolution to that has been making it configurable.

Perhaps that would be advantageous here too?

link

vidarh 942 days ago

I agree with that. The constant stream of completions with things like VS Code even without copilot is infuriatingly distracting, and I don't get how people can work like that.

I don't use Emacs any more, but I'll likely take pretty much the same approach for my own editor.

link

ParetoOptimal 941 days ago

Do you find auto complete on type similarly distracting? I do in some contexts but not others.

link

vidarh 941 days ago

Yes, I find it absolutely awful. It covers things I want to see and most keypresses it provides no value. I'm somewhat more sympathetic to UI's if they provide auto-complete in a static separate panel that doesn't change so quickly. It feels to me like a beginner's crutch, but even when I'm working in languages I don't know I'd much rather call it up as needed so I actually get a chance to improve my recall.

link

tarruda 942 days ago

Also not familiar with llamafiles, but if it uses llama.cpp under the hoods, it can probably make use of mmap to avoid fully loading on each run. If the GPU on Macs can access the mmapped file, then it would be fast.

link

jart 942 days ago

Author here. It does make use of mmap(). I worked on adding mmap() support to llama.cpp back in March, specifically so I could build things like Emacs Copilot. See: https://github.com/ggerganov/llama.cpp/pull/613 Recently I've been working with Mozilla to create llamafile, so that using llama.cpp can be even easier. We've also been upstreaming a lot of bug fixes too!

link