Hacker News new | ask | show | jobs
by NitpickLawyer 502 days ago
> vs most things related to hardware or low level work.

counter point:

https://github.com/ggerganov/llama.cpp/pull/11453

> This PR provides a big jump in speed for WASM by leveraging SIMD instructions for qX_K_q8_K and qX_0_q8_0 dot product functions.

> Surprisingly, 99% of the code in this PR is written by DeekSeek-R1. The only thing I do is to develop tests and write prompts (with some trials and errors)

2 comments

A single PR doesn't really "prove" anything. Optimization passes on well-tested narrowly scoped code are something that LLMs are already pretty good at.
I think DeekThink is something different though.

It is able to figure out some things that I know do not have much training data at all.

It is looking at the manual and figuring things out. "That doesn't make sense. Wait, that can't be right. I must have the formula wrong."

I just seen that in the chain of thought.

Nah, in my experience, if there is the slightest error in the first sentence of the chain of thought, it tends to get worse and worse. I've had prompts that would generate a reasonable response in llama, but turn out utter garbage in Deepthink.
But how is this any different from real humans? They are not always right either. Sure, humans can understand things better, but are we really going to act like LLMs can't get better in the next year? And what about the next 6 months? I bet there are unknown startups like Deepseek that can push the frontier further.
The ways in which humans err are very different. You have a sense of your own knowledge on a topic and if you start to stray from what you know you're aware of it. Sure, you can lie about it but you have inherent confidence levels in what you're doing.

Sure, LLMs can improve but they're ultimately still bound by the constraints of the type of data they're trained on and don't actually build world models through a combination of high bandwidth exploratory training (like humans) and repeated causal inference.

at a certain point though, one wonders if you can trust people to accurately report how much is written by an LLM. (not even implying bad faith, but if you're constantly re-reading, selecting and re-combining snippets written by LLMs, it's not really "written" by LLMs in the same way that's implied).
We kinda went through this with images when Photoshop and similar tools appeared. I remember a lot of people asking questions in the late 90s/early 00s in particular about if an image were “real” or not and the distinctions between smart photography and digital compositions. Nowadays we just assume everyone is using such tools as a baseline and genuinely clever photography is now celebrated as an exception. Perhaps ditto with CGI and prop/set making in movies. Unless a director crows about how genuine the effects are, we assume CGI.
Yeah I never know exactly what this means. The pr says for one variant it got in one shot and the other they said took re-prompting 4 to 8 more times.
> at a certain point though, one wonders if you can trust people to accurately report how much is written by an LLM.

That's an interesting thought. I think there are ways to automate this, and some IDEs / tools track this already. I've seen posts by both Google and Amz providing percentages of "accepted" completions in their codebases, and that's probably something they track across codebases automatically.

Also on topic, here's aider's "self written code" statistics: https://aider.chat/HISTORY.html

But yeah I agree that "written by" doesn't necessarily imply "autonomously", and for the moment it's likely heavily curated by a human. And that's still ok, IMO.