| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by embedding-shape 124 days ago
	> a near-frontier model Is Kimi K2 near-frontier though? At least when run in an agent harness, and for general coding questions, it seems pretty far from it. I know what the benchmarks say, they always say it's great and close to frontier models, but is this other's impression in practice? Maybe my prompting style works best with GPT-type models, but I'm just not seeing that for the type of engineering work I do, which is fairly typical stuff.

2 comments

crystal_revenge 124 days ago

I’ve been running K2.5 (through the API) as my daily driver for coding through Kimi Code CLI and it’s been pretty much flawless. It’s also notably cheaper and I like the option that if my vibe coded side projects became more than side projects I could run everything in house.

I’ve been pretty active in the open model space and 2 years ago you would have had to pay 20k to run models that were nowhere near as powerful. It wouldn’t surprise me if in two more years we continue to see more powerful open models on even cheaper hardware.

vuldin 124 days ago

I agree with this statement. Kimi K2.5 is at least as good as the best closed source models today for my purposes. I've switched from Claude Code w/ Opus 4.5 to OpenCode w/ Kimi K2.5 provided by Fireworks AI. I never run into time-based limits, whereas before I was running into daily/hourly/weekly/monthly limits all the time. And I'm paying a fraction of what Anthropic was charging (from well over $100 per month to less than $50 per month).

hjordache 124 days ago

Beyond agree. Was spending crazy amounts on Claude and it was sporadic at best. Some moments, Opus was a rockstar, others, it couldn’t solve the simplest of problems. Switched to Kimi K2.5 and honestly didn’t think it would do anything other than destroy my code. Crazy enough, it solved the problem I had in less than 60 seconds and I was hooked. Not to say it doesn’t have issues, it does, started repeating itself over and over, forgets things after so much context, etc, though it writes damn good code when it does work properly and for an absolute fraction of the price Anthropic charges.

cadamsdotcom 124 days ago

Saw you wrote that you moved away from Opus 4.5. If you haven’t tried Opus 4.6, there’s only one number different in the name, but the common experience is it’s significantly better.

Have you tried 4.6 as a comparison to Kimi K2.5?

giancarlostoro 124 days ago

> OpenCode w/ Kimi K2.5 provided by Fireworks AI

Are you just using the API mode?

hjordache 124 days ago

API mode and Kimi k2.5 is currently free on OpenCode. Enjoy!

giancarlostoro 120 days ago

What? Like self hosted or what? Because I'm eerie of using any API services if it's not US based, I don't need all my IP going overseas.

varispeed 124 days ago

Depends what you see as flawless. From my perspective even GPT 5.2 produces mostly garbage grade code (yes it often works, but it is not suitable for anywhere near production) and takes several iterations to get it to remotely workable state.

crystal_revenge 124 days ago

> not suitable for anywhere near production

This is what I've been increasingly understanding is the wrong way to understand how LLMs are changing things.

I fully agree that LLMs are not suitable for creating production code. But the bigger question you need to ask is 'why do we need production code?' (and to be clear, there are and always will be cases where this is true, just increasingly less of them)

The entire paradigm of modern software engineering is fairly new. I mean it wasn't until the invention of the programmable microprocessor that we even had the concept of software and that was less than 100 years ago. Even if you go back to the 80s, a lot of software doesn't need to be distributed or serve a endless variety of users. I've been reading a lot of old Common Lisp books recently and it's fascinating how often you're really programming lisp for you and your experiments. But since the advent of the web and scaling software to many users with diverse needs we've increasingly needed to maintain systems that have all the assumed properties of "production" software.

Scalable, robust, adaptable software is only a requirement because it was previously infeasible for individuals to build non-trivial systems for solving any more than a one or two personal problems. Even software engineers couldn't write their own text editor and still have enough time to also write software.

All of the standard requirements of good software exist for reasons that are increasingly becoming less relevant. You shouldn't rely on agents/LLMs to write production code, but you also should increasingly question "do I need production code?"

munksbeer 123 days ago

This is a very interesting aspect. I've been thinking along these lines.

Consider design patterns, or clean code, or patterns for software development, or any other system that people use to write their code, and reviewers use to review the code. What are they actually for? This question is going to seem bizarre to most programmers at first, because it is so ingrained in us, that we almost forget why we have those patterns.

The entire point is to ensure the code is maintainable. In order to maintain it, we must easily understand it, and and be sure we're not breaking something when we do. That is what design patterns solve, making easier to understand and more maintainable.

So, I can imagine a future where the definition of "production code" changes.

varispeed 124 days ago

> Scalable, robust, adaptable software is only a requirement because it was previously infeasible for individuals to build non-trivial systems for solving any more than a one or two personal problems. Even software engineers couldn't write their own text editor and still have enough time to also write software.

That's a wild assumption. I personally know engineers who _alone_ wrote things like compilers, emulators, editors, complex games and management systems for factories, robots. That was before internet was widely available and they had to use physical books to learn.

embedding-shape 123 days ago

Yeah, that jumped out from me too. Plenty of hackers could write their own text editor + have time to be professional developers to do other things. How do people think most of FOSS actually happened 15-20 years ago? Most of us were hacking on stuff in our free-time, but still having day jobs.

bspinner 124 days ago

In terms of security: yes, everyone needs production code.

e12e 124 days ago

In my mind, "yolo ai" application (throwaway code on one hand, unrestrained assistants on the other) - is a little like better spreadsheets and smart documents were in the 90s; just run macros! Everywhere! No need for developers - just Word an macros!

Then came macro viri - and practically - everyone cut back hard on distributing code via Word and Excel (in favour of web apps and we got the dot.com bubble).

embedding-shape 124 days ago

> it’s been pretty much flawless

So above and beyond frontier models? Because they certainly aren't "flawless" yet, or we have very different understanding of that word.

crystal_revenge 124 days ago

I have increasingly changed my view on LLMs and what they're good for. I still strongly believe LLMs cannot replace software engineers (they can assist yes, but software engineering requires too much 'other' stuff that LLMs really can't do), but LLMs can replace the need for software.

During the day I am working on building systems that move lots of data around where context and understanding of the business problem is everything. I largely use LLMs for assistance. This is because I need the system to be robust, scalable, maintainable by other people and adaptable to large range of future needs. LLMs will never be flawless in a meaningful sense in this space (at least in my opinion).

When I'm using Kimi I'm using it for purely vibe coded projects where I don't look at the code (and if I do I consider this a sign I'm not thinking about the problem correctly). Are these programs robust, scalable, generalizable, adaptable to future use case? No, not at all. But they don't need to be, they need to serve a single user for exactly the purpose I have. There are tasks that used to take me hours that now run in the background while I'm at work.

In this latter sense I say "flawless" because 90% of my requests solve the problem on the first pass, and the 10% of the time where there is some error, it is resolved in a single request, and I don't have to ever look at the code. For me that "don't have to look at the code" is a big part of my definition of "flawless".

mhitza 124 days ago

Your definition of flawless is fine for you and requires a big asterix. But without being called out on it look how your message would have read for someone that's not in the known of LLM limitations, and contributed further to the dissilusionment of the field and the gaslighting that's already going on by big comapnies.

fullstackchris 124 days ago

regardless its been 3 years since the release of chatgpt. literally 3. imagine in just 5 more years how much low hanging (or even big breakthroughs) will get into the pricing, things like quantization, etc. no doubt in my mind the question of "price per token" will head towards 0