Hacker News new | ask | show | jobs
by mynameisjody 211 days ago
Every time I see an article like this, it's always missing --- but is it any good, is it correct? They always show you the part that is impressive - "it walked the tricky tightrope of figuring out what might be an interesting topic and how to execute it with the data it had - one of the hardest things to teach."

Then it goes on, "After a couple of vague commands (“build it out more, make it better”) I got a 14 page paper." I hear..."I got 14 pages of words". But is it a good paper, that another PhD would think is good? Is it even coherent?

When I see the code these systems generate within a complex system, I think okay, well that's kinda close, but this is wrong and this is a security problem, etc etc. But because I'm not a PhD in these subjects, am I supposed to think, "Well of course the 14 pages on a topic I'm not an expert in are good"?

It just doesn't add up... Things I understand, it looks good at first, but isn't shippable. Things I don't understand must be great?

16 comments

It's gotten more and more shippable, especially with the latest generation (Codex 5.1, Sonnet 4.5, now Opus 4.5). My metric is "wtfs per line", and it's been decreasing rapidly.

My current preference is Codex 5.1 (Sonnet 4.5 as a close second, though it got really dumb today for "some reason"). It's been good to the point where I shipped multiple projects with it without a problem (with eg https://pine.town being one I made without me writing any code).

I feel it sometimes tries to be overly correct. Like using BigInts when working with offsets in big files in javascript. My files are big but not 53bits of mantissa big. And no file APIs work with bigints. This was from Gemini 3 thinking btw
I just whack-a-mole these things in AGENTS.md for a while until it codes more like me.
Coding LLMs were almost useless for me, until my AGENTS.md crossed some threshold of completeness and now they are mostly useful. I now curate multiple different markdown files in a /docs folder, that I add to the context as needed. Any time the LLM trips on something and we figure it out, then I ask it to document it's learnings in a markdown doc, and voila it can do it correctly from then on.
> https://pine.town

how many prompts did it take you to make this?

how did you make sure that each new prompt didn't break some previous functionality?

did you have a precise vision for it when you started or did you just go with whatever was being given to you?

Judging by the site, they don't have insightful answers to these questions. It's broken with weird artifacts, errors, and amateurish console printing in PROD.

https://i.ibb.co/xSCtRnFJ/Screenshot-2025-11-25-084709.png

https://i.ibb.co/7NTF7YPD/Screenshot-2025-11-25-084944.png

I definitely don't have insightful answers to these questions, just the ones I gave in the sibling comment an hour before yours. How could someone who uses LLMs be expected to know anything, or even be human?

Alas, I did not realize I was being held to the standard of having no bugs under any circumstance, and printing nothing to the console.

I have removed the amateurish log entries, I am pitiably sorry for any offense they may have caused. I will be sure to artisanally hand-write all my code from now on, to atone for the enormity of my sin.

It also doesn't seem to work right now.
Yeah, all of the above was a single bug in the plot allocation code, the exception that handled the transaction rollback had the wrong name. It's working again.
> how many prompts did it take you to make this?

Probably hundreds, I'd say.

> how did you make sure that each new prompt didn't break some previous functionality?

For the backend, I reviewed the code and steered it to better solutions a few times (fewer than I thought I'd need to!). For the frontend, I only tested and steered, because I don't know much about React at all.

This was impossible with previous models, I was really surprised that Codex didn't seem to completely break down after a few iterations!

> did you have a precise vision

I had a fairly precise vision, but the LLM made some good contributions. The UI aesthetic is mostly the LLM, as I'm not very good at that. The UX and functionality is almost entirely me.

did you not run into this problem described by ilya below

https://www.youtube.com/watch?v=aR20FWCCjAs&list=PLd7-bHaQwn...

this has been my experience purely vibecoding. i am surprised it works well for others.

btw the current production bug. how did you discover that and why it slip out. looks like site wasn't working at all when you posted that comment?

> did you not run into this problem described by ilya below

I used to run into a related issue, where fixing a bug would add more bugs, to the point where it would not be able to progress past a given codebase complexity. However, Codex is much better at not doing that. There are some cases where the model kept going back and forth between two bugs, but I discovered that that was because I had misunderstood the constraints and was telling the model to do something impossible.

> how did you discover that and why it slip out.

Sentry alerted me but I thought it was an edge case, and I didn't pay attention until hours later.

I use a spiral allocation algorithm to allocate plots, so new users are clustered around the center. Sometimes plots are emptied (when the user isn't active), so you can have gaps in the spiral, which the algorithm tries to fill, and it's meant to go to the next plot if the current one can't be assigned.

For one specific plot, however, conditions were such that the database was giving an integrity error. The exception handling code that was supposed to handle that didn't take into account that it needed to roll back before resuming, so the entire request failed, instead of resuming gracefully. Just adding an atomic() context manager fixed it.

> looks like site wasn't working at all when you posted that comment?

It was working for a few hundreds (thousands?) of visitors, then the allocation code hit the plot that caused the bug, and signup couldn't proceed after that.

> Just adding an atomic() context manager fixed it.

ok looks like you are intimately familiar with the code that is being produced and are AI as code generator rather than pure vibe coding. That makes sense to me.

Btw did AI add that line when you explained what the error was or did you add that in manually.

It's not really any different in my experience
Stochastic parrot? Autocomplete on steroids? Fancy autocorrect? Bullshit generator? AI snake oil? Statistical mimicry?

You don't hear that anymore.

Feels like whole generation of skeptics evaporated.

I certainly hold those opinions still, because the models still have yet to prove they are anything worth a person's time. I don't bother posting that because there's no way an AI hype person and I are ever going to convince each other, so what's the point?

The skeptics haven't evaporated, they just aren't bothering to try to talk to you any more because they don't think there's value in it.

So you don't even try LLMs regularly?

And whats with everything else regarding ML progress like image generation, 3d world generation etc.?

I vibe coded plenty of small things i haven't ever had the time for them. You don't have anything which you wanted to do and can fit in a single page html application? It can even use local storage etc.

I think the stochastic part is true and useless. It can be applied to anyone or anything. Yes, the models give you probabilities, but any algorithm gives you probabilities (only zero or one for deterministic ones). You can definitely view the human mind as a complex statistical model of the world.

Now, that being said, do I think they are as good as a skilled human on most things? No, I don't. My trust issues have increased after the GPT-5 presentation. The very first question was to showcase its "PhD-level" knowledge, and it gave a wrong answer. It just happened to be in a field I know enough about to notice, but most didn't.

So, while I think they can be considered as having some form of intelligence, I believe they have more limits than a lot of people seem to realise.

> Feels like whole generation of skeptics evaporated.

https://www.youtube.com/watch?v=aR20FWCCjAs&list=PLd7-bHaQwn...

Ilya Sutskever this week.

Have you also looked at the rest 1h 36m or just those out of context 30s?
have you ever made a non annoying comment
Maybe your bubble flew away from those voices? I see them all the time, and am glad.
still haven't see something proving it was not autocomplete on steroids or statistical mimicry
It is all those things.

The Bitter Lesson is with enough VC subsidised compute those things are useful.

Those echoes have grown louder over the past year or so. The only way you've heard less of it is if you buried your head under sand.
It is all those things. It consistently fails to make truly novel discoveries, everything it does is derived from something it trained on from somewhere.

No point in arguing about it though with true believers, they will never change their minds.

Have you tried Gemini 3 yet? I haven't done any coding with it, but on other tasks I've been impressed compared to gpt 5 and Sonnet 4.5.
It's very good but it feels kind of off-the-rails in comparison to Sonnet 4.5 - at least with Cursor it does strange things like putting its reasoning in comments that are about 15 lines long, deleting 90% of a file for no real reason (especially when context is reaching capacity) and making the same error that I just told it not to do.
The computer science field is going to be an absolute shitshow within 5 years (it already kinda is). On one side you'll have ADHD dog attention span zoomers trying out all these nth party model apis and tools every 5 seconds (switching them like socks, insisting the latest one is better, but ultimately producing the same slop) and on the other side you'll have all these applied math gurus squeezing out the last bits of usable AI compute on the planet... and nothing else.

We used to joke that "The internet was a mistake.", making fun of the bad parts... but LLMs take the fucking cake. No intelligent beings, no sentient robots, just unlimited amounts of slop.

The tech basically stopped evolving right around the point of it being good enough for spam and slop, but not going any further, there are no cures no new laws of physics or math or anything else being discovered by these things. All AI use in science I can see is based on finding patters in data, not intelligent thought (as in novel ideas). What a bust.

Completely disagree, what i see agentic coding agents do in combination with LLMs is seriously mind-blowing. I don't care how much knowledge is compressed into an LLM. What is way more interesting is what it does when it misses some knowledge. I see it come up with a plan to create the knowledge by running an experiment (running a script, sometimes asking me to run a script or try something), evaluating the output, and then replan based on the output. Full Plan-Do-Check-Act. Finding answers systematically to things you don't know is way more impressive than remembering lots of stuff.
I don't see a big difference to humans, we are saying many unreasonable things too, validation is necessary. If you use internet, books or AI it is your job to test their validity. Anything can be bullshit, written by human or AI.

In fact I fear the humans optimize for attention and cater to the feed ranking Algorithm too much, while AI is at least trying to do a decent job. But with AI it is the responsibility of the user to guide it, what AI does depends on what the user does.

There are some major differences though. Without using these tools, individual are pretty limited in how much bullshit they can output for many reasons, including they are not mere digital puppet without need to survive in society.

It’s clear pro-slavery-minded elitists are happy to sell the speech that people should become "good complement to AI", that is even more disposable as this puppets. But unlike this mindless entities, people have will to survive deeply engraved as primary behavior.

Humans can output serious amounts of unproven bullshit, e.g., 3000 incompatible gods and all the religions that come with them...
The worst part is when the AI spits out dogshit results --people show up at lightspeed in the comments to say how "you're not using it right" / "try this other model, it's better"

Anecdotally, the people I see the most excited about AI are the people that don't do any fucking work. I can create a lot of value with plain ol' for loop style automation in my niche. We're stil nowhere near the limit of what we can do with automation, that I don't give a fuck about what AI can do. Bruh in windows 10 copy and fuckin paste doesn't work for me anymore, but instead of fixing that they're adding AI

LLMs help a lot of users with making FOR loops and things like that. At least it's been the case for me, I'd never tried to use PowerShell before but with a bit of LLM guidance was able to cobble together some useful (for me) one-liner commands to do things like "use this CSV of file names and pixel locations, and make cropped PNG thumbnails of these locations from these images".

Stuff like that which regular users often do by hand, they can ask an LLM for the command (usually just a few lines of a scripting language if they only know the magic words to use).

The only people I see complaining about AI are those that have the most to lose.
Using it isn't optional though, its forced through corporate policy. If my boss would shut up about it that would be enough for me
My wife and I are both paid to work on AI products and we both think the whole thing’s only sorta useful in-fact. Not nothing, but… not that much, either.

I’m not worried about AI taking our jobs, I’m worried about the market crash when the reality of the various failed (… to actually reduce payroll) or would’ve-been-cheaper-and-better-without-AI initiatives the two of us have been working on non-stop since this shit started break through the hype of investment and the music stops.

The LLM only reflects the input of what its fed. If the results are unintelligent then so is the input.
It's been three years of amazing use cases and discoveries, and in those same years we got things like Ozempic. You can be skeptical of all the hyped things that are said that may be exaggerated without negating the good side.
The patent for Ozempic was filed nearly 20 years ago: https://patents.google.com/patent/US8129343B2/en?oq=US812934...

Ozempic’s FDA approval was in 2017, the same year transformers were invented.

Whatever you can place at LLMs, GLP-1’s aren’t one of them.

Ozempic has nothing to do with LLMs, so I'm a bit confused about the point you're making here?
My chatbot told me that chatbots invented drugs.
Only a tiny bit, but I should. When you say GPT-5, do you mean 5.1? Codex or regular?
Sorry, yeah, 5.1 regular chatbot.
Ahh, try 5.1 Codex (with codex cli), it's much better, I've found.
imo don't waste your time for coding with Gemini 3. Perhaps worth it if it's something Claude's not helping with, as Gemini 3's reasoning is very good supposedly.
Maybe the wtfs per line are decreasing because these models aren't saying anything interesting or original.
No, it's because they write correct code. Why would I want interesting code?
Oh, my bad. I still had the comment someone made about the model writing phd-level paper in my head and didn't realize you were talking about code.

Fully agree.

:D made my day
I guess you have a couple of options.

You could trust the expert analysis of people in that field. You can hit personal ideologies or outliers, but asking several people seems to find a degree of consensus.

You could try varying tasks that perform complex things that result in easy to test things.

When I started trying chatbots for coding, one of my test prompts was

    Create a JavaScript function edgeDetect(image) that takes an ImageData object and returns a new ImageData object with all direction Sobel edge detection.  
That was about the level where some models would succeed and some will fail.

Recently I found

    Can you create a webgl glow blur shader that takes a 2d canvas as a texture and renders it onscreen with webgl boosting the brightness so that #ffffff is extremely bright white and glowing,
Produced a nice demo with slider for parameters, a few refinements (hierarchical scaling version) and I got it to produce the same interface as a module that I had written myself and it worked as a drop in replacement.

These things are fairly easy to check because if it is performant and visually correct then it's about good enough to go.

It's also worth noting that as they attempt more and more ambitious tasks, they are quite probably testing around the limit of capability. There is both marketing and science in this area. When they say they can do X, it might not mean it can do it every time, but it has done it at least once.

> You could trust the expert analysis of people in that field

That’s the problem - the experts all promise stuff that can’t be easily replicated. The promises the experts send doesn’t match the model. The same request might succeed and might fail, and might fail in such a way that subsequent prompts might recover or might not.

The experts I am talking about trusting here are the ones doing the replication, not the ones making the claims.
That's how working with junior team members or open source project contributors goes too. Perhaps that's the big disconnect. Reviewing and integrating LLM contributions slotted right into my existing workflow on my open source projects. Not all of them work. They often need fixing, stylistic adjustments, or tweaking to fit a larger architectural goal. That is the norm for all contributions in my experience. So the LLM is just a very fast, very responsive contributor to me. I don't expect it to get things right the first time.

But it seems lots of folks do.

Nevertheless, style, tweaks, and adjustments are a lot less work than banging out a thousand lines of code by hand. And whether an LLM or a person on the other side of the world did it, I'd still have to review it. So I'm happy to take increasingly common and increasingly sophisticated wins.

Junior's grow into mids, and eventually into seniors. OSS contributor's eventually learn the codebase, you talk to them, you all get invested in the shared success of the project and sometimes you even become friends.

For me, personally, I just don't see the point of putting that same effort into a machine. It won't learn or grow from the corrections I make in that PR, so why bother? I might as well have written it myself and saved the merge review headache.

Maybe one day it'll reach perfect parity of what I could've written myself, but today isn't that day.

I wonder if that difference in mentality is a large part of the pro- vs anti-AI debate.

To me the AI is a very smart tool, not a very dumb co-worker. When I use the tool, my goal is for _me_ to learn from _its_ mistakes, so I can get better at using the tool. Code I produce using an AI tool is my code. I don't produce it by directly writing it, but my techniques guide the tool through the generation process and I am responsible for the fitness and quality of the resulting code.

I accept that the tool doesn't learn like a human, just like I accept that my IDE or a screwdriver doesn't learn like a human. But I myself can improve the performance of the AI coding by developing my own skills through usage and then applying those skills.

> It won't learn or grow from the corrections I make in that PR, so why bother?

That does not match my experience. As the codebases I've worked with LLMs on become more opinionated and stylized, it seems to to a better job of following the existing work. And over time the models have absolutely improved in terms of their ability to understand issues and offer solutions. Each new release has solved problems for me that the previous ones have struggled with.

Re: interpersonal interactions, I don't find that the LLM has pushed them out or away. My projects still have groups of interested folk who talk and joke and learn and have fun. What the LLMs have addressed for me in part is the relative scarcity of labor for such work. I'm not hacking on the Linux Kernel with 10,000 contributors. Even with a dozen contributors, the amount of contributed code is relatively low and only in areas they are interested in. The LLM doesn't mind if I ask it to do something super boring. And it's been surprisingly helpful in chasing down bugs.

> Maybe one day it'll reach perfect parity of what I could've written myself, but today isn't that day.

Regardless of whether or not that happens, they've already been useful for me for at least 9 months. Since O3, which is the first one that really started to understand Rust's borrow checker in my experience. My measure isn't whether or not it writes code as well as I do, but how productive I am when working with it compared to not. In my measurements with SLOCCount over the last 9 months, I'm about 8x more productive than the previous 15 years without (as long as I've been measuring). And that's allowed me to get to projects which have been on the shelf for years.

This article by an AI researcher I happen to have worked with neatly sums up feelings I've had about comments like yours: https://medium.com/@ahintze_23208/ai-or-you-who-is-the-one-w...

> Things I don't understand must be great?

Couple it with the tendency to please the user by all means and it ends up lieing to you but you won’t ever realise, unless you double check.

> Couple it with the tendency to please the user by all means

Why aren't foundational model companies training separate enterprise and consumer models from the get go?

I think they get to that a couple of paragraphs later:

> The idea was good, as were many elements of the execution, but there were also problems: some of its statistical methods needed more work, some of its approaches were not optimal, some of its theorizing went too far given the evidence, and so on. Again, we have moved past hallucinations and errors to more subtle, and often human-like, concerns.

Well, that's why people still have jobs but I appreciate the idea of the post that the neat demo was a coherent paragraph or silly poem. The silly poems were all kind of similar, not very funny, and the paragraphs were a good start but I wouldn't use them for anything important.

Now the tightrope is a whole application or a 14 page paper and the short pieces of code and prose are now professional quality more often than not. That's some serious progress.

The author goes into the strengths and weaknesses of the paper later in the article.
I keep trying out different models. Gemini 3 is pretty good. It’s not quite as good at one shotting answers as Grok but overall it’s very solid.

Definitely planning to use it more at work. The integrations across Google Workspace are excellent.

The author actually discusses the results of the paper. He's not some rando but a Wharton Professor and when he is comparing the results to a grad student, it is with some authority.

"So is this a PhD-level intelligence? In some ways, yes, if you define a PhD level intelligence as doing the work of a competent grad student at a research university. But it also had some of the weaknesses of a grad student. The idea was good, as were many elements of the execution, but there were also problems..."

I think the point is we’re getting there. These models are growing up real fast. Remember 54% of US adults read at or below the equivalent of a sixth-grade level.
> Remember 54% of US adults read at or below the equivalent of a sixth-grade level.

The sane conclusion would be to invest in education, not to dump hundreds of billions of llms, but ok

Education is not just a funding issues. Policy choices, like making it impossible for students to fail which means they have no incentive to learn anything, can be more impactful.
But holy shit is it also a funding issue when teachers make nothing.
As far as I understand it, the problem isn’t that teachers are shit. Giving more money would bring in better teachers, but I don’t know that they’d be able to overcome the other obstacles
> Giving more money would bring in better teachers, but I don’t know that they’d be able to overcome the other obstacles

Start with the easiest thing to control? Of giving more money and see what it does?

We seem to believe in every other industry that to get the best talent pay a high salary salary, but for some reason we expect teachers to do it out of compassion for the children while they struggle to pay bills. It's absurd.

Probably one of the single most important responsibilities of a society is to prepare the next generation, and it pays enormous return. But because we can't measure it with quarterly profits we just ignore it.

The rate of return on providing society with as good education is insane.

I think you need to research the issue more. Teachers are well remunerated in most states. Educational outcomes are largely a function of policy settings. Have a look at the amazing turnaround in literacy rates in Mississippi after they started teaching phonics again.
I date a lot of teachers. My last one was in the San Ramon (CA) Valley School district, she makes about $90k a year at 34 years old. Talking to her basically makes me want to homeschool my kids to make sure someone like her isn't their teacher. Paying teachers more won't do ANYTHING until we become a lot more selective about who gets to become and stay a teacher. It can't be like most government jobs where getting it is like winning the lottery and knowing you can make above market money for below market performance.
There is so much wrong with this. You cannot judge the class of teachers based on a small sample of your taste in women. You didn't actually communicate anything materially wrong with her. You listed a high income area to make us think teachers are overpaid but we have no insight by default into median income in the area or her qualifications.

Lastly its entirely impossible to attract better candidates without more money its just not how the world works.

For reference the median household income in san ramon is about 200k so 2 teachers would be below average. A cop with her experience in the same town makes 158k

Its interesting to hear you say that you date a lot of teachers while simultaneously holding this view of their level of competence. Or just not the ones you date?
If teachers made as much as half the people on this site, perhaps things would be better. 90k in San Ramon is more or less the median wage [1]. It's not _that_ much money.

[1] https://en.wikipedia.org/wiki/San_Ramon,_California#2020_cen...

This is so basic that I feel I shouldn't need to say it, but you can't be selective if you don't pay. You take what you get.

The reason teaching became largely a women's profession when they used to be exclusively men is because we wanted to make education universal and free so we did that by paying less, and women who needed to work also had to take what they could get. The reason it has become a moron's profession is because we have made it uniquely undesirable. If you think that teachers should be amazing and imminently qualified and infinitely safe to have around children, pay them like programmers.

Instead, the middle-class meme is to pay them nothing, put them in horrible conditions, and resent them too. Typical "woman's work" model.

I guess the problem isn't only the pay, it's the opportunity cost which only a certain kind of people are willing to pay for the whole career. If you select those people out... you're left with zero candidates.
It's not just investing in education, it's using tools proven to work. WA spends a ton of money on education, and on reading Mississipi, the worst state for almost every metric, has beaten them. The difference? Mississipi went hard on supporting students and using phonics which are proven to work. WA still uses the hippie theory of guessing words from pictures (https://en.wikipedia.org/wiki/Whole_language) for learning how to read.
Investing in education is a trap because no matter how much money is pumped into the current model, it’s not making a difference.

We need different models and then to invest in the successes, over and over again…forever.

Because education alone in a vacuum won't fix the issues.

Even if the current model was working, just continuing to invest money in it while ignoring other issues like early childhood nutrition, a good and healthy home environment, environmental impacts, etc. will just continue to fail people.

Schooling alone isn't going to help the kid with a crappy home life, with poor parents who can't afford proper nutrition, and without the proper tools to develop the mindset needed to learn (because these tools were never taught by the parents, and/or they are too focused on simply surviving).

We, as a society, need to stop allowing people to be in a situation where they can't focus on education because they are too focused on working and surviving.

Exactly correct.
It's so hilarious to look at 10k years of education history and be like "Nah, funding doesn't make a difference."

Incredible.

The US already spends more per student than almost any other country (5th globally) and the outcomes are getting constantly worse.

It’s not a funding problem.

A lot of that funding in the US goes to pay teachers money they then use to pay for health insurance -- which in other countries is often provided by the tax base at large and not counted as an education expense.
That's half true. You have to think about cost of living, you can't just compare across the globe like that. And especially opportunity cost. In the US, teacher pay lags behind similarly educated professionals.

But you're right after a certain point other factors matter more than simple $ per student. Unfortunately one of those factors is teacher pay <=> teacher quality.

It's incredibly unfair that you get to just lie online or worse that you actually believe what you're saying.
Education funding is highest in places that have the worst results. Try again.
Yes for example is its very well known that Angola has a top tier education system while Swedish people can barely read or count
Well, if you actually look at the data:

https://nces.ed.gov/programs/coe/indicator/cmd/education-exp...

We spend far more than most countries per pupil, for much poorer results

https://worldpopulationreview.com/country-rankings/pisa-scor...

It's pretty clear that while spending is a factor, it's probably not the biggest one. The countries that seem to do best are those that combine adequate funding with real rigor in instruction.

I posted elsewhere you can't just compare across the globe like that. You have to think about cost of living and especially opportunity cost. In the US, teacher pay lags behind similarly educated professionals, which means they get stretched thin and the best with options will leave.
New Mexico (where I live) is dead last in education out of all 50 states. They are currently advertising for elementary school teachers between 65-85K per year. Summers off. Nice pension. In this low cost of living state that is a very good salary, particularly the upper bands.

I don't think it's a money issue at this point.

Because they use whole language theory (https://en.wikipedia.org/wiki/Whole_language) instead of phonics for teaching how to read.
Just flatly not true.
In theory yeah, but in practice 54% will also vote against funding education. Catch-22.
In WA they always pass levies for education funding at local and state level however results are not there.

Mississipi is doing better on reading, the biggest difference being that they use phonics approach to teaching how to read, which is proven to work, whereas WA uses whole language theory (https://en.wikipedia.org/wiki/Whole_language), which is a terrible idea I don't know how it got traction.

So the gist of it, yes, spend on education, but ensure that you are using the right tools, otherwise it's a waste of money.

First time hearing of whole language theory, and man, it sounds ridiculous. Sounds similar to the old theory that kids who aren't taught a language at all will simply speak perfect Hebrew.
I almost agree, but too many people will take that to mean “we need to do more with less”. It’s a feature of capitalism. Teachers are stretched thin in most places, that’s always the main problem. Are WA teachers compensated about the same as other similarly educated professionals? As cops?

Hire smart motivated people, pay them well, leave them alone, they’ll figure this one out. It’s not hard, anyone can google what Finland does.

> WA teachers compensated about the same as other similarly educated professionals

WA teachers are among the best salaries in the country for being a teacher (within top 5). You start at around 84k$ I think, 90k$+ if you have a masters degree, at least in Seattle, and it can scale up to 150k$ with enough seniority, as well as pension plan.

> Hire smart motivated people, pay them well, leave them alone, they’ll figure this one out. It’s not hard, anyone can google what Finland does.

The problem is not the teachers themselves, it's what the system tells them to teach. You can have the best teacher in the world, but if they use BS curricula students will unfortunately learn BS.

Think about it, you can have brilliant engineers, but an idiot ceo, and the company will fail despite the engineers.

Not true, most people are not upper-middle class anti-tax wackos. They benefit from those people being taxed.
In my own social/family circle, there’s no correlation between net worth and how someone leans politically. I’ve never understood why given the pretty obvious pros/cons (amount paid in taxes vs. benefits received)
That's interesting b/c I see it very obviously in mine with the partial exception of myself. The more professional and private sector their job or spouse, the more conservative they are. E.g a real estate lawyer is conservative, a lawyer for the state is liberal, a software engineer is a communist, and the musicians are libertarians or socialist-lite.

Professional or artisanal work are petit bourgeois positions, so are flexible in their outlook regardless of income.

The electorate in the U.S. commonly votes against its own interests.
Pithy, but not true.
That's why you phrase it as "woke liberals turning your children gay!"

In USA K-12 education costs about $300k

350 million people, want to get 175 million of them better educated, but we've already spent $52 trillion dollars on educating them so far

The people most vociferously for conservative values are middle class, small business owners, or upper class, though the true upper class are libertine (notice who participated in the Epstein affair). The working class is filled with all kinds of very diverse people united by the fact they have to work for a living and often can't afford e.g. expensive weddings. Some of them are religious, a whole bunch aren't. It's easy to be disillusioned with formal institutions that seem to not care at all about you.

Unfortunately, a lot of these people have either concluded it is too difficult to vote, can't vote, or that their votes don't matter (I don't think they're wrong). Their unions were also destroyed. Some of them vote against their interests, but it's not clear that their interests are ever represented, so they vote for change instead.

You don't need an educated workforce if you have machines that can do it reliably. The more important question is: who will buy your crap if your population is too poor due to lack of well paying jobs? A look towards England or Germany has the answer.
The top 10% of households already account for more than half of consumer spending in the US
Hmmm, that doesn't seem right. I'm having a hard time finding an actual consumption number, but I am confident it's well below 50%.

The top 10% of households by wage income do receive ~50% of pre-tax wage income, but:

1) our tax system is progressive, so actual net income share is less

2) there's significant post-wage redistribution (social security/medicaid)

3) that high income households consume a smaller percent of their net income is a well established fact.

Unfortunately, people are born with a certain intellectual capacity and can't be improved beyond that with any amount of training or education. We're largely hitting peoples' capacities already.

We can't educate someone with 80 IQ to be you; we can't educate you (or I) into being Einstein. The same way we can't just train anyone to be an amazing basketball player.

From what I've read, IQ is one of the more heritable traits, but only about 50% of one's intelligence is attributable to one's genes.

That means there are absolutely still massive benefits to be had in trying to ensure that kids grow up in safe, loving homes, with proper amounts of stimulation and enrichment, and are taught with a growth, not a fixed potential mindset.

Sad to say, but your own fixed mindset probably held you back from what you could truly achieve. You don't have to be Einstein to operate on the cutting edge of a field, I think most nobel prize winners have an iq of ~ 120

This is extremely not settled science. Education in fact does improve IQ and we don't know how fixed intelligence is and how it responds to different environmental cues.
Other countries have better outcomes. I doubt it's just because of the genetics.
https://en.wikipedia.org/wiki/Comparative_advantage

Modern society benefits a lot from specialization. It's like the dumbest kid in France is still better at French than you.

A question for the not-too-distant future:

What use is an LLM in an illiterate society?

Automatic speech recognition and speech to text models are also growing up real fast.
But will an illiterate person be able to articulate themselves well enough to get the LLM to do what they want, even with a speech interface?

Will they possess the skills (or even the vocabulary) to understand the output?

We won't know for another 20 years, perhaps.

Thinking that speech recognise is a solution to the illiterate is like thinking that low code tools can replace traditional programming tools. The bottleneck is and has always been the cognitive capacity limits of your average human. No interface can solve the issue of humans being illiterate
> What use is an LLM in an illiterate society?

The ability to feign literacy such that critical thought and ability to express same is not a prerequisite.

Absurd question. The correct one is "what use is an illiterate in an LLM society".
> But because I'm not a PhD in these subjects, am I supposed to think, "Well of course the 14 pages on a topic I'm not an expert in are good"?

https://en.wikipedia.org/wiki/Gell-Mann_amnesia_effect

You don't use it that way. You use it to help you build and run experiments, and help you discuss your findings, and in the end helps you write your discoveries. You provide the content, and actual experiments provide the signal.
Like clockwork. Each time someone criticizes any aspect of any LLM there's always someone to tell that person they're using the LLM wrong. Perhaps it's time to stop blaming the user?
If someone says that they can't get a camera to work, you tell them how to fix it, right? I can't think of what other response is appropriate.
Why would their response be appropriate when even the creators of the LLM doesn't clearly state the purpose of their software, yet alone instruct users how to use it? The person I replied to said that this software should be used yo "help you build and run experiments, and help you discuss your findings, and in the end helps you write your discoveries" - I dare anyone to find any mention of this workflow being the "correct" way of using any LLM in the LLM's official documentation.
Validation that cameras will never work and photographs aren't real.
You wouldn't use a screwdriver to hammer a nail. Understanding how to use a tool is part of using the tool. It's early days and how to make the best use of these tools is still being discovered. Fortunately a lot of people are experimenting on what works best, so it only takes a little bit of reading to get more consistent results.
What if the company selling the screwdriver kept telling you your could use it as a hammer? What if you were being bombarded with marketing the hammers are being replaced by screwdrivers?
You can recognise that the technology has a poor user interface and is wrought with subtleties without denying its underlying capabilities. People misuse good technology all the time. It's kind of what users do. I would not expect a radically new form of computing which is under five years old to be intuitive to most people.
> It just doesn't add up... Things I understand, it looks good at first, but isn't shippable. Things I don't understand must be great?

It’s like the Gell-Mann amnesia effect applied to AI. :)

https://en.wikipedia.org/wiki/Gell-Mann_amnesia_effect

This is a variation of the Gell-Mann amnesia effect: https://en.wikipedia.org/wiki/Gell-Mann_amnesia_effect
One could say, the GeLLMann amnesia effect. ( ͡° ͜ʖ ͡°)
Thanks for introducing me this article
Loads of AI chatter is the Murray Gell-Mann Amnesia Effect on steroids
For what it's worth I have been using Gemini 2.5/3 extensively for my masters thesis and it has been a tremendous help. It's done a lot of math for me that I couldn't have done on my own (without days of research), suggested many good approaches to problems that weren't on my mind and helped me explore ideas quickly. When I ask it to generate entire chapters they're never up to my standard but that's mostly an issue of style. It seems to me that LLMs are good when you don't know exactly what you want or you don't care too much about the details. Asking it to generate a presentation is an utter crap shoot, even if you merely ask for bullet points without formatting.
> It's done a lot of math for me that I couldn't have done on my own (without days of research),

Isn't the point of doing the master's thesis that you do the math and research, so that you learn and understand the math and research?

I bet they were talking about how people didn't do long division when the calculator first came out too. Is using matlab and excel ok but AI not? Where do we draw the line with tools?
OP said they "generated entire chapters"
Apparently not. This is the most perfect example I've seen of "I can recite it, but I don't understand it so I don't know if it's really right or not" that I've seen in a while.
I do understand it. I just don't have the overview of all the algorithms that LLMs have.
Truth is you still need human to review all of it, fix it where needed, guide it when it hallucinate and write correct instructions and prompts.

Without knowledge how to use this “PROBALISTIC” slot machine to have better results ypu are only wasting energy those GPUs need to run and answer questions.

Majority of ppl use LLMs incorrectly.

Majority of ppl selling LLMs as a panacea for everyting are lying.

But we need hype or the bubble will burst taking whole market with it, so shuushh me.