Hacker News new | ask | show | jobs
by moonchrome 1172 days ago
All this talk about chatgpt replacing programmers and skill atrophy - meanwhile all I'm getting out of chatgpt is bullshit and hallucinations. Copilot is amazing at boilerplate, but that's about it - I don't even read any suggestions that don't fall into that category anymore.

Copilot is amazing because it lets me stay in the flow when I need to churn out stuff (when I already know what I want). I would pay over 100$/month for a faster/less jittery copilot.

ChatGPT is cheap at 20$/month but not even worth that price.

6 comments

I think there's two camps of think-piece authors emerging - the ones who've tried a bunch of different examples of things and got passable-to-great-looking results; and the ones going deeper into specific areas and hitting the wall in terms of expertise and specificity. Using the GPT 4 API, I'm definitely often hitting limitations, especially around depth of information, and having to "prompt engineer" my way around them. After using a dozen prompt variants to try to prod it in the direction I want without seeing it reflect those changes, a bit of the magic wears off.

I'm bearish on the idea of long-term prompt engineering being a big skillset since I imagine the "understanding the prompt" side of the tools will get better, but I don't see it necessarily getting around the need for specificity of input. It feels like writing a task ticket and giving it to a junior person - what you get back might not be what you need, and a lot of time the true difficulty is knowing exactly what you need up front. Reducing that cycle time is wonderful, but doesn't replace the hard earned skills of knowing what to make.

I am in the camp of people who see the current limitations but also see the rate of progress and think those limits may not stand for long. It is yet unclear if it will progress like autonomous driving or like playing go …
Prompt engineer won't go away, it'll get more "engineer"-like. Knowing how to describe a point in a model's latent space for the generation you want is here to stay, but the black magic and art aspect of it will go away.

For example, in stable diffusion land, lots of people have intuition about the relationship between certain prompts and the output they produce. That intuition is embedding and training data specific, so it's not really transferrable (even to different fine tuned models for stable diffusion 1.5). However, I use clip interrogation to map the portions of the latent that my prompt is pointing to, evaluate the embedding text to find desirable/undesirable elements, then adjust the prompt or add negative prompts to navigate my generations towards what I want.

Prompt engineering is merely the entry-drug to AI-wrangling.

SD has gotten to the point that someone can fine tune a model (LORAs) with 2 days of time and $2 of GPU time.

There'll be roles for AI wranglers in every large company, where you'll be gathering the dataset and building LORA plugins for the AI to adapt specifically for your codebase/customerbase/documentation etc.

There's also processes involved in building APIs for the AI (AIPI?) to use and interface with your documentation and systems, setting up vector databases, monitoring AI output etc.

People who think there won't be job for expert AI users are just coping. Thinking "haha AI will kill your job too". The steam engine was more powerful than 100 men. In the end it required like 30 people up and down the value chain to support the engines, from coal mining, to coal shoving, to maintenance, to manufacturing.

I'm not sure most codebases are unique enough for that. There will certainly be some of that at places that are doing new things, but for the average online service backend or frontend app programming tasks, I think things like Copilot will see enough and get trained well enough out of the box to be pretty one-size-fits-all.

There will be a lot of business pressure towards using the "good enough" out of the box ones too. If you've got a team of less than a hundred people, rolling your own "datasets, LORA plugins, APIs for AI, vector databases, monitoring, etc" is a multi-person team and significant chunk of new expense. So is the incremental gain their for small to medium teams with relatively "standard" problems?

Kinda like self-hosting at that scale vs using a cloud vendor.

"Language model, write me a python script to finetune a language model."
I agree with this, for me scenarios where I know what I want are better handled by copilot - I'm way better at writing code than gpt prompts - copilot then picks up the boilerplate as I go along - and since I know what I want it's easy to fact check.

There are some scenarios where it would be useful to have chat like interface in editor to prototype fast - hopefully copilot x delivers.

I think there are two camps of authors using. The ones who tried a bunch of different examples and got terrible results and wrote the whole thing off, and the ones who pushed on through that, kept exploring the capabilities of the model and couldn't believe how useful it could be once they figured out how best to use it.
That has been somewhat my experience as well. I found ChatGPT sometimes to useful to provide a hint for something I don't know how to do.

However, I never was able to get it to write a successful function for anything that would have been useful. It got it wrong every time.

I'm surprised with this response. I myself have found it extremely useful and ChatGPT has saved me tons of time with programming and non-programming tasks.
> all I'm getting out of chatgpt is bullshit and hallucinations

> ChatGPT is cheap at 20$/month but not even worth that price.

This is so general obviously it's not true. It's providing lots of value to lots of people. To me this sounds like someone with the goal of confirming their own biases.

What's my bias ? I want this to work - I want to expend less effort to do my job - so far the problems where ChatGPT would fit in the workflow it's taken me more time to fact check the plausible bullshit it generates than doing it the old fashioned way.

Copilot is way better at generating boilerplate.

The one task I did find it useful was converting model types to open API spec - out of trying to use it for a month.

Your bias is that you want it to do your job, but extrapolating that since it's no good at that (for whatever reason), that it's no good for anything and forgetting that there are many other jobs out there.
I agree with your general sentiment and have been slumming on r/singularity lately where there has been a ton of hype. One thing that I think I've gleaned from my reading the comments there, is that people who aren't as skilled at using search engines find ChatGPT to be magical.

To my way of thinking, crafting the perfect prompt is about the same, or more, effort than crafting the perfect Google search. In both cases I'll probably have to double check the sources if I want a critical analysis of the results.

It might be worth considering, for those that GPT is helping a lot, what are you using it for? And for those that GPT is not helping, what are you trying to do with it?

"Programming" is a pretty broad activity description. I can readily imagine that AI tools, trained on publicly-available data, would be more helpful with, say, Wordpress plugins than with flight control systems.

Yup. I always wonder why is my experience not like others? Are those PR people for Microsoft?

Example:

I gave chatgpt a list. Which looked like

st street

av avenue

Convert this to yaml format as

  st:
    name:
      street
And so on.

It failed spectacularly. Not even once but about 10 times. Even if it succeeded, it kept changing the output by doing ops which I never mentioned in the prompt (like reordering and merging duplicated values to a single key)

> Even if it succeeded, it kept changing the output by doing ops which I never mentioned in the prompt (like reordering and merging duplicated values to a single key)

That's something I don't see mentioned enough; if you change the input to a LLM, that may potentially change the probabilities of all the output tokens. Most of us would be surprised if we told a junior developer to fix a bug in a specific module, and they submitted a PR which modified literally every file in the source tree, but that's entirely plausible with a LLM. Asking it to "fix" one thing may change/break completely unrelated things.

I have human intelligence and I can't figure out what the output you're hoping for here is.

Update: I tried that with GPT4 and got this:

    st:
      name: street
    av:
      name: avenue
GPT 3.5 didn't know what to do with it.
It’s a yaml key value format using a map. And even if it nailed the format, it kept changing the output on it’s own. I mentioned not to change order or remove duplicates, it kept doing that anyway. I gave it 100 elements, it kept giving me around 80. And yeah, it was GPT 3.5.
> meanwhile all I'm getting out of chatgpt is bullshit and hallucination

GiGo, basically

Doubt it, I've tried prompts where a simple Google query would lead to an answer to see if it could save me time (eg. query about AWS service usage, some azure stuff). In every instance I got a plausible sounding solution to my problem that would directly contradict documentation, it invents capabilities/features and misleads.

I've tried using it for code review on a few functions and tasked it to improve provided code - every time it would write worse code eg. I had some logic that would filter to a new list and then append replacement - it's refactor did filter -> add or replace for already filtered items, the reasoning was bullshit : fake performance claims about avoiding allocation when the "allocation" in question was value type, and the suggested alternative was replacing a vector with a hash map which is both logically wrong because of losing order, and slower for the use case.

For generating small stuff like a regex the pain you have to go through to get a correct prompt is higher than writing the thing and you still need to double check it.

I see no use case where chatgpt would improve my workflow in current stage and I've seen so many idiotic bugs recently when pressing the devs that introduced them it's basically "ChatGPT".

The one time it was useful was when I had to convert a model definition to open API spec - was easy to fact check and give feedback to get a decent solution.

> all I'm getting out of chatgpt is bullshit and hallucinations

Are you using GPT3 or 4?

4 and full agree and still confusedly wondering.. I know I will be shouted down, but what are all you "tech guys" doing in your daily biz, just glueing hyperdocumented boilerplate together and spitting out the same textbook examples, I don't get it.
That's my impression too, for every engineer with master or more you have 10 developers who just got out of a 3 weeks Javascript boot camp

Of course they think Chatgpt is a revolution and will replace developers, that's because they don't see the bigger picture.

If you work on anything remotely hard you already know coding is like 10% of the job and out of this only 10% is trivial and this is the only part got will get right

straight up don't understand the dialogue around chatGPT replacing tech jobs lmao.

Ok can chatgpt understand the super ambiguous requirement demanded by the customer, translate it into something meaningful to implement, anticipate what the customer actually wants (or will need in the future) and make sure the implementation meets that nuanced complexity? doubt it.

are you all writing hello world for a living?

Parts of our job that require skill and parts of our job that require time are not the same.

"understand the super ambiguous requirement demanded by the customer, translate it into something meaningful to implement, anticipate what the customer actually wants (or will need in the future) and make sure the implementation meets that nuanced complexity" takes skill, but that is NOT what takes up most of the workday, implementation does - and if a tool saves some meaningful time on the implementation part, then the same project can be done in the same time with less people, i.e. replacing some jobs.

> that is NOT what takes up most of the workday, implementation does - and if a tool saves some meaningful time on the implementation part

Sorry no, completely untrue here, its about debugging and inter system complexity and getting stuff like debug artifacts together, using various tools, debugger, sniffers here, trace analyzers there, having clue and figuring out the bug, and then fixing it, but please not the surface quick fix, but understanding the root cause (though ChatGPT couldn't even do the first thing well even if guided to most of these I guess, unless trained on multiple 100k to million loc code bases, which would not happen for other reasons)...

Pure implementation is the easy part (even if actually hard) and not taking up the workday, I'd wished it would more often..

It can help on those fun tasks like doing a visualization of some data for these things sometimes.. but there it is 50% great, other 50% I would have better used google skills and directly headed to docs or Stackoverflow where I can judge answers better, or transfer them to my problem more easy.

I personally doubt even ChatGPT10 will be able to do all these various tasks and reason between them...and even if, how much computing power should be there for how many tech people world-wide? I wonder I never read about scaling and limits..

I simply asked whether they were using 3 or 4. I don't have access to 4 yet.
Yes that's exactly what I do, except they're not really all that well documented.

What are you doing in your daily biz? Could you provide some specific examples I'd like to see how chatgpt reacts to them.

I'm using 4 and I still am constantly having to babysit it and challenge it to get a working result out of it. Don't get me wrong it is absolutely saving me time, but it is very much like being the teacher of a very fast typing junior dev.