|
|
|
|
|
by danielmarkbruce
630 days ago
|
|
That prompts aren't science means little. If anything it makes them more important because you can't systematically arrive at good ones. If one spends a lot of time building an application to achieve an actual goal they'll realize the prompts make a gigantic difference and it takes an enormous amount of fiddly, annoying work to improve. I do this (and I built an agent system, which was more straightforward to do...) in financial markets. It so much so that people build systems just to be able to iterate on prompts (https://www.promptlayer.com/). I may be wrong - but I'll speculate you work on infra and have never had to build a (real) application that is trying to achieve a business outcome. I expect if you did, you'd know how much (non sexy) work is involved on prompting that is hard to replicate. Hell, papers get published that are just about prompting! https://arxiv.org/abs/2201.11903 This line of thought effectively led to Gpt-4-o1. Good prompts -> good output -> good training data -> good model. |
|
Important and easy to make are not the same
I never said prompts didn’t matter, just that they’re so easy to make and so similar to others that they aren’t a moat.
> I may be wrong - but I'll speculate you work on infra and have never had to build a (real) application that is trying to achieve a business outcome.
You’re very wrong. Don’t make assumptions like this. I’ve been a full stack (mostly backend) dev for about 15 years and started working with natural language processing back in 2017 around when word2vec was first published.
Prompts are not difficult, they are time consuming. It’s all trial and error. Data entry is also time consuming, but isn’t difficult and doesn’t provide any moat.
> that is hard to replicate.
Because there are so many factors at play _besides prompting. Prompting is the easiest thing to do in any agent or RAG pipeline. it’s all the other settings and infra that are difficult to tune to replicate a given result. (Good chunking of documents, ensuring only high quality data gets into the system in the first place, etc)
Not to mention needing to know the exact model and seed used.
Nothing on chatgpt is reproducible, for example, simply because they include the timestamp in their system prompt.
> Good prompts -> good output -> good training data -> good model.
This is not correct at all. I’m going to assume you made a mistake since this makes it look like you think that models are trained on their own output, but we know that synthetic datasets make for poor training data. I feel like you should know that.
A good model will give good output. Good output can be directed and refined with good prompting.
It’s not hard to make good prompts, just time consuming.
They provide no moat.