Hacker News new | ask | show | jobs
by bennyschmidt 1033 days ago
GPT 3.5 is so bad it's useless to me - for writing it's too repetitive of the same kind of jargon, for coding it's wrong way too often. The NLP is also worse, I have to be more explicit. It's just an average chat bot IMO.

GPT 4 @ $20/mo. is significantly better at everything, I use it for doing stuff in Angular lol - when you have an AI explaining the why behind everything, this over-engineered mess of a framework starts to actually make sense. Definitely nice to have around as a translator/teacher or troubleshooting assistant. Can't imagine googling for answers to problems if this gets any better. The main thing is just habit - GPT 4 is lower effort to arrive at more direct, bespoke answers.

The one feature I want is built-in prompt-splitting, so we don't have to use third-party tools. In my all-wise random person's opinion: Forget the old versions of GPT, and forget the phony ethics, and focus on the best version of this technology, sell it for $20/month, make billions and disrupt a lot of things online.

7 comments

> forget the phony ethics, and focus on the best version of this technology

I’ve experimented a lot between the censored and uncensored versions of Llama 2.

Based on this, I’ve concluded that fine-tuning for political correctness and ethics negatively affects all answers. They become repetitive and washed out.

I hope this technology keeps improving to the point we can run it on our own machines. It's too good to be censored.
> In one hour, the chatbots suggested four potential pandemic pathogens, explained how they can be generated from synthetic DNA using reverse genetics, supplied the names of DNA synthesis companies unlikely to screen orders, identified detailed protocols and how to troubleshoot them, and recommended that anyone lacking the skills to perform reverse genetics engage a core facility or contract research organization.

https://arxiv.org/pdf/2306.03809.pdf

I'm sorry. Are you implying I'm not supposed to know about any of those things you cited? That it's "sensitive" information, not meant for people like me?

Your post is the exact reason why we need uncensored models running in a distributed manner.

> explained how they can be generated from synthetic DNA using reverse genetics

Was that a secret? https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9066064/

> supplied the names of DNA synthesis companies unlikely to screen orders

My naive Google search implies that'd be most of them...

https://arstechnica.com/science/2022/12/experts-debate-the-r...

> identified detailed protocols and how to troubleshoot them

Googling "reverse genetics for influenza" gets the same protocols...

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5297655/

> recommended that anyone lacking the skills to perform reverse genetics engage a core facility or contract research organization.

I googled "Who to hire for reverse genetics" and the first result was a CRO

https://www.wur.nl/en/research-results/research-institutes/b...

Please feel free to contact the expert of our contract research organization (CRO) if you have a question concerning reverse genetics and reverse vaccinology.

-

LLMs have the sum knowledge of a lot of Google searches. I wish we'd stop drumming up the most ludicrous risk profiles when they're capable of damage in much more boring ways.

> Angular […] over-engineered mess of a framework

Good to know I’m not the only one feeling that way

I think for fine-tuned GPT-3.5 to be competitive with GPT-4 on your use cases (assistance with Angular), you'd have to fine-tune on enough data that it really resembles pre-training more than fine-tuning. And it wouldn't be worth the hassle unless you're building a product around it.

That said, many valuable LLM products / features are more narrow in scope and can see a huge lift from fine-tuning. We've run a bunch of experiments on this (e.g., SQL query generation is a good example), where fine-tuning even the 7B Llama-2 model outperforms GPT-4 (surprisingly) [1]. That's a very different type of problem from teaching software engineering of course.

[1] https://www.anyscale.com/blog/fine-tuning-llama-2-a-comprehe...

Use code interpreter to upload your files and prompt it to ask you a serires of questions to know what to do next
Code interpreter is quite good. I used it to create graphs, convert csv to JSON, write a complex Bash script, and regex. It's impressive.
Uploading a file to Code Interpreter does not magically increase the prompt context length. It will just read in part of the file or write code that operates on the file, depending on your prompt
I've found that LLMs serve best as fuzzy searchers. It may be hard to ask Google the right questions, but this is where LLMs shine. Googling any form of "I remember hearing about a study that Google did awhile back about new hires and they found that if a GPA was above 3.0 that there was no difference. Can you link me that study? Was there any followup?" is quite difficult and you'll likely end up with tons of links about questions of minimum GPA for getting a job at Google, but Bard will give you information about "Laszlo Bock" and his book, when enables more refined Googling. Simple "Laszlo Bock Google GPA" now provides a useful search.

This is where I find LLMs shine, when I'm struggling to cite the correct incantation to Google to filter our all the junk that has been SEO optimized. (foreshadowing LLM search optimization...)

What's also interesting is I tried this exact sentence in multiple LLMs.

- ChatGPT gives me the standard knowledge limit response despite all the results for our refined search being June 2013.

- Bard didn't need any coaxing (a bit surprising).

- Hugging Face Chat also gave me Bock and Project Oxygen and Project Aristotle (Bard didn't have either). HuggingFace is providing by far the best result.

- Claude did not find the study but at least suggested some others.

- LLaMa doesn't seem to be able to find it either, but suggests that Google has done studies and gives some names.

sheepscreek is exactly right about the fine tuning for correctness degrading results. There is an interesting thing going on right now, as alignment is strangely not being recognized as also disalignment. You cannot have one without the other. There is always a trade since you are shifting the probability distribution. But I think unfortunately it is not only unpopular to research this area, but the methods needed would involve quite unpopular networks and require a deep discussion of probability and distributions, which currently appears to be resulting in rejection from top conferences if my Twitter feed and personal experience are any indication. The conferencing system is so noisy at this point that I personally feel that it is worse than were it to not exist. Much like my ChatGPT result for the question.

It is also worth mentioning that the tuning process being performed may have additional consequences which aren't being openly discussed or addressed, despite it being in the name. Tuning for human preference is not exactly tuning for factual knowledge, but the preferred results that humans like. While tuning may include pressure to increase factual output one needs to also be highly aware that the bias we're introducing to these models is that which specifically hacks the evaluation metric (i.e. us humans). This has the ability to make LLMs worse off than before, as they become more likely to be convincing when they return incorrect information, even if the average factual accuracy is higher. Need to be highly aware of both Simpson's and Berkson's paradoxes, as they deal with poor evaluation due to the way in which data (results) are aggregated. We are literally tuning through Goodhart's Law.

I wish I could point the AI at huge GitHub codebases and have it explain the whole thing to me. Would make contributing to open source software so much easier.
There is a GTP-4 plugin available for this purpose. I have not tested it myself, but it may be worth trying out?
You are referring to this one, yes? https://recombinant.ai
That's seriously awesome. Does this require ChatGPT 4 subscription? I can't justify paying to work on open source.
For your use case of troubleshooting assistant are you pasting code into ChatGPT or using something like cursor.so ?