Hacker News new | ask | show | jobs
by ta93754829 599 days ago
we keep moving the goalposts, and that's not a bad thing.

remember when Doom came out? How amazing and "realistic" we thought the graphics were? How ridiculous that seems now? We'll look back at ChatGPT4 the same way.

1 comments

Or is ChatGPt4 4k TV, which is good enough for almost all of us and we are plateauing already?

https://www.reddit.com/r/OLED/comments/fdc50f/8k_vs_4k_tvs_d...

For the work that I do, ChatGPT accuracy is still garbage. Like it makes obvious factual errors on very simple technical issues which are clearly documented in public specifications. I still use it occasionally as it does sometimes suggest things that I missed, or catch errors that I made. But it's far from "good enough" to send the output to co-workers or customers without careful review and correction.

I do think that ChatGPT is close to good enough for replacing Google search. This is, ironically, because Google search results have deteriorated so badly due to falling behind the SEO spammers and much of the good content moving off the public Internet.

I am going to offer you some tips for using ChatGPT.

1. Just because something is in a public specification does not mean that GPT knows about this specification. If you want to work on something, and that something is documented, share the document with the AI. Don't just assume it has read it!

2. Share your existing code, don't just ask for one-off functions. You can do this with a tool like Aider.

3. Context is king. Do you have code (in a different language?) which does what you want? Do you have ideas/comments from JIRA tickets? GitHub discussions? Include it all. Ask questions. Don't just ask for code, but ask for a plan to implement a given feature or refactor. Add the plan to the context. Work from the plan. Have the AI update the plan as it works.

4. AIs are really good at prompting AIs. Ask Claude for help writing your prompts. Tell it what you want to do. Tell it the things you're concerned about, or possible issues you expect to have.

Here's an example of augmenting a prompt before you use it:

Original prompt: Help me write a prompt for a smart AI to review a webaudio library and find bugs.

Result:

You are an expert developer specializing in web technologies, with deep knowledge of the Web Audio API and common pitfalls in audio programming. Your task is to thoroughly review the provided WebAudio library code and identify potential bugs, performance issues, and areas for improvement.

## Instructions:

1. Analyze the library's architecture and overall design. 2. Review each function and method for: - Correct implementation of Web Audio API concepts - Proper error handling and edge cases - Memory leaks or performance bottlenecks - Browser compatibility issues 3. Check for common WebAudio-specific bugs, such as: - Incorrect audio graph connections - Timing and synchronization issues - Sample rate and buffer size mismatches - Potential audio glitches or artifacts 4. Evaluate the library's handling of: - Audio context creation and management - Node creation, connection, and disconnection - Parameter automation and scheduling - Audio worklet usage (if applicable) 5. Assess the library's documentation and API design for clarity and consistency.

## Output:

Provide a detailed report including: 1. A summary of the overall code quality and architecture 2. A prioritized list of identified bugs and issues 3. Specific code examples highlighting problematic areas 4. Recommendations for fixes and improvements 5. Suggestions for additional features or optimizations

Please be thorough in your analysis and explain your reasoning for each identified issue or suggestion.

Thanks, I appreciate you trying to help but I was already aware of those tips and they don't help. Either the accuracy is still bad, or going through the extra steps takes so long that it's faster to just do everything myself. Maybe the next version will be better.

I am not dealing with code or code reviews but rather complex written specifications where understanding what's going on requires integrating multiple sources.

There’s absolutely room for improvement. I think models themselves are plateauing, but out interfaces to them are not.

Chat is probably not the best way to use LLMs. v0.dev has some really innovative ideas.

That’s where there’s innovation to be had here imo.

I don't think we're at a plateau. There's still a lot GPT-4 can't do.

Given the progress we've seen so far with scaling, I think the next iterations will be a lot better. It might even take 10 or even 100x scale, but with increased investment and better hardware, that's not out of the question.

I thought we’ve seen diminishing returns on benchmarks with the last wave of foundation models.

I doubt we’ll see a linear improvement curve with regards to parameter scaling.

And now we have the LLMs self feeding their models (which may be either good or bad). This shouldn’t lead to short-term wide (as in AGI) efficiency. I bet this is a challenge.