Hacker News new | ask | show | jobs
ChatGPT powered GitHub code review app (chatcody.umso.co)
90 points by winkmaster 1199 days ago
18 comments

As someone who has been using ChatGPT to code recently - code reviews are one of the worst ways to use ChatGPT. Sure, it'll catch some stuff every now and then, but realize that it's still just a language model and things like design or code reviews require real understanding.

Using ChatGPT to code review is very much like posting a picture of your code on StackOverflow or Reddit.

The bot is not meant to completely replace code review but to improve the code to be more production ready and catch any errors that might not be caught by the developers. In my experience more than 40% of code review comments relate to the specific changes and do not need a global context from the project.
It feels weird to see so many companies sending their codebases to an ai company in exchange for feedback on their practices. People used to be more secretive about their tech years ago
I thought of this recently as well. SaaS and the modern "cloud" services really have radically changed the game in this regard. Even as recently as 10 years ago, the idea that you would willingly exfiltrate your most precious data to some company, where they would then essentially "own" it (in the sense that your data is in their database, and you don't have direct database access to it. You can only access it through what they provide an API for), would have been really difficult to fathom. We had nearly everything "on prem."

In many ways I think the change is good. It's certainly more convenient. But the radical shift in mindset is something I deeply regret. The SaaS-ification of everything, where the user has zero control over anything (even installation of updates) has had a majorly negative impact on my life. I loathe the fact that every few days, something important to my workflow/routine is probably going to break in some way from a CI/CD deploy that contained a bug, and I'll be stuck until they patch it. And of course there are times when you are mid-something, and the app starts 500ing and you click refresh, and congratulations! You're the first user of the massive UI overhaul that you didn't want or need, so now you get to re-learn how to use the app!

As a dev I love SaaS. It's so wonderfully convenient for me! As a user, I hate it.

I think it's because we're more used to sending our codebases to Github, Vercel, and other places?

Also, it seems that for most web sites and apps, it's more about the craft and speed of execution than the code itself, so I've stopped being paranoid about this kind of stuff. I don't think deep tech engineering co's should use these things though.

Isn’t most tech just assembling API calls in a nice looking UI? Very little innovation happening at most companies.
My employer has banned Copilot (and presumably this app as well), for what it's worth. I'm guessing most large employers will end up doing the same if they're at all paranoid about their code, although by that time Copilot may offer an on-site version, or at least a version that can be deployed to an enterprise's cloud instance.
Maybe people have finally realized that code is not an asset.
Seems like a pretty naive comment, do you think the Google maps codebase isn’t an asset, for example ?
Well, I think there's value to both points - how much of the maps code would you need to be able to get something working, and how much work would it take to implement all the custom internal dependencies that would probably be missing?

I'm sure there are very complex valuable pieces of code within the maps codebase (or any other), but it would be a fairly massive task finding those pieces, extracting them, and getting them to work properly in a different codebase...

The data is way more valuable than the codebase. Being able to display a bunch of image tiles is cool, but not a mysterious unsolved problem.
Why doesn’t Google open source it then?
Open sourcing something requires extra work.
It's not an asset for engineers. It is for companies.
The code is already sent to github and hosting provider. We just have to trust that multi billion dollar companies have more to loose by using our code. And most of the codes that people work with is not rocket science, so unless openai directly sells the code to the competitor, the potential losses is likely not big.
Given the recent EU cybersecurity directive requiring code to be audited it's not so weird. However they will partly break this model with their AI regulation directive that is in the works.
It will revert as consumer computers get more powerful and the LLMs simultaneously get more efficient to run with fewer computational resources
wouldn't it be better to do this locally before the PR, if the bot's suggestions are valid then the user would have to make another commit.

maybe this would work if a reviewer wanted to highlight code for a question/comment and then the bot could add their two cents

It has to be an online service until they get the models more refined and hardware advanced enough that you can run these locally. If you can't run the model locally then it might as well be a service you tie into at the service (eg. github) level. That is, IMO, services are most useful with other services.
I think they are saying the app could be a pre-commit hook where their API is hit before code goes up to remote
Yes. I'm saying that due to this relying on an external service it would be inappropriate for a pre-commit hook as you could only commit while online.
> [...] it would be inappropriate for a pre-commit hook as you could only commit while online.

Why does that matter?

Hey, some of us live in caves.
git commit --no-verify ?
If the developer wants to slow their own development workflow by delaying the code review, that’s.. concerning.
pre-push hook?
to clarify, i mean the processing would still be done remotely, but it could be done earlier in the cycle
No, because that would be way too close to the user, it would be almost impossible to market it as a service you have to pay for ;)
Hmm. Not seeing your argument. The company has an API key. Employees query the local service which passes it up to the pay service. What does user executed vs. github commit triggered have to do with anything? What am I missing?
This solution doesn't require the installation of software, or any changes to the existing workflow.
The bot does not work like a linter or testing framework, these are suggestions that are subject to discussions, and the pull request is a great place to get reviews from the team
This look like someone got the output of $linter and asked ChatGPT to write a happy explanation in plain English. Is there more to it?
Agreed. I was expecting the AI to uncover and offer more nuanced suggestions. The examples on the site were highly local issues, with most of them being one-liners our linters are already catching. I was excited to see what the AI can do when it has the context of the entire repo, but I was not wowed.
I wouldn't discard the capabilities of the LLM. I've been using it to do relational reasoning of Java comparators and it is surprisingly accurate. https://twitter.com/marceloabsousa/status/163289383288582963...
Apparently not. And for being its sole value-add, I find the the fake-happy chattiness to be quite grating in fact.
The examples on the site are meant to showcase the features and chatGPT capabilities being where it shines. Depending on the project's context the bot will make great reviews, chatGPT already does a good job at reviewing code if given enough context and that's what the bot is trying to achieve. In the future the bot can be onboarded to a codebase and the overall project for better review and possible code contribution.
Rather than a natural language code review, having a machine learning model that can spot code that "looks wrong" might be really valuable, especially if it was able to get a "good enough" SNR. Static checkers today can be limited in their applicability. But if you trained a model on { block of code } => "defect class <X>" it could be really powerful. Perhaps seeing examples of applied fixes might be a good way to convince someone whether or not there was really a bug there.

Maybe it could even collaborate with static checkers as a quick screen -- seeing as how some of static checkers today are a compromise between the execution time of the checker and the comprehensiveness of the check.

Great idea, but you stopped too early. By being a bit bolder, you could have solved the problem of open-source funding.

The idea: Train your bot based on the code reviews of prominent open-source contributors. Next, let the user choose the bot they want a review from (e.g. "Dan Abramov Bot", "Linus Torvalds Bot", etc.). Each time the bot does a code review, the open-source contributor gets paid for that. This also solves the legal problem of the licence of training data.

I explained this idea in a blog post a couple weeks ago: https://marmelab.com/blog/2023/02/27/copilot-code-review.htm...

I would pay for that product, because it's fair and it addresses a real problem: the sustainability of open-source projects.

I wonder if most companies will endorse their IP going through this kind of thing. Probably going to happen at any rate.
All of the code is already in Github (aka Microsoft).
It is, but Github's terms do not allow your code to be shared with others. Although ChatGPT represents that it does not retain information provided in conversations, it does “learn” from every conversation. There is currently little reassurance as to how those "learnings" are leveraged outside of your own usage.
Chatgpt doesn't "learn from every conversation", they trained it once and the output isn't back propagated through the network. Keep in mind that Githubs terms are the same ones they willingly ignored to make copilotm
They clarified it with latest update on 1st of March 2023 that it's opt-in:

> "Data submitted through the API is no longer used for service improvements (including model training) unless the organization opts in"

source https://openai.com/blog/introducing-chatgpt-and-whisper-apis

They probably don't want this AI turn nazi in less then a week, like the one they put on Twitter, thanks to the "learnings" it gets from users.
It's year 0 of this being useful. I would imagine sooner rather than later we'll start seeing it licensed and run internally/easily spun up in your organization's AWS or Azure or what have you.
Why not IDE suggestions: not auto-completions(already use gh copilot, latency is so high) but more of advanced linter suggestions?

Would love to this evolve into sourcery.ai-like product. Or sourcery.ai could benefit from ChatGPT/Codex.

How does this generally work? I've played with openai's API and I can send it questions and get answers. Is this app just parsing new additions to commits and passing in the whole code block as a prompt, with an additional line saying "please review this code"?

Additionally, how is this app able to handle so many potential requests to openai? Are they already paying for a bunch of tokens in anticipation? Or do you pay for how many tokens used per month? (The openai pricing is a little confusing since there's so much you can do.)

Thanks for any insight

The goal is to provide more context to the model and use the changes with the title and description of the PR to get a review, while asking for improvements and suggestions.
I'd imagine you'd use a specific sequence of prompts to interrogate the code according to a predefined script?
This is harder than it seems. The way we are doing with Robin (Reviewpad's AI Reviewer) is by feeding it context about the PR and results from our static analysers and allowing the developer to prompt directly through the PR comment. For example, https://github.com/marcelosousa/robin-preview/pull/2
At Reviewpad we're also working on a similar product.

Just yesterday, I've tweeted about it: https://twitter.com/marceloabsousa/status/163289383288582963....

Some pull request examples at: https://github.com/marcelosousa/robin-preview/pulls

Senior/Mid-level devs - just do your code reviews. Based on the screenshots it looks like a useless cashgrab/datagrab.
Most PR reviews contain comments and suggestion that could easily be automated with ChatGPT, and reduce the time it takes for a PR to be merged, which makes the team more productive.
Surely you would be very comfortable to use this to 'code review' and send all your code, API keys, ENV files and secret urls to a random online service?

I don't think so.

There is a reason why large companies ban the use of unaudited third-party services like this, especially when they aren't compliant with security standards.

ironically, said review bot will likely help you prevent checking in private API keys
Yeah haha, one more reason to use the bot and catch these mistakes
Why does this tool exist in an interactive form? Just do the thing and compile the code, and if that works check it in.
When working with ChatGPT the term, “Confidently wrong” comes up.

Applies to self driving cars too, though they don’t have rollback capabilities in the event of an unhandled exception that involves physics.

Ok. Compilation + automated tests then.
The bot is built to act as an extra team member dedicated to code review, these suggestions could then be discussed on the thread within the team and new commits can then be made
I will give it a shot in our company

Edit: downvote? So, I received a couple PRs, it commented way too much. Sometimes a comment would say "Everything looks good". Needs better "calibrating".

The Github bot is built to review pull requests by analyzing the changes made and providing detailed comments on the changes, by suggesting improvements or highlighting vulnerabilities or bugs.
Looks useful for small teams where things can fall through the cracks. I've signed up to try it!
More useful than a linter?
It's quicker to set this up than linter-enforced rules in GitHub actions so could be useful when resource is tight.
There are linters in the marketplace.
Is there one you'd recommend? I'm a solo dev and this sounds useful.
I don't use them myself but it depends entirely on what language(s) you're using and the rest of your CI flow. Do a quick search and use the one you're using locally. It will most likely use the same config.

But since you're solo I'd recommend discipline before pushing instead of adding another layer on your CI, unless the point is to learn of course then go ahead.

Reminds me of ourselves at pullpilot.ai.

Looking at the product it seems very solid. I don't believe in competition in this case but rather that we're trying to help more and more devs.

If they play their cards right, such as PP, they can add modular verifications of security issues, code change recommendations/optimizations, plagiarism detection, etc.

Looks good @winkmaster!

“Great job! By the way, why don’t you rewrite this in Rust?”