Hacker News new | ask | show | jobs
by skepticATX 1081 days ago
The funny thing about the 10x claims (I've even seen folks say 100x) is that results like these would be readily apparent. Even a broad 1.5x productivity increase would be world changing. And yet, we clearly are not seeing this currently.

I'm not saying it will never happen, just that it will be very obvious if it does. 5-10% sounds about right to me.

3 comments

The problem with these claims is that you can't really quantify a "10x" change. I have found a lot of emergent benefits to using LLMs. For example, because my cognitive load of wrangling APIs and understanding and refactoring legacy code and all the other nonsense of my day to day can be so heavily delegated, I actually feel refreshed at the end of the day, and can bang out a decently chunked feature on my opensource software on the couch (admittedly also boilerplate heavy code).

This means that I went from 0 opensource commits to 4000 since chatgpt came out.

Not just that, but I've gotten not only more adventurous, but have the time to consider doing drastic refactors and spend much more time thinking about my software.

I won't call it 10x or 100x, because that wouldn't mean anything, but surely it is a paradigm shift for me, completely world changing.

Side projects are also where I found it invaluable. I'm spending most of my time thinking about actual problems in linguistics and language rather than how to setup an NLTK pipeline or docker image for your own wikimedia database or what not

I never heard of type hints before but ironically I use them on everything now, since it's easier to lint.

I'd be really curious if you're willing to expand more on how it has helped with those workflows. Do you copy/paste chunks in and ask it to explain them? Have it try to refactor them and then clean up?
For legacy code:

  - generate comments (hit or miss, but at least it can rewrite my random notes into consistent notes)
  - generate type annotations
  - refactor "broadly" (say, "rename all variables to match the following style" or "turn this class into a dataclass like XXX" or "transform the SQL queries into builder queries using XYZ"). Often requires some manual work but it gets a lot of tedious stuff out of the way
  - reverse-engineer clean API specs by just pasting in recorded HTTP logs
  - clean up logs into proper enums by generating the regexps
  - write CLI tools to probe the system (say, CLI tool to exercise the APIs mentioned above)
  - generate synthetic test data
  - transform HTML garbage into using a modern component system / react
  - transform legacy react/js into consistent redux actions
  - generate SQL queries at the speed of mouth
I could go on forever...
>- refactor "broadly" (say, "rename all variables to match the following style" or "turn this class into a dataclass like XXX" or "transform the SQL queries into builder queries using XYZ"). Often requires some manual work but it gets a lot of tedious stuff out of the way

Can you go on more about this, please? This sounds, frankly, heavenly, but the second sentence gives me pause. I guess it's not necessarily a question of how reliably it can "broadly" refactor but rather how broadly "broadly" is meant to be taken...

>- generate SQL queries at the speed of mouth

...and this? I'm not really a database guy, but I do keep hearing from them about how (eg, a database guy's) stateful knowledge of a database can result much, much more efficient queries than eg a sales guy with a query builder. Are the robut's queries more like the former or the latter?

I gave some insights on the SQL thing above. For the refactor broadly, it's useful when I have something that's a bit too squishy for my IDE refactoring tools/multicursor editing/vim macros, but easy enough to do or provide an example for. One thing I mentioned is having consistent variable names.

I would highly recommend taking a piece of code (any code) and then just start experimenting. Here's a few prompt ideas:

  - make this a singleton
  - use more classes
  - use less classes
  - create more functions
  - use lambdas
  - rewrite in a functional pipeline style
  - extract higher order types
  - use fluent APIs
  - use a query builder
  - transform to a state machine
  - make it async
  - add cancellation
  - use a work queue
  - turn it into a microservice pipeline
  - turn it into a text adventure
  - create a declarative DSL to simplify the core logic
  - list the edge cases
  - write unit tests for each edge case
  - transform the unit tests into table-driven tests
  - create a fuzzing harness
  - transform into a REST API
  - write a CLI tool
  - write a websocket server to stream updates into a graph
  - generate a HTML frontend
  - add structured logging
  - create a CPU architecture to execute this in hardware
  - create a config file
  - generate test data
  - generate a bayesian model to generate test data
  - generate a HTML frontend to generate a bayesian model to generate test data and download as a csv
  - etc...
If you are not feeling inspired, take a random computer science book, open at a random page, and literally just paste some jargon in there and see what happens. You don't need correct sentences or anything, just random words.

There really is nothing that can go wrong, in the worst case the result is gibberish. The code doesn't even need to build or be correct for it to be useful. These models are trained to be plausible, and even more importantly, self-consistent.

When prompted with code in-context, these things are amazing at figuring out consistent, plausible, elegant, mainstream APIs. Implementing them correctly is something I usually tend to do manually instead of bludgeoning the LLM.

> - generate SQL queries at the speed of mouth

Because of the points this is the nearest to the work that some colleagues do, I anakyze this point (but you could ask similar questions about many of the other points):

In my experience, writing correct SQL queries (which often tend to be quite non-trivial because of the internal complexity of the projects) typically involves a lot of knowledge about the whole system that my colleagues and I work on. Even if I could copy-paste this information, written down once, into the AI chat window:

- I seriously doubt that any of these AI chat bots would be able to generate a remotely decent SQL query based on this information, if only because these SQL queries look really different from what you would see in typical CRUD web applications (for a very instructive example think into the direction of ETL for unifying historically separated lines of business where you often have lots of discussions with the respective colleagues to clear up very subtle details what the code is actually supposed to do in some strange boundary cases that exist because of some historical reasons (which one wants to get rid of))

- even explaining what the SQL query is supposed to do would in my opinion take more time than simply writing it down. Even ignoring the previous point: it is very typical that explaining in sufficient detail what the code is supposed to do would take far more time than simply writing it. A lot of programming work is not writing some scaffolding of some CRUD app or implementing a textbook algorithm.

Have you tried this at all?

I find that many of the generative AI models (GPT-4, 3.5, even MPT-30B running on my laptop) are really shockingly good at SQL.

Paste in a query and ask it for a detailed explanation. I've genuinely not seen it NOT provide a good result for that yet.

Generating new SQL queries is a bit harder, because of the context you need to provide - but I've had very strong results from that as well.

I've had the best results from providing both the schema and a couple of example rows from each table - which helps it identify things like "the country column contains abbreviations like US and GB".

If you've found differently I'd love to hear about it.

> Paste in a query and ask it for a detailed explanation. I've genuinely not seen it NOT provide a good result for that yet. [...] If you've found differently I'd love to hear about it.

I have not directly tried it (the employer does not allow AI chatbots for any application intended for production (i.e. more sensitive stuff), but only for doing experiments), but working on the code I very rarely had the problem that I could not understand what some single (SQL) line of code does in the "programming sense".

The central problem that rather occurs often is understanding why this line does exist and why things are implemented the way they are.

Just to give an example: to accelerate some queries, I thought some index would make sense (colleagues principally agreed; it would likely accelerate a particular query that I had in mind). But there exists a good reason why there exists no index at this table (as the respective colleague explained to me). This again implies that for ETL stuff involving particular tables, one should make use of temporary tables where possible instead of JOINs; this is the reason why the code is organized as it is. This is the kind of explanation that I need, which surely no AI can deliver.

Or another example: why does some particular function (1) have a rights check for a "more powerful" role and a related one (2) does not need one? The reason is very interesting: principally having this check (for a "more powerful" role) does not make a lot of sense, but for some very red-tape reasons auditors requested that only a particular group of roles shall be allowed to execute (1), but they were perfectly fine with a much larger group of users being allowed to execute (2). Again something that no AI will be able to answer.

I approach things a bit obliquely. I create a custom made DSL (starting from scratch in each conversation, often) that allows me to model my query the way I want. Then, I write a traditional SQL builder on that DSL (or more like, ask GPT to do it for me). Then, I generate DSL statements that match my current domain, and more importantly, modify existing ones.

So, at each step, I do almost trivial transformations.

One key ingredient is that the DSL should include many "description" fields that incorporate english language, because that helps the model "understand" what the terser DSL fields are for.

Straight SQL is a crapshoot, and as you said, more often than not, either obviously or subtly broken or for another database. Which makes sense, considering how much different flavors of SQL it has in its training corpus and how much crappy SQL is out there anyway.

Another thing that helps is use extremely specific "jargon" for the domain you want to write queries for. Asking for "accrual revenue" and "yoy avg customer value" (yes, yoy, not year over year) often tends to bring back much higher quality than just asking for "revenue" or "customer value".

Are you testing chatgpts output in any way? I’ve considered using it for tasks but after hearing all the talk of how it can write good looking code that ends up not working as you might expect, I started wondering if the time savings from generating that block are wasted from interpreting and testing.
I have access to ChatGPT Code Interpreter mode, where it can both write Python and then execute it.

I use that to write code all the time, because ChatGPT can write the code, run it, get an error, then re-write the code to address the error.

Here are two recent transcripts whereI used it in this way:

- https://chat.openai.com/share/b062955d-3601-4051-b6d9-80cef9...

- https://chat.openai.com/share/b9873d04-5978-489f-8c6b-4b948d...

I dunno computers should be world changing. And they are. Yet the lives of ordinary people have hardly improved and are arguably worse since the 1980s
> I dunno computers should be world changing. And they are. Yet the lives of ordinary people have hardly improved and are arguably worse since the 1980s

If you replace "lives of ordinary people" by "productivity improvement", you actually have a good (and rarely discussed) point:

> https://en.wikipedia.org/wiki/Productivity_paradox

On that note, my personal hypothesis, even more controversial than "IT unproductive hypotheses" in the article that means IT was a rounding error relative to earlier major improvements, is: in many areas - particularly in office work and all kinds of everyday errands - IT is anti-productive, as in it makes people less productive on the net.

The core hypothesis behind my belief is that introduction of computers to replace a class of tasks - up to and including a whole job type - is just shifting the workload, diffusing it across many people, where previously it was concentrated in smaller number of specialists. Think e.g. the things you use Word, Excel, Powerpoint, Outlook, etc. (or their equivalents from other vendors) for - before software ate it, a lot of that used to be someone's job. Now, it's just tacked onto everyone's workload, distracting people from doing the actual job they were paid to.

That would seem like obviously stupid way to do, so why would businesses all fall for it? I argue it's because even as shifting the workload makes everyone in the company less productive on the net, it looks like an improvement to accounting. Jobs with salaries are legible, clearly visible on the balance sheets. So is money saved by eliminating them. However, the overall productivity drop caused by smearing that same work across rest of the company? That's incremental, not obviously quantifiable. People and their salaries stay the same. So it all looks like introducing software and obsoleting some jobs saves everyone money - but then somehow, everyone is experiencing a "productivity paradox". But it's not a paradox if you ignore the financial metrics with their low resolution - focusing on what happens to work, it seems that IT improvements are mostly a lie.

If I understand it would be something like. You used to get a dedicated secretary. But now most of those roles are now handled by computer. So in a sense everyone has had their workload mildly increased. But worse than that the workload is typically of a different nature so, for example, excessive meetings are now easy to generate.

I would also add that it may be of a net benefit that fewer roles are needed. But that net benefit overwhelming goes to the owners of the company. And that's what we've been seeing the last 30+ years the very wealthy have become much more wealthy while everyone else is worse off. ()

() growing wealth inequality is very complex and I'm sure would be happening anyway. I'm not saying computers cause wealthy inequality but they don't seem to be doing much good in fixing it either

> But worse than that the workload is typically of a different nature so, for example, excessive meetings are now easy to generate.

That, but also:

- Secretaries were better at this work because that was their specialization, and they enjoyed efficiencies coming from focusing on doing a single specific kind of work.

- Those increments of extra work add up.

- Moving that work to everyone else means you now have highly paid specialists doing less and less of the specialized work they're paid for. In many cases (programming among them), context switching is costly, so the extra work disproportionately reduces their capacity at doing the thing they're good at.

This all adds up to rather significant loss of productivity.

> Yet the lives of ordinary people have hardly improved and are arguably worse since the 1980s

Read up on improved crop yields for subsidence farmers who got access to weather reports through smart phones.

A lot of people are not starving now because of that one change alone.

> Read up on improved crop yields for subsidence farmers who got access to weather reports through smart phones.

I think that's a poor example. I was getting weather reports on pre-smartphone Nokias.

Farmers could have gotten weather reports on feature phones.

Worse in what sense?

Computers have done amazing things for humanity. I would not want to go back to a computerless world...

If you work in the US for fang or similar your life is probably much better. I love computers. I love what they do for recreation and keeping in touch long distance. But work hours are longer. Education and housing is no longer affordable for most people. In the US life expectancy is beginning to drop. You would think people would have easier less stressful lives with greater financial security. But it's not the case for 99% of us.

Quick example. A doctor in the 90s would expect to look forward to less time wasted on paperwork thanks to computers but administration work is increasing. Likewise most professors at most universities will tell you they spend more and more time on administration. Shouldn't this be one of the primary things a computer could address.

My point is great gains brought by a technology such as computers may not translate to great end user benefit. Largely due to compensating inefficiencies elsewhere in the system.

So for example. It may be that chatgpt makes people 10x more productive. But if management gives bad direction it doesn't matter. If most software being developed is redundant then it doesn't matter. If most software is just making ads run faster then it doesn't matter. The technology needs to be appropriately directed to be of noticeable social benefit.

I think the catch here is that we need to know what makes admin work increase. While I can’t comment on education, at least for housing in Canada this is all a result of NIMBYism and government intervention. In other words, we could have slouched towards utopia much further than we have if it were not for NIMBYs.
"In other words, we could have slouched towards utopia much further than we have if it were not for NIMBYs."

I think the NIMBYs have done pretty well in creating their own Utopia in terms of Canadian real estate

That has nothing to do with what the poster is saying. And if tools improve but not quality of life, that is because minds are not improving along tools - this has been noted for centuries.
I don't think anyone is claiming that chatgpt has made everyone 100x more efficient. Either they are retarded or you misunderstood them.