Hacker News new | ask | show | jobs
by bzbee10 1233 days ago
Here's a concrete scenario: StackOverflow makes money because people go to it for answers to their questions. The answers are provided by people who write them because they get points for providing good answers, that they use for reputation points to get good jobs. It's an ecosystem. Everybody profits. Now Google scrapes the contents, and provides a service where you just ask your question and it gives the best, most accurate answer. Nobody goes to StackOverflow any more. People stop posting answers, because nobody asks the questions, and nobody reads their answers any more. StackOverlow goes under, and the ecosystem dies. Where does Google gets its information now? The feeder pipeline has died.
12 comments

> Where does Google gets its information now? The feeder pipeline has died.

The bar has raised now. People will ask the questions on SO whose answers they couldn't find from anywhere else including AI.

Correction: the AI will ask the questions it couldn't find a high enough confidence answer to. We are all now working for the AI to find niche scraps of knowledge that haven't been indexed.
I had similar thoughts.

The next evolution towards general AIs would be the implementation of curiosity.

30 years old, engineer here. I am starting to get really scared. I don't mean that for my job security.

We are just starting to observe societal effects of social media. We haven't reached the era where we, as an entire species, recognise and regulate it impacts legally & properly.

I am starting to feel like we are losing it against the machine as a species. I don't fear being replaced, but I feel the culture getting mangled in a way we won't be able to recover some things, because it will be too late.

It is not a ring-wing, left-wing political thing. But some sort of innovator's dilemma. We are like cornering ourselves into an innovator's dilemma as a species.

Yes and: I’d argue that whoever has been distorting social media to further their own goals (for example the Cambridge Analyticas) has already mangled culture enough that the transition to AI might be imperceptible.
^ This.

we will all become servants to the giant AI, feeding it more and more levels of detail and obscurity

maybe people get paid to answer questions that AI can't answer, can be a new eco system
I don’t understand this scenario. If someone already posted the answer then asking the question again on SO has no value. If the LLM can automate giving answers then we don’t need SO as much as before.
> If the LLM can automate giving answers then we don’t need SO as much as before.

This is probably the misunderstanding: the LLM can only automate giving answers because it has been trained on all of SO (or other similar communities). It's a summary of SO, not an alternative to it. When new problems arise, the LLM will need to be re-trained to include the new SO answers, it will not be able to synthesize new knowledge.

So, if SO is dead, the LLM can't get the info anymore to answer questions about new topic - but you won't be able to tell for a long time, long enough to kill SO most likely (assuming this actually gets traction, of course).

I’m not convinced the LLM doesn’t have emergent answers. I often ask ChatGPT questions about data wrangling that are quite esoteric, such as “write some code in R that sorts an array as alphabetical, but puts x and y at the beginning”. Even if it gives the wrong answer at first, it seems to get it right eventually.
But why would SO (or an equivalent) be dead. I know very little about ML so it is entirely possible that I'm way way way off here but it seems like a product with this sort of tool integrated would be capable of determining when a query produced a not-very-useful result and could even aggregate such queries. We'd have a very powerful training system where the AI can communicate back "hey I need training on this sort of stuff" and then iterate. If this is valuable, people can be paid to provide this training input.
If Google launches this tool, Google will make money off the answers this tool provides. The fact that this tool is trained on the content from SO will not mean SO gets any money. Also, if users just get the answers from Google Bard, they will not visit SO, and will not contribute to SO's community or revenue. So, the SO community will eventually die if Google Bard is good enough.

The whole premise of the economics of these LLMs is built on the assumption that the training data is (mostly) free. If you need to pay people to provide the training input, you will quickly find that you're spending more money on creating the training data than you're getting out of the finished model.

Stack Of never paid anyone though, so it seems possible for Google to launch a service where people answer questions to feed the AI and give back similar tokens as SO.

I mean, they could make a game, where people have to try to beat the AI (and other humans) in making the best answers to questions.

Sure, they can try to essentially create and run a new SO - though that's still far more costly than what they did today. Especially when considering similar effects on other content sites.
people will visit SO less, make it less profitiable to stay alive.
If this the case? If the LLM must be re-trained each time a new problem arises, it means that it doesn't "reason"...so what's the point?
Remember LMGTFY? People will be too lazy to ask the AI and will instead ask basic and completely obvious questions on Stack Overflow, forums, and so on. Probably forever.
The most accurate answer might still not be correct, so they'll still go to stackoverflow to ask new ones, thus keeping the pipeline alive ?
But if the traffic to SO is reduced significantly because of Google, there will be a death spiral. There wouldn't be a site to go and ask new questions, and even if the site is still there, nobody will bother to answer. The core problem is that the people who create the new knowledge (the answer writers) and the people who aggregate the knowledge (StackOverflow) will not get any concrete return for their efforts, since they are cut out of the consumption loop.
You don't need to be a big business like SO to provide this service. The death spiral would only make sense for a business with demands for growth out pacing the market, so they exit. or who has an operation which is too expensive vs the ad revenue.

But that doesn't mean other businesses can't fill that reduced role more efficiently.

You need to be a place that people go to ask and answer these questions. It doesn’t matter how big or small you are if you don’t exist because the people answering the questions don’t go to you in the first place. You’re right that there’s another means in which a business can fill this, but I don’t think it’ll be stack overflow.
Here is where the loot anything you see wild west kinda approach will break down.
> because they get points for providing good answers, that they use for reputation points to get good jobs.

Please show your work for the second part. Seems like a general statement but don't see that reflected in reality.

> Where does Google gets its information now? The feeder pipeline has died.

If people are asking questions and don't get one, then they'll still seek out an answer? I could see a cycle where Q&A sites get eaten by Google but as long as there is demand for fresh answers there will be services that fullfil them.

People will still seek out sites to post questions requiring context and domain issues that AI won't ever fully address. Plus the human instinct for social interaction and asking the same question already solved 5x before.

It doesn't have to be a Stackoverflow tier business but forums will remain a thing and there's plenty of reputation networks outside of Q&A.

I can get 90% of Yelp's resturant info via Google SERP (menus, hours, location, reviews) but I still use Yelp all the time and as a business they are doing fine.

This AI stuff will be a similar thin layer on top for quick answers, but niche content sites will still flourish IMO. Just with less traffic for a whole new classes.

> can get 90% of Yelp's resturant info via Google SERP (menus, hours, location, reviews) but I still use Yelp all the time and as a business they are doing fine.

Wasn’t there a law suit around this? Sure you can get more info but how many will go that far?

I think the trend we see is there will be a generalized engineering-category of LLMs and people maintaining them will just feed new frameworks/libraries/languages etc. documentations to training sets and those models will then happily answer questions about "new stuff"
Google already shows snippets from SO and has done so for years, without AI. Why would this be any different?
The difference is that the snippet points to the source. If you are like me, when I see the snippet and find it relevant I immediately go to the source, since I get access to the full context and others' comments. With a technology like chatGPT, that link is cut.
Are you sure that google wont provide the link? If these chatbots could provide references for their answers it allows them to link back to websites, solving lots of problems mentioned here
> Now Google scrapes the contents, and provides a service where you just ask your question and it gives the best, most accurate answer.

You can already nowadays "google" for an answers instead of asking them on StackOverflow. So what is the difference to the situation we already have?

I use github copilot to create simple functions by typing the function name and copilot does the rest. This works reasonably well for basic stuff. It becomes unhelpful and starts creating incorrect paths to library functions as soon as you add your work context.
This doesn't address the parent's excellent question: how do these models continually get trained and updated if they put their key sources of training data out of business?

Related: I'd be quite worried if I was a Q/A site like StackOverflow or Quora.

> This doesn't address the parent's excellent question

That's a large part of HN in a nutshell.

Imho, quora users don’t use it for finding answers but rather reading personal experience. I used to write on quora fairly often many year ago. But something happened, and in my subjective experience the platform became far less interesting and useful. Significant part of questions turned into barely hidden shills for business or products, right answers seldom ever get visible and overall quality of content went down drastically. Maybe I’m not a representative but from my point if view the readers and writers on quora 2023 won’t have any different experience when there’s some smart machines that gives right answers.
It's not excellent. It's a red herring. Google is already showing responses on the SERP without you having to go to SO, without AI, and the hypothetical scenario hasn't happened.
That is in fact the elephant in the room: you will most likely never be able to actually use it for complex stuff because of the token limits required for ai.

Imagine analyzing a massive code base, sure it can tell you how you where solving function ex by translating it to natural language, but it still does not understand any of it.

As far as i know, training it on your dataset will not improve this.

Increasing the token limit is a solvable problem
Sure we just need next level super computers for these large models and the patience of multiple days to wait for output
Not necessarily - you just need hierarchical abstraction memory. I reckon my "token" limit when analysing code is around 7.
Increasing the token limit without needing more resources to run the network is a solvable problem
but you do sure see the problem with a codebase right?
That's not my experience at all. I find Copilot the best at understanding and making sense of my work context, spanning many files. I find it less useful for creating generic functions.
Sadly, we feed programming knowledge into ChatGPT now.