Hacker News new | ask | show | jobs
by diggan 388 days ago
> The most important AI applications being deployed in enterprise today—agents, code generation, and complex reasoning—are bottlenecked by inference latency

Is this really true today? I don't work in enterprise, so don't know how things look like, but I'm sure lots of people here do, and it feels unlikely that inference latency is the top bottleneck, even above humans or waiting for human input? Maybe I'm just using LLMs very differently from how they're deployed in a enterprise, but I'm by far the biggest bottleneck in my setup currently.

3 comments

It is if you want good results. I’ve been giving Gemini pro prompts for 200+ seconds multiple times per day this week and for such tasks I really like to make it double/triple check and sometimes give the results to Claude for review, too (and vice versa).

Ideally I can just run the prompt 100x and have it pick the best solution later. That’s prohibitively expensive and a waste of time today.

> That’s prohibitively expensive

Assuming you experience is working within enterprise, you're then saying that cost is the biggest bottleneck currently?

Also surprising to me that enterprises would use out-of-the-box models like that, I was expecting at least fine-tuned models be used most of the time, for very specific tasks/contexts, but maybe that's way optimistic.

Cost is irrelevant when compared to the salaries of the people using them so they will do basic cost controls but nothing too onerous. And cost is never a reason to prevent solutions being built and deployed.

And most enterprises aren't even doing anything advanced with AI. Just doing POCs with chat bots (again) which will likely fail (again). Or trying to do enterprise search engines which are pointless because most content is isolated per team. Or a few OCR projects which is pretty boring and underwhelming.

Cost would be the biggest factor if price per token was the same but tokens were arriving 100x faster. (Not particularly unexpected I’d say.)
How do you create a prompt for Gemini to spend 200 seconds and review multiple times.

Is it as simple as stating in the prompt:

  Spend 200+ seconds and review multiple times <question/task>
You give it a task from hell which the devil himself outsources, like ‘figure out how these fifty repositories of yaml blobs, jinja templates and code generating code generating hcl generating yaml interact to define the infrastructure, then add something to it with correct iams, then make a matching blob of yaml pipelines to work with that infrastructure’
Only an insignificant minority of companies are running their own AI LLM models.

Everyone else is perfectly fine using whatever Azure, GCP etc provide. Enterprise companies don't need to be the fastest or have the best user experience. They need to be secure, trusted and reliable. And you get that by using cloud offerings by default and only going third party when there is a serious need.

If you think that cloud offerings are secure and trustworthy by default you truly must be living under a rock.
I have worked for a dozen companies all earnt more than $20b a year in revenue. That includes two banks and a hedge fund. All use the cloud.

You must be living under a rock if you think the cloud isn't secure enough for the enterprise.

I think the key here is twofold. First “the cloud” as commonly understood isn’t what anyone here is talking about. The subject is commercial inference providers.

The “cloud”, or Commercial offerings in storage, VMs, etc are reasonably “secure” in a very general context these days, that is generally true.

OTOH “cloud” AI (commercial inference) is going to use your data for training, incorporating your business processes and domain specific competencies into its innate capabilities, which could eventually impact your value proposition. Empirically, this will happen, eventually, regardless of the user agreement that you signed.

Leakage of proprietary competencies is what is meant by being insecure, in this context.

Second, “cloud isn't secure enough for the enterprise” should be replaced with “enterprise actually cares about security except as a cost/benefit analysis”.

Sending your data to someone else’s data center is a really good way for your data to potentially end up on someone else’s computer. In fact, it’s pretty much the point. If security was the priority, they wouldn’t do that.

Some quant-heads endorsing the latest fad doesn't prove anything. Also they don't care if chinese hackers are vacuuming data cause ballstreet doesn't care about sustainability. But I grant you that secure and trust are just words that don't mean anything anymore anyhow.
LOL, all fintech are using or entering the "cloud" very heavily. Cloud is here for long enough that claiming it's insecure shows only the immense ignorance.
Any business using commercial inference providers is potentially risking their value proposition. Everything you send to cloud inference will eventually be gleaned for training data.

Empirically we know that the data is the most valuable input to cloud services, and eventually it will be used, regardless of the user agreement. When the stored data becomes worth more than the company, it will be eaten and stripped by vulture capital. Law of the jungle, baby.

https://www.bleepingcomputer.com/news/security/oracle-custom...

Just one of the later examples of a very long list of cloud data breaches affecting millions of users. But hey who cares as long as it does not affect your own bottom line.

>Cloud is here for long enough that claiming it's insecure shows only the immense ignorance

Such a bizarre interpretation considering we still use SMS

I feel a lot companies do it to reduce liability. It may not be more secure, but it is not their problem.
AWS is in fact extremely secure.
True, the biggest bottleneck is formulating the right task list and ensuring the LLM is directed to find the relevant context it needs. I feel LLMs in their instruction following are often to eager to output rather than using tools (read files) in their reasoning step.