Hacker News new | ask | show | jobs
by janalsncm 990 days ago
Here is the problem. Every investor is going to ask what your moat is. What differentiates your Whisper -> Llama -> Midjourney Pipeline.ai from the next one? And the answer is, if you’re just making API calls, nothing. Sorry. There’s nothing stopping Jian Yang from creating newpipeline.ai in a weekend. Here’s a couple of things which could set you apart off the top of my head.

Customers. Having customers is an advantage over the next guy who doesn’t, because now you can start customizing your product for unique needs rather than having a generic crud app.

Custom models. A custom model means some kid can’t just replicate your app easily.

Unique data. Data which is infeasible for another company to acquire or replicate.

Special people. People who will give your startup an edge in creating all of the above.

2 comments

>There’s nothing stopping Jian Yang from creating newpipeline.ai

As someone leading a product team which "only calls apis" in this context: this is a very premature take. Hear me out :-)

Robust LLM-powered software requires

* very thoughtful design of prompt templates

* understanding of top_p and temperature in the context of said templates and their parameter space

* very thoughtful design of representative test cases for a given combination of prompt template and api params. without these, you're not even able to reason about the value range of the function you're developing

* execution and evaluation of those tests

* maintenance of all above

...and that's just talking about ensuring the desired output types in one closed context. I won't go into the creativity required to solve more complex problems (content injections, for one). Let me just say this: I won't lose sleep because anyone could just replicate our applications. The opposite is the case: I invite anyone to try and catch up. Good luck with that.

What you wrote might apply for prototypes of zero-shot applications, but not for production-ready software, letalone production-ready software that solves problems which involve more than one isolated LLM-call.

I don't think either of you are entirely wrong but I think it can be broken down more generally.

There are very few web applications that have any real proprietary implementations that are impossible to replicate. Its a combination of factors that builds the potential moat to the business.

I think the point of the parent was that you can build expertise in all of those in a relatively short time frame (esp. when more developers will start building up experience), compared to acquiring a large customer base or a large dataset.
Any tips on building robust LLM software? And yeah I agree on how important test cases are. Having some sort of objective benchmark for judging how effective prompts are is really useful.
>Any tips on building robust LLM software?

Honestly, it's difficult.

One reason why we are very happy with our results is because the person in charge of experimenting with the LLM-flow, while highly intelligent, is not talented with languages (honestly, he's the opposite).

This forces him to come up with creative solutions where others might get more mileage just with flawless prompts. Thanks do this, we discovered some really interesting tricks which help us solve problems that the available literature does not discuss.

Based on the flows he designs, I revise his prompt templates for more precision and token efficiency.

>And yeah I agree on how important test cases are. Having some sort of objective benchmark for judging how effective prompts are is really useful.

Its not just about effectiveness, but about ensuring that no false-positive-inferences make it into production: for a given prompt template, deeply investigate which edge cases of input data would lead to false positives. then, adapt the template and api-params until the unit test has 100% success.

> Thanks do this, we discovered some really interesting tricks which help us solve problems that the available literature does not discuss.

Care to share? ;)

Dont.

If you must, there is a continuum of tasks that range from suitable to risky in production settings.

Most definitely choose things on the suitable side of that scale (Eg - text generation, or classification).

More complex tasks like Data to text or Summarization? I personally would always avoid it, except if there are certain very specific workflows for your team/task.

Further, Its not just test cases - its an entire evaluation and prompt versioning layer. Of the few that I am aware of, most are not even openly available (Including Azure Prompt flow)

Totally agree on that. But I think this is the natural way for AI business to grow.

A) You build a MVP/PoC on top of an existing model to quickly find the value prop and product/market fit

- That won't build a moat and won't be efficient to tackle the problem at hand (as laid out by the No Free Lunch Theorem), but will help you zero in on the opportunity

B) Once you have this sorted out you build your custom model with your unique data to build your moat and perfect your product

The right time to go for funding, in my opinion, would be closer to moment B, as building that custom model is where you'll need good capital (specially to hire the talent who will help you do that).

Karpathy had specifically said this was an issue in Jun/Jul: PoCs are easy to make, but production is hard.

This is something their CTO reaffirmed last week at another talk.

Would like to read that conversation, do you have a link? I couldn't find it with my amateur google-fu...