Hacker News new | ask | show | jobs
by melanieb421 1243 days ago
Yep, yet another post on AI and the future of content creation.

Looking at it from a different angle - what if AI could search your internal docs, and help you problem-solve? Aka help you exploit past knowledge to inform future decisions?

The hope: GPT will democratize creation, not fill the internet with shitty articles.

5 comments

Democratize creation and filling the internet with shitty articles are synonymous. Not that this is a problem or a particularly scary prospect, bookstores are already filled to the brink with garbage, it's just a game of numbers at this point.

Searching your internal docs is an interesting one, but it's still unclear what this can do that grep can't. The leap forward would be ability to reason autonomously, but we're as far from that as we've ever been.

Yep that’s the first great use case I’ve thought of - looking through all the documents in a group and answering a specific question. Also can imagine multi step pipelines driven by answers to previous questions.
Let me give a set of references and get a literature review.
Large Pharma and Life science companies generate huge amounts of documentation around change. They have a huge historical corpus of categorized structured documents that have been reviewed and approved. The quality should be good. Can definitely see a draft document using AI option in the future.
It’s going to be both.

But there definitely is an upside for those who separate writing from communicating.

That is quite literally what the article talks about.
The commenter you're replying to submitted the article, I think they were trying to get people to hold off on prejudging it as Yet Another AI Submission by stating that it's actually got a slightly different message than most
It seemed rather light on details to me, like do you upload a whole bunch of documents onto the site and then it builds a model based upon that that it uses to query?
No, the basic model already exists. You could throw together something like this yourself: encode your documents with a Transformers model into a series of vectors. Then you're merely a nearest neighbor search away from finding the most semantically relevant documents. Feed those documents to GPT-3 or some such LLM as contextual state, along with the query, et voila! you have Q&A on documentation.
Those are the essential steps, yes.

(Spoiler: I worked on this feature @ Slite) But in practice, the effectiveness of your pipeline will depend greatly upon exactly how you implement each piece.

Here are some of the things we've had to consider and develop, in order for the Q&A to be production worthy:

- at which level is it best to encode documents (sentence, paragraph, whole doc) and how can those tiered embeddings be weighed effectively?

- how to use the natural document hierarchy in the workspace?

- where to add NLI, so we can actually compare in-model "does this statement imply that statement" rather than just comparing vectors

- how to prioritize passages using additional fine-grained ranking against the question (cross-encoding, etc)?

- how to compute a "confidence" score so that we actually take the generative (GPT) part out of the equation and bail out early, if the documents do no contain anything which strongly implies an answer to your specific query

These are just a few of the pieces. But what we learned quickly is that solving the problem of building a great Q&A means first solving many problems that are deeply intertwined with search algorithms, natural language understanding, and other text problems in general.

Thanks for all this, most of which flew right over my head!!

I wonder, can you recommend a resource where one could get "quickly" up to speed on how this stuff conceptually works, something one could ingest in a weekend or so to get a decent handle on it (but not necessarily be capable of doing anything)?

Like, I think I have a decent understanding of the probabilistic "next word" aspect of LLM text generation, but I'm assuming there's an initial "vector query" that happens prior to this part of it?

Also, could you offer any advice on the notion of how AI/ML is able to extract various meanings from text? For example, let's say I wanted to be able to scrape this thread and then for each comment extract various ideas/topics mentioned in each comment (ideally with influence from a custom ontology), detect emotions, claims of fact, etc etc etc - is that sort of thing possible?

Slite's a knowledge base tool(similar to N*tion) so the idea is that all your internal documentation already lives there.

(and yes, there is an import feature for the aforementioned other tool).