Hacker News new | ask | show | jobs
by rpdillon 25 days ago
I paid for Kagi for a bit, but got a weird vibe when I realized they were working pretty hard to paper over the fact that they pay a third party to scrape Google search results for them. The public-facing side of that coin is Kagi's position that Google should make their index available to competitors (see https://blog.kagi.com/waiting-dawn-search).

All that's to say: when I paid for Kagi, I thought I was investing in additional search infrastructure, and didn't realize Kagi had no aspirations to build their own general purpose index, and instead primarily aggregate results from other indexes, either adversarily (Google, Bing) or not (Yandex, Mojeek, Brave, Apple, etc.) I understand they do maintain their own small-web index, but I thought their aspirations were higher when I first jumped on that train.

13 comments

> didn't realize Kagi had no aspirations to build their own general purpose index

Kagi employee here. We're actively working on building our own indexes beyond the limited ones we have now, not just a general index but also purpose built indexes for things like programming, etc.

I did not intend to spread misinformation here, and would like to hear more about the general-purpose index Kagi is working on. I had based my comment on several Kagi pages, but mostly https://help.kagi.com/kagi/search-details/search-sources.htm..., which mentions Teclis as Kagi's own index, but https://teclis.com/ makes it pretty clear that it's a "small web"-focused tool:

> Teclis is an attempt to surface the less known web, the web of creativity and self expression, the more humane web.

> Teclis includes its own crawl as well as results from Kagi Small Web index and results with permission from Marginalia Search.

> Teclis works best with broad queries such as 'machine learning', 'vegan diet', 'religion' etc..

Is there another crawler doing the general-purpose stuff?

Not in production yet, so will have more to share in time.
Fair enough, thanks for replying!
How broad will they be? Do you aim to ever have large scale indexing of the web?
Hey do you guys have posts or sharing about it? It would be awesome to see what you are trying to accomplish, maybe it's time to post on HN ;)
Hurry. Google might give up the ghost on its search product and maintaining indices on anything not geared for LLMs.

I'm not sure antitrust will help you.

So you will stop buying Yandex data at some point?
I care more from breadth of the dataset than politics in my search engine, thank you very much.

For everybody else there’d Google I guess.

What are the challenges of doing that when so much of the internet has turned itself into SEO slop to fit Google's algorithms?

I imagine there is still a whole load of stuff out there on the internet that Google would never surface because it doesn't have enough adsense or whatever. Are you finding that?

There is a very easy 90% solution to that: massively downrank everything with ads with some exceptions added as needed.
That's what SlopStop is for :) developing methodologies that scale for detecting slop.

I mean it sounds like that already has a lot of overlap with our Small Web indexing efforts, so that part of our indexing efforts could be an extension of that. A lot if this is still in development though so I can't speak on specifics just yet.

How do you build a search index in the days of Anubis pages everywhere?
Anubis is easy, just use a whitelisted user agent or a headless browser if some sites disable that - you need one to index web app abominations anyway. Cloudflare and Google reCaptcha are bigger problems.
> We're actively working on building our own indexes

Lip service. You'll have some token index of Wikipedia or something so you can say your results are "a blend of our own index and other sources".

Wikipedia is prob in "other sources", as they actually say they have a direct license for it.

https://blog.kagi.com/waiting-dawn-search#:~:text=Wikipedia,...

Lol everyone has a direct license for Wikipedia.
It's funny you say that as we just switched the Wikipedia widget over to our own internal index. We don't intend to stop there either.
Nobody wants to pay for anything, so the services that figured out how to profit from people not paying will win.

There was this idea born in the late '90's/early 00's that everything digital should be free. The internet was dominated by teenagers with no job and no credit card, so it made sense.

But the result of that has been a whole generation with an allergy to compensation, and the inability for anyone to compete with "free" services, even if everyone hates that service.

Prior to eternal September, the internet was dominated by college students and staff. Everything was free by virtue of there being no secure payment mechanism. That spirit continued as it opened to the broader public.
> Everything was free by virtue of there being no secure payment mechanism.

That and the fact Universities provided free, fast and unmetered internet access. I doubt they would be running anything if they had to pay $1/hour like regular people had to in their dial-up days.

Most people will gladly throw large pile of money for everything that they feel convinced serve them well, provided they are not living by some ridiculously low wage that turn them into monthly paycheck serfs.

When large portion of moneyless teenagers grown up into indebted to death adults, there is no wonder they stick to lure at free services rather than unaffordable services.

That's a really curious perspective. There are a few different angles of attack here, but let's start with this: it WAS free because people were making free content. Before the Internet we were hosting free BBSes (look those up), we then hosted websites which we made ourselves when the Internet was commercialized, and we paid for services like games where it made sense. You'd buy software you'd own forever (like Photoshop), you'd buy music you owned (like CDs), and there weren't 30 subscriptions randomly renewing on your credit card.

Google won because it was a single text box. Yahoo lost because it full of ads and pretended to be a phone book. Linux won in the server world because it was free and superior, Windows lost because it's shite and expensive.

I could go on, but before I do that I'd have to be convinced I'm not replying to a 27 year-old who just graduated business school.

The thing is, even today much content on the internet is still made for free without anything being paid to the author - we just have third-parties who have inserted themselves to profit from it. That's mostly a failure of society to provide the needed infrastructure as a public good.
BBS were only amateur efforts. Linux would not go anywhere if it was not for IBM famously investing 1 billion in 2000.

You can get some development and innovations built purely on "free", but without actual professionals who can make a living by developing these systems, they never take off to reach the masses. The best example is social media and the Fediverse.

I adopted Linux in college in 1993 and, like many peers, brought it to my R&D job and observed this wave of expansion through the mid to late 90s. Linux was already "going somewhere" in 2000 for IBM to even notice it. Lots of federal grant money was directly or indirectly improving Linux due to FOSS folks like me.

It was getting so much commercial and academic engagement that we had the idioms (cliches?) of the "LAMP stack" for basic web servers and "Beowulf clusters" for high performance computing. Even SGI was already revealing a Linux plan, before 2000, when they still seemed like a fixture of the HPC industry rather than an also ran.

I apologize for the hyperbole, but you are arguing my point: if something took "lots of federal grant money" to become usable in universities and amount to anything more than a research project, then we are no longer about something "free", are we?
From that point of view nothing that requires human input is free. Which is true in a sense, people are using free to mean free to use, not free to improve.
That’s disingenous. Microsoft themselves considered Linux a serious threat as early as 1998, as described in their own confidential memoranda. (AKA the Halloween Documents released by ESR.)
You are right, I abused the hyperbole. The IBM investment was not the first thing that propelled Linux to the mainstream. I remember that in 1999 my university was already installing Red Hat with Gnome 1.0 on the workstations for the computer lab, which of course already implies that Red Hat already existed as a mature company trying to make money from support contracts.

But even if the data point is not good to support the argument, I don't think one could argue that Linux succeeded by "being free". If Linux was a "serious threat" in 1998, it was because there already companies looking into it and willing to make back it up financially to help its development.

The Software Creations BBS was not an amateur effort. (Just an example)

And prior to whatever IBM did in 2000, I already had a job deploying Linux and BSD systems in production at a corporate job.

> The Software Creations BBS was not an amateur effort

Yeah, and it was a BBS ran and backed by a software development company that used it as a channel to promote and sell their software. IOW, they were not offering the infrastructure "for free".

> I already had a job deploying Linux and BSD systems in production at a corporate job

Which means that there was someone paying your employer to support it. Again, not doing it "for free".

I think you and the others responding to me are just trying to disprove the specifics of my comment but entirely missing the meat of the argument: I am far from being "a 27 year-old who just graduated business school", but I agree with GP said: people will not pay for digital services unless they absolutely have to, so companies that try to make a living by offering a quality service in exchange for payment will invariably lose to someone that offers their product "for free" but exploits their customers elsewhere.

>>Google won because it was a single text box.

I remember a colleague around 1998, he said: "how will they ever make money? Its just an empty website?"

LOL

You are arguing GP's (WarmWash) point and not even realizing it.
Counter point: https://kagi.com/stats

About 70k people are paying at least $5 a month. I've been using the $25 a month plan for nearly 3 years now. I imagine Kagi is doing alright.

Google could loose 70k active users and it wouldn’t even register as a blip. They have like 50,000-60,000 TIMES as many active users.

I’m one of those 70k people and support Kagi, and I also strongly believe in companies succeeding and sustaining themselves on a small scale like this. I think our economy would be healthier if it was made of many, many small companies, not a few massive ones.

But we can’t argue Kagi is anything more than a super niche product, for now. :(

Typically the reason for there being so few smaller companies is paradoxically that small companies exist to be gobbled up by the big ones.
A company can be successful without dominating the world.
> Nobody wants to pay for anything

Congratulations, this might be the single most trivially-disprovable statement I've ever seen on this site

Try to respond with substance here on HN. Their point can’t be summarized by what you quoted, yet you responded to a quote.
I think the main value proposition of Kagi is that you're the customer not the product. As far as I know they are delivering on that.

The search infrastructure you're talking about is a natural part of that, but, like any infrastructure, it scales the organization it's supporting. Kagi is tiny so their "original infrastructure" contributions are tiny.

Put another way, you essentially were investing in infrastructure, but you were hoping for major infrastructure and what is happening is small infrastructure. Kagi would probably need to get much bigger to be able to do the infrastructure you're talking about. (And if they were much bigger, it should be natural -- at a certain scale it will make more sense to do your own than work with someone else's.)

They are building their own search index, and they should be allowed to scrape Google in the exact same way Google scraped everybody else.
If anything, this makes me want to pay them twice. Once for search and once for exploiting google.
"... didn't realize Kagi had no aspirations to ..."

"... I thought their aspirations were higher ..."

It sounds like the decision to send search queries (and money) to Kagi was based at least in part on reasons other than the quality of the search results

This is interesting psychology

What if all (cf. only sum of) the money sent to Kagi was actually invested in an alternative way to search the web without using an index created by a corporation or a non-profit with commercial subsidiaries

Defensive HN replies may focus on the quality of the search results from commercial indexes, e.g., "Google is the best. That's why everyone 'chooses' it."^1 But if the consumer is choosing Kagi based on other reasons, e.g., "investing in additional search infrastructure", then clearly there is more to these decisions

For example, some search engines claim to be planting trees or some such. Nothing to do with the quality of the results

1. Apple is being paid 20+ billion for choosing Google as the default in iOS but Apple's choice is not based on the money. Yes, that makes sense

I'm into sustainable, long-term vision. I feel like the plea for Google to make their index open has a lot of good points, but I also think it will go nowhere. So I could view Kagi strictly as a service, like a guy who mows my lawn, but part of the reason I'd pay for search is to build an alternative to all the crap out there. I stopped using Google search years ago and think DDG works Just Fine (I know there are detractors). I get DDG's play and it makes sense to me. I guess I need to reconsider what I think I'm paying for when I give Kagi money. I do think their search results are better, but generally not enough that an extra subscription is worth it.
Qwant and Ecosia try to build their own index: https://techcrunch.com/2025/08/06/qwant-and-ecosia-debut-sta...
I do not believe that Qwant can produce something good, they always were a company to extract money from the french taxpayer to wrap bing results.
I use and enjoy Ecosia, it works pretty well for most use cases. Unfortunately it has the same limitation as Duck and basically all of the other non-enormous-players in the search engine market: Location aware search is garbage.
One the things I hate about Google is being forced to have location-aware search. I love how Kagi actually lets me override the country.
I would enjoy a world where this was much more configurable.
I don't think they papered over this? They've been transparent about paying to scrape other indices while they work on their own.
I am the same, but at the same time I don't want to make assumptions about how viable it is to run a useful index for a small company. I assume they looked into it and deemed it non viable, but would like to know more.
Yes, their argument is essentially that Microsoft spent $100 billion over 20 years trying to compete and still essentially failed.
And they're not exactly wrong.
We really should be investing in a public index infrastructure instead of yet another private search company that puts on a nice face while they are the under dog.
I think your statement is correct in the absence of a clear statement of direction and/or product launch by Kagi. I tried Kagi for a year and came away disillusioned as you were.
They also had a browser called Orion and till date that gave me anxiety because YouTube videos won't play the first time you load them, you need to refresh the page (randomly) and similar other weird quirks. It's state hasn't changed much over the last year either, so I switched back to Brave now.
I don’t think that’s Orion specific, I have the exact same issue with Safari and Firefox
I'll pitch in that since youtube was bought by Google it's become pretty anticompetitive too. They've absolutely been caught degrading their product on all browsers except chrome. I've witnessed this numerous times on Firefox on my android. Videos refusing to play, subtitles appearing off the screen, refusing to fullscreen, and at least 3 more annoying things i can't remember anymore.
On numerous occasions, I had issues with YouTube and Firefox that were fixed by changing the user agent to make it look like chrome.

I stopped using YouTube 10+ years ago, so no clue if it still the case.

This is incompetence, not malice. YT devs would skip testing on Chrome if they could get away with it, but are forced to.
Incompetence at scale is malice. We're talking about a mega-corporation gobbling up some of the brightest minds of our generation, not some garage startup that is barely able to keep the lights on.
I'll agree with some of the brightest minds, but some Googlers I worked with were mediocre at best.

Also, I don't really understand what your catchy phrase means. Google's management is being malicious by not forcing their employees to not be lazy?

> they were working pretty hard to paper over the fact that they pay a third party to scrape Google

Not the least bit surprising to me. I had the misfortune of talking to Kagi's CEO several years ago. Every word out of his mouth was a lie.

Kagi's the one search company I trust less than Google.

I have found few CEOs capable of telling nothing but the truth. Based on that, I am nearly certain that lying is part of the job description.
Extraordinary claims require extraordinary evidence
I don't walk around recording every conversation I have.

I'd also argue that calling a tech CEO a liar is far from extraordinary. It'd be extraordinary if I accused him of honesty.

Only problem is that the original quote you commented on was not kagi lying, as they actually say the same here

https://blog.kagi.com/waiting-dawn-search

You said that you were not surprised that kagi was lying, only that they were not in this occasion. When you accuse somebody of lying it makes sense to provide at least some evidence of that.

At the very least, they are very clear about which indexes they use and how.