Hacker News Clone

ghm2199Jan 21, 2026, 6:34 PM

> Building a comparable one from scratch is like building a parallel national railroad..

Not too be pedantic here but I do have a noob question or two here:

1. One is building the index, which is a lot harder without a google offering its own API to boot. If other tech companies really wanted to break this monopoly, why can't they just do it — like they did with LLM training for base models with the infamous "pile" dataset — because the upshot of offering this index for public good would break not just google's own monopoly but also other monopolies like android, which will introduce a breath of fresh air into a myriad of UX(mobile devices, browsers, maps, security). So, why don't they just do this already?

2. The other question is about "control", which the DoJ has provided guidance for but not yet enforced. IANAL, but why can't a state's attorney general enforce this?

jeromechooJan 21, 2026, 9:16 PM

Building an index is easy. Building a fresh index is extremely hard.

Ranking an index is hard. It's not just BM25 or cosine similarity. How do you prioritize certain domains over others? How do you rank homepages that typically have no real content in them for navigational queries?

Changing the behavior of 90% of the non-Chinese internet is unraveling 25 years and billions of dollars spent on ensuring Google is the default and sometimes only option.

Historically, it takes a significant technological counter position or anti-trust breakup for a behemoth like Google to lose its footing. Unfortunately for us, Google is currently competing well in the only true technological threat to their existence to appear in decades.

AlienRobotJan 21, 2026, 11:48 PM

Good news! Google doesn't know how to rank pages either!

pasJan 22, 2026, 10:15 AM

yet ... it works "ok" most of the time.

not to mention that people mostly need wikipedia, the news, navigating the infuriating world of websites of big service providers (gov sites, or try to find anything on Microsoft's dark corner of the web), porn and brainrot

but it's awfully hard to make traction on a business that provides this.

hsuduebc2Jan 21, 2026, 6:57 PM

I don’t think it’s comparable to today’s AI race.

Google has a monopoly, an entrenched customer base, and stable revenue from a proven business model. Anyone trying to compete would have to pour massive money into infrastructure and then fight Google for users. In that game, Google already won.

The current AI landscape is different. Multiple players are competing in an emerging field with an uncertain business model. We’re still in the phase of building better products, where companies started from more similar footing and aren’t primarily battling for customers yet. In that context, investing heavily in the core technology can still make financial sense. A better comparison might be the early days of car makers, or the web browser wars before the market settled.

ghm2199Jan 21, 2026, 8:12 PM

> ... stable revenue from a proven business mode... In that game, Google already won.

But if they were to pour that money strategically to capture market share one of two things would happen if google was replaced/lost share:

1. it would be the start of the commoditization of search. i.e. search engine/index would become a commodity and more specialized and people could buy what they want and compete.

2. A new large tech company takes rein. In which case it would be as bad as this time.

Like what I don't get is that if other big tech companies actually broke apart monopoly on search, several google dominos in mobile devices, browser tech, location capabilities would fall. It would be a massive injection of new competition into the economy, lots of people would spend more dollars across the space(and ad driven buying too) money would not accrue in an offshore tax haven in ireland

To play the devils advocate, I think the only reason its not happening is because meta, apple, microsoft have very different moats/business models to profit off. They all have been stung one time or another is small or big ways for trying to build something that could compete but failed. MS with bing, Meta with facebook search, Foursquare — not big tech but still — with Maurauder's Map.

citizenpaulJan 22, 2026, 12:22 AM

>why can't they just do it

Money. Google controls 99% of the adverting market. That's why its called a monopoly. No one else can compete because they can never make enough money to make it worth the costs of doing it themselves.

paxysJan 21, 2026, 7:32 PM

Apple had a chance to break Google's search monopoly, but they chose to take billions from them instead.

Microsoft had a chance (well another chance, after they gave up IE's lead) to break up Google's browser monopoly, but they decided to use Chromium for free instead.

Ultimately all these decisions come down to what's more profitable, not what's in the best interests of the public. We have learned this lesson x1000000. Stop relying on corporations to uphold freedoms (software or otherwise), becuase that simply isn't going to happen.

charcircuitJan 21, 2026, 9:26 PM

>but they chose to take billions from them instead.

They chose to use Google with a revenue sharing agreement. Google is very well monetized. It would be very difficult for Apple to monetize their own search as good as Google can.

>they decided to use Chromium

Windows ships with Microsoft Edge as the browser which Microsoft has full control over.

w10-1Jan 22, 2026, 5:50 PM

Other comments mention difficulty, cost, conditions, etc.

Also, competitive agreements: of the big players like Apple, Microsoft, Facebook/Meta, Amazon, etc., only Google is in the ad business. But it has credible threats of digging into their businesses - GCP, Android, (not to mention software licenses and competitive access to e.g., Samsung), etc. So they agree to cede the ad world to Google, to keep Google out of their businesses.

The injunctions cannot be effective. Google ads are essentially a tax at a fine scale that rational people chose when it didn't change site behavior. But then Google ads changed the nature of the web itself, converting every snippet of information into an opportunity to monetize. Neither would change with a public search.org, and injunctions to license ad-free indexes won't change site behavior or publishers' self-interest in selling access to their content to Google alone.

Google knows the injunctions are unworkable and ultimately ineffective. The only question is what price they have to pay to the Trump judiciary to counter them.

xnxJan 21, 2026, 7:24 PM

> If other tech companies really wanted to break this monopoly, why can't they just do it

Companies would rather sue than try and compete by investing their own money.

WhyNotHugoJan 21, 2026, 6:25 PM

The statistics in this article sound like garbage to me.

Google used by 90% or the world?

~20% of the human population lives in countries where Google is blocked.

OTOH, Baidu is the #1 search engine in China, which has over 15% of the world’s population… but doesn’t reach 1%?

These stats are made measuring US-based traffic, rather than “worldwide” as they claim.

weisnobodyJan 21, 2026, 7:51 PM

Yes the stats don't make sense. It appears to be an issue with StatsCounter.

The Search Engine wikipedia article [1] has a section on Russia and East Asia market share, which confirms that the roll up used for world wide counts is off, unless the number of people using the Internet is drastically different in some of the countries.

Russia

  * Yandex: 70.7%
  * Google: 23.3%

China:

  * Baidu: 59.3%
  * Other domestic engines: "smaller shares"
  * Bing: 13.6%

South Korea:

  * Naver: 59.8%
  * Google: 35.4%

Japan: * Google: 76.2% * Yahoo! Japan: 15.8%

[1] https://en.wikipedia.org/wiki/Search_engine#Market_share

dylan604Jan 21, 2026, 10:12 PM

Maybe it's the same logic that says you can lower the prices of things >100%

lolcJan 21, 2026, 6:40 PM

I guess they'd argue that the people in China don't count, because people in China don't get to choose Google. But yeah, the stats they use from "StatCounter" are clearly not representative for what the world uses.

manquerJan 22, 2026, 3:27 AM

Market share is based on factual consumption numbers however subsidized or regulated by a government not free will.

Choice/Free will is an arbitrary line in the sand, one could argue how much choice we have about consuming google search when it is "85-90"% monopolistic business with well documented anti-competitive practices.

Chinese consumers perhaps have more choice than we do, Baidu is only about 60% market share. They do get to choose, it more that Google is not one of the options available to them, it is not like if not Baidu then it is a Phone Book.

elAhmoJan 21, 2026, 7:30 PM

You can argue that people outside of China don't get to choose something other than Google. Sure, there are recent pushes with default search engine choices and similar initiatives, but there is a reason why Google is paying hundreds of millions of dollars to be the default search engine.

tyreJan 22, 2026, 3:26 AM

It’s reasonable to see a distinction between the great firewall and the default browser search engine

ivanjermakovJan 21, 2026, 8:02 PM

To be fair, Kagi won't be used in China either.

faitswulffJan 22, 2026, 2:37 PM

I have used it from China, actually. Not big enough to be blocked.

pfistJan 21, 2026, 8:36 PM

I am rooting for Kagi here, and I applaud their transparency on such matters. It is quite enlightening for someone like me who understands technology but knows little about the inner workings of search.

It remains to be seen how or if the remedies will be enforced, and, of course, how Google will choose to comply with them. I am not optimistic, but at least there is some hope.

As an aside: The 1998 white paper by Brin and Page is remarkable to read knowing what Google has become.

m-schuetzJan 22, 2026, 7:10 AM

I'm rooting for Kagi solely because the block feature. It's amazing to be able to block undeservedly SEO'd garbage sites from future search results.

lostloginJan 22, 2026, 9:34 AM

Blocking, pinning and the general quality.

I’d pay a more if I could opt out of Yandex, and if it integrated properly with iOS (Apples fault).

fuzzy2Jan 22, 2026, 10:08 AM

fyi: DuckDuckGo has blocking now, too. I use it extensively to do away with all the clone sites of Stack Exchange, GitHub etc

All without using an account, saved locally in the browser.

m-schuetzJan 22, 2026, 10:12 AM

Oh nice, that's good to know. Yes, those clones sites are also instantly on my block list, as well as Userbenchmark, sites with AI-generated "info" pages (if I want AI answers, I'll just ask ChatGPT), sites that won't work without third-part cookies, low-quality game guide sites that were evidently made for users to visit, but not actually to help them, etc.

whsJan 21, 2026, 5:36 PM

>Google: Google does not offer a public search API. The only available path is an ad-syndication bundle with no changes to result presentation - the model Startpage uses. Ad syndication is a non-starter for Kagi’s ad-free subscription model.[^1]

>Because direct licensing isn’t available to us on compatible terms, we - like many others - use third-party API providers for SERP-style results (SERP meaning search engine results page). These providers serve major enterprises (according to their websites) including Nvidia, Adobe, Samsung, Stanford, DeepMind, Uber, and the United Nations.

The customer list matches what is listed on SerpAPI's page (interestingly, DeepMind is on Kagi's list while they're a Google company...). I suppose Kagi needs to pen this because if SerpAPI shuts down they may lose access to Google, but they may already have utilize multiple providers. In the past, Kagi employees have said that they have access to Google API, but it seems that it was not the case?

As a customer, the major implication of this is that even if Kagi's privacy policy says they try to not log your queries, it is sent to Google and still subject to Google's consumer privacy policy. Even if it is anonymized, your queries can still end up contributing to Google Trends.

xnxJan 21, 2026, 5:55 PM

> Because direct licensing isn’t available to us on compatible terms, we - like many others - use third-party API providers for SERP-style results

Crazy for a company to admit: "Google won't let us whitelabel their core product so we steal it and resell it."

ajdudeJan 21, 2026, 6:05 PM

Does anyone else use the phrase "I'm going to google XYZ" while referring to actually searching it up on Kagi, DDG, or another search engine?

sabslikesobsJan 21, 2026, 7:26 PM

I like that there's a list of primary sources at the bottom.

Kagi's AI assistant has been satisfying compared to Claude and ChatGPT, both of which insisted on having a personality no matter what my instructions said. Trying to do well-sourced research always pissed me off. With Kagi it gives me a summary of sources it's found and that's it!

1vuio0pswjnm7Jan 21, 2026, 11:17 PM

Google has appealed and moved for a partial stay re: the remedies discussed in this blog post

https://storage.courtlistener.com/recap/gov.uscourts.dcd.223...

Will Kagi file an amicus brief in support of the plaintiffs

Perhaps Google will fund amici in support of their position as they did in the Epic appeal

https://www.law.com/nationallawjournal/2025/01/10/fight-over...

ApolloFortyNineJan 21, 2026, 8:59 PM

With Google's search engine making almost $200 billion a year in revenue, I'm not sure Kagi could afford what market rates would be here. They also spent billions developing the technology to crawl, index, and rank billions of pages, factoring that in, again I don't think a good price can be put on it.

What even is market rate? Kagi themselves admits there's no market, the one competitor quit providing the service.

Obviously Google doesn't want to become an index provider.

dangoorJan 21, 2026, 9:01 PM

According to the article, the judge's memorandum said about index data access:

> Google must provide Web Search Index data (URLs, crawl metadata, spam scores) at marginal cost.

I'm guessing that the "marginal cost" of a search is small and it's not connected to the how much ad revenue that search is worth.

thisislife2Jan 22, 2026, 9:19 AM

> Layer 3: Paid, subscription-based search

Should actually be - Layer 3: Paid, ad-free, subscription-based search. (It's a subtle omission that indicates the direction Kagi search will eventually take).

NevermarkJan 23, 2026, 9:55 AM

> A government-backed, ad-free, intermediary-free, taxpayer-funded search service providing baseline, non-discriminatory access to information. Imagine search.org.

There is no way the government provides a search engine that doesn’t become a political football or weapon.

Maybe in a different age.

I completely agree that monopoly remedies, such as fair open paid licensing, are needed. I prefer that to breakups, when this kind of cooperative/competitive leveling works.

embedding-shapeJan 23, 2026, 10:52 AM

> There is no way the government provides a search engine that doesn’t become a political football or weapon.

Maybe it doesn't have to be based in the US? Maybe we could make this a world effort, run by a coalition instead, across border lines, like a library for the modern age.

cushJan 22, 2026, 5:48 PM

The idea of a search index being a public utility is an interesting idea but I’m not sure what it would do for trust. Governance is the biggest question mark, and with the current administration I’d say let Google run it and have less restrictive access to the index. My Google search usage has dropped probably 99% over the last two years.

My hope is that the powers that be figure out how to monetize these products with dollars instead of attention. Google’s ad-driven business model ruined the internet - we don’t need that in our AI products too.

jeffbeeJan 21, 2026, 6:46 PM

"We will simply access the index" has always struck me as wild hand-waving that would instantly crumble at first contact with technical reality. "At marginal cost" is doing a huge amount of work in this article.

adsharmaJan 22, 2026, 12:59 AM

Why didn't I see anything about common crawl?

Exa, Parallel and a whole bunch of companies doing information retrieval under the "agent memory" category belong to this discussion.

direwolf20Jan 21, 2026, 5:56 PM

I hope they cache search results to further reduce the number of calls to Google.

And Marginalia Search was not mentioned? Marginalia Search says they are licensing their index to Kagi. Perhaps it's counted under "Our own small-web index" which is highly misleading if true.

stacktraceyoJan 22, 2026, 2:17 AM

Is there a crowd indexed style search index? Like instead of relying on the crawling completely you rely on a maybe like an extension in your browser that indexes as people are using their browser. Or maybe indexing your site to this index instead of waiting to be crawled.

gkbrkJan 22, 2026, 9:58 AM

I think Brave Search does something similar with their Web Discovery Project, but I don't think it indexes full web pages from users.

https://support.brave.app/hc/en-us/articles/4409406835469-Wh...

senkoJan 21, 2026, 9:53 PM

A full up-to-date index of the searchable web should be a public commons good.

This would not only allow better competition in search, but fix the "AI scrapers" problem: No need to scrape if the data has already been scraped.

Crawling is technically a solved problem, as witnessed by everyone and their dog seemingly crawling everything. If pooled together, it would be cheaper and less resource intensive.

The secret sauce is in what happens afterwards, anyway.

Here's the idea in more detail: https://senkorasic.com/articles/ai-scraper-tragedy-commons

I'm under no illusion something like that will happen .. but it could.

moebrowneJan 21, 2026, 10:26 PM

Isn't this what CommonCrawl are doing?

https://commoncrawl.org/

senkoJan 22, 2026, 10:02 AM

Yes. But they don't crawl everything (probably due to lack of funding), and, as the article and other commenters here note, people are incentivised to allow Google and only Google to crawl. In practice, the CommonCrawl dataset is too small for a realistic search engine competitor.

I'd love to see Google, Bing and others being incentivized (wink, wink) to contribute (technically, financially, etc) to CommonCrawl or Internet Archive since they already do this.

azornathogronJan 21, 2026, 10:36 PM

Is crawling really solved?

Any naive crawler is going to run into the problem that servers can give different responses to different clients which means you can show the crawler something different to what you show real users. That turns crawling into an antagonistic problem where the crawler developers need to continually be on the lookout for new ways of servers doing malicious things that poison/mislead the index.

Otherwise you'll return junk spam results from spammers that lied to the crawler.

I've never done it so maybe it's easier than I imagine but I wouldn't be quick to assume that crawling is solved.

senkoJan 22, 2026, 10:12 AM

I don't mean to say it's trivial. I'm sure there are many hard problems such as the one you mention - though that particular one is more "cleaning the index" part which might work on top of the open common corpus.

But my impression is that it's more a question of scale and engineering time than having to invent something new.

(disclaimer: I also never worked on a internet-scale search system, maybe I'm very off the bat here as well).

azornathogronJan 22, 2026, 12:26 PM

Oh, ok. I misunderstood - I think we agree.

stephen_cagleJan 21, 2026, 7:02 PM

One interesting point was the original PageRank algorithm greatly benefited from the fact that we kinda only had "text matching" search before Google (my memory was AltaVista at the time).

Because text matching was so difficult to search with, whenever you went to a site, it would often have a "web of trust" at the bottom where an actual human being had curated a list of other sites that you might like if you liked this site.

So you would often search with keywords (often literals), then find the first site, then recursively explore the web of trust links to find the best site.

My suspicion has always been that Google (PageRank) benefited greatly from the human curated "web of trust" at the bottom of pages. But once Google came out, search was much better, and so human beings stopped creating "web of trust" type things on their site.

I am making the point that Google effectively benefited from the large amount of human labor put into connecting sites via WOT, while simultaneously (inadvertently) destroying the benefit of curating a WOT. This means that by succeeding at what they did, they made it much more difficult for a Google#2 to come around and run the exact same game plan with even the exact same algorithm.

tldr; Google harvested the links that were originally curated by human labor, the incentive to create those links are gone now, so the only remaining "links" between things are now in the Google Index.

Addendum: I asked claude to help me think of a metaphor, and I really liked this one as it is so similar.

``` "The railroad and the wagon trails"

Before railroads, collective human use created and maintained wagon trails through difficult terrain. The railroad company could survey these trails to find optimal routes. Once the railroad exists, the wagon trails fall into disuse and the pathfinding knowledge atrophies. A second railroad can't follow trails that are now overgrown. ```

keedaJan 21, 2026, 9:35 PM

> I am making the point that Google effectively benefited from the large amount of human labor...

This is exactly right, but the thing most people miss is that Google has been using human intelligence at massive scale even to this day to improve their search results.

Basically, as people search and navigate the results, Google harvests their clicks, hovers, dwell-time and other browsing behavior to extract critical signals that help it "learn" which pages the users actually found useful for the given query. (Overly simplified: click on a link but click back within a minute to go to the next link -> downrank, but spend more time on that link -> uprank.)

This helps it rank results better and improve search overall, which keeps people coming back and excluding competitors. It's like the web of trust again, except it's clicks of trust, and it's only visible to Google and is a never-ending self-reinforcing flywheel!

And if you look at the infrastructure Google has built to harvest this data, it is so much bigger than the massive index! They harvest data through Chrome, ad tracking, Android, Google Analytics, cookies (for which they built Gmail!), YouTube, Maps and so much more.

So to compete with Google Search, you don't need just a massive index, you also need the extensive web infra footprint to harvest user interactions at massive scale, which means the most popular and widely deployed browser, mobile OS, ad tracking, analytics script, email provider, maps, etc, etc.

This also explains why Google spent so many billions in "traffic acquisition costs" (i.e. payments for being the Search default) every year, because that was a direct driver to both, 1) ad revenue, and 2) maintaining its search quality.

This wasn't really a secret, but it (rightfully) turned out to be a major point in the recent Antitrust trial, which is why the proposed remedies (a TFA mentions) include the sharing of search index and "interaction data."

luk4Jan 22, 2026, 9:37 PM

I think it's worth mentioning the Open Web Search initiative [1] and the Open Web Index [2] specifically.

> 14 renowned European research and computing centers have joined forces to develop an open European infrastructure for web search. The initiative is contributing to Europe’s digital sovereignty as well as promoting an open human-centered search engine market. [1]

> The Open Web Index (OWI) is a European open source web index pilot that is currently in Beta testing phase. The idea: Collaboratively and transparently secure safe, sovereign and open access to the internet for European organisations and civil society. The index stores well structured open web data, making it available for search applications and LLMs. [3]

[1] https://openwebsearch.eu/

[2] https://openwebindex.eu/

[3] https://openwebsearch.eu/open-webindex/

user3939382Jan 21, 2026, 6:51 PM

For anyone not acquainted Kagi is excellent and the people who work there strike me as nice and competent. I’m a harsh critic usually. Highly recommended.

flkiwiJan 21, 2026, 7:23 PM

I've gotten more value out of it than just about any ongoing subscription I have. It's clean, fast, deeply customizable (i.e., excluding "answers" websites or any other domain you never want to see again), and, for what it is, inexpensive. Honestly if Google (or Bing) worked like Kagi does, I'd trade some of the privacy for the utility.

zvqcMMV6ZcrJan 22, 2026, 9:32 AM

Recently I encounter "no results" screen when using Google that I am starting to suspect the problem will solve itself. And by solve I mean open parts of internet will die off completely, and only owners of silos like Facebook will be able to provide data for search indexes.

keedaJan 21, 2026, 10:13 PM

Google's advantage is not just in its index and algorithms, it is that it has built a self-reinforcing flywheel that data mines human attention at massive scale to improve their search results.

This comment (https://news.ycombinator.com/item?id=46709957) points out that Google got its start via PageRank, which essentially ranked sites based on links created by humans. As such, its primary heuristic was what humans thought was good content. Turns out, this is still how they operate.

Basically, as people search and navigate the results, Google harvests their clicks, hovers, dwell-time and other browsing behavior -- i.e. tracking what they pay attention to -- to extract critical signals to "learn" which pages the users actually found useful for the given query. This helps it rank results better and improve search overall, which keeps people coming back, which in turns gives them more queries and data, which improves their results... a never-ending flywheel.

And competitors have no hope of matching this, because if you look at the infrastructure Google has built to harvest this data, it is so much bigger than the massive index! They harvest data through Chrome, ad tracking, Android, Google Analytics, cookies (for which they built Gmail!), YouTube, Maps, and so much more. So to compete with Google Search, you don't need just a massive index, you also need the extensive web infra footprint to harvest user interactions at massive scale, meaning the most popular and widely deployed browser, mobile OS, ad footprint, analytics, email provider, maps...

This also explains why Google spends so many billions in "traffic acquisition costs" (i.e. payments for being the Search default) every year, because that is a direct driver to both, 1) ad revenue, and 2) maintaining its search quality.

This wasn't really a secret, but it turned out to be a major point in the recent Antitrust trial, which is why the proposed remedies (as TFA mentions) include the sharing of search index and "interaction data."

We all knew "if you're not paying for it, you're the product" but the fascinating thing with Google is:

- They charge advertisers to monetize our attention;

- They harvest our attention to better rank results;

- They provide better results, which keeps us coming back, and giving them even more of our attention!

Attention is all you need, indeed.

NextgridJan 22, 2026, 12:42 AM

> "learn" which pages the users actually found useful for the given query

But due to their business model I'm not sure they are ranking "usefulness" as much as you think.

Useful results ultimately don't benefit Google because Google makes no money on them. Google makes money on ads - either ads on the search results page, ads on the destination pages or (indirectly) from steering users to pages which have Google Analytics.

It's likely the actual algorithm balances usefulness to the user with usefulness to Google. You don't want to serve up exclusively spam/slop as users might bounce, but you also don't want to serve up the best result because the user will prefer it over the ad on the SRP page. So it has to be a mix of both - you'll eventually get a good result, after many attempts (during which you've been exposed to ads).

Google does enjoy the myth that they are unable to combat spam/slop while in reality they do profit off it.

keedaJan 22, 2026, 6:07 AM

That is also the thesis of this piece: https://www.wheresyoured.at/the-men-who-killed-google/

It is plausible, but I'd guess Google would not risk that. I'm sure Google has pulled other shenanigans to get more clicks, like stuffing more and more ads, and making ads look like results (something even I personally have fallen for once), but I think they're too smart to mess with their sacred cash cow.

direwolf20Jan 22, 2026, 11:58 AM

> cookies (for which they built Gmail!)

Can you explain this one?

keedaJan 22, 2026, 7:23 PM

There were blogs that explained this in detail (Facebook does something similar), but I can't find them, so here's what Google's AI overview says when I search for "How gmail cookies help google track users across the web":

Gmail cookies, such as SID and HSID, act as unique identifiers for a signed-in Google account, allowing Google to track user activity across its services and millions of third-party websites. These cookies, often lasting 2 years, link browsing behavior—like searches and site visits—to a specific user profile to personalize ads, measure campaign performance, and analyze site usage, even on non-Google sites that use tools like Google Analytics or AdSense.

jxmesthJan 22, 2026, 5:44 AM

Honestly, would be very cool if someone could make a search engine of only human-produced content. I know it's going to be hard and compute intensive but I don't think it's impossible. In fact, Google could do it. A paid service for only human made content. Obviously there would be a margin of error as we can never be 100% sure if something really is AI written.

direwolf20Jan 22, 2026, 12:00 PM

Marginalia Search is a small-web search engine with a curated list of sites and its own index. Sometimes I find it useful to find answers to technical problems because it only searches the kind of site where people write about the technical problems they solved.

HellsMaddyJan 22, 2026, 6:57 AM

Kagi is doing something similar to this, though it's not trying to remove absolutely all AI, just "slop": https://help.kagi.com/kagi/features/slopstop.html

yomismoaquiJan 21, 2026, 6:27 PM

One thing I have discovered after using AI chats that include a websearch tool is that I don't want to delve on diferent blogs, Medium posts, Stack overflow threads with passive-aggresive mod comments, dismissing cookie banners... Sorry I just want the info I'm looking for, I don't care for your personal expression or need to monetize your content.

There are other times (usually not work related) when I want to explore the web and discovering some nice little blog or special corner on the net. This is what my RSS feed reader is for.

kqrJan 21, 2026, 6:49 PM

With Kagi you can opt in to an LLM summary of the search result by appending a question mark to the query. It's a neat mechanism when it works!

zhfanlqeoJan 22, 2026, 9:40 AM

I used kagi for a while but got lazy with updating the subscription when moving and needing to change credit cards so I went back to DDG/Google and having to go back to having to skip the first result or first few results shows you just how obnoxious this practice is. When I have a few moments I'll resubscribe to kagi...

RonsenshiJan 22, 2026, 9:56 AM

I've been trying to use DDG for the past 2-3 years, but way too often I have to add !g at the end to go to google where I can get better results. So I've been considering giving Kagi a try. Can you tell if in your experience Kagi has better results than DDG?

mrweaselJan 22, 2026, 3:29 PM

I've used DDG since, 2012, and switched to Ecosia about three years ago. In my experience if DDG or Ecosia can't find something, then neither can Google. In some cases I still check with !g, but Google is now worse than both DDG and Ecosia (which is funny because Ecosia partially uses Google).

It may be related to which type of content you search for, field of work or even how you search, but you're certainly not the only one I've heard complain that they need !g way to often for alternatives to be viable.

Google is very good is you need to buy something though. Their ad system yields rather good results, most of the time. Lately I've noticed that they are more and more serving ads for questionable drop shippers and foreign webshops, rather than brands I trust, so they might also be declining in that department.

hoooooooooomeJan 22, 2026, 10:48 AM

I switched to DDG from Google some years ago, and then to Kagi around the start of 2025.

I find the Kagi results to everything I need, and often lead me to more niche personal blog posts specific to what I am looking for. Surfacing small blogs posts is not something I remember getting much of in DDG and I'm really enjoying that.

weisnobodyJan 21, 2026, 7:38 PM

I think the crawled data should have to be shared, but I'm not convinced that Google should have to share their index.

It may be impracticable to share the crawled data, but from the stand point of content providers, having a single entity collecting the information (rather than a bunch of people doing) would seem to be better for everyone. Likely need to have some form of robots.txt which would allow the content provider to indicate how their content could be used (i.e research, web search, AI, etc.).

The people accessing the crawled data would end up paying (reasonable) fees to access the level of data they want, and some portion of that fee would go to the content provider (30% to the crawler and 70% to the crawler? :P maybe).

Maybe even go so far as to allow the Paywalled content providers to set a price on accessing their data for the different purposes. Should they be allowed to pick and choose who within those types should be allowed (or have it be based on violations of the terms of access)

It seems in part the content providers have the following complaints:

  * Too many crawlers (see note below re crawlers)
  * Crawlers not being friendly
  * Improper use of the crawled data
  * Not getting compensated for their content

Why not the index? The index, to me, is where a bunch of the "magic" happens and where individual companies could differentiate themselves from everyone else.

Why can't Microsoft retain Bing traffic when it's the default on stock Windows installs?

  * Do they not have enough crawled data?  
  * Their index isn't very good?
  * Their searching their index isn't good
  * The way they present the data is bad?
  * Google is too entrenched?
  * Combination of the above?

There are several entities intending to crawl all / large portions of the Internet: Baidu, Bing, Brave, Google, DuckDuckGo, Gigablast, Mojeek, Sogou and Yandex [1]. That does not include any of the smaller entities, research projects, etc.

[1] https://en.wikipedia.org/wiki/Search_engine#2000s–present:_P... (2019)

hsuduebc2Jan 21, 2026, 6:20 PM

It is even worse that the Google search become shit in last years. So they gate keep only relevant information for themselves and not using them with intent to improve search quality. As always if you have no competition your innovation goes only towards cost reduction. Not product improvement.

warkdarriorJan 21, 2026, 7:31 PM

If Google Search is shit, why does Kagi want access to it?

JaggedJaxJan 21, 2026, 7:38 PM

They want access to the index. They will perform their own sorting to determine the best results to show from that index.

b3kartJan 21, 2026, 8:05 PM

…without having advertiser interests to cater to.

jiehongJan 21, 2026, 10:24 PM

I think one side problem is that part of the web is not even searchable with a search engine.

Here are some examples:

- Discord

- WeChat (is it the web?)

- Rednote

- TikTok (partially)

- X (partially)

- JSTOR (it finds daily, but you find more stuff on the website directly)

- any stuff with a login, obviously.

reddaloJan 21, 2026, 11:20 PM

> Discord

Damn, I can't stand open-source projects that host their "forums" on Discord. It's a nigthmare to use, it's heavy, slow, and it's completely unsearchable from the web.

I wonder what went wrong with our society.

cyberrockJan 22, 2026, 1:15 AM

First of all not everyone wants spectators and gawkers on all of their conversations. As for open solutions, IRC didn't provide chat history for the common folk (no, most users are not able to host their own Pi Zero bouncer, especially back in 2017), and Matrix development was too slow (Elements implemented message pinning in 2022), so the rest was history. There was just no alternative to Slack or Discord.

SpivakJan 22, 2026, 6:10 PM

> I wonder what went wrong with our society.

Predators.

https://maggieappleton.com/cozy-web

sharpshadowJan 21, 2026, 8:08 PM

If Google provides a Search Index it will be the censored version therefore still politically acceptable. The “Layer 1” idea will not happen.

direwolf20Jan 21, 2026, 8:38 PM

That's why Kagi combines results from multiple sources, just as it does with Yandex.

WhereIsTheTruthJan 21, 2026, 6:55 PM

Kagi's "waiting for dawn" is just waiting for Google to legitimize their reseller business

Meanwhile, users pay a premium to pretend they're not using Google

Fascinating delusion

b3kartJan 21, 2026, 7:03 PM

> Meanwhile, users pay a premium to pretend they're not using Google

My searches can’t be tied to me by Google for their ad targeting: this is worth paying a premium for, and I am glad Kagi are providing this service.

You seem to have a very limited understanding of the value Kagi provides.

yuugha1838Jan 21, 2026, 10:04 PM

I have a limited understanding of the value Christianity provides. That neither means that Christianity provides no value, nor does it mean that God exists.

idiotsecantJan 22, 2026, 4:32 PM

Uh oh you're eating your tail again

NextgridJan 22, 2026, 12:48 AM

Users pay a premium to have Google's results cleaned out of spam/trash. It's effectively paying someone to cut out the newspaper ads for you and then give you the resulting ad-free paper.

miloignisJan 21, 2026, 11:08 PM

With Kagi being $55-$110 a year and Google making >$200 a year per US user, it's arguably a discount.

BlackFlyJan 22, 2026, 8:18 AM

In addition to what others are telling you, Kagi also allows you to

- filter out results from specific websites that you can choose, - show more results from specific websites that you can choose, - show fewer results from specific websites that you can choose,

and so forth. When you find your results becoming contaminated by some new slop farm, you can just eliminate them from your results. Google could also do that, but their business model seems to rely more on showing slop results with their ads in those third party pages.

Just like mobile phone providers, third parties can provide lots of value add by reselling infrastructure. Business models can be different, feature sets can differ. This is not a delusion but the reality of reselling.

idiotsecantJan 22, 2026, 4:31 PM

users pay a premium for superior UX and no tracking, actually. Kagi has wildly better filtering and customization.

1970-01-01Jan 22, 2026, 3:19 PM

>The problem: A search monopoly

...

>We tried to do it the right way

This sign-up to retrieve better information idea will never take-off the way they think it will. A white label search will get you nowhere. They are silently failing because they're just too stubborn to do it the hard way. Kagi needs to pivot and succeed on useful and interesting edge cases first. Build us out a subject-relevant search, such as displaying vetted content from forums when searching a product/service, and then tying it into Facebook Marketplace for local items or services and Amazon for new. That is called building a product for yourself that others will use. Now you have your very own cashflow for clicks; use that cashflow to buy more corporate access, thereby proving you can succeed without any other search business propping you up and into relevancy. You don't need to start with the giants either. Start with something that works on local hunting, fishing, shooting, and knitting forums. When grandmothers need high quality green yarn today, make their muscle memory point to Kagi local, not Google.

maelitoJan 22, 2026, 10:20 AM

We need a european Kagi.

ameliusJan 22, 2026, 3:29 PM

https://en.wikipedia.org/wiki/Quaero

grayhatterJan 22, 2026, 5:27 PM

Kagi uses Brave search index? huh, TIL... that's very disappointing. And it's the kinda thing that would prevent me from ever paying for Kagi. Brave's crawler, is agressive, dumb (it doesn't appear to back off if it hits a number of 503s), and critically, it ignores robots.txt. They even admit they choose to ignore it. To top that off their crawler doesn't identify itself, instead masquerading as a real browser. I've had to ban the entire Hetzner ASN from my site to get them to stop.

On one hand, I really want Kagi to succeed. They very often, do seem to care about the parts of the world and internet that I care about. But on the other... to me, willingly associating, and financing a company that willingly brags about ignoring consent, is a non-starter for me.

nige123Jan 21, 2026, 6:49 PM

The user data (anonymised) and analytics also needs to be shared.

echelonJan 22, 2026, 1:06 AM

If there are any Kagi folks here, I've come up with a new angle to attack Google's anti-competitive position that could be incredibly effective:

https://news.ycombinator.com/item?id=46681985

https://news.ycombinator.com/item?id=44546519

I'm going to send this idea to my legislators, the EU, Sam Altman, Tim Sweeny, and Elon Musk, et al., I just haven't had time to put this together yet.

Google is a monopolist scourge and needs to be knocked down a peg or two.

This should also apply to the iPhone and Android app stores.

ares623Jan 21, 2026, 6:51 PM

Kagi should start building an index of sites that are trying to escape the current slop internet. It’s know they have the Small Web thing. But I’d like to see an index of a “neo internet” that blocks Google et al.

z64Jan 21, 2026, 8:10 PM

I've been tossing around the very early idea of seeing what we can do to elevate alcoves of the web such as Gemini[1] through Kagi. I am slightly conscious of that some people might not like us operating in that space, it's been on my TODO to poll people about it and take a quick pulse. I love the tech and think we could give it meaningful exposure.

Is this along the lines of what you have in mind - any other active efforts you're aware of that you think we should look into?

[1] https://en.wikipedia.org/wiki/Gemini_(protocol)

ares623Jan 22, 2026, 9:04 AM

That's cool that you're looking into it. Are you saying that in any "official" manner as a Kagi employee? Or something more personal?

I've been meaning to write an RFC or open-letter of sorts to collect ideas for what a neo or parallel web could look like, but I'm just a nobody so shrug. It'll probably be something very fragmented and very very niche but nowadays I think that can be seen as a good thing.

z64Jan 22, 2026, 6:20 PM

I'm working on making an internal proposal to integrate with Gemini on several fronts, yes. Still hatching the idea, and much else to do - maybe this summer it will come to fruition if it pans out :)

ares623Jan 22, 2026, 9:41 PM

Well, from one dreamer to another, thank you for taking on that effort and I wish you luck. I will keep my eye out for it.

idiotsecantJan 22, 2026, 4:37 PM

How would that work? Like Kagi caches the gemini content and delivers it as web content? I suppose that might annoy the kind of person who runs a Gemini server.

z64Jan 22, 2026, 6:17 PM

I don't think we would go to that end, not as a first step anyways; I'm thinking of some simpler ones. Sparing details as I'm just brainstorming for now and getting to know their communities.

But, there are already plenty of services that proxy Gemini pages so that you can read them in conventional browsers, as well as search engines for Gemini content.

freediverJan 21, 2026, 8:23 PM

Relevant https://github.com/kagisearch/smallweb/pull/425

direwolf20Jan 22, 2026, 12:01 PM

Add your site to Marginalia Search. They accept submissions by email or GitHub PR, and Kagi pulls from Marginalia Search.

OGEnthusiastJan 21, 2026, 5:57 PM

Sounds like we need a nationalized search engine company then?

browningstreetJan 21, 2026, 6:10 PM

I wouldn't trust a nationalized search engine company.

That said, there are projects like Common Crawl and in Europe, Ecosia + Qwant.

I personally would like to see a search enginge PaaS and a music streaming library PaaS that would let others hook up and pay direct usage fees.

NitpickLawyerJan 21, 2026, 6:36 PM

> and in Europe, Ecosia

I tried. It's just not good enough. Quick example: yesterday I set up a workstation with Ubuntu, wanting to try out wayland. One of the things I wanted was to run an app (w/ gui) from another (unprivileged) user under my own user. Ecosia gave me bad old stuff. Tried for a few minutes, nothing useful. Switched to google, one of the first results was about waypipe. Searched waypipe on ecosia. 1 and a half pages of old content. Glaringly, not one of those results was the ubuntu.manpages entry on waypipe. shrug

shadowgovtJan 21, 2026, 6:14 PM

An interoperable search index access standard might work. We've done something similar for peering and the backbone of the IP-layer interconnects themselves.

direwolf20Jan 21, 2026, 8:41 PM

You have to make it economically preferable, and there's No known solution to this. Large networks are still using their positions to bully smaller ones off the IP-layer internet backbone.

g947oJan 23, 2026, 11:16 AM

So the entire search result comes from Truth Social and Grokipedia. No thanks

bilekasJan 22, 2026, 10:04 AM

I've tried Kagi and while it is better than google these days, to be fair that's not hard with the enshitification slop that's out there.

But Kagi funds Yandex which fund the RU government, and I think it should be known to anyone looking to use it.

https://ounapuu.ee/posts/2025/07/17/kagi/

https://kagifeedback.org/d/5445-reconsider-yandex-integratio...

ssoidJan 21, 2026, 6:55 PM

[dead]

the_arunJan 21, 2026, 6:41 PM

If google is serving 90% traffic & others are unable to enter - Doesn't that mean google is doing something right for the customer and others are unable to outcompete it? Isn't this how life works?

Waiting for dawn in search: Search index, Google rulings and impact on Kagi

Comments