> With this latest update, for the first time in sixteen years everything on Reddit is now searchable - users, posts, communities, and now comments - making Reddit one of the first platforms with this capability.
I mean this is a straight up blatant lie, every single forum/social site out there has proper search and has for years, I know all of mine do. Most of them you even have image search.
So honest question, how does a multi-million dollar site just not have such basic functionality? Does it not matter?
Better question, and this is something I see time and again, how do sites missing such basic functionality even get any funding to start with? It's just weird. Reddit didn't even have a working user block till recently either.
It's also a lie because "everything" on reddit is still not in fact searchable; buried in the comments on the announcement post, they admit that pre-2020 comments are still not indexed:
Well time to continue using "site:www.reddit.com" on Google, then. I find the old search from old.reddit.com to be way better than the newer search (but I haven't tried this newest search linked in the thread).
Interesting - I did not even know that old.reddit used a different search. I rarely even use the search (occasionally I will use it to search a specific subreddit and assume it is only searching post titles). I pretty much exclusively use old.reddit + customized reddit enhancement suite and on mobile use the app Reddit is Fun. Anytime I have to go back to vanilla new reddit it feels like a totally different site. I'm not even really changing the default subscriptions all that much either at all - often I browse without being signed in. The part I find the most odd is that even without signing in the "hot" algo seems to organize top posts occasionally differently between the two (old and new).
LOL This is ridiculous. The majority of Reddit is Pre-2020 !
Or as I like to call it - The Before Covid times. BC. Surely there's no other acronym that uses those two letters. 2018 BC was a good year for me personally
Reddit's engagement model relies on new posts with which to comment on. Being able to easily find previous content means less reposting, and less opportunities to give users a place to engage. A standard, working search function would only hurt reddit's growth metrics.
It's so obvious to the userbase that you have certain users complaining about reposts, or even subreddits that intentionally flag for original content (OC).
Any data to suggest that the past 2 years have generated the majority of reddit's data? I've been a user for a decade and it doesn't feel like this is the case to me.
I don't understand why search is something that Reddit needs to have. Google does the job perfectly fine. Search isn't a core feature for Reddit and it's certainly not "basic" functionality.
Google does an okay job and certainly is the best option for searching Reddit that I'm aware of, but there are many cases where Google breaks (largely due to Reddit's site). It's common to get results that don't match anything in the content itself, but matches a title in the "more from this subreddit" section at the time Google scraped it. Dates are often wrong so adding date filters often doesn't work as expected.
Even old Reddit has the "more from this subreddit" type links now so I imagine it breaks even if you scope your query to the old domain. I noticed most of this appeared after they launched the new Reddit UI, I don't remember having this problem in the past. Note that these are problems they could likely fix so I agree, but any platform can likely build a more context aware search as well.
Disclaimer: I work at Google but don't work on search, opinions are my own, blah blah
People have complained about the poor quality of Reddit search for as long as it has existed.
Aside from that, certainly, a site with its own indexing and awareness of its own internal structure can provide a better search experience than a third-party.
> I don't understand why search is something that Reddit needs to have. Google does the job perfectly fine.
No, it absolutely does not. I have used Google to search Reddit for a string and got no results. That string was present in a Reddit comment open in another tab. It was a few years old so there is very little chance it hadn't been indexed yet.
Good search is really a must for any forum though - I remember back in the day when more forums were independent (pre Reddit) and less moderated a common thing to tell a new user that made a very common (repeated) post was to use the search function. Some people were dicks about it but often people just were gently steering the person in the right direction. No need to have the same exact post every week clogging things up when a simple search can lead you to high quality discussion content on the same exact topic.
Increasing impressions/time-on-site increases ad inventory and also juices all the other engagement numbers.
This is a good move by Reddit (whose search has always been abominable). Many HN'ers have pointed out that they add "reddit" to their Google searches, ex: "<product X> reviews reddit". Why let Google get the ad dollars for that SERP?
because reddit is a for-profit business and if it can wrestle search from Google obviously it will. In a broader sense the internet is moving away from its protocol nature to vertically integrated firms and you can expect this fragmentation of search in many places.
Comments are literally the whole point of forums - what the point be if people did NOT want to read the comments?
I agree there are a lot of crap comments on the bigger default subs (as is the same with any forum) but those are relatively easy to ignore and recognize with experience. Or maybe I just have been using internet forums for too long !
That isn’t a problem faced by the vast majority of Reddit users. What I posted is a workaround that has been available since Reddit was founded and would be useful for the average user to know.
They keep raising money. Might as well hire more and pay better. I'm blown away that after nearly 2 decades, search is effectively unfunded at this company.
Reddit search has always been laughably bad - it is quite amazing that for the biggest and most popular forum on the internet they have not worked harder on this. Reddit should have the BEST search on the internet.
Or hell - can they not just pay Google and embed a customized version into their site?
"Better question, and this is something I see time and again, how do sites missing such basic functionality even get any funding to start with? "
By abandoning ethics and building a moat so that those with ethics can't breach the moat.
"In the early days, reddit's community was built up thanks to hundreds of fake profiles created by the site's co-founders, according to Steve Huffman (coincidentally, a reddit co-founder). To make the site look populated and diverse, Huffman and Alexis Ohanian, the other founder, would submit links of their own choosing, each time under a new username."
Sometime around the night of the DNC primary in 2015 there was a dramatic shift in the content on Reddit. If you were a frequent visitor of the homepage and /all at the time, you will know exactly what I am referring to.
The product has gone downhill ever since.
Comment search is nice because it enables users to find interesting content directly in the good niche subreddits.
As someone who has been using reddit and frequenting r/all for over 10 years orso, I dont know what you are talking about. Either make you point, or just don't make such comments. They add nothing.
I think one could argue that your comment adds much less to the conversation. It's basically just "you're wrong, because I said so."
Maybe - since you have a decade of material to draw from - you could offer a counter-example? I think that would be more conducive to discussion than "don't make such comments."
It's far more than that. The OP said "you know exactly what" - and the comment disproved that, by not "knowing exactly what".
Also, how is the comment supposed to refute a negative? A negative that wasn't even a claim to begin with but rather a vague gesture towards some conspiracy (?).
And I am refuting the refutation by "knowing exactly what". Turtles all the way down.
Do you see how silly this is yet? The comment you are defending is - logically - about as valid as not saying anything at all, which is exactly what they told the OP to do.
With a specific claim demonstrating a contrary phenomenon, one would think. "Vague claims" aren't some rhetorical ace-in-the-hole. They can be argued against, presumably better than "you're wrong, and you should feel bad."
I have been browsing /r/all for years, but I aggressively filter subreddits out that hold no interest for me. It allows /r/all to be semi-curated content for me, while I still will see something that breaks containment from the niche subreddits.
I don't feel that this makes reddit a great news source or anything - but it does make it palatable entertainment.
I do this same thing, it has worked really well. I browse the all subreddit and aggressively block/filter subreddits that are of no interest, then smaller subreddits start bubbling up and all becomes relatively good for finding new content actually. Blocking the main default subreddits and subreddits where they just keep posting memes and all subreddits that have anything to do with gaming or new games gets you like 80% of the way there. It's nice seeing small subreddits on all after that curation.
I've also been on reddit for more than 10 years and it's also never happened to me, but I know what they're talking about. Let's not pretend like it doesn't happen.
I have seen it happen through complaints reaching the all subreddit. However it only seemed to happen to subreddits that were peddling deep hatred against outgroups like people of color, overweight people, or were just vile. Your comment makes it seem like any given subreddit is in danger of quarantining, but anecdotally that doesn't seem to be the case. However, I have an open mind: would you give me an example of a subreddit that was otherwise innocuous yet got quarantined?
One of my favorite ways of discovering niche content has been browsing /r/all/gilded and taking a look at the variety of highlights throughout the site that never make it to the front page or other kinds of publicity.
I dug in there one time and found a fascinating obituary for a user on a subreddit for people living with addiction -- full of some touchingly personal and weirdly public stories. Following the user's profile from there taught me about the wild world of the "opiate roll call" subreddit (a thinly veiled Craigslist for drugs) which was at the time flourishing. Fascinating stuff.
These days that awards are free, that's just a list of popular things, not a list of cool things. The major subreddits can have a selfie posted that gets a dozen awards.
This has been a theme on reddit since comments were introduced.
The largest shift in the site was something 12-14 years ago when they redesigned the site to have thumbnails.
This brought people to the site for images instead of content. That was absolutely the original torpedo.
It made the site much more popular, but the quality tanked fast, and has continued since due to having to cater to the lowest common denominator of millions of people.
The biggest change came after the migration from Digg, which would have been about 5 years prior.
the_donald brought a lot of a younger users and white supremacists. I haven't seen a DNC-related reddit conspiracy before though. Certainly no noticeable shift that night.
The digg exodus was nothing, just a bunch of group A basement dwellers moving in with group B basement dwellers. At the time the average person still had no idea what reddit or digg was.
The whole thing was just internal internet geek culture drama.
No, it changed reddit from having interesting content and mostly useful constructive commentary to being full of memes and jokes (like Digg). Sure there were pun threads before, but you didn't have to custom tailor the front page to keep it from being complete shit.
I can only recommend aggressively blocking popular subs, ones that are managed by the big mods, and any that have even one political thread that gets a suspicious amount of votes. All actually becomes a source of new interesting content that way.
I tried this but quickly hit a limit on the number of subs you can block. It is fairly low, maybe a couple hundred. There are a lot more big, low quality subs than that.
I wonder if your comment is an instance of the Eternal September effect [0]. The "quality" of a website is such an intangible and immeasurable thing, it's more or less impossible to say, in aggregate, if something has objectively gotten worse, especially considering the fact that Reddit has grown exponentially more popular.
At the very least, there's a distinct lack of self awareness of how small your one opinion is, to the sea of overall opinions. "Has gotten worse" vs "I've grown to like it less" are very different statements; one externalizes the decline, eschews your role and responsibility for your own opinion, and the other owns up to the reality that something my be worse "for you" while getting better for many, many others.
It was really just one trick, which was to immediately sticky select new posts. Those posts would garner large amounts of upvotes immediately by virtue of being stickied at the top of a subreddit where most users did not browse reddit overall but rather just browsed the sub. Reddit's algorithm favors posts that really take off with lots of votes right away, so as a consequence The_Donald's posts were constantly making r/all.
> It was really just one trick, which was to immediately sticky select new posts.
and (imo) if they were really wanting to fix this trick they would have just made it so something that was ever stickied couldn't show up on /r/all. Instead they did what (to me) was basically a complete change of their algorithm with very slow changes to placement.
I can’t speak to that timeline but I’ll add that I have a use-hate relationship with Reddit. It’s a source of great information and misinformation, caring and callousness.
I don't think it was related to that at all - I think it was just around that time that Reddit also started to really become more mainstream popular. Anytime a forum or sub forum grows in size you end up with a lot more crap to sift through. And with greater number of users also of course comes greater numbers of bots / fake / shill accounts as well.
>Sometime around the night of the DNC primary in 2015 there was a dramatic shift in the content on Reddit. If you were a frequent visitor of the homepage and /all at the time, you will know exactly what I am referring to.
Indeed.
It's especially visible when the effect temporarily abates. After Trump's 2016 election win, for a day or two, it was possible to post something non-leftist to /r/politics without having it be mercilessly downvoted; as if the shills were awaiting orders.
I don't know if that's exactly the claim they are making, but I can certainly attest to their timeline. It was a day-night shift from "Bernie or Bust" to "Eww Bernie Bros are gross". Putting my own politics aside, I was more shocked by the sudden, very drastic shift in narrative. It's really hard to believe that it was organic. It wasn't even a "maybe we should support Clinton now that Bernie is out" narrative, which would have been more believable. It was a "Bernie supporters are sexist" narrative.
In the months leading up to the DNC primary, reddit was a battleground between Correct the Record[1] and /r/The_Donald. After the primary, it felt like every post on left-wing subs was inline with the narratives that CTR had been pushing for months. Before the primary, there was quite a bit of resistance to CTR, even from the left-wingers. That evaporated over a single night.
> Right now, we internally blacklist the account so that the data is not exposed via any public API. For full disclosure, we currently do not permanently delete any data unless there is a major issue involving PII, etc. While you have the right to request that people cannot search your comments and submissions via the public API, we reserve the right to keep data in our private archive so long as we never allow any data that you requested be removed get exposed through any public API endpoints.
Your deleted comments even after requesting to be removed from pushshift can still be found in the archives e.g. what https://camas.github.io/reddit-search/ uses
> we reserve the right to keep data in our private archive so long as we never allow any data that you requested be removed get exposed through any public API endpoints.
This is why I'm really happy to have GDPR. If this were a European company someone would eventually tear them a new one.
"Reserve the right" is a cheeky choice of words for keeping data that isn't theirs against people's consent.
> "Reserve the right" is a cheeky choice of words for keeping data that isn't theirs against people's consent.
How is it "your data" if you use their property (reddit.com) to post content on their website? IP and other PII I can understand, as reddit gets that no matter what interaction you do with the website, it makes sense to protect that. But as soon as I make this comment on HN, I don't really expect to "own" this comment anymore, it now belongs to news.ycombinator.com.
When you make a post on Reddit, you give that site permission to publicly display and retain it. However, what is being talked about here is a third-party scraping that post and keeping it without your permission, the permission that you did grant to Reddit.
If I post to Reddit, I'm giving Reddit permission to display that comment. I haven't read the T's and C's, but my expectation would be that I retain copyright over my own comments.
I could well be wrong (probably, even :), but my point is that giving permission to use my comments on one site shouldn't automatically give carte blanche to anyone and everyone to use them.
> But as soon as I make this comment on HN, I don't really expect to "own" this comment anymore, it now belongs to news.ycombinator.com
In Article 17, the GDPR outlines the specific circumstances under which the right to be forgotten applies. An individual has the right to have their personal data erased if:
The personal data is no longer necessary for the purpose an organization originally collected or processed it.
An organization is relying on an individual’s consent as the lawful basis for processing the data and that individual withdraws their consent.
An organization is relying on legitimate interests as its justification for processing an individual’s data, the individual objects to this processing, and there is no overriding legitimate interest for the organization to continue with the processing.
An organization is processing personal data for direct marketing purposes and the individual objects to this processing.
An organization processed an individual’s personal data unlawfully.
An organization must erase personal data in order to comply with a legal ruling or obligation.
An organization has processed a child’s personal data to offer their information society services.
However, an organization’s right to process someone’s data might override their right to be forgotten. Here are the reasons cited in the GDPR that trump the right to erasure:
The data is being used to exercise the right of freedom of expression and information.
The data is being used to comply with a legal ruling or obligation.
The data is being used to perform a task that is being carried out in the public interest or when exercising an organization’s official authority.
The data being processed is necessary for public health purposes and serves in the public interest.
The data being processed is necessary to perform preventative or occupational medicine. This only applies when the data is being processed by a health professional who is subject to a legal obligation of professional secrecy.
The data represents important information that serves the public interest, scientific research, historical research, or statistical purposes and where erasure of the data would likely to impair or halt progress towards the achievement that was the goal of the processing.
The data is being used for the establishment of a legal defense or in the exercise of other legal claims.
I wonder how all of this works out when you're taking public comments from someone else's forum and re-hosting them. Someone who specializes in law or even is more familiar with GDPR and related laws might be able to comment in more detail. I'm just linking the source which attempts to clarify these things, hopefully those are useful.
Don't know what to tell you, I tried opting out a year ago [1] and they still have some of my deleted comments today [2]. Pushshift is a massive GDPR violation so I'm surprised they haven't gotten into any trouble yet.
One of the most annoying things is to open a thread and be forced to click all over the place to see comments rolled up under each other. You can't even search comments inside a single post until you have somehow unwound everything. Thats why I use Sync for Reddit and will logout proper if they dump 3rd party clients.
Reddit without RES (also uBlock Origin) is reddit that's only 50-75% usable. Though I would also say old.reddit is infinitely superior to the newer reddit interface as well.
I agree - I mostly was using Reddit on mobile for years for this reason (using Reddit is Fun on android) because I was too lazy to setup RES on desktop. Once I got a proper config setup + constant redirect to old.reddit.com (was already using Ublock) the site is a millions times better even without logging in. And of course even if you make a throwaway username and do a little subreddit subscription customization that makes it even better.
Off topic: If Anyone reading this thinking of joining reddit you can navigate it two ways without getting engaged(also enraged) with the the scum and filth and rage-baits that is r/all or r/popular.
1). Use old.reddit.com, create an account ( it will bait you to enter your email, but its not necessary to proceed. ). Go to subreddits that you want to follow, there are hundreds of niche subreddits which are frankly quite awesome. Subscribe to them. click on r/Home to get the feeds of the subscribed subreddits.
2). Read-only subreddits, privacy focused. Use libredd.it go to niche hobby subreddits and subscribe. Come back to homepage.
There you go the correct ways to navigate through the piles of garbage. Thank me later for your saved time, sanity.
Related is how both storage and processing is very cheap these days, and yet you don't see many ISPs providing an out of the box experience featuring a mini-NAS, Nextcloud server, e-mail server, website server (including a personal blog and a phpBB forum ?)... on their modem/router boxes !
An important thing is also that reddit still doesn't require a mail address. Just click continue on the signup form without filling in the mail and bypass the dark pattern.
I think a worthy addendum to (1.) is to review the 'default' subreddits that your new account is auto-subscribed to (unsure what the current list is, but probably stuff like /r/news, /r/funny, /r/videos...) and unsubscribe from the ones you don't/won't like.
Reddit of course has its issues but by working through and around them it's still an amazing site - not because of its features or politics but because of the discussions and communities i.e. users of the site. The default subreddits are generally awful though.
I use the RES extension on my PC and the Reddit Sync app on my phone and would recommend both.
For (1), a userscript like this does wonders. There's also a way to replace new-reddit links with the old-version, but it takes a bit of tinkering to get it to work on google.
> Or just log in and tick the box that says "always use old reddit" and then it will always use old reddit.
Not sure if it's just me, but that setting seems to frequently get reset, as I tick that box maybe once a week/two weeks or something, but I keep getting sent to the new reddit at one point. Maybe I accidentally access new.reddit.com which forces the new layout and then reset the setting?
In any case, I'm giving one of the new -> old redirection extensions a try now instead.
It's never got reset for me. I've set it once and it's stayed applied across all my desktop browsers. I'm also pretty sure I've visited new.reddit.com and it didn't reset the setting.
It doesn't apply to mobile but IMHO the new site is slightly less terrible on mobile than the old one anyways so that's my preference.
That doesn't work for certain cases - e.g, if you're browsing Reddit in an icognito window.
e.g, you want to browse for a gift for your SO and you're review-hunting, but you don't want it showing up in your sidebar as a recently viewed thread, or being in your history, or whatever - so you don't sign in. Userscripts/extensions force old.reddit.com regardless.
In particular the posts from /r/politics which hit the front page are often Q-anon tier; absurd misinformation and extremist rhetoric. But yes the smaller sub-reddits can be good.
You could use it to easily find any ongoing conversations on the site, about <topic you want to get into discourse about>, anywhere, and thereby be laser focus on artificially butting into the discussions about that thing. If this became common practice, perhaps we will see people externally agreeing to proxies for certain words so they can discuss certain topics publically on reddit without getting "jungled". Or maybe the feature isn't going to be used that much. I don't know.
You see this on Twitter sometimes -- people talk about "en eff tees", I think both to avoid people searching for the term and to be seen by people who have "NFT" muted.
There's also certain people with very vocal fanbases who do this. Sometimes I'm rather happy Twitter added the feature to restrict replies to your follows.
I feel horrible whenever I sense a need to block somebody. It's occasionally people I know who are too far gone into desperation and self-loathing/contempt to deal with anymore. The other much more rare case is people who 1) open with 2) insultingly irrational/inconsistent 3) vitriolic hostility. In that latter case, I can live with any two, but not all three. For a lot of people the criteria for 2) is pretty variable so I try to give some benefit of the doubt, which leaves ample opportunity for the disappointment to be real.
I've been pulled into the frontline of a culture war, so I've become pretty liberal with who I block, including using chainblocking on people who encourage harassment of others.
It occasionally hits the wrong people, but it's the only way to make twitter usable for me. It's not my failing.
It's of course all against the twitter rules, but twitter moderation is horrible and these people have also learned to speak in code. It's not "kill yourself" anymore, it's "join the 41%".
"Kill yourself" or some proxy, that's an objectively evil gutless thing to say. I've had people say some very insulting things to me in recent memory, but the last time someone said "kill yourself" to me she was a sad-sack in her mid-teens. Therefore I associate it with being extremely immature, depressed, and severely separated from reality. When I think that one of the people out there who has been mean to me might have already offed themselves, I have to ask myself "Who's out there for these people? Probably nobody". I wish I could gently pull people out of the attitude, but I'd probably come across like some wannabe camp counsellor.
I wonder to what extent this is an early step toward walling off Reddit's content from third-party search engines. They probably recognize that without a functional search they'd take a big hit in such a scenario, but with search under their own control they can better influence how people end up seeing various content.
That and/or building a better relevancy engine for search-based ads. It isn't the worst idea because they don't need to do horribly privacy invasive things to provide query based ads. But obviously they will because there are even better relevancy scores to be had with the other signals in a given user's account.
I have not actively used Reddit in over a year and am better for it. Their design is one of the most hostile I have seen in a website. I wish it was an area that allowed for competition because someone needs to give Reddit's head a shake.
I find that using the Old Reddit interface with RES on desktop, and Apollo on mobile, more or less eliminates the icky parts of the Reddit experience.
The catch is that both of those usage modes are pretty much at the mercy of Reddit Inc., which could decide to "deprecate" them at any time they see fit. And given that Reddit Inc. seems to be planning an IPO, I imagine that those will both be on the chopping block sooner rather than later.
They have been hostile to old reddit for a long time - many new features don't work with it like polls, free awards, inline pics, avatar, and so on. But I bet they won't outright kill it, since the portion of users who use old reddit is still very big and they also tend to be power users who are very vocal.
> They have been hostile to old reddit for a long time - many new features don't work with it like polls, free awards, inline pics, avatar, and so on.
I didn't want or appreciate some of those features anyway. Old Reddit with RES pretty much feels feature-complete to me. In some sense it's also refreshing to use a popular social media UI that isn't going through constant redesigns and tweaks.
I don’t use Reddit in the sense that I have an account, instead I go through teddit.net using a privacy redirect browser extension, this lets me bypass their redesign by using a different front end.
Over time I’m starting to dislike threaded comments. I kind of wish everything was a single thread with numbers and people could reply to specific numbered comments.
Then if you’d like you could just view said thread as a tree but it’s still a single threaded comment thread.
My problem with trees is that unless expanded and then flattened finding the newest relevant comments is more trouble than it’s worth.
I have to say I strongly disagree with this take. I hate it every time that I have to go through a thread that’s just comment under comment. It’s so hard to keep up with multiple conversations going around.
I wonder if a threaded discussion where the parent being responded to is always visible as you scroll would be a workable and usable interface? I get turned around sometimes if there are dozens of replies to a comment in a threaded discussion, especially when I get to the grandchild or great grandchild comment level.
It would be analogous to when people quote entire paragraphs in a forum post to add their 1 sentence reply (which always bothered me for reasons I've never come to terms with). Once I found reddit, I really appreciated the threaded discussion.
That’s a fair take. I suppose it depends on what you prioritize. Even in this post it’s difficult to see the most latest comment unless you read and expand all of them.
After a few hundred I personally don’t bother. great example is the musk Twitter thread. Good luck finding the globally newest comment.
It's probably hard for me to relate with your problem, because I'm literally never interested in most recents comments. What is your "usecase" in reading newest comments?
I mean, the same reason why you'd want the newest of anything. Stuff on the top gets out of date. Even in the context of a comment thread, I'd want the latest comment in the thread, etc.
Not to mention with threads you waste time reading things over and over again.
This is awesome. While, comments are generally lacking in info (but still entertaining) there are always gems in there. It’s a bit like needle-in-haystack. I think there’s even a subreddit dedicated for this. I’d love to see ML/DL deployed to make it easier to find such comments.
you mean... like if people could vote on the best comments, and you could sort by the votes? The issue is that figuring out what is actually useful is intractable and subjective.
No, it’s a bit more complicated than that because the comments are threaded. It could take a few passing to find a gem comment. Also, the comments evolve over time meaning you might not find anything at the time of your browsing.
That's sort of how I feel about comment trees. ~100-200 comments is the safe max for me, after that, sufficient garbage is present to degrade readability. It's subreddit/topic/ToD dependent, since some are difficult to follow from the beginning.
I wish I could set multiple sortation options at the same time. So I might see the top 'best' post, followed by the top 'controversial' post, followed by the 'newest' post. Then the second best of each.. etc..
As for threaded, I like how StackOverflow handles this. You have separate top level answers, but underneath that is flat comments that can be filtered a tad for quality or totally unwrapped. That system works very well.
This is an excellent move from Reddit. I've been appending Reddit to my Google searches for years. There is nothing better than testimonials from real humans within a focused interest community.
Haven't used reddit in about 5 years and my life has improved almost as significantly as a few years prior to that when I deleted facebook. HN is the only place I have social interaction on the internet, even while interacting with people I may adamantly disagree with the tone of comments are generally more respectful, informative and less band-wagony than any social media platform out there. Good for them for fixing their infamously broken search feature, but it isn't remotely enough to entice users back onto their platform.
Interesting, I wonder what they used for the search database, since that enormous amount of text can't fit in RAM, it would have to be partitioned & sharded in something like Scylla DB
At one point, their search used LucidWorks Fusion [1], a commercial product based around Solr (that uses Lucene indexes under the hood) but also integrates a vector database and the like for semantic search. The linked wiki page still has Lucene-style queries.
IIRC they have a pretty standard PostGres setup. I'd bet they just setup another PostGres shard replicated just for search, using an extension for the index. Doesn't require the index or working set to be in RAM.
I doubt that a succinct full text index (like an FM-index) of this data would require more than a modest server to keep in memory. Why aren't these used in this context?
Succinct full text indexes can be substantially smaller than the source text. It depends on the zero order entropy of the text. If things are highly repetitive, a very small index might be feasible. Usually lookup times are linearly proportional to query size, with logarithmic factors in database size.
Reddit search has been notoriously horrible since its introduction. It couldn't even search for subreddits effectively. It could never search comments for content.
Hardly worth it. I one designed & maintained an archive of certain subreddits. Now nuked for various reasons.
The modding to craft a narrative and ban dissenting right think is so pervasive (and even the admins banning individual wrongthink comments) that what you see on the site doesn't represent what people posted.
How? The site is so slow and unresponsive, as for threads it hand-picks like 3 comments at best, you have to click to see all and it still won't show all. How and why do people put up?
But how many people does that single percentage number represent? With hundreds of thousands of active accounts, loosing 5% of users is not an insignificant amount of a user base.
I can't see anything replacing Reddit any time soon, though.
my biggest gripe are purchased aged accounts, awards, and coordinated manipulation of opinions in r/cryptocurrency r/wallstreetbets in particular.
there's clearly pump & dump scams happening here. but its impossible to regulate and prove (VPN).
there will come a point where you just can't straight up trust what somebody says anymore on reddit, unless there's some sort of consensus...but even that can be manipulated.
if you go to regional subreddits like r/japan or r/korea, you will notice its not even natives jumping on like r/singapore, its mostly expats and surprisingly quite very prejudiced and biased. You would think people exploring other cultures to be open minded, its not in these circles, its an echo chamber for whatever demographic it represents.
all in all, reddit as a product and value is as good as 4chan. there's no defense or nuanced balanced views with active moderation like you will find on HN.
Hackernews, with its current moderation, that can possibly allow "subreddits" would be far more valuable for me. But lately I also see that HN is not immune from astroturfing and manipulation. It's not hard to see this, you have accounts that aren't active suddenly becoming activated in a particular thread (recent example: https://news.ycombinator.com/item?id=31015328) I suspect these are run by the same person trying to "win" or feels they have something to lose.
tldr: Text based, anonymous discussion platforms are problematic and rife with manipulation and unreliable trust vectors and don't see any good solutions
I mean this is a straight up blatant lie, every single forum/social site out there has proper search and has for years, I know all of mine do. Most of them you even have image search.
So honest question, how does a multi-million dollar site just not have such basic functionality? Does it not matter?
Better question, and this is something I see time and again, how do sites missing such basic functionality even get any funding to start with? It's just weird. Reddit didn't even have a working user block till recently either.