| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by joewrong 2444 days ago
	" ORDER BY tweet.time DESC " sign me up

9 comments

mc3 2444 days ago

Whether it is the lack of an ORM or lack of a "algorithm" that pleases you, this comment has made my day :-).

link

hombre_fatal 2444 days ago

I believe they're referring to the lack of algorithmic sorting.

link

Jaxan 2443 days ago

I’m not even sure what to think of the term “algorithmic sorting.” Is quicksort considered as algorithmic sorting ... ?

link

hombre_fatal 2443 days ago

I'm referring to "algorithmic" in its negative usage as short-hand for what you can see as of late with regard to Twitter, Youtube, and Facebook's suggestion engines where more complex sorting and filtering is happening than a simple ORDER_BY.

Sure, not the best short-hand, but it seems most people understood what I meant.

For example, https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

Like this submission: https://news.ycombinator.com/item?id=20184282 where people will shorten the "evil" to just "the algorithm".

link

redwall_hp 2443 days ago

That SQL statement is literally algorithmic sorting...

link

mc3 2441 days ago

Yes, I hate it when they manipulate the masses like that. They're so time-ist! Show me that old stuff!

link

npo9 2444 days ago

It’s the pure simplicity.

link

anonytrary 2443 days ago

Pure simplicity works well when you have little to no users. It's reasonable to think that

  ORDER BY tweet.time DESC

would result in a impractical experience for a site that has millions of DAU.

link

lvturner 2443 days ago

Only if you subscribe to everything - if you keep your follower list small, the individual experience doesn't change regardless of the overall size of the platform.

link

Nextgrid 2443 days ago

Why would it be? It'll naturally keep your list of followings small (as a large one will be unmanageable) and promote quality over quantity.

link

dblohm7 2443 days ago

Personally, I’m pleased by both!

link

jamroom 2444 days ago

I'm not an PG user but there does not appear to be an index on tweet.time - and even if there was wouldn't it better to do ORDER BY tweet.id DESC? I assume ordering by primary key is going to be the fastest, and would (ideally) prevent maintaining an index on tweet.time.

link

chaps 2444 days ago

Sure, but you're making some assumptions in doing so. For example, that the time and id columns will remain constant. It might be a decent assumption in the beginning, but once you start doing updates on the table, all bets are off.

link

WorldMaker 2444 days ago

Yup, as soon as you need to change ID schemes, you risk breaking query logic. Sure things like v1 UUIDs and Snowflakes (and ULIDs and so forth) try to maintain temporal ordering in their ID formats, but what if you need v4 UUIDs for better clustering in some hash-table or SHA-256 hashes for some sort of content addressing scheme?

Also, it's just a very premature optimization in a world where CLUSTERED INDEXes exist. The database engine doesn't have to cluster by primary key, it can cluster directly by time (or any other index) if you ask it to. The power of doing it that way is that you can flexibly change it based on real performance issues (how do your execution plans look?) and characteristics (are you read heavy or write heavy? which ones are your bottle necks?), whereas if your application makes assumptions about ID format it's a lot harder to on-the-fly tune queries that all need to be rewritten.

link

ErikAugust 2444 days ago

I am not sure IDs are exactly incremental in Twitter at this point. I think they are issued by blocks or something of that nature. Someone here could answer that far better.

link

WorldMaker 2444 days ago

Twitter calls them "snowflakes". It's a 64-bit ID of that consists of a tuple of sorts of a timestamp, machine code, sequence ID. They start with timestamps to mostly give them monoticity in the direction of time (ie, sort order them), but they definitely aren't simple sequence IDs.

link

axaxs 2444 days ago

Assuming that every tweet.time is unique, what benefit would you gain from indexing it?

link

mc3 2443 days ago

With the query given, the optimiser can immediately figure out to get the latest record from the index and scan back through the index. If the index has included columns, it could scan the data straight off the index. Without the index you need to scan the entire table, sort it in memory, and then read off the top columns. If you were doing a top X query it would be more markedly faster by fetching less data from disk. But I think that query is getting all the records, but still it will be quicker by avoiding the in memory sort.

link

axaxs 2443 days ago

interesting, thanks for the insight. I haven't touched DB setup in many years, and even then was novice. The best person I knew told me to index if a column wasn't unique, but also wasn't something with only two or three choices. Sounds like I have more reading to do...

link

mc3 2441 days ago

I'm certainly not an expert - I did a great DB course about 15 years ago and then used the skills every now and then since. I might not be up on the latest. And I am more of a SQL Server person. BUT... the main thing I learn is view actual execution plans, and see what is actually happening before adding indexes (unless it's an 'obvious' index).

> The best person I knew told me to index if a column wasn't unique, but also wasn't something with only two or three choices.

Yeah I think this is too broad advice, and you need to understand what you want to achieve. Mentally, choosing indexes is like choosing whether to use a hashtable vs. for loop vs. binary tree etc in an algorithm in code. There is not golden rule or "always use a hashtable if there are 100 entries" type thing. You just need to figure it out on a case by case basis. And usually there are only 2-3 tables in your DB worth a lot of effort in figuring it out!

link

vntok 2444 days ago

Why would it be unique?

link

axaxs 2444 days ago

So, in fairness I didn't dig through the code to see the column, but assumed some granularity like milliseconds? It was a genuine question on whether it would speed up the query at all if indexed since it is generally grabbing everything then sorting.

link

spullara 2444 days ago

Indexes generally give you sorting for free so you don't have to look at everything.

link

gigatexal 2444 days ago

In general if it is something you will query in a where clause it should be indexed.

link

WA 2443 days ago

Just create a private list on Twitter, put all people you follow in there. Bam, done. Reverse-chronological order, no ads.

link

nkozyra 2444 days ago

I'll never understand why the stream cannot just default to logic like this or [my follows]+[non-rt/replies]+[desc by time]

Best I can come up with is it'd make promoted comment more apparent?

link

hombre_fatal 2444 days ago

There are good faith arguments for applying smarter sorting / filtering to the stream.

For example, if someone follows two people and one person simply tweets at 1000x the rate of the other, I would argue that it would be better UX if the rare tweet from the other person gets more weight.

Another thing Twitter does is show you popular things that have been liked by the people you follow. It might be way too much to show you every thing that every person you follow has liked every time. But it can introduce some interesting content to your stream to see such second-tier content.

I don't see how promoted content plays into it since you can interpolate it anywhere.

link

MagnumPIG 2444 days ago

That's exactly the reason. Ads can much more easily be camouflaged as real content this way.

link

WilTimSon 2444 days ago

If only they weren't inane. I think Twitter is probably the worst in terms of 'promoted content' because it always seems to serve up the same ad several days in a row and it's always something surprisingly off-topic. I'd be browsing my feed full of political news, movie trailers, and animal photos... and get an ad for ketchup. How on Earth could I confuse that for a real tweet or something my friends would retweet? A ketchup ad? Mental.

link

hombre_fatal 2444 days ago

This doesn't make much sense to me. You don't need complicated algorithmic sorting and filtering to inject an ad space after every fifth tweet. Even the client can do that.

link

megous 2444 days ago

Client does that. If you consume twitter via their API, you'll notice there are no ads there.

link

hirundo 2444 days ago

You see many more outlier tweets with thousands of likes when your feed is interspersed with "most liked by follows and follows of follows", as opposed to just "tweeted by follows". I think that improves the signal to noise ratio of the feed.

But yeah, I wish a simple timeline was the default, and twitter stopped switching away from it.

link

mipmap04 2444 days ago

For anyone interested in this, google “hacker news gravity algorithm” to see an interesting example of how sorting was done here at one time (as I understand it the algorithm has evolved).

link

mipmap04 2444 days ago

I was working on creating an rss reader with a reddit-like interface that allowed follows, voting, comments, etc. The computational expense of a fancy sort algorithm based on factors other than just one non-computed column is immense once you get to a decent number of users. I ended up removing the algo just to lower my hosting bills. Of course, there were other potential solutions, but for a side project, this was easiest.

link

kgthegreat 2443 days ago

Feel free to take this for a spin - http://trysensible.com/

It has that and more!

Open source - https://github.com/kgthegreat/sensible

link

robobro 2443 days ago

Pleroma, part of the fediverse (3,500,000+ users) has exactly this.

link

singron 2443 days ago

Is "Latest" sorting in the twitter UI not this?

link

sam_lowry_ 2444 days ago

Fenix seems to have that.

link