Am I right in reading this as the parent comment envisioning a “post” table and a “tag” table, and you’re suggest the “post” table just have a “tag” column?
> Every post is tagged with a given category. There could be 100+ categories in the blog system and a blog post could be tagged with any number of these system categories.
My point is, I don’t see SQL query as expensive for this kind of use case. There are easy and native ways to do it.
In case you would like a top notch performance, Redis might be a way to do it. Even a reverse-index would achieve great performance.
That's a standard many-to-many relationship that would normally be implemented by three tables:
+---------+-------+---------+-------+-----------+
| post_id | title | content | other | fields... |
+--------+------+
| tag_id | name |
+------------+---------+--------+
| tagging_id | post_id | tag_id |
But it seems like the core of the request is still something like:
SELECT post_id, count(1) AS count
FROM taggings
WHERE tag_id IN (3, 8, 255)
GROUP BY post_id
ORDER BY count DESC
(off the top of my head; I haven't checked this for any kind of correctness)
And I don't see why that query suffers as you add tags...?
------------
EDIT responding to below [HN believes I am a problem user who should only be allowed to make so many comments per day]:
< that is pretty much what I meant by “I see just one table” as you don’t need any joins
Well, assuming you're doing this because a user is interacting with your site via some kind of web interface, you can set the interface up to deliver you tag_id values directly, but you'll still need to do a join with the posts table so you can present a list of posts back to the user instead of a list of internal post_id values.
So I guess
SELECT t.post_id, count(1) AS count, p.title, p.url
FROM taggings t JOIN posts p ON t.post_id = p.post_id
...
> Every post is tagged with a given category. There could be 100+ categories in the blog system and a blog post could be tagged with any number of these system categories.
My point is, I don’t see SQL query as expensive for this kind of use case. There are easy and native ways to do it.
In case you would like a top notch performance, Redis might be a way to do it. Even a reverse-index would achieve great performance.