Hacker News new | ask | show | jobs
by philh 1333 days ago
SQL queries sometimes put floats in GROUP BY. E.g. if you have a many-to-one relationship you might do a query like

    SELECT foo_id, foo.some_float, SUM(bar.some_thing)
    FROM foo JOIN bar USING (foo_id)
    GROUP BY foo_id, foo.some_float
I feel kinda dirty whenever I do this.

Though, I would guess the optimizer (at least in postgres) is smart enough to ensure no float equality checks actually happen under-the-hood. They could be necessary, if the schema was different than I'm imagining; but maybe in that case, it would almost always be a bad idea.

2 comments

Why would you feel dirty? In this case it is solving for exact equality, ie the same bits, it doesn't matter that the value is a float.

Though I have seen some people using a double as a primary key (no idea why) and some database engine (internal, not major vendor) failing to do equality comparisons in certain statements, I suspect because they must be switching to "close enough" which is not what you expect when you write col1 = col2.

This is also really kind of an artifact of how GROUP BY works in most database engines.

I've always liked the way MySQL/MariaDB let you omit things from the GROUP BY if they're provably unique in each group (here, if foo_id is a primary key of foo, and you're grouping by it, there can only ever be one foo.some_float for each foo_id).

I suspect in practice this would get rid of approximately all occurrences of group-by-float.