Hacker News new | ask | show | jobs
by setr 2078 days ago
>Sure, it might make sense in specific cases, but in general it doesn't make sense to transpose a table.

From my intuition, I don't see anything stopping the function from existing -- it can be applied to any arbitrary table. You probably don't want to transpose your table, except when you want to, but that's true of any function -- I'm can't imagine any scenario where transpose(table)->table would as an algorithm fail (unless I suppose if julia tables include header rows, in which case there's probably no generally correct definition)

1 comments

> From my intuition, I don't see anything stopping the function from existing

A datatable is (or is isomorphic to and can be analyzed as) a mapping from row numbers to tuples of a given shape[0].

The transpose of table will only be table (a mapping from rows numbers to tuples of a common shape) if the tuples of the starting table were homogenous (every field of the same type.)

This works with, say, matrices where all the elements are numbers; but it fails in the general case.

[0] Yes, I know, Julia defines them as columns of arrays, and columnar organization is ideal for all kinds of processing tasks. For me, thinking about and explaining the the problem with transpose works easier thinking about it in row-oriented form (which is logically equivalent). In column-oriented description, a datatable is an ordered set of columns, each of which is a homogenous array, but if you try to transpose it, each of the columns of the result would be a heterogenous array unless the columns were all of the same type to start with. So, again, it fails to be a table->table function except in the case where the starting table consists of columns of identical type.

No, I'm still not understanding. Here's my thinking:

Scenario #1: If the tuples are the same shape (type, size), it's fine

    [
       (string, int, date)
       (string, int, date)
       (string, int, date)
    ]
transposed:

    [
       (string, string, string)
       (int, int, int)
       (date, date, date)
    ]
Both input and output have tables with consistent shapes (type, size)

Scenario #2: Assuming its legal, if the tuples are differently shaped (by datatype), its weird (but that was true of your original table anyways), but you can still do a valid transposition to produce a valid table

    [
       (string, int, date)
       (int, date, string)
    ]
transposed:

    [
       (string, int)
       (int, date)
       (date, string)
    ]
It was weird to begin with, and it's similarly weird to end with. I can't imagine the output not being a legal table by any rule that does not also disallow the input.

Scenario #3: Similarly to scenario 2, If your tuples are differently shaped (by size), you can still do a transposition

    [
       (string, int)
       (string, int, date)
    ]
transposed:

    [
       (string, string)
       (int, int)
       (date)
    ]
and like Scenario #2, the output is as illegal as the input

In the latter two cases, I don't know what you'd want to do with the transposition (or even its input), but I don't see anything stopping the operation itself from being reasonable/consistent/valid.

Is there another scenario I'm failing to imagine?

> Scenario #1: If the tuples are the same shape (type, size), it's fine

In the row-oriented view: each row of a datatable is a tuple of the same shape (size and order of types as every other row) -- just like a database table. So, if the shape is (string, int, date) for row #1, its that shape for every row.

In the column-oriented view, each column is a homogenous array: every element in the column has the same type.

> [ (string, int, date) (string, int, date) (string, int, date) ]

Sure, this is a fine starting table; in row-oriented, its shape (the shape of every row) is (string, int, date). In column-oriented view, the table as a whole can be viewed as a tuple of shape (string[3], int[3], date[3]) because it has three rows. Cool.

> transposed:

> [ (string, string, string) (int, int, int) (date, date, date) ]

Right, this is no longer a datatable. The first row has shape (string, string, string). So, if its a table, the other two rows must also have shape (string, string, string); but instead, each has a different shape.

Ah!

Ok, that makes sense.