| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by somethingsome 330 days ago

I have many functions written by many scientists in a unique software over many years, some expect a data format the others another, it's not always the same function that is called, but all the functions could have been written using a unique data format. However, they chose the data format when writing the functions based on the application at hand at that moment and the possible acceleration of their algorithms with the selected data structure.

When I tried to refactor using types, this kind of problems became obvious. And forced more conversions than intended.

So I'm really curious because, a part from rewriting everything, I don't see how to avoid this problem. It's more natural for some applications to have the data format 1 and for others the data format 2. And forcing one over the other would make the application slow.

The problem arises only in 'hybrid' pipelines when new scientist need to use some existing functions some of them in the first data format, and the others in the other.

As a simple example, you can write rotations in a software in many ways, some will use matrix multiply, some Euler angles, some quaternions, some geometric algebra. It depends on the application at hand which one works the best as it maps better with the mental model of the current application. For example geometric algebra is way better to think about a problem, but sometimes Euler angles are output from a physical sensor. So some scientists will use the first, and the others the second. (of course, those kind of conversions are quite trivial and we don't care that much, but suppose each conversion is very expensive for one reason or another)

I didn't find it a criticism :)

1 comments

Mawr 330 days ago

If I understood the problem correctly, you should try calculating each format of the data once and reusing it. Something like:

    type ID {
        AsString string
        AsInt int
        AsWhatever whatever
    }

    function new type ID:
        return new ID {
            AsString: calculateAsString()
            AsInt: calculateAsInt()
            AsWhatever: calculateAsWhatever()
        }

This does assume every representation will always be used, but if that's not the case it's a matter of using some manner of a generic only-once executor, like Go's sync.Once.

link

somethingsome 330 days ago

But the data changes very often in place with the functions calls on it.

I agree that would be a good solution, despite that my data is huge, but it assumes the data doesn't change, or doesn't change that much.

link