| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by axilmar 1268 days ago

The problem of hidden types surfaces again.

Obviously passing null to a function that does not expect null is a well known problem, tackled by many programming languages but not by C and C++. This is a purely typing issue, because null is a different thing than a valid memory address.

The other issue about vector.pop_back() is also a hidden type issue...an empty vector is not the same as a non-empty vector, and therefore pop_back() applies only to non-empty vector.

What would a possible solution be in the context of C++?

Well, each function should only be applicable when the parameters are in the states the function expects. So parameters (including this/self) shall have states.

For example:

    void pop_back() this : not_empty { 
        ...
    }

But this is not enough. The language should allow the user to specify change of states at certain statements/expressions. For example:

    void clear() {
        ...
        this := empty;
    }

And then the compiler can check the AST if the used vector has the 'empty' state on pop back or not. Doing the following would be illegal:

    vector<int> data;
    data.clear();
    data.pop_back();

And the following would be illegal because one execution path would lead to a state violation:

    vector<int> data;
    if (some_external_condition) {
        data.clear();
    }
    data.pop_back();

And the following would be illegal too, because the result of clear_random_vector could be possible in empty state.

    vector<int>& clear_random_vector(vector<int>& a, vector<int>& b) {
        if (rand() < RAND_MAX / 2) { //c++ has better random number interface
            a.clear();
            return a;
        }
        b.clear();
        return b;
    }

int main() { vector<int> a{1, 2, 3}, b{4, 5, 6}; vector<int>&v = clear_random_vector(a, b); v.pop_back(); }

And in case the function clear_random_vector is in another translation unit, then:

a) if headers are used, the programmer shall provide metadata about the state of the result in the header. Example:

    vector<int>& clear_random_vector(vector<int>& a, vector<int>& b) := empty;

b) if modules are used, the compiler itself should output the metadata.

Actually, this would also solve the null pointer problem, because a pointer would have the states empty, not_empty by default, and then functions could be like this:

    memcpy(void* dst : not_empty, const void* src : not_empty, size_t count);

And then this would be caught at compile time:

    std::vector<char> dst, src;
    memcpy(dst.data(), src.data(), 10);

States could also solve the problem of array indexing. For example, an index can have two states, 'in_range' and 'out_of_range'. Using an integer as an array index would implicitly add the two states in the index variable; comparing the integer with the array range would automatically add 'in_range' or 'out_of_range' to the integer.

Example of valid usage:

    vector<int> data{1, 2, 3};
    int i = 0; //our index
    //'i' gets automatically 'in_range' state.
    if (i < data.size()) { 
        std::cout << data[i] << std::endl;
    }

The following would be invalid, because straightforward array indexing would requre a state named 'in-range':

    vector<int> data{1, 2, 3};
    int i = 0; //our index
    std::cout << data[i] << std::endl;

How would that work at language level? well, here is an idea: a) any operation can affect the compile-time state of objects, b) states themselves would be templates, bound to certain ids at compile time.

Here is a possible implementation:

    template <class T> class vector {
        //returns a 'size_t' variable tagged with state 'size_of_this' and parameter 'this'; in this case, this does not refer to the runtime address of the object, but to the identifier this object is bound to.
        size_t size() const : size_of_this<this> {
            ...
        }
    }

    //index is bound to state bound_index<T>, where T refers to the variable of the vector at compile time.
    template <T> operator < (int& index, size_t size: size_of_this<T>) {
       index := size_of_this<T>;
    }

    template <class T> class vector {
        //the vector operator [i] requires size_of_this<T>
        const T& operator [](int index : size_of_this<this>) const {
            ...
        }

        //adding an element to the vector breaks size_of_this.
        void push_back(const T& elem) {
            this := !size_of_this<this>;
        }
    }

So, each time push_back is called on a vector, the guarantee about indexes breaks, and the index that previously holds the guarrantee can no longer be used for indexing.

The above also requires metadata at header/module level.

By all means, the above is not feature complete, it is some idea on how it might work to have compile time guarantees...and if it looks suspiciously like Rust's lifetimes, I am sorry, I don't know Rust, any similarity is completely random.

And perhaps this mechanism may be used for lifetimes as well, I don't know ;-).

2 comments

aw1621107 1268 days ago

I think what you're describing is basically dependent types [0], where types can depend on values.

[0]: https://en.wikipedia.org/wiki/Dependent_type

link

zabzonk 1268 days ago

by your logic, a vector with 3 elements is a different type to a vector with 4 elements. this is where original pascal famously went off the rails

link

axilmar 1267 days ago

If code depends on a vector having 3 or 4 elements, then yes, a vector with 3 elements is a different type than a vector with 4 elements.

It is actually dependent typing.

Not being able to express types properly is a major issue in c++ and in all other programming languages that do not allow this kind of typing.

The lack of this facility prohibits programmers for declaring their intent properly to the compiler, forcing programmers to keep that information in their head.

To me, it's the no 1 issue that c++ must deal with. It will solve a myriad of problems.

link

ThatGeoGuy 1268 days ago

Which starts to look an awful lot like dependent types. Unfortunately any sane use-case of that in 2023 looks like Idris or Coq and nobody in C++ today is switching to either of those.

link