Hacker News new | ask | show | jobs
by ars 4110 days ago
> To be honest, I'm completely in the strict camp

Can you explain why? Are you using PHP in a web context? Because everything from the web is a string.

So wouldn't it make the most sense to let the functions using the web data coerce to integer, and just work?

How does putting (int) before the arguments to function help anything?

I actually liked Ze'ev's proposal because it let you be weak while making sure you did not coerce obviously bad data.

Anyway, as a member of the strong camp, can you explain?

4 comments

> Because everything from the web is a string.

I would start by saying that everything, regardless of domain, is just a stream of bits. Which is completely useless, just like your assertion.

And I know what you meant, but you're also wrong. A JSON object is only a string before being interpreted. An x-www-form-urlencoded is actually a map in which values can be arrays instead of primitives. Such forms often correspond to domain models with a clear definition.

There's also no such thing as "obviously bad data". All data is good in the proper context, therefore automatic conversions that try to make this distinction do not make sense. I don't necessarily know how PHP behaves, but in another popular language there's a world of difference between "077" and "77". There's also a world of difference between integer, floating point and fixed point and the details are never irrelevant.

> therefore automatic conversions that try to make this distinction do not make sense.

You have to convert it somewhere. I don't see how the caller converting it is any better than the recipient doing it. Having the caller do it seems quite pointless when the recipient is anyway doing it.

Your answer about how everything is bits was quite useless since you completely missed the point. Your input is a string, you have to convert it someplace. Weak mode has the recipient do it. Strict mode you have to do it yourself, and then the recipient double checks.

I see no value in the second option - the actual conversion in both cases is identical.

But Danack disagrees, so I asked him to explain. Your answer was not helpful at all.

@ars, you completely missed my point. Lets go over it again. Tell me how should the following things convert:

     77
     077
     77.0
I GOT your point. I don't care about your point because it is not the question I am asking.

Why are you answering something I did not ask?

I am asking why does Danack prefer strict mode. There is absolutely nothing in my question that cares about the specific details how you convert bits to types, other than that you do.

My question is entirely about WHO does the conversion. NOT about the conversion itself.

(Oh, and the thing about bad-data has a defined meaning that went over your head because you are not familiar with the debate here. In this context bad-data means data loss on conversion. So "1" to 1 is fine, but "1 a" to 1 is not.)

The conversion itself is very relevant because you cannot establish a default conversion that should happen, therefore the conversions should be explicit, answering the WHO. This is why I asked you about what should the conversion produce in those examples.

And also in the conversion from "1.1" to 1.1 there is loss of information, because the two representations are not isomorphic. Care to guess why?

This is the point that ars is making. He is saying that it has to be a runtime check and/or conversion because it is coming over the wire, as a string, at runtime.

So the question he is asking is, if you have to do the check at runtime anyway, what is the benefit of the type hinting? Isn't it just belt and braces?

It seems like a perfectly legitimate question to me.

And by the way, to those downvoters who don't seem to be able to tell the difference between a comment you disagree with and spam, can you please contribute to the conversation by hitting the reply button or alternatively get lost? Only you're ruining it for the rest of us. Thanks

> Having the caller do it seems quite pointless when the recipient is anyway doing it.

Actually, I disagree. The caller is the only one who has semantic information about what the variable (and hence its value) means. All the callee (recipient) can do is blind cast it. The caller on the other hand can interpret it because it knows the meaning (talking about the developer, not the engine).

> Can you explain why?

Over the years, we have refactored our PHP code base to something which is much more bug proof, and a lot of it is because we demand certain types to be passed to types. Currently we use doc types (for which the IDE helps us heaps) and type hinting for objects being passed as arguments.

For a given process (e.g. form submission) there will nevertheless be an entry point where strings from the web are passed in. But if you can minimize that area as much as possible, beyond that one place (a function or class) which understands the mapping from incoming string types to PHP types, you end up with a code base which behaves mostly like a statically type language.

This has reduced a huge subset of bugs which are caused by unexpected input being passed to a function. Having the language itself tell you when you made an error statically while writing is much better than having to wait until runtime.

A similar argument would stand for why we moved from dynamically created strings sent to the database, towards a database abstraction layer where we pick up syntax errors at time of writing.

> Can you explain why?

Hopefully.

> How does putting (int) before the arguments to function help anything?

It wouldn't. Anyone who is casting from an unknown type to an int by using just `(int)` is doing something wrong in my opinion.

Even in web-based applications there are at least two layers of code: i) One where the type of the values are unknown and they are represented as strings. ii) One where the types of the values are known.

At the boundary between these two layers you should have code that inspects the strings that represent the input values, check that they are acceptable, and convert them to the desired type. If the input values cannot be converted to the desired type, the code needs to give an error that is both specific to the type of error so that a computer can understand it, as well as provide a human understandable explanation of why the conversion was not allowed.

The reason why I want strong types is that I never, ever want to blindly cast from one type to another. The decision about how to convert from one type to another, should always be made at a boundary between areas of the application where types are known, and the areas where the types are unknown. I always want to be forced to make that decision in the right place, using code that gives useful errors and messages, rather than having the value coerced into the desired type.

tl;dr I won't use (int) to cast, I will use something like the code below.

cheers Dan

    function validateOrderAmount($value) : int {
        $count = preg_match("/[^0-9]*/", $value);
        
        if ($count) {
            throw new InvalidOrderAmount("Order amount must contain only digits.");
        }

        $value = intval($value);

        if ($value < 1) {
            throw new InvalidOrderAmount("Order amount must be one or more.");
        }
        
        if ($value >= MAX_ORDER_AMOUNT) {
            throw new InvalidOrderAmount("You can only order ".MAX_ORDER_AMOUNT." at a time.");
        }
        
        return $value;
    }

    function processOrderRequest() {
        $orderAmount = validateOrderAmount($_REQUEST['orderAmount']);
    
        //Yay, our IDE/static code analyzer can tell that $amount is an int if the code reached here.
        placeOrder($orderAmount);
    }
Thank you for the reply! (The rest of the replies to me got seriously derailed....)

So from your code it looks like the only benefit of strict mode is in case you forget to do the validation/conversion it will warn you? I guess that's reasonable. Is there any other benefit?

To me it seems that Ze'ev's proposal would be even better for you - it does the equivalent of the validation and conversion automatically including with an error if it doesn't validate.

You wrote "let's ignore that", but it really seems like to best of all worlds to me. Any idea why it was rejected so badly? Is it because the coercion rules are different from the rest of PHP?

> Is there any other benefit?

Strict types make it easier to reason about code, that the tl;dr version.

> To me it seems that Ze'ev's proposal would be even better for you - > it does the equivalent of the validation and conversion automatically > including with an error if it doesn't validate.

Rather than having int 'types' which we can reason about, it has int 'values' which are harder to reason about. Types can be reasoned about just by looking at the code. Values can only be reasoned about when running code. A contrived example:

    function foo(int $bar){...}

    foo(36/$value);
In strict mode, this would be reported as an error by code analysis.

For the coercive scalar type proposal, this code works - except when it doesn't. This code works when $value = 1, 2, 3, 4 and breaks when $value = 5.

This is the fundamental difference; whether conversions between types have to be explicitly done by code, and so any implicit or incorrect conversion can be detected by static code analysis tools, or whether the conversions are done at run time, and so cannot be analyzed fully.

This means most of these errors will be discovered by users on the production servers. Strict mode allows you to eliminate these types of errors.

Yes, this means I need to add a bit of code to do the explicit conversion, but I just don't convert between values that much. Once a value is loaded from a users request, config file or wherever, it is converted once into the type it needs to be. After that, any further change in type is far more likely to be me making a mistake, rather than an actual need to change the type.

> Any idea why it was rejected so badly? Is it because the coercion rules are different from the rest of PHP?

At least in part it was because the RFC was seen as a way to block strict types; about half of the RFC text is shitting on people desires for strict types, which did not make people who want strict types be very receptive. If it had been brought up 6 months ago, there is a good chance it would have passed, or at least would have been closer.

Some parts of the proposal were good - other parts were nuts that were pretty obvious the result of the RFC only being created once the dual mode RFC was announced and about to be put to the vote, with a very high chance of passing.

* Good - "7 dogs" not longer being converted to "7" if someone tries to use it as an int.

* Bad - Different mode for internal function vs userland functions e.g. "Unlike user-land scalar type hints, internal functions will accept nulls as valid scalars." and other small differences. This is even more nuts than you might realise as it means if you extend an internal class, and overload some of the methods on the class, those methods will behave differently to the non-overloaded methods.

* Bad - Subtle and hard to fix BC breaks in conversion which are probably not right anyway. e.g. false -> int # No more conversion from bool true -> string # No more conversion from bool

It is a shame that the discussion became so contentious. It would have been good if the conversion rules could have been tidied up, but all the time and energy had been used up the not particularly productive discussion.

> Because everything from the web is a string.

Do you consider JSON a string? Would you manipulate it as a string or use a JSON parser?

> How does putting (int) before the arguments to function help anything?

It throws an error in case you receive bad input, and as we all know you will receive bad input.

You get json from the browser?

Of course I manipulate the data - that's exactly what weak mode does, convert the strings into integers. I just don't see how doing the conversion myself manually helps anything.

> It throws an error in case you receive bad input, and as we all know you will receive bad input.

It does no such thing. (int) will simply turn bad input into a zero.

> You get json from the browser?

Yes.

Really? All the pages you program with forms, and links and whatever are sending you json?

I have no doubt you CAN do it, but most of the time you don't.

And since most of the time you are dealing with strings my question stands: Why do the conversion manually instead of letting the nice new feature do it for you.

You've heard of JavaScript I presume?
Even AJAX sites do not typically send all data back as JSON. Most of the time they use normal form-urlencoded data.

If you did send everything as JSON you would be bypassing everything PHP does with form/url data to make things easier for you (for example arrays). That doesn't seem like a good engineering tradeoff.