Hacker News new | ask | show | jobs
by jfabre 1704 days ago
I never understood this argument. In what kind of shop are you working that passing a string named person to a method expecting an object is tolerated. Or even passing different types that don't share a common interface.

This would never fly in a code review in any of the companies I've worked for.

7 comments

I've seen essentially this code in so many organically grown codebases (when they grew up without types). It's usually close the the UI, because someone had to quickly add an alternate path to support some new user interaction

    function find_user(person) {
        if user is string {
            query_by_name(person)
        } else {
            query_by_name(person.name)
        }
    }
and yeah, we all know it's kinda messy, but also that logic has to live somewhere and we need this feature asap so it passes code review. I wrote a test for it, ship it.
I came very close to writing almost this exact code just the other day (except it was username or user id for me), but came to my senses. It's just so tempting in a dynamic language...

In a static language, you either can't do it, have to really go out of your way to do it, or at least do function overloading (which is a bit cleaner)

Calling a function like this “ensureUser” is pretty idiomatic and useful, in lisp-style code bases. I think it’s a pattern related to “parse, don’t validate” in static-type lands: rather than _checking_ what the type is and throwing an error, you define a function that knows how to turn various representations of your type into its canonical shape.
Sounds like a brilliant case for multiple-dispatch.
Right so we have:

    function find_user(person: string)
and also:

    function find_user(person: Object)
how long before someone writes this:

    find_user(person: { name: "dave" })
meanwhile, someone else, not suspecting that they'll be handed a weird half-formed `User` object adds `person.id` somewhere in the body of the Object version of `find_user` and now we have a weird edge-case where very rarely `find_user` panics because the user object we're handed doesn't have an id??? Great, I just lost an hour trying to dig that out of the logs, and the users are starting to think of the product as flakey because the bug has been in prod for over a month before we finally believed them enough to look into it.

Just. Use. Types. Multiple dispatch won't save you on its own. You NEED compile-time types.

Somebody downvoted you, I'm guessing because they think this is a silly example and have never actually seen something like this. I have, in a production code base.
Multiple dispatch and compile time times are not exclusive at all.
I'm saying the problem isn't solved by multiple dispatch alone, but it is solved by compile time types alone. You can use both together, of course.
The issue whether the language is interpreted or compildd (which would distinguish compile time types from strong types) is in my opinion completely orthogonal to the issue of how dispatch works. Strong types and multiple dispatch fix the issues I see even in an interpreted language.
This was probably just a silly example for a quick explanation.

  But all it takes is a method that expects an integer Id to receive a string representation of said id because of some obscure path in code that notwithstanding your 100% line coverage the team is so proud of, was never exercised on tests because nobody can have 100% branch coverage
In C++ you're only ever one missing "explicit" from introducing such problems.

Suppose I call fire(bob). Programmers from other languages might reason that since fire is a function which takes a Person, bob must be a Person. Not in C++. In C++ the compiler is allowed to go, oh, bob is a string and I can see that there's a constructor for Person which takes a string as its only argument, therefore, I can just make a Person from this string bob and use that Person then throw it away.

To "fix" the inevitable cascade of misery caused by this "feature" C++ then introduces more syntax, an "explicit" keyword which means "Only use this when I actually ask you to" rather than as a sane person might, requiring an implicit keyword to flag any places you actually want this behaviour to just silently happen.

This way, hapless, lazy or short-sighted programmers cause the maximum amount of harm, very on-brand for C++. See also const.

If only there was a way to enforce these parameter types automatically
I personally love it, and wish every library worked this way. My argument is why go out of my way to make it not work, when it would be easy to make it work. This is because I think of modules/packages as user facing programs that are easy to tie together, instead of simple building blocks.

What I really wish existed was a built in way to cast and validate, or normalize and validate. I never care if something is a string. I care that if I wrap it in str(), or use it in a fstring, the result matches a regex. Or if I run a handful of functions one of them returns what I need.

The only benefit I can see of type hints on their own is it makes it easy to change a callable's signature, but I think that's best avoided to begin with.

> why go out of my way to make it not work, when it would be easy to make it work

The problem the DWIM approach to APIs is that when you go out of your way to "do something reasonable" with absolutely any kind of argument type, leaving the caller's intent implicit, you will sometimes run into combinations that "work" in unexpected—and often unwanted—ways.

For example, say you have a function which returns either a Person object or, in very rare cases, an error string. Moreover, you fail to check for the error string, and pass the result into another function which expects a Person object but will also take a name and look up the corresponding Person object in a table. Now if the first function fails you're left trying to look up an error string as a name, with no obvious signs (such as a type mismatch error) to show that anything is amiss.

It's important to make the intent explicit, and not just let the function guess. One option compatible with both statically- and dynamically-typed languages is to provide two functions, one requiring a Person object and another taking a name string. This is still perfectly ergonomic for the user and mitigates most of the potential for confusion.

For example, say you have a function which returns either a Person object or, in very rare cases, an error string. Moreover, you fail to check for the error string, and pass the result into another function which expects a Person object but will also take a name and look up the corresponding Person object in a table. Now if the first function fails you're left trying to look up an error string as a name, with no obvious signs (such as a type mismatch error) to show that anything is amiss.

Well I only ever return one type from a function, I'm not a total madman. Sometimes I'll do one type or a None, if I'm trying to replicate the functionality of dict.get(). Any error string would be within an Exception, so that wouldn't be an issue, but even in your example it would show a stack trace to the function looking up the user, and would be much more valuable to troubleshoot than a type mismatch.

One option compatible with both statically- and dynamically-typed languages is to provide two functions, one requiring a Person object and another taking a name string. This is still perfectly ergonomic for the user and mitigates most of the potential for confusion.

In practice that is usually what I end up doing, but with a 3rd function that takes either and returns a Person object. In this particular case I would probably make the function be a method on the Person object, and have a class method to look up the Person.

Here is the scenario that annoyed me enough to turn me off static typing. I had a class that stored the IP address of a network device as an ipaddr.IPAddress object (now ipaddress in the standard library) and there were various subclasses for specific device types. One of the device types needed an SDK, and the init for the SDK class looked something like this

  def __init__(self, host, port=1234, scheme='https'):

      if not isinstance(ip, str):
         raise TypeError('invalid host')

      self.url = f"{scheme}://{host}:{port}"

If they didn't check the type it would have worked fine. Just like every other library we were using to connect to devices.

So after a bit of frustration we changed our base class

  def original__init__(self, ip_address):
      self.ip_address = ipaddr.IPAddress(ip_address)
  def new__init__(self, ip_address):
      ipaddr.IPAddress(ip_address) # just to validate
      self.ip_address = ip_address 
and all was well with the world, but there was a dumb mistake waiting for us. A year or two later, after upgrading to 2.7 we started passing around unicode objects instead of strings to get ready for 3.x, as was the style at the time. Again that SDK broke, and only that SDK, because it insisted on checking the type. Sure it was our mistake this time for not having the original fix to be just casting it to str right before passing it to the SDK, but it was annoying and should have been unnecessary.

I understand that type hints are much better in this regard because it would only show an error in your tooling. But that brings me to another point.

I write my packages/classes/modules to mostly be used in a web app, or as scripts that run on a schedule. However, I also need to be able to write one-offs very quickly. When that happens my code that was previously a library for different applications, now becomes an application itself. Using the REPL, a jupyter notebook, or bpython, I will need to quickly get something done. In these scenarios I don't want to waste time remembering how to normalize the data being given to me. Especially If the code that provides such niceties is tucked away at a higher level for end users of the web app.

Like I said, I tend to just make a lookup function, and then have everything else be methods on the object. But that doesn't really help when it's parameters to a function. I really don't know what would make it better. Perhaps some kind of mix between function overloading and interfaces from other languages, and the magic *_validate() methods that Django uses. Maybe instead of type hints for return values we need value hints, that give an idea of what actual objects might look like. Then tooling could take into account if it would still work after validation and normalization. Of course it could be that there is no elegant and reliable way to do what I really want, but I can dream.

> Well I only ever return one type from a function, I'm not a total madman.

I'm sure your APIs are sane (at least to you). It's all the other developers you have to watch out for.

> … even in your example it would show a stack trace to the function looking up the user, and would be much more valuable to troubleshoot than a type mismatch.

A type mismatch would be caught earlier (even in a dynamic language) and the runtime exception should report the specific objects involved, so you still get the string which caused the problem.

> Here is the scenario that annoyed me enough to turn me off static typing.

To begin with, this example has nothing to do with static typing. It involves a runtime time check. In this case I would agree that the type check is too strict. Some languages have an interface or protocol for "string-like" objects (e.g. the to_str method in Ruby), and it would be better to use that rather than checking specifically for an instance of str. Objects which shouldn't be treated as strings just don't implement the protocol. Python has the __str__ magic method, but unfortunately it's not very useful in this regard since all objects implement it, even ones that are nothing like strings. It's more like Ruby's to_s method, used for formatting and debugging rather than as an indication that you have an actual string. The best recommendation I've seen for checking for "string-like" objects in Python is something like `str(x) == x`, though the extra comparison adds some overhead.

Of course that doesn't really help you since you were trying to pass an arbitrary non-string-like object (IPAddress) to a function expecting a string; the looser `str(x) == x` check would also have failed. The call might have "just worked" without the condition, or it might have failed spectacularly. In assuming that it would work without the type check you're depending on the implementation using string interpolation rather than, say, concatenating the strings with the + operator, which requires actual strings and not IPAddress objects since the + operator doesn't do implicit conversion like f-strings would. Static typing would have helped to limit these dependencies on unstable implementation details, letting you know that you need to fix the issue at the call site by passing `str(self.ip_address)` for the host parameter.

We have tests, and static types, because developers are people and people make mistakes.

You can't say "we simply don't allow bugs!" because it's a lie. Why rely on a another person manually checking for silly mistakes when the computer can do it for you?

You'd think. But I've seen many many many examples of this pattern in production JS code.