| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bad_user 5364 days ago

Scala is not Haskell and in Scala you can only check if a variable contains an Employee or a String. The problem is that such types do not actually contain anything useful about the content within that object, it just says how the object behaves (the messages it responds to).

For instance, say you have a String ... well, a String containing what? A name? A regular expression? A number waiting to be converted to floating point? A bank account number? A phone number? An email?

I don't see compile-time type-checking in Scala as something useful, as most often than not it stands in my way. It does help to alleviate some bugs related to incorrect usage of types, but on the other hand you need tests with good code coverage anyway.

1 comments

noelwelsh 5364 days ago

The whole point of having an expressive type system is to encode useful properties in it. If a type of String isn't any use to you, give the value a more expressive type. Make a Name type, or a RegEx type, or whatever. Use unboxed types:

  http://www.chuusai.com/2011/06/09/scala-union-types-curry-howard/

if you don't want to pay an abstraction tax.

link

trurl 5364 days ago

Indeed. I think many people seem to miss the fact that types are a language for describing the data in your program.

I also think many people do not understand the benefits of type abstraction, perhaps because many languages do not even offer it without requiring boxing. Even if your RegEx type is a String, not revealing that to client code is wise engineering.

link

Periodic 5364 days ago

My favorite example is dealing with user input on the web. Everything from the data passed by the user's browser to the data stored in the database is just a string of bytes. You could go about representing this as a string, but really there are different types of strings, specifically there are sanitized and unsanitized strings. The database should only ever receive sanitized strings and all input from the user is an unsanitized string. Then you have two simple functions that map from one to the other, say, escapeString and unescapeString.

If you don't represent these as different data types it's very easy to accidentally lose track of if you sanitized a string or not. Did you sanitize it before you placed it in your model, or do you need to sanitize it before to display it? Did we unsanitize it when we pulled it from the DB? Should we be displaying it unsanitized in the HTML?

If you encode these differences in opaque types it suddenly becomes impossible to make a mistake. It suddenly becomes very easy to reason about a program. Based on the type you can immediately know what type of string you are dealing with and not have to every ask yourself if it has been sanitized yet.

For me, that reduces the complexity of the application and makes reasoning about the program easier. Wise application of types to semantic differences in data can really make a program easier to build.

link

jeltz 5364 days ago

That is a poor example since you should not normally have to code like this when handling user input on the web.

You must escape all user input you send to the database, and this is normally done by your database driver by using parametrized queries (be they prepared or not). And when the data is retrieved from the database you generally do not have to unescape anything, especially not since the returned data is not SQL strings but in whatever binary protocol is used and is unescaped (if necessary) by your driver.

So what your code should look like is something like:

  db.execute("INSERT INTO foo (a, b) VALUES ($1, $2)", a, b)

So for the database you just have two types. Queries and data. Data should always be treated as unsafe while queries is the special class that could help out if you do a lot of manual query building.

link

Periodic 5364 days ago

Not so. What if something fails database validation so you want to display it in the form with an error? This can lead to attacks where a malicious user gives the victim a specially crafted URL that inserts a script into their page, then when they access the URL the script executes inside their browser with their session.

For this reason any user input needs to be escaped whenever it is rendered in HTML as well. If you got it from the DB it should be safe, but if you got it from the url parameters it isn't.

link

jeltz 5364 days ago

Data from the database is just as unsafe as data from URL parameters which is why escaping of data must be done in the presentation layer. This is basically the same thing as SQL queries. You have a template like this (most of the HTML omitted).

  <li class="<%= user_type %>"><%= username %></li>

Where username and user_type are variables containing data fetched from the database. This template is then compiled to something like the blow. Strings literals of the HTML-safe type start with s", and concatetnation to the output buffer is <<.

  out << s"<li class=\"" << user_type << s"\">" << username << s"</li>"

So to implement your own templating language and and template helper functions this is a useful concept. But query building should be in the database layer and HTML building in the presentation layer. People using the database should not have to know of or care about that the data might at some point be displayed in a HTML page.

link

bad_user 5364 days ago

Maybe my example wasn't clear enough.

How about if you want to do stuff only with Employees with a salary greater than X? Consider this case ...

   a_employees = employees.filter{|x| x.salary > A}
   b_employees = employees.filter{|x| x.salary > B}

   # where A > X, B > X
   # so both should work with:

   def do_stuff(employees) where employees.salary > X
       ...

In other words, creating new subclasses is a poor substitute for code contracts, especially since many contracts could be checked at compile time.

link