Hacker News new | ask | show | jobs
by lolinder 600 days ago
Every consumer of its data should be sanitizing its inputs before rendering them wherever they are using it. HTML, SQL, etc. Banning "computer code" as judged by a random bureaucrat from being inserted into the database is not a solution at all, much less a foolproof one.

The absolute best case scenario here is that the bureaucrats successfully block all possible actually-malicious injection attacks but the vulnerable consumers still get broken occasionally by a random apostrophe that gets thrown in.

2 comments

> Every consumer of its data should be sanitizing its inputs before rendering them wherever they are using it.

This is not how the real world runs though. In the real world (outside the bubble of programmers) things are messy and a lot of stuff barely works, many people are incompetent etc.

Said otherwise, it's defense in depth.

"Should" doesn't factor in. You can't make everyone competent at the wave of a magic wand. But you can control what company names are allowed. You can't control how they will be parsed. There is one law about company names, but a myriad systems that may parse them.

This is a huge blindspot of programmers.

It always barely works as much as you allow it to. Lower the bar even more and it will start barely working at it again.

This koolaid with protecting real world only helps perception (“I made it work now with this simple rule”), cause moving the bar down relaxes issues a bit and they don’t instantly accumulate at the new level.

It doesn’t matter where the bar is, they will always find enough competence and budget to follow it in a moment. You just have to hard-break what half-works in advance.

You can't make everyone competent at the wave of a magic wand

You can make their incompetence fail by adding random honeypots like someone suggested above. That would be a smart move. Your “out of bubble” move is just an instant gratification button.

Whenever I see a python-requests user-agent I sometimes keep the connection open indefinitely without responding, to see if the developer was incompetent and forgot to set a timeout. Responding to other certain clients with 'Location: file:///dev/urandom' is also mildly entertaining.

My point would be, I'm not sure if this wouldn't be too damaging to the mental health of programmers if everyone was doing shit like that.

On balance, blocking such names makes sense. You can secure YOUR systems, and if that was that I would agree but unless you are going to pay to audit all consumers of the data worldwide, this solution is more pragmatic. I am not sure what we gain by letting company names have code.
Thats the thing, you don't have to audit. You put your own harmless malicious code base company names in and people immediately learn to deal with it.

It's WAY less pragmatic to test every company name for potential malicious actions in other peoples code that you don't own.

You are right but best to do that on day 1, which was probably in the 1970s or whenever a database of company names first existed. In the case of HTML script exploits maybe the 1990s.

So you have a transitioning issue. You suddenly allow this company name sending a script to a domain they control then it is too dangerous.

Test data like you mentioned is a great idea to increase resiliance. However I don't think that rises the overall ecosystem of consumers of this data to the right level to release actual exploits into the dataset.

Downvoters are probably thinking purely. They are thinking "everyone in the world should make their systems 100% secure against common exploits and let a company name be an arbitrary string".

The problem is that is not realistic.

It works at a corporate level but not across all actors who interact with this dataset and the global internet. You can "should" at them all you like but no one has control over this.

The government can choose: more exploits in the wild or fewer. Allowing script URLs they dont control in company names is the former.

For the register of companies in England & Wales, day 1 would have been the 5th of September, 1844.

I think we can forgive the young William Gladstone (who was President of the Board of Trade at the time) for not fully anticipating how difficult robust string handling would turn out to be!

So you're right, this could only ever be approached as a transitioning issue.

That doesn't test things in a useful way, and relies on having an official dataset lie. Good ingestion code should ignore those, and then you're not even testing the frontend of those systems.
By disallowing, we normalise deviance (security wise).

Also, there can be a problem with who/how decides what is code. There are myriad of programming languages already, and for trolling or legal attack purposes, one could build interpreter using arbitrary words as keywords (to make problems for arbitrary company)

> there can be a problem with who/how decides what is code.

Blocking names that look like code is part of a defence in depth approach, it's not a standalone silver bullet.

I meant abuse scenarios.

Laws eventually are use not as intended, but as written.

“defense[1]”, “if happy begin something end”, “if”. All of these technically are code (somewhere). Also check out some esoteric language like: https://en.m.wikipedia.org/wiki/Whitespace_(programming_lang...