Hacker News new | ask | show | jobs
by bebrbrhrj 598 days ago
On balance, blocking such names makes sense. You can secure YOUR systems, and if that was that I would agree but unless you are going to pay to audit all consumers of the data worldwide, this solution is more pragmatic. I am not sure what we gain by letting company names have code.
2 comments

Thats the thing, you don't have to audit. You put your own harmless malicious code base company names in and people immediately learn to deal with it.

It's WAY less pragmatic to test every company name for potential malicious actions in other peoples code that you don't own.

You are right but best to do that on day 1, which was probably in the 1970s or whenever a database of company names first existed. In the case of HTML script exploits maybe the 1990s.

So you have a transitioning issue. You suddenly allow this company name sending a script to a domain they control then it is too dangerous.

Test data like you mentioned is a great idea to increase resiliance. However I don't think that rises the overall ecosystem of consumers of this data to the right level to release actual exploits into the dataset.

Downvoters are probably thinking purely. They are thinking "everyone in the world should make their systems 100% secure against common exploits and let a company name be an arbitrary string".

The problem is that is not realistic.

It works at a corporate level but not across all actors who interact with this dataset and the global internet. You can "should" at them all you like but no one has control over this.

The government can choose: more exploits in the wild or fewer. Allowing script URLs they dont control in company names is the former.

For the register of companies in England & Wales, day 1 would have been the 5th of September, 1844.

I think we can forgive the young William Gladstone (who was President of the Board of Trade at the time) for not fully anticipating how difficult robust string handling would turn out to be!

So you're right, this could only ever be approached as a transitioning issue.

That doesn't test things in a useful way, and relies on having an official dataset lie. Good ingestion code should ignore those, and then you're not even testing the frontend of those systems.
By disallowing, we normalise deviance (security wise).

Also, there can be a problem with who/how decides what is code. There are myriad of programming languages already, and for trolling or legal attack purposes, one could build interpreter using arbitrary words as keywords (to make problems for arbitrary company)

> there can be a problem with who/how decides what is code.

Blocking names that look like code is part of a defence in depth approach, it's not a standalone silver bullet.

I meant abuse scenarios.

Laws eventually are use not as intended, but as written.

“defense[1]”, “if happy begin something end”, “if”. All of these technically are code (somewhere). Also check out some esoteric language like: https://en.m.wikipedia.org/wiki/Whitespace_(programming_lang...