| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by 0xbadcafebee 1849 days ago

> Where are the flying cars?

Understanding why we don't have flying cars will tell you how the entire world works. It turns out that "progress" has nothing to do with "want", and more to do with need, availability, and timing.

> They lack the structure to make the data they contain readily accessed programatically, so step one is miserable screen scraping and data cleanup. That’s where a lot of people give up.

> We think we have an interesting way to fix this.

RDF? OWL? SKOS?

1 comments

bena 1849 days ago

All of those address the issue from the other side.

If I'm aware of the Semantic Web, agree with their conclusions, and wish to do so, I can make my site Semantic. That's true. And if I wanted to publish data and make sure others could access it, I would.

It's not required however. And if I want to make my data only accessible through my portal, I have a vested interest in making my site as anti-Semantic as possible. Like I don't think Multiple Listing Service (MLS) companies would ever make their sites Semantic. Nor would any company that consumes their data (like Zillow or Realtor). It's the data itself that has value, so they want to put hoops in front of it.

But it's technically publically available data that they're publishing for free. Technically, if you acquire the facts themselves, it doesn't matter how, they can't do anything to you if you redistribute that data. The only thing they can do is make their site as difficult as possible to scrape.

Then there are the sites that don't care either way and making their site Semantic is only additional work. For example: Lego's storefront. All of that information is from their databases. They already have the information and don't care whether or not someone else has access to it. The information provides little to no value. So they have no incentive to invest resources in making that data more easily available to others.

So to get the information from these sites, you have to scrape. It's unfortunate, it's miserable, but those are the facts.

link

0xbadcafebee 1849 days ago

That's quite a fatalistic point of view for a non-technical problem. But who cares if scraping is involved? You can still use all the technologies I listed.

Looking at Hash.ai's product roadmap, it looks like they're building a "Splunk for simulations". And probably their big value-add, besides just a flashy tool, is going to be collecting and curating data. But they probably realize how much time and money it takes to constantly curate all that data. So probably what they are doing is building the tools so that individuals can curate their own data, by using a scraping tool, a transform tool, a load tool, etc. And if they want that data to be more useful within their own ecosystem, it would make sense to make all that data semantic, so that data from one customer is composeable for another customer. The more taxonomies and semantic relationships that are built over time would just make the whole dataset more intelligent over time. Enabling customers to build graphs of graphs would turn them into the world's premier crowd-sourced and curated data warehouse, netting them a trillion dollar valuation in 10 years, without ever charging a single person for access, or paying for anything but infrastructure and a couple devs.

Or maybe it's something else, I dunno.

link

bena 1848 days ago

I think we're thinking from two different perspectives.

You're considering about what happens when the data is in HASH's possession. It'll likely be helpful for them to be Semantic, yes.

But that's not the only problem. The problem lies in the collection of the data as well. I'm thinking from the perspective of getting the data from somewhere to HASH. I'm pre-HASH.

HASH does look like a "what if we could just do the fun bit" solution. You do the grunt work of defining the parser and HASH does the fun bit of making the simulation software.

link