Hacker News new | ask | show | jobs
by lifeisstillgood 4463 days ago
Surely there are difficulties in expecting data providers to produce their data in standard formats across industries and countries? I am naive as to how much and what data is available but that seems a stretch
1 comments

If interested, take a look at my project on unifying bike sharing networks data. Besides providing a public API, we are also providing a python library that accesses and abstracts different sources under the same model [1, 2]

There are a lot of accessible sources (though, not documented), but there are also clear examples on how one would never provide a service! Some examples [3, 4]

What I was referring, though, was in a way to avoid having to build an intermediate server scraping services that are perfectly usable (JSON, XML) just because we (all) prefer to build clients that understand one type of feed (standard).

Maybe it's not about designing a language, but just as a new way of doing things. Let's say I provide the client with the clear instructions on how to use a service (its format, and where are the fields that the client understands (in an XPath-like syntax)).

That should be enough to avoid periodically scraping good-player servers, but at the same time being able to build client apps without having to implement all the differences between feeds. Besides, it would avoid being banned for accessing too much times a service, and would give data providers insight on who is really using their data.

Let's say we want to unify the data in Feed A and Feed B. The model is about foos and bars:

    Feed A:
    {
      "status": "ok",
      "foobars": [
        {
          "name": "Foo",
          "bar": "Baz"
        }, ...
      ]
    }

    Feed B
    [{"n": "foo","info": {"b": "baz"}},...]

    We could provide:
    {
      "feeds": [
        {
          "name": "Feed A",
          "url": "http://feed.a",
          "format": "json",
          "fields": {
            "name": "/foobars//name",
            "bar": "/foobars//bar"
          }
        },
        {
          "name": "Feed B",
          "url": "http://feed.b",
          "format": "json",
          "fields": {
            "name": "//n",
            "bar": "//info/b"
          }
      ]
    }
    Instead of providing a service ourselves that accesses Feed A and Feed B
    every minute just because we want to ease things on the client.
Not sure if that's what you asked, though.

[1]: http://citybik.es

[2]: http://github.com/eskerda/pybikes

[3]: https://github.com/eskerda/PyBikes/blob/experimental/pybikes...

[4]: https://github.com/eskerda/PyBikes/blob/experimental/pybikes...

Ok different feeds, same domain, unifying the model sis feasible, either as an intermediate or as a client "template thing"

Thank you - makes sense. I was thinking different data feeds different domains.