Hacker News new | ask | show | jobs
by tantalor 3071 days ago
Very confusing. You say it uses JSON/XML (for data encoding) and also the "problem" is "how information is encoded". What is the encoding problem? And how could JSON/XML be the problem? These are fairly simple encoding formats and well understood.

Are you referring to the data model as the problem? How exactly?

2 comments

The data model is a big part of the problem. There are lots of different ways to encode the same information within HL7. People refer to different "flavors" of HL7, different ones for different implementations (and implementations are not just vendor specific, but site specific — EHR software is very heavily customized for each health system). Add to that doctors and nurses entering information in their own ways within a given health system (since the software isn't very clear or usable), and the data that's getting transferred is a huge mess.

Beyond that, a lot of the most important information is encoded in free text fields, and so isn't directly analyzable. And even when information is mapped to codes from standard medical ontologies, there's no guarantee that when that information is transferred in HL7 formats it includes the code from the ontology.

It's not at all clear what the optimal way to structure medical information should be, so it's no surprise that there's a huge amount of variance out in the world. HL7 is quite old (v2 was made in 1989), and every new variant has to support the existing variants. EHRs were originally designed around billing and administrative workflows, so it's also not surprising that the data structures aren't great for analyzing data or treating patients.

Not the poster you were replying to, but I do healthcare data integration for my day job.

The problem is that HL7 is ostensibly a standardized interchange format, but there's enough ambiguity in the spec that literally every vendor implements things differently which leads to... my job existing.

Vendors implement the spec selectively. They may or may not support any given message trigger. They may have a different idea of what exactly constitutes something as basic as a patient account number and choose to send it in an unexpected field. Or send a piece of data you weren't expecting at all there. There may be a business case for capturing data that wasn't in the spec for a version of HL7 being used -- email addresses are common one today -- that lead to user-defined fields being added ad-hoc.

Honestly, working with HL7 v2 messages like posted above isn't really any substantially harder than working with CSVs. The real headache comes from actually integrating the underlying data.

Poster of the V2 message here. You describing the integration problem and that is the exact problem faced. As far as I understand it, isn't integration exactly what the format is for? It doesn’t do it well.
The standard will generally get you 80-90% of the way there.

There are a lot of factors that go into why the standard fails to be plug-and-play. The fact that v2 is essentially a glorified, somewhat standardized CSV instead of a prettier JSON has next to nothing to do with it.

troyastorino's sibling comment nails a lot of it. There's no standard model for the underlying data, which makes it incredibly difficult, if not impossible, to have a standard transmission format for the data. Literally every individual facility you'll look at is unique and will have their own registration workflows, code sets, etc.

The old V2 spec isn't what I'd call good, but it works. It's ugly to look at, but it's not difficult to work with, either.

The problems you're addressing, however, are far more fundamental to the industry itself and aren't going to be solved by an interchange format.

Maybe so, but I would argue a lot of the key information isn't that different for each type of medical event. (I'm leaving aside scheduling and insurance claims for now because I'm less familiar with things there but there are still probably some commonalities).

Each medical event should have:

- Patient it relates to

- Date it happened (possibly date it started and date it ended instead)

- Who did/prescribed/ordered it

- List of medical codes+coding system tuples that happened on that event

There's tons of other information of course, but these very basic things are universal and should always be in the same place (I refer to FHIR in this case, but format is somewhat irrelevant if the API is good). I understand they're not for historic reasons, and that some might complain because it doesn't exactly fit how they think about things, but a consistent API provides more value and I think will lead to better process down the line.

HL7 is moving along with FHIR, and I think it's a good start, I look forward to where it ends up.

>Maybe so, but I would argue a lot of the key information isn't that different for each type of medical event.

Well, I mean, that's pretty much the entire basis of the HL7 segment paradigm.

>- Patient it relates to

This is a much, much, much harder problem than you'd think.

Patients are going to have multiple identifiers attached to them and resolving them cleanly is literally an industry of its own within healthcare.

And that's precisely why it's a common problem during integrations - which identifier gets used how is generally a workflow and design decision made for a specific site-level implementation.

>- Date it happened (possibly date it started and date it ended instead)

>- Who did/prescribed/ordered it

These usually aren't sticking points for integration because they're the easy ones to get people to agree on.

>- List of medical codes+coding system tuples that happened on that event

These aren't really standardized at the industry level beyond ICD-10 diagnosis codes. Things like insurance provider codes, procedure codes, order codes, etc are individual to sites; even things like ethnicity and gender codes are variable by location.

I don't want it to sound like I'm down on FHIR or that I think HL7v2 is the greatest thing since sliced bread because I don't think either is the case.

The point I'm getting at is that there are huge problems with healthcare data interchange that just plain aren't going to be solved by a better interchange format.