|
|
|
|
|
by code_biologist
957 days ago
|
|
I don't do security, but I have been a data engineer for the better part of a decade and I don't understand what void and unstructured are. Am I the fool? I don't get it. The primitives of many of these ETL systems are structured tables (snowflake, parquet, pandas dataframes, whatever) and I don't think I'd ever choose bytes over structured tables. The unstructured parts of data systems I've worked on have always chewed up an outsize portion of labor with difficult to diagnose failure modes. The biggest cognitive effort win of reverse ETL solutions has been to make external systems and applications "speak table". |
|
In security, binary artifacts are common, e.g., to scan YARA rules on malware samples and produce a structured report (“table”). Turning packet traces into structured logs is another example. Typically you have to switch between a lot of tools for that, which makes the process complex.
(The “void” type is only for symmetry in that every operator has an input and output type. The presence of void makes an operator a source or sink. A “closed” pipeline invariant is one with source and sink, and only closed pipelines can execute in our mental model.)