|
|
|
|
|
by afpx
2911 days ago
|
|
Could you talk a bit about the challenges that you faced in developing this? And, the before ans after benchmarks? I tried something kinda similar to help with tuning data engineering jobs and pipelines for performance and costs. But, it turned out to be a fruitless activity because there were too many variables that affected performance. I’d produce some models that seemed to be marginally effective. But, after code changes, configuration changes, changes to data input sizes, the models quickly became stale and ineffective. |
|
(Isn’t all ML about good labels and features? :-) )
Structured ML systems require you to provide (ideally) unambiguous examples of expected behavior. In case of Spiral (or any other online learning) such examples need to be generated automatically. In our experience this part took a good amount of effort: distributed systems issues (aka race conditions and transient bugs in remote systems) made automatic generation of “clean” examples difficult. Once the bulk of these problems were addressed the system began to operate very smoothly. Specifically, it adapted to changing conditions very well.
Spiral is designed to be a drop-in replacement for hand-coded heuristics. In other words, if you had a somewhat working tree if-else statements that specified your image caching policy (if size<100k and type==jpeg..), you should already have an idea for what features to use. There is a bit of work involved in translating these features into the form suitable for classifiers in Spiral. For example, if a classifier is using binary features, the file size feature would need to be quantized (123kb -> “100-200kb bucket”). While this type of work requires forethought and effort, runtime cost of running this classifier is very low.