| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sethhochberg 237 days ago

I'm leading a team that is doing an incremental migration to Gleam by delegating specific parts - we're basically pushing the "functional core, imperative shell" pattern to its macro extreme and having Gleam pick up background jobs from our existing Ruby on Rails codebases where we have heavy calculation tasks.

This approach is notably not making proper or really any use of OTP features, but it was an extremely easy way to adopt a safe, fast, functional language for number-crunching workflows while continuing to lean on Rails for everything else its great at: web interfaces, HTTP APIs, etc.

Rails is basically the configuration tool for the various inputs of a job, and job is passed to Gleam via Redis as an atomic set of config inputs to be used when processing a dataset (usually big CSV files streamed from object storage). We use a very thin Elixir wrapper to do all network and file IO etc, Gleam modules are pure business logic called from Elixir.

Some day soon, I'm going to try and write up a longer technical article about this approach... it comes up surprisingly often in HN conversations.

1 comments

cedws 237 days ago

Did I understand correctly you’re using it for file processing? If so does it yield reliability benefits? We have an assortment of jobs written in Go that process files of various types (CSV, Parquet, TXT) in S3 too. The issue we have is that our Kubernetes jobs crash all the time when they encounter something unexpected. Obviously we should invest into making them more robust but what we really want is some way for the jobs to continue processing whatever they can instead of crashing and starting over.

link

sethhochberg 237 days ago

In our case, the files being processed are datasets that have already been normalized through another ETL tool. Since we're doing the preprocessing ourselves elsewhere, our Gleam parsers are set up to expect a pretty rigid set of inputs. We do all of the file IO / streaming in Elixir and pass the raw data into Gleam as Elixir maps: so Gleam just takes maps, parses them into types pretty rigidly, and our entire Gleam module ecosystem assumes "perfect enough" data.

If we encounter row-level errors in a batch, we log those alongside the outputs. There's nothing particularly intrinsic about out usage of Gleam that prevents the workers from crashing during processing, its all about having error handling set up within the job itself to avoid killing the process or pod running it.

link