Hacker News new | ask | show | jobs
by 3pt14159 1360 days ago
Don't use this. Marshal has too many issues. If you really need persistence and can't use something like Postgres, use the Ox gem instead. It's more reliable between versions of Ruby and easier to parse from other languages if you ever have to.
3 comments

> use the Ox gem

The main thing is that it's part of the standard library. If you import a gem anyway, often you'd be well off with sqlite.

As for storage format, there's also:

https://ruby-doc.org/stdlib-3.1.2/libdoc/yaml/rdoc/YAML/Stor...

I love the simplicity of YAML::Store. It was introduced in Ruby 1.8, almost 20 years ago (https://github.com/ruby/ruby/commit/55f4dc4c9a5345c28d0da750...).

I even created a little gem when I was starting with Ruby, 10 years ago, that was a very thin wrapper around it so that I could play around using an ActiveRecord like syntax (https://github.com/brunnogomes/active_yaml). I used in some pet projects so I could do stuff like:

  p = Post.new
  p.title = "Great post!"
  p.body = "Lorem ipsum..."
  p.save

  Post.all # => [#<Post:0x895bb38 @title="Great post!", @body="Lorem ipsum...", @id=1>]

  Post.find(1) # => #<Post:0x954bc69 @title="Great post!", @body="Lorem ipsum...", @id=1>

  Post.where(author: 'Brunno', visibility: 'public')
  # => [#<Post:0x895bb38 @author="Brunno", @visibility="public", @id=1>, #<Post:0x457pa36 @author="Brunno", @visibility="public", @id=2>]
And have access to the data directly in the YAML files.

Good times!

The problem with YAML is that meaningful whitespace means that the size grows quickly for highly nested documents. I don't love XML, but there is a reason I recommended Ox. I've used it for real projects and it never fell over like so many of the alternatives I've tried where databases were not in the cards.
The problem with XML is that angle bracket expressions take up too much space because you need to duplicate element names. I don't love JSON, but there is a reason I recommend OJ.

...

The problem with JSON is that the keys take up too much space because they are duplicated. I don't love BSON, but there's a reason why I recommend bson-ruby.

And I could keep going... ;)

The benefit of using YAML is precisely that there's meaningful whitespace. Different strokes for different folks.

I don't get the value of "it's in the standard library". Ruby has the amazing (fir scripts) require "bundler/inline" that allows you to use a single file for code and Gemfile, as well as auto installing the dependencies, so going for standard library doesn't seem to provide any practical value except offline support
I used pstore for an ad-hoc monitoring service on an outdated windows server running an outdated ruby version - it was easy to set it up to run from task scheduler every five minutes and check resident memory of an old ruby service - logging the ram, and killing/restarting it if it was over 1 GB (this all on 32bit ruby with the limits of 4gb address space per process).

Sure there are many things that "should" have been fixed above - but just having any old ruby version on hand was enough to help check for a memory leak and mitigate it - while taking the time to figure out if the leak could be plugged.

And offline support (a server in dmz/locked down wrt new software) is big too!

Is Marshal still tied to Ruby version? Boy was this fun about ten years ago for a system I inherited that Marshaled huge complex objects into TokyoTyrant and back. You try migrating or upgrading a system where the runtime version is tied to EVERY object in a database.
> too many issues

Such as?

Marshal is Ruby's version of pickle in Python: it serializes arbitrary objects, which means that correct deserialization requires arbitrary code execution.

This is bad enough on its own, but it also makes pivoting a file read/write primitive into code execution much easier.

Why the "don't use it"? Just say "use it with caution" or, since we are being rude telling people what to do whenever pickle or marshal comes up, just don't say anything and assume people know what they are doing.
I don't think I phrased that in a particularly rude way, but I'm sorry if it came across as rude.

The answer is that we have serialization techniques that are as good on all the dimensions that matter (speed, serialized size, etc.) and better in terms of security. Pickle and Marshal are, at best, footguns in otherwise very safe language ecosystems.

> The answer is that we have serialization techniques that are as good on all the dimensions that matter

I'd look at that sentence with great skepticism. What could possibly surpass a conversion to raw object representation? Do you mean libraries which require you to use protocol languages like protobuf or inheritance?

https://github.com/ruby/psych defaults to only loading permitted classes since 4.0 so that seems less of a concern now?
`psych`, used for YAML, is a different thing than Marshal. pstore uses Marshal. https://ruby-doc.org/core-2.6.3/Marshal.html. I don't believe psych will be involved with pstore.

I'm honestly not sure, though, how much I should be worried about the fact that someone who has write access to my database can maybe escalate that to an arbitrary code execution if I use pstore. Literally not sure. Write access to my DB seems pretty disastrous already...

Pickle is fine (in a pinch). It's not meant for untrusted data.
Anything is fine when the data is trusted. The problem is that the data is almost never actually trusted :-)