Hacker News new | ask | show | jobs
Show HN: Monty, Mongo tinified. MongoDB implemented in Python (github.com)
65 points by davidlatwe 1850 days ago
7 comments

Isn't the creator of MySQL named Monty?
Yeah, but I didn’t know that at the time I was naming the project. Just a coincidence.:)
NP, last time I read something from him was around 2000, when he postulated one should just solve referential integrity in your application.

Purely coincidental :-)

... back then, lots of projects were saying referential integrity was an operations problem. That was rubbish, and they knew it. Same for multimaster replication. Fortunately we're past that now. But I like MongoDB for its document model, not its service architecture (although it definitely gets the job done). This Monty is an interesting project, and I'm looking forward to future developments.
Its document model is vanilla BSON on disk.
It possible to update the BSON without loading the entire document in the memory? I am curious how Mongo does updates.
Thanks for the kind words !
hahaha XD
It's nice but it's no Mongita ;) https://github.com/scottrogowski/mongita

In all seriousness, it seems like there's a lot of demand for embedded Mongo-like databases. Congrats on a successful Show HN post!

Thanks ! Yeah, if the job can be done by installing one Python package, why use docker :P
Hah, I love the name. Very clever.
In nodejs I use mongo memory server[0] for unit testing.

Would this be able to fill this gap in python?

[0] https://www.npmjs.com/package/mongodb-memory-server

Hmmm, better not using it for unit test I think.

Had a quick look into that mongo memory server and it pull the real mongodb instance for testing so it’s solid. MontyDB in other hand is only a tiny in-completed replica (only had basic CRUD implemented, no aggregation and other fancy stuff).

Could be used for a quick demo for application that requires MongoDB, which was part of my initiative back then.

> Could be used for a quick demo for application that requires MongoDB

If the demo or production only requires those basic operations of course. Those implemented ops are testing against to real MongoDB, so should be good enough for basic usage.

IMO unit tests can be done against any implementation, even a mock one, so using Monty would be beneficial, if it supports all you need.

However, you need to have integration tests, to ensure that such implementation behaves the same as the MongoDB version you have in the production. Such tests only need to be run when you change the implementation or upgrade production MongoDB.

Thanks, that's a shame I started a new project in python (Fastapi and mongodb) and was looking for something to fill that gap.

This still looks awesome.

Thanks !
That is not a unit test anymore. No matter where you are writing your database. A unit test should test the unit and not rely on any dependency.
That means adapters and drivers are not unit testable.

This differentiation is quite pointless though. If you can spawn the dependency in memory at no cost then why not.

Also not using dependencies is not the point of a unit test. Almost everything has dependencies.

All of the logic in the driver can be unit tested. Actually talking to the back end is functional testing.

It's a common mistake that people think that HTTP implementations can't be unit tested too, but the truth is a bit more complicated. These libraries are essentially a wire protocol and people think of them as 'wire-protocol' instead of 'wire, protocol'. If you think of it instead as a codec with IO, then you write the codec separately and you can fully test the codec without mocking any IO at all.

It's nearly tautological that code that is written to be unit tested can be unit tested, and code that isn't cannot be. And because people want to get their code 'working' and then prove that it works, they are battling uphill to get good tests written, because good tests can't be written. And then some people use this as confirmation bias for why they shouldn't have to write tests.

You can’t test the logic without the parts that interpret the logic. Your mock will fail if I change the name of a alias in my query. A database won’t.

Don’t confuse how it’s done with what is done.

Don’t confuse functional tests with unit tests. Both are necessary for systems with side effects (ie most systems)
If a unit test fails under those circumstances you have little assurance before investigating that changes to your unit caused the failure.

If you have to test scenarios in which other dependencies are needed, then you must move up the testing pyramid. That serves to put in check the expectations of what each type of test should accomplish, and when to execute those tests.

Not a DB expert.

Why write a DB program in Python? Wouldn't it be slow?

You get the legendary data integrity of mongoDB with the refinement of brand new software running at the speed of python. What's not to like?
People write raft papers with pseudo code or some hard to comprehend systems language. I've always wondered why there isn't a Jepsen tested python implementation with lots of github stars.

Here is mine:

https://github.com/adsharma/raft

Waiting for a python correctness prover and a transpiler.

Yep, it's slow. :)

It's not meant to be used in production that has scaled up. This project was mostly for fun.

I would use it in production. It just depends on what I'm producing. "Production" can range from everything like a banking website serving millions (nope) to a kids toy (yup).

There are a ton of uses where performance doesn't matter, and some where data is even ephemeral or non-critical. These sorts of simple tools are also really nice for test cases, development environments, and a ton of other uses.

I'm developing a tool designed for large-scale data processing, and I have a dummy back-end very similar to this which I use for development.

My computer has close to 4GHz and multiple cores. Essentially anything which ran on an 80486 back in the day will be fast enough in interpreted Python. That's actually a lot of stuff.

And yes, I'm not disagreeing, but agreeing and expanding on your easily-missed disclaimer ("production *that has scaled up*").

That’s exactly what I meant. :) And thanks for sharing !
I did the same to learn Redis. I wrote a simple graph database using Gremlin's syntax https://github.com/emehrkay/rgp

It was very slow, but interesting.

Interested to know if it can run on PyPy and what kind of difference that would make.
Thanks. That makes sense.
If I had the chance to write a software whose description used the words Monty and Python, I would do that no matter if the result would be as slow as a dead parrot.
Depends on what you're doing, I guess. If you're just working with Python dictionaries with occasional background flushes to disk then it would be very fast. Probably close to as fast as anything else. Of course, there's a lot more to a DBMS than just reading/writing in-memory data structures and occasionally saving them on your hard drive.
I wrote a python to rust transpiler (py2many) also as a fun project. I won't be surprised if writing a db in python actually becomes viable some years down the road due to the awesome tooling and the idea -> code uninterrupted flow that's possible.
You may want to write that, but why would someone want to use a database that could be 50x faster if it was written in a native language? If you write software for yourself in a slow scripting language, that is one thing, but it leaves a huge amount of performance on the table.

This is the same as the problem with electron. People that only know javascript might think it is great to embed a full web browser, but it is selfish to users to push something that they think will run at a normal speed only to have it use 100x the memory and lag during simple operations.

The transpiler could in theory generate code in the native language. You can see for yourself:

https://news.ycombinator.com/item?id=27032399

Please file bugs/issues if something isn't working.

Look up Nubank and Datomic. You'll have an aneurysm.

(Edit: to be clear, I'm agreeing with you)

> I won't be surprised if writing a db in python actually becomes viable some years down the road due to the awesome tooling and the idea

Long term, complicated, high performance projects worked on by many developers is the Achilles heel of Python. The lack of type safety really bites over a large code base. Also issues with automatic refactoring tools due to the very dynamic nature of Python Deployment and dependency management is also a big issue in Python. Not to mention performance and multithreading.

Yeah I had an API server to write. I looked at FastAPI and checked out the example project. So much tooling for formatting, type hinting, linting, deployment, etc. And while the project claims to be "comparable in speed to Go" the benchmarks they linked to showed that meant "significantly slower than". In the end I just went with go instead. Python has it's place but you can avoid a lot of work by using something else sometimes.
Benchmarks like that are completely artificial anyway, because the real speed difference comes when the code becomes more complex and the dynamic language can no longer be reduced to something like those simple benchmarks, because it's not provable.

And God forbid someone mention the L1 cache and how "benchmarks" are completely different to the cache interactions in real-world dynamic programs.

Python has a PR problem:

* That it's a dynamically typed language * That it's not a serious language like C or C++, suitable for writing a 50 line throwaway script

I'd like to convince people that both statements are false. But probably best to use the github issue tracker than HN comments.

Is MongoDB still a recommended production database? The answer seems to change based on what year the question was asked.
https://jepsen.io/analyses/mongodb-4.2.6

> MongoDB is a distributed document database which claims to offer “among the strongest data consistency, correctness, and safety guarantees of any database available today”, with “full ACID transactions”. Jepsen evaluated MongoDB version 4.2.6, and found that even at the strongest levels of read and write concern, it failed to preserve snapshot isolation. Instead, Jepsen observed read skew, cyclic information flow, duplicate writes, and internal consistency violations. Weak defaults meant that transactions could lose writes and allow dirty reads, even downgrading requested safety levels at the database and collection level. Moreover, the snapshot read concern did not guarantee snapshot unless paired with write concern majority—even for read-only transactions. These design choices complicate the safe use of MongoDB transactions.

Not just yet, stay tune for /dev/null based storage engine !
It's already implemented actually :') ! Trust me. I'm an engineer :D. https://github.com/mongodb/mongo/blob/72ed8227aa029afd554aa5...