Show HN: Monty, Mongo tinified. MongoDB implemented in Python

Y	Hacker News new \| ask \| show \| jobs

	Show HN: Monty, Mongo tinified. MongoDB implemented in Python (github.com)
	65 points by davidlatwe 1850 days ago

7 comments

the-dude 1850 days ago

Isn't the creator of MySQL named Monty?

link

davidlatwe 1850 days ago

Yeah, but I didn’t know that at the time I was naming the project. Just a coincidence.:)

link

the-dude 1850 days ago

NP, last time I read something from him was around 2000, when he postulated one should just solve referential integrity in your application.

Purely coincidental :-)

link

ncphil 1850 days ago

... back then, lots of projects were saying referential integrity was an operations problem. That was rubbish, and they knew it. Same for multimaster replication. Fortunately we're past that now. But I like MongoDB for its document model, not its service architecture (although it definitely gets the job done). This Monty is an interesting project, and I'm looking forward to future developments.

link

slver 1850 days ago

Its document model is vanilla BSON on disk.

link

avinassh 1850 days ago

It possible to update the BSON without loading the entire document in the memory? I am curious how Mongo does updates.

link

davidlatwe 1850 days ago

Thanks for the kind words !

link

davidlatwe 1850 days ago

hahaha XD

link

scottrogowski 1850 days ago

It's nice but it's no Mongita ;) https://github.com/scottrogowski/mongita

In all seriousness, it seems like there's a lot of demand for embedded Mongo-like databases. Congrats on a successful Show HN post!

link

davidlatwe 1850 days ago

Thanks ! Yeah, if the job can be done by installing one Python package, why use docker :P

link

deergomoo 1850 days ago

Hah, I love the name. Very clever.

link

vorticalbox 1850 days ago

In nodejs I use mongo memory server[0] for unit testing.

Would this be able to fill this gap in python?

[0] https://www.npmjs.com/package/mongodb-memory-server

link

davidlatwe 1850 days ago

Hmmm, better not using it for unit test I think.

Had a quick look into that mongo memory server and it pull the real mongodb instance for testing so it’s solid. MontyDB in other hand is only a tiny in-completed replica (only had basic CRUD implemented, no aggregation and other fancy stuff).

Could be used for a quick demo for application that requires MongoDB, which was part of my initiative back then.

link

davidlatwe 1850 days ago

> Could be used for a quick demo for application that requires MongoDB

If the demo or production only requires those basic operations of course. Those implemented ops are testing against to real MongoDB, so should be good enough for basic usage.

link

watermelon0 1850 days ago

IMO unit tests can be done against any implementation, even a mock one, so using Monty would be beneficial, if it supports all you need.

However, you need to have integration tests, to ensure that such implementation behaves the same as the MongoDB version you have in the production. Such tests only need to be run when you change the implementation or upgrade production MongoDB.

link

vorticalbox 1850 days ago

Thanks, that's a shame I started a new project in python (Fastapi and mongodb) and was looking for something to fill that gap.

This still looks awesome.

link

davidlatwe 1850 days ago

Thanks !

link

gchamonlive 1850 days ago

That is not a unit test anymore. No matter where you are writing your database. A unit test should test the unit and not rely on any dependency.

link

slver 1850 days ago

That means adapters and drivers are not unit testable.

This differentiation is quite pointless though. If you can spawn the dependency in memory at no cost then why not.

Also not using dependencies is not the point of a unit test. Almost everything has dependencies.

link

hinkley 1850 days ago

All of the logic in the driver can be unit tested. Actually talking to the back end is functional testing.

It's a common mistake that people think that HTTP implementations can't be unit tested too, but the truth is a bit more complicated. These libraries are essentially a wire protocol and people think of them as 'wire-protocol' instead of 'wire, protocol'. If you think of it instead as a codec with IO, then you write the codec separately and you can fully test the codec without mocking any IO at all.

It's nearly tautological that code that is written to be unit tested can be unit tested, and code that isn't cannot be. And because people want to get their code 'working' and then prove that it works, they are battling uphill to get good tests written, because good tests can't be written. And then some people use this as confirmation bias for why they shouldn't have to write tests.

link

slver 1849 days ago

You can’t test the logic without the parts that interpret the logic. Your mock will fail if I change the name of a alias in my query. A database won’t.

Don’t confuse how it’s done with what is done.

link

hinkley 1849 days ago

Don’t confuse functional tests with unit tests. Both are necessary for systems with side effects (ie most systems)

link

gchamonlive 1850 days ago

If a unit test fails under those circumstances you have little assurance before investigating that changes to your unit caused the failure.

If you have to test scenarios in which other dependencies are needed, then you must move up the testing pyramid. That serves to put in check the expectations of what each type of test should accomplish, and when to execute those tests.

link

truth_ 1850 days ago

Not a DB expert.

Why write a DB program in Python? Wouldn't it be slow?

link

CyberDildonics 1850 days ago

You get the legendary data integrity of mongoDB with the refinement of brand new software running at the speed of python. What's not to like?

link

adsharma 1850 days ago

People write raft papers with pseudo code or some hard to comprehend systems language. I've always wondered why there isn't a Jepsen tested python implementation with lots of github stars.

Here is mine:

https://github.com/adsharma/raft

Waiting for a python correctness prover and a transpiler.

link

davidlatwe 1850 days ago

Yep, it's slow. :)

It's not meant to be used in production that has scaled up. This project was mostly for fun.

link

blagie 1850 days ago

I would use it in production. It just depends on what I'm producing. "Production" can range from everything like a banking website serving millions (nope) to a kids toy (yup).

There are a ton of uses where performance doesn't matter, and some where data is even ephemeral or non-critical. These sorts of simple tools are also really nice for test cases, development environments, and a ton of other uses.

I'm developing a tool designed for large-scale data processing, and I have a dummy back-end very similar to this which I use for development.

My computer has close to 4GHz and multiple cores. Essentially anything which ran on an 80486 back in the day will be fast enough in interpreted Python. That's actually a lot of stuff.

And yes, I'm not disagreeing, but agreeing and expanding on your easily-missed disclaimer ("production *that has scaled up*").

link

davidlatwe 1849 days ago

That’s exactly what I meant. :) And thanks for sharing !

link

emehrkay 1850 days ago

I did the same to learn Redis. I wrote a simple graph database using Gremlin's syntax https://github.com/emehrkay/rgp

It was very slow, but interesting.

link

divs1210 1850 days ago

Interested to know if it can run on PyPy and what kind of difference that would make.

link

truth_ 1850 days ago

Thanks. That makes sense.

link

squarefoot 1850 days ago

If I had the chance to write a software whose description used the words Monty and Python, I would do that no matter if the result would be as slow as a dead parrot.

link

VWWHFSfQ 1850 days ago

Depends on what you're doing, I guess. If you're just working with Python dictionaries with occasional background flushes to disk then it would be very fast. Probably close to as fast as anything else. Of course, there's a lot more to a DBMS than just reading/writing in-memory data structures and occasionally saving them on your hard drive.

link

adsharma 1850 days ago

I wrote a python to rust transpiler (py2many) also as a fun project. I won't be surprised if writing a db in python actually becomes viable some years down the road due to the awesome tooling and the idea -> code uninterrupted flow that's possible.

link

CyberDildonics 1850 days ago

You may want to write that, but why would someone want to use a database that could be 50x faster if it was written in a native language? If you write software for yourself in a slow scripting language, that is one thing, but it leaves a huge amount of performance on the table.

This is the same as the problem with electron. People that only know javascript might think it is great to embed a full web browser, but it is selfish to users to push something that they think will run at a normal speed only to have it use 100x the memory and lag during simple operations.

link

adsharma 1849 days ago

The transpiler could in theory generate code in the native language. You can see for yourself:

https://news.ycombinator.com/item?id=27032399

Please file bugs/issues if something isn't working.

link

MillenialMan 1847 days ago

Look up Nubank and Datomic. You'll have an aneurysm.

(Edit: to be clear, I'm agreeing with you)

link

RcouF1uZ4gsC 1850 days ago

> I won't be surprised if writing a db in python actually becomes viable some years down the road due to the awesome tooling and the idea

Long term, complicated, high performance projects worked on by many developers is the Achilles heel of Python. The lack of type safety really bites over a large code base. Also issues with automatic refactoring tools due to the very dynamic nature of Python Deployment and dependency management is also a big issue in Python. Not to mention performance and multithreading.

link

nprateem 1850 days ago

Yeah I had an API server to write. I looked at FastAPI and checked out the example project. So much tooling for formatting, type hinting, linting, deployment, etc. And while the project claims to be "comparable in speed to Go" the benchmarks they linked to showed that meant "significantly slower than". In the end I just went with go instead. Python has it's place but you can avoid a lot of work by using something else sometimes.

link

MillenialMan 1847 days ago

Benchmarks like that are completely artificial anyway, because the real speed difference comes when the code becomes more complex and the dynamic language can no longer be reduced to something like those simple benchmarks, because it's not provable.

And God forbid someone mention the L1 cache and how "benchmarks" are completely different to the cache interactions in real-world dynamic programs.

link

adsharma 1849 days ago

Python has a PR problem:

* That it's a dynamically typed language * That it's not a serious language like C or C++, suitable for writing a 50 line throwaway script

I'd like to convince people that both statements are false. But probably best to use the github issue tracker than HN comments.

link

mysterydip 1850 days ago

Is MongoDB still a recommended production database? The answer seems to change based on what year the question was asked.

link

shoo 1850 days ago

https://jepsen.io/analyses/mongodb-4.2.6

> MongoDB is a distributed document database which claims to offer “among the strongest data consistency, correctness, and safety guarantees of any database available today”, with “full ACID transactions”. Jepsen evaluated MongoDB version 4.2.6, and found that even at the strongest levels of read and write concern, it failed to preserve snapshot isolation. Instead, Jepsen observed read skew, cyclic information flow, duplicate writes, and internal consistency violations. Weak defaults meant that transactions could lose writes and allow dirty reads, even downgrading requested safety levels at the database and collection level. Moreover, the snapshot read concern did not guarantee snapshot unless paired with write concern majority—even for read-only transactions. These design choices complicate the safe use of MongoDB transactions.

link

loloquwowndueo 1850 days ago

But is it webscale? :) https://www.youtube.com/watch?v=b2F-DItXtZs

link

davidlatwe 1850 days ago

Not just yet, stay tune for /dev/null based storage engine !

link

MaBeuLux88 1839 days ago

It's already implemented actually :') ! Trust me. I'm an engineer :D. https://github.com/mongodb/mongo/blob/72ed8227aa029afd554aa5...

link