Hacker News new | ask | show | jobs
by wudangmonk 1105 days ago
The current reddit seems to be written in js/python. I can see the logic behind js since its a lot easier to find devs and thanks to all the hard work of the engineers behind v8 its not a slow language. I do not understand why python was chosen though. If you need more speed you would go with c++ and not python which is the slowest popular language.

Anyone got any insight into why python was chosen?.

8 comments

Because Spez can't read it, he can only parse shapes.

https://www.youtube.com/watch?v=oq7DEUhr7o0

That’s not what he said. He’s actually making an interesting point about readable code. I’m not defending his current behavior, just these comments about readable code.
I can’t tell if this is an AI fake or not, and that scares me.
It was uploaded 9 years ago. Definitely real.
It was chosen because at the time because python was much more mature and productive. They didnt use Django, but as an example Django is still more productive than any Node framework I've tried thus far, too, and we have types now. :)
There's a post explaining exactly why they moved from CL to Python, and it was basically because of library availability.

https://web.archive.org/web/20060206185841/http://reddit.com...

Why they moved away from CL:

"If Lisp is so great, why did we stop using it? One of the biggest issues was the lack of widely used and tested libraries. Sure, there is a CL library for basically any task, but there is rarely more than one, and often the libraries are not widely used or well documented. Since we're building a site largely by standing on the shoulders of others, this made things a little tougher. There just aren't as many shoulders on which to stand."

And why Python was chosen:

"So why Python?

We were already familiar with Python. It's fast, development in Python is fast, and the code is clear. In most cases, the Lisp code translated very easily into Python. Lots of people have written web applications in Python, and there's plenty of code from which to learn. It's been fun so far, so we'll see where it takes us."

So, basically, they were happy with CL except for lack of libraries, and because they also knew Python already, and they knew there were more libraries for Python, they picked it. Simple as that.

NodeJS didn't even exist when Reddit started using Python, as far as I can tell.

Initial release of NodeJS was May 27, 2009; 14 years ago.

Swartz was involved in the development of Reddit until he departed from the company in 2007.

And we know that parts of the Python code for Reddit was written by Swartz.

Wow, that sounds so early to adopt Python!
Python was released in 1991.
And Ruby in 1995, but it wasn't before 2005 (and Rails) that it reached traction. In 2007, albeit 16 years old, Python was still far from mainstream, especially out of its scripting niche.
That doesn’t match my recollection. I’m not sure how to prove anything, but I’ll note that Python 3.0 was released in 2008. If Python 2 wasn’t already extremely popular and well-established at that point, I doubt we’d have seen the brutal 10+ish year migration from it.
That's not true. We were using Python 1.5.2 at a well-funded startup I was at from 2000-2007. We were an early cloud provider and maybe 1/3rd of our customers were using Python. This was in the 3-tier stack days and Zope was fairly popular for the application server.

The internal tooling we built was mostly in Python.

I'd used python on 3 commercial apps by 2007. One for Boeing, one for the navy one for an insurance company. Ruby was a little different. Ruby without rails had no real standards for making big apps. And I'm not sure when gems was made at rails con by Seattle ruby. Rails was for a long time a mega library for ruby... Ruby was missing a ton of features without rails. Similar but worse than a node web app versus working in a react app. No hot reload, no lib directory, many other features.
Aaron Swartz replaced the Lisp with web.py
Yes, web.py was his baby. He also had some markdown parser stuff.

Baby boomers always tell me about seeing The Doors live in dive bar in NYC. I can say I used Reddit when it was written in Lisp. :-)

Back then to us smaller forums goers, Reddit was the new kid in the playground that we didn't like because somehow he was viewed as "cooler" than the other kids.

Even years after, when I had stopped posting on those PHP powered smaller forums I resisted creating a reddit account just because it was "reddit".

I would say, the smaller forums sites were a more tightly knitted group that I attended several meetups with. Reddit just never felt the same.

I'm not sure why python was chosen.

But I contributed a tiny bit of code to reddit back when it was open source and chatted with reddit engineers on IRC, so I do know the official justification (as of 2010ish) for reddit sticking with python:

Websites are mostly IO bound anyway. The hot parts like markdown formatting and templating are already c modules. So the performances gain for rewriting in a compiled language aren't large enough to justify it.

That seems reasonable to me, even today. Am I off-base?
Incredibly reasonable. People love to trash on the performance of Python and JS saying that they're totally unsuitable for backend services with non-trivial amounts of traffic when they're usually the most cost effective solution for a business. These higher level languages are easier to hire for, much easier to prototype, allow for faster iteration, allow for substantially faster on-ramp time, and are fast enough to run these io bound workloads.
I have had the opposite experience with every python team I have interacted with.

On top of it being slow and brittle, dependencies break, needlessly destabilizing stuff.

Worse, the python team ends up having the hardest job, due to self-inflicted problems, so it either ends up with junior devs that don’t know better, or bitter senior devs that could be 10x more productive doing something else.

As always, it varies from company to company, but this is what I saw on four teams out of four at multiple companies.

> dependencies break

You can abuse dependencies in every language. This doesn't sound like a Python problem but a bad tech management problem (i.e. who signed off on allowing 'randomguy69/left-pad' as a dependency).

With Python it doesn't take abuse. You blink and the bloody thing rots away.
we have account servers for a game in java that end up being cpu bound (we have to scale up based on cpu, not network), which is much much faster than python.

so i would be surprised if a server written in python could saturate the network, for a reddit-style workload, which i imagine would be similar.

any1 have relecent experiences to share?

reddit never cared much about performance, they blame it on IO but they chose to use cassandra, they used it as a key value store, then they put python on top of it, with the result that most pages would take seconds to generate and the website would go down almost every day.

With the "new" reddit they replaced most of the frontend with javascript and it really shows. That's not to say I like sites that use too much javascript but python is slow enough just parsing and filling html templates, seemingly.

I have a theory (which nobody I ever met shares) that the visually most beautiful language is the one that long-term results in the fastest development of a project:

Visual noise is like inflation. A constant tax on what you do. But you barely notice it. But it accumulates because it is constantly leading to slightly worse and more complex code. Since it is harder to focus on a noisy something in front of you, you make slightly worse decisions. Over time this accumulates. And similar to how Warren Buffet noticed that a Dollar of his youth is not even worth a penny these days, over time, exponentially more effort is needed to develop the more and more complex system.

So over a longer timeframe, a visually more elegant language wins by a long shot.

Python is the visually most elegant language.

Those hypothetical tiny gains are instantly and completely eclipsed by the lack of static checking support.
__visuallyelegant__
That should trigger the good sense of any “Zen of Python” die-hard. Not saying I haven’t used/got access to double-underscore thingies in Python in my 15+ years of writing Python code, but I always felt dirty in doing that.
But isn’t that the point? The naming convention is for protocols that have syntax support
Scheme has entered the chat.
> Python is the visually most elegant language.

Having program login buried in a nauseating amount of boilerplate doesn’t feel very “visually elegant” :-P

What you’re saying is only partially correct - Yes, for more exploratory programming for a new product with full US/EU/CANZUK based dev workforce it makes sense to prioritize maintainability.

For a company complaining about infrastructure costs for a product that has been cloned many times over, and which is large enough to hire offshore devs, the cost equation favours rewriting in something like Go, Rust or even Java. These are quite tailored towards Reddit’s current use case and many companies including Twitter did that rewrite.

Using a language with a faster runtime makes the system run faster. But it does not compound. So you are limited to a ceiling at about an order of magnitude faster execution.

Making better decisions on a daily basis compounds. You evolve towards a better architecture. Which can make the code multiple orders of magnitude faster.

One of many examples: In a complex system, developers often have a blurry vision of what can be cached, in which situations it can be cached, and how cache invalidation should be done. While in a more elegant system, the borders between different approaches to caching are clearer. Leading to the more elegant system being multiple orders of magnitude faster as it can leverage the different caching layers (in memory cache, on disk cache, the CDN ...) more effectively.

This doesn't sound like argument in favor of Python, though. In my experience, Python code quickly reaches the level of an impenetrable word salad, at which point all the compounding stops.
This depends on your editor color settings though. C++ can look very handsome
C++ looks very handsome.

Usually someone will pop up showing crazy meta programming template code and use it a straw man to declare cpp=bad.

Which is why elegance argument would've favored staying with Lisp - not only it looks elegant, but also you can do all the "crazy metaprogramming" without the crazy, but rather in straightforward and elegant code.

And I say that as someone doing C++ for a living, and occasionally engaging in said crazy template metaprogramming stuff. Occasionally, because people tend to frown at it during code review; apparently, metaprogramming is reserved only to library authors...

I've bee using C++ since the late 90s, and namespaces are still annoying.
Just use `using namespace std;` inside your function and C++ suddenly becomes nice.
Using boost or std in a function is much better than doing it in the header for obvious reasons, but still annoying.
How so?
Although now people use types with python, is it still ?
What other people use does not change how you can use the language. You don't have to use types.

I just took a look at the 4 latest Show HNs which are written in Python. None of them uses types.

I agree with you. C++ and Python are great. Haskell looks the best but it’s way too tiring to write. Go is the worst offender.
Do you actually think CPU usage is a concern here? Database performance and optimization, horizontal scaling, cache optimization (and cache invalidation) are the big problems. Security is also a big problem.

The performance critical part is the database, and that at least used to be written in Java (cassandra db).

Writing the bit of glue code between your high performance database and your fronted in cpp would introduce all kinds of really bad potential bugs, scary remote code execution bugs.

CPU usage of that glue code might take up 10% of your hardware budget worst case, even with a "slow" language. It's just not a concern.

We're not talking about 10% factors, but 10x factors between slow and fast languages. People say CPU usage doesn't matter when you're horizontally scaling, but horizontal scaling increases your ongoing operational costs, and you often are horizontally scaling precisely because your code is too slow.
What percentage of cpu time do you think would be spent in python land?
In my experience one of the limiting factors of a language as you scale is library support. You want to do more complex things (for better or worse) or integrations (monitoring, etc.) but the development cost is very high since you have to write the whole part yourself or make tradeoffs on the limited libraries available.
Ummm...same reason as JS? Why do you think they need more speed to the point they need c++? Do you think finding c++ devs easier than python devs?
> Ummm...same reason as JS? Why do you think they need more speed to the point they need c++?

Well, given complaints how much 3rd party apps cost them, they’d better off optimize their crap first.

It's not really about cost, it's about convincing investors CEO dude is in control and can make reddit profitable for their planned IPO. "We spent a shitton of money migrating to C++ and hiring expensive and harder to replace C++ talent to optimize shit" is not a good selling point.
I know what their plan is, just pointing out their hypocrisy.
I imagine because it's obviously "fast enough" performance-wise, and an order of magnitude faster dev-wise and hiring-wise.
Not fast enough to develop good moderation features with 2000 employees
I would argue that's because they went the microservices + react route, so now everything is probably more difficult to implement, especially since the monolith is still in the center.
Not fast enough to keep their operational at bay though.