Hacker News new | ask | show | jobs
by ralferoo 721 days ago
I remember reading an article on Source Code In Database back in the early 2000s, and it's been knocking around my brain ever since as something I ponder every couple of years. I just can't shake the feeling that there's the gem of a future paradigm where everyone wonders "why we didn't always do it that way?", but then every time I try to follow those thoughts through to a conclusion, it always feels like it'd just be re-implementing Smalltalk, and then the question is "why isn't Smalltalk more popular?"

That said, there's a lot to be said for revisiting old ideas. There was so much interesting research done in the 60s and 70s in all sorts of random directions, maybe because at that time there were no precedents or expectations for how things should be done. There are so many untapped resources here, it's crazy. Every now and then I re-watch "The Mother of All Demos" [1] from 1968 where Douglas Englebert demonstrates some of the research at Stanford or the Sketchpad Demo [2] from 1963 where Ivan Sutherland is presenting a GUI-based CAD system.

Fortunately, these ideas have now been picked up again, but to me it's interesting to note just how long a time lapsed between these ideas and becoming mainstream. Some of it is obviously the cost as the state-of-the-art research machines were massively more powerful than the home computers even 2 decades later, but I'm sure there were a lot of great ideas that have just been forgotten.

Part of the problem, I think, is that we have found solutions to some of the easy problems and optimised it to such a degree that it's then hard to ever go back and revisit the alternative approaches because you'd need to regress so far from the current levels of expectations.

[1] https://www.youtube.com/watch?v=yJDv-zdhzMY [2] https://www.youtube.com/watch?v=6orsmFndx_o

2 comments

InterSystems Caché has some MUMPS like qualities.

It is easily the worst developer experience conceivable. Easily tied for last place in the pantheon of turrible ideas realized thru turrible implementations.

Sure, you can bork a SmallTalk env with some ill-advised changes to the runtime.

Caché is so brittle, you can bork your env with a compiler error. And there's no feedback. And since the env is a blob, there's no version control.

https://en.wikipedia.org/wiki/InterSystems_Caché

> > Source Code In Database > ?

Rather than think about specific closed environments (which may be an unavoidable consequence of SCID), I was thinking more generically about the issues. At the time, I was firmly in the Java large webapp space, so I was mostly thinking about how you could target a JVM.

In terms of the actual article, there's a bunch of links on Wikipedia [1] but I specifically was referring to an older version of this [2] article (I think, I don't remember it being as garish colours) and I think I found it via c2 [3].

None of these quite match up with what I thought I remembered which informed a lot of my thoughts back then, but I was mostly thinking about what the UI for such a system might look like because a nice friendly GUI isn't necessarily optimal for an experienced programmer who's probably happiest writing and seeing their code as a big chunks of text, and also how sometimes you want code that the linter would hate because you've deliberately formatted something to make it easier for humans to understand.

I was then thinking about how you could abstract and generalise statements and sub-expressions into small mini-functions that weren't complete functions per se, but more like templates. I spent a long time thinking how one might do code de-duplication by copying a graph of code, and then changing some of the nodes in a copy-on-write style thing, but decided there was no easy way of programatically deciding which part of the tree were being fixed due to a bug and needed to be shared with all copies, and which were just modified inputs or local changes. In terms of code, it's not that hard to do, but presenting in an intuitive way in a UI is much harder, especially if one of the goals is to make things easier for a novice programmer.

[1] https://en.wikipedia.org/wiki/Source_Code_in_Database [2] https://www.mindprod.com/project/scid.html [3] https://wiki.c2.com/?SourceCodeInDatabase

> experienced programmer who's probably happiest writing and seeing their code as a big chunks of text

Not if you're a Smalltalk programmer.

"ENVY/Manager augments this model by providing configuration management and version control facilities. All code is stored in a central database rather than in files associated with a particular image. Developers are continuously connected to this database; therefore changes are immediately visible to all developers."

https://www.google.com/books/edition/Mastering_ENVY_Develope...

I kind of feel that we're talking at cross-purposes here. To me, whether the code is in a database, in-memory, serialised to a file isn't all that important as they're all just representations of the same AST. For me, the label "source code in database" is in comparison to "source code in linear files that need not have any inherent structure".

Also, perhaps my use of the phrase "experienced programmer" is being interpreted negatively. I'm not trying to imply that someone who has a lot of experience using a graphical programming language is less experienced, I'm using it as a shorthand for "experienced programmer of a traditional text-based language". I'll continue to do so for this reply too, because adding that caveat every time I use that shorthand makes the actual meaning I was trying to convey much harder to see.

As regards to the writing and seeing code as text comment, I meant that experienced programmers will probably find writing something like

sin(angle) * radius + offset

in a textual form the quickest way of expressing that idea, especially if their IDE supports auto-completion of variables. Most programmers generally also prefer to visualise their code the same way they wrote it, so presenting it back to them as text makes sense.

A novice programmer might prefer to see that as a graph of operator nodes because it guides them through the process. Even better if they can organise the nodes in the layout they want, as some people remember visually and can use the distinctive look of each area of "code" to navigate when zoomed out.

Certainly, in game development, I've seen fairly non-techy people create massive Blueprint graphs for Unreal this way, but they wouldn't consider themselves a programmer and were be scared by the prospect of a screenful of code that does the same thing. On the other hand, as someone who's used text to code for decades, I find Blueprints to be horrendously slow for me to understand when presented with someone else's "code" because there are far fewer "social code norms" being obeyed and people just do whatever makes sense to them.

I actually think in the above example of a short expression, the best solution for a novice programmer probably isn't even a graph, but actually closer to the text for an experienced programmer. Most people will have had exposure to formulas at school, so they might well prefer to see something closer to the traditional text form of the source code, but with tools to guide them to entering that like the "Insert Equation" that's been in Word for decades - so for instance, you might insert a divide symbol and then fill in the two boxes, etc.

The point is that both code as text and code as a graph can both be used to express the same AST at the end of the day, and people should be able to use whichever make most sense to them or makes them most productive. The tricky part then comes when people have chosen to format their code or layout their graph in a particular way that adds meaning and aids understanding of the graph / source code without actually affecting the AST. I'm specifically talking about formatting here rather than comments, which is a similar problem of how you you would build a code comment to a specific part of the tree in a graphical view, and likewise graphical views might have "visual sections" of the graph that don't necessarily map neatly to a linear sequence of source lines.

> I'm using it as a shorthand for "experienced programmer of a traditional text-based language".

And I was telling you that experienced Smalltalk programmers don't work with "big chunks of text".

Small snippets of text, presented in context.

Sorry I'm not seeing enough clarity to wish to continue.

fwiw

> big chunks of text

"Lines of Code" / "Total Methods" = 7

https://dl.acm.org/doi/pdf/10.1145/74878.74904

Decided to split the off-topic part off into a reply so that it didn't distract from the answer!

In terms of over-optimisation forcing a certain technologies to be developed and others to be ignored, one example I'm very familiar with is computer graphics. I'd written a TON of stuff here, but decided to simplify it as it was labouring to specific a point.

But our computer graphics state-of-the-art was roughly along these lines: drawing all edges of polygons, hidden-line removal (Sutherland), clipping intersecting polygons (Hodgman), filling polygons with a single colour, *, Gouraud shading, Gouraud shading with smaller triangles, Phong shading with bigger triangles, texturing, fixed texture and lighting pipelines, pixel shaders, vertex shaders. I'll also add compute shaders too, but that was more of a generalisation of what people were starting to do with pixel shaders operating on data that wasn't really pixel data.

Now, you'll notice my * around the time of single colour filled polygons... this might not be the correct place to put the *, but around this point some people started experimenting with ray-tracing and got amazing results, just incredibly slowly. These were seen as the "gold standard", but because drawing triangles was much faster, this is where the money continued to be poured into, optimising and optimising this special case, discovering more techniques to "approximate" the right image, but trying to avoid the hard work of actually rendering it. Over time, things have got closer and closer to ray tracing, except transparent and shiny objects have always been the achilles heel.

Fortunately industry's interest in ray-tracing has resumed, and now compute shaders are general enough that they can be used, but they're still orders of magnitude slower because the renderer needs to consider the entire scene not just a triangle at a time, so you need to store the scene in some kind of tree that's paged in on demand and different latencies for different pixels causes problems for the SIMD architectures. We're starting to see more and more consumer-level hardware with decent ray-tracing performance now, but it's been a decade of lost time in terms of optimisation from where it could have been if the entire market hadn't been competing only in making triangles rasterise more quickly.

In the ray-tracing space, we still see that it's too slow to create perfect images (for very complicated scenes with lots of shiny surfaces and few lights, you might need thousands of rays per pixel to just get a handful that actually reach a light source), so we've invented all sorts of approaches to cover it up - whether it's training an ML model to guess the real colour for black pixels from neighbouring ones, or re-projecting pixels from a previous frame to fill in the games, etc.

Personally, I can't help but think the real breakthrough in performant raytracing will come from tracing light from the light sources instead. This wasn't done traditionally because potentially it's even more expensive than tracing backwards from the pixel, but should be more accurate when there are multiple light sources.

But even the latest batch of hardware is all focused on raytracing, which I think is missing the biggest trick of all - they could be using cone-tracing as a first approximation and then subdividing the cone into smaller and smaller chunks until they're approximately pixel sized. None of this is new, it's just not what the larger industry is doing right now, because it's cheaper and easier for them to do rays instead.