HTMLy: Databaseless Blogging Platform (Flat-File Blog) | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	HTMLy: Databaseless Blogging Platform (Flat-File Blog) (github.com)
	41 points by danpros 4488 days ago

14 comments

spookylukey 4488 days ago

You should really use SQLite!

You will either end up with an ad hoc, informally-specified, bug-ridden, slow implementation of half of SQLite ... or, you will fail to even attempt the features that SQLite gives you - such as locking and dealing with concurrency - and you will have bugs.

For example, you use file_put_contents. See this comment: http://www.php.net/manual/en/function.file-put-contents.php#...

Please don't go and add locking now! (It's hard to get right). SQLite was invented to be a better fopen() - use it! There is no reason not to, if you are requiring PHP 5.3. If you want a simple plain text dump of your SQLite DB, that's not hard to add.

vidarh 4488 days ago

A blogging platform for personal use is not magic. Avoiding things like concurrency problems in small platforms like this is often trivial.

Personally I also use a flat file blogging platform, and I explicitly rejected putting stuff in a database because I want to be able to edit the articles with emacs and check them into a git repository. Concurrency in my case is a non-issue because, well, there's only one of me.

There are plenty of scenarios where you really should not use SQLite nor any other RDBMS because it overcomplicates things that are really exceedingly simple.

yen223 4487 days ago

If you don't want to use a database, go for one of the many static site generators. That way, you won't even need to muck around with PHP.

This project as it stands, quite frankly, looks like the worst of both worlds.

danpros 4487 days ago

> This project as it stands, quite frankly, looks like the worst of both worlds.

It's okay if you think like that, but you should try it first and then compare it with other similar solutions (i.e Ghost).

kijin 4488 days ago

    <!--t Here is the post title t-->

I hate non-standard stuff like this, even though I myself do similar things all over the place. It's the kind of the thing that's OK if you do it to yourself, but immediately rings an alarm when it is distributed to the general public. When you're already doing something non-standard by using a flat file instead of a well-known DB format, you might as well use standards in other places so that people have fewer reasons to complain.

For example, Markdown has a special syntax for <h1> tags. It looks like this:

    This is a title.
    ================

Since it is extremely unlikely that an <h1> tag will be used for anything other than the title of a post, why not use it to mark a line as the title?

Or maybe use the MultiMarkdown convention of colon-separated header fields at the top of the file, like:

    Title: This is the title.
    Tags: foo, bar
    Date: March 3, 2014

danpros 4488 days ago

The initial version was using a standard markup for the title but I change it due to avoid if there is a double h1 :)

For the next point as tags, dates, I chose to put it as a filename for the sake of speed.

kijin 4487 days ago

In HTML5, it's OK to have multiple <h1> tags in a page, provided that they belong to separate sections, articles, etc.

danpros 4487 days ago

If you try it, you will realize that to write an article using the admin panel is quite user friendly :)

hayksaakian 4488 days ago

Don't take this the wrong way, but why PHP?

noonespecial 4488 days ago

Outside of "the bubble" the web is basically made of php and perl. Its like the dark matter of the internet. If you want to reach all they way down to the weekend dabbler, you really need to choose one of those 2, and of those 2, php seems more immediately accessible.

codygman 4488 days ago

Sadly the web is made up of a lot of php code. Luckily this graph lets me sleep better at night:

http://www.indeed.com/jobtrends?q=php&l=&relative=1

vidarh 4488 days ago

I think you'll find that graphs like that are meaningless. I work for an agency that does mostly PHP work, and we never advertise, because for common skills like PHP, recruiters and potential hires inundate us with calls every single day. I expect the very common skills like PHP are severely under-represented in ads for this reason. For less common skills, you need to advertise, as people won't know where to go.

deckiedan 4488 days ago

Presumably because of how many webhosts have it available automatically.

I'd pondered a flat file based php 'blog' engine once - with all the benefits of markdown or whatever - no database complexity, easily rsyncable for deployment, git can keep revisions, etc...

hayksaakian 4488 days ago

Sure makes sense. I asked because I always see post about Jekyll and octopress, and such. That are basically server-less.

deckiedan 4488 days ago

Yep - and there's also some benefits performance and security wise to not having any serverside code at all (other than nginx, or whatever).

The problems you have with jekyll, pelican, et al is that you lose site-side search, 'related posts' (without some reasonably complex compile-side clobber), etc. etc.

Using extremely minimal PHP lets you deploy just as easily, you don't get too much of a performance hit (a hell of a lot better than wordpress, etc), you can still do search, related posts, forms, embedding, and all that.

PHP as a server-side 'clever templating' language really isn't that bad. It's only awful when used to build anything massivly complex (such as joomla! or drupal...), and that it encourages messy project design.

mercurial 4488 days ago

> PHP as a server-side 'clever templating' language really isn't that bad.

I only touch PHP as little as possible, and work with a legacy PHP codebase, but AFAIK, it hasn't evolved a tag to automatically HTML-escape/JSON-escape content. So, it's as good a templating language as it is a programming language: pretty terrible. I'll trade PHP for something as barebones as Python with WSGI + Jinja2 any day.

pestaa 4488 days ago

It's not fair to compare to compare pure PHP and Jinja2. Twig is the PHP-equivalent of Jinja2, and it is an extremely well designed templating engine.

vesinisa 4488 days ago

> It's only awful when used to build anything massivly complex (such as joomla! or drupal...)

Or a Bitcoin exchange

meritt 4488 days ago

If only Mark Karpeles had simply written mtgox in another language, there's no way he would have lost 750,000 bitcoins.

uniclaude 4488 days ago

To be honest, not implying you're wrong, but you could have "related posts" with pre-computation, and "site-side search" with a mix of pre-computed data structures and front-end Javascript (which could be quite efficient, provided the blog is not too large).

danpros 4488 days ago

HTMLy has a built-in search feature and related posts. I use file-naming convention than the speed always fast even though let say the blog has about 3k of posts with hundreds of tags, why? HTMLy don't read the content first but filter it first. I already test it with mini VPS (RAM 128 MB) and no speed penalty.

imdsm 4488 days ago

Interesting choice. A lot of places still use it, but it definitely feels dirty. Not the kind of thing you usually see around here. While it is still used a lot, I have to say, when I saw PHP I immediately closed GitHub.

72deluxe 4488 days ago

Do you close your browser when you find out it is written in C++? Or shut down your OS when you find out it is written in C? Or angrily close the browser tab when you find out something on the page uses JavaScript? Do you then go and wash your hands and wipe the nervous sweat from your brow?

If a tool gets a job done, does the language matter?

No language is perfect. I know PHP feels "dirty" but the misuse of it is likely the cause of widespread disparaging. Even the English language is not perfect. Does this stop you using it?

Just a thought. I tend to close tabs when I come across stuff written in languages I can't write in... :-)

meritt 4488 days ago

Why not? It's a static cache generator. Does the language remotely matter?

est 4488 days ago

Next step: on-demand flat files which is dynamically generated and has an in-memory cache !!!

bananas 4488 days ago

Already got that. It's called the OS buffer cache.

Dynamic generation - make!

sankethkatta 4488 days ago

we can add it to the list: http://staticsitegenerators.net/ :)

bgwhn 4488 days ago

I don't think it's a static generator as there's no compilation step. Rather, they just replaced the database with file lookup. The server still needs a php interpreter.

fahmi182 4487 days ago

Should be added in http://staticsitegenerators.net/

https://github.com/jaspervdj/static-site-generator-compariso...

Currently, this listing is for only projects that are either or both a Flat File CMS and Static Site Generator, but not for projects which are only Dynamic Servers (such as WordPress and Ghost).

HTMLy is flat file cms :)

ncang 4487 days ago

Instead of a headache I prefer simple, copying a site be offline. :)

stevekemp 4488 days ago

The biggest problem with static blogs is the lack of comments. I see this project uses either facebook or disqus which is a solution a lot of people like.

I prefer not to rely upon external comment-providers though, which is why I wrote my own self-hosted comment-server:

https://github.com/skx/e-comments/

wereHamster 4488 days ago

I use jekyll for my blog and looked for ways to integrate comments. I ended up with a simple solution: Provide a per-post unique email address where people can send comments to. The idea was then to manually process the comments and only put the most useful ones on the website.

I wanted to think that the hurdle is very low. It's a bit higher than just entering text in a text box and clicking a button. But sending an email isn't that hard either.

I received maybe half a dozen emails since I introduced my form of static comments. Most in the form of 'Does this work?'. For obvious reasons I didn't publish them. I redesigned my blog a few weeks ago and didn't add the comment feature back in.

https://caurea.org/2012/03/31/this-blog-has-comments-again.h...

stevekemp 4488 days ago

I find that a very interesting solution, but I suspect I'd not use it in practice (to submit comments).

Moving away from the browser to my mail-client to compose things would distract me and cause me to be a little confused.

icebraining 4488 days ago

It's interesting that we're still writing such applications by hand. One thing that interested me when I learned about CouchDB was the possibility of skipping that and just exposing the database to the browser, with a few schemas and a couple of data validation functions configured. After all, that system is almost a dumb HTTP storage mechanism.

stevekemp 4488 days ago

Presumably that means that users can mass-scrape the submitted comments though? (Potentially allowing the email addresses users submitted to be harvested.)

Other than that it doesn't seem like an unreasonable approach.

icebraining 4487 days ago

Why can't they mass scrape your service, though? After all, what you built is essentially a very specialized REST database.

As for harvesting email addresses, I think you could solve that by using a CouchDB view, which is essentially a function that processes and returns JSON documents. In this case, it could just delete the "email" key and return the rest.

You would probably still need to block the direct access to the document via frontend proxy, since I don't think Couch allows you to specify fine-grained per-user permissions, which is definitively a drawback.

Alternatively, since you're already willing to send hashed versions of the emails (as Gravatars), you could just store only the hashes in the first place, and never commit the plaintext to disk.

stevekemp 4487 days ago

I might have been making assumptions on CouchDB which aren't valid - that remote users could query all documents (== pages) to get the comments.

With my thing yes it can be crawled, since requests to /comments/ID will return the JSON comment-data. However there is no enumeration of the valid IDs possible, short of a dictionary attack. (This is where I was thinking that exposing CouchDB might expose more data.)

I did consider not storing emails, and for my use-case that's fine, but I figured sooner or later somebody will want to access them so ruling it out unduly would eventually result in a bug report.

icebraining 4487 days ago

I might have been making assumptions on CouchDB which aren't valid - that remote users could query all documents (== pages) to get the comments.

Yes, you'd probably need to block that URL with a proxy, and only allow single page views to be requested. I think this is definitively a shortcoming of the BD; it should allow finer grained permissions.

However there is no enumeration of the valid IDs possible, short of a dictionary attack.

Well, by default CouchDB uses UUIDs, so enumeration shouldn't be possible either. Of course, both are subject to simple scraping of the HTML pages; a simple wget + grep can probably list them all, so you don't gain much, except for private pages you might have.

I did consider not storing emails, and for my use-case that's fine, but I figured sooner or later somebody will want to access them so ruling it out unduly would eventually result in a bug report.

Fair enough. I actually don't think CouchDB, as it is now, would necessarily be a better solution than yours. But the question is, why not? I believe the direction is correct, but the current implementation falls short, and that's a shame.

themichael 4488 days ago

We use BitNami and Wordpress to generate static files. Works great. Paired with cloudflare, the site is faster than 99 % of the web (according to http://tools.pingdom.com/fpt/).

w4rh4wk5 4488 days ago

I created a similar project, you might want to have a look at it https://github.com/W4RH4WK/Phoenix-Tpl

MrBlue 4485 days ago

How does this compare to Dropplets?(https://github.com/Circa75/dropplets)

danpros 4485 days ago

Read this section https://github.com/Circa75/dropplets#writing-posts HTMLy no need complicated stuff like this :)

evacchi 4488 days ago

well, let me add my old project http://flatpress.org it does feel dated, as it uses BBCode, but there plugins for Markdown. If I had the time, I would make MD the default nowadays.

danpros 4482 days ago

That is a great project, before make this things I already try yours :)

ivanhoe 4488 days ago

Never really understood why people try to avoid DBs? It's OK for a very basic pile-of-online-texts blog, but the moment you try to show whatever relation between blogposts (e.g. the related articles or the latest articles) you end up reinventing the wheel

vidarh 4488 days ago

Most blogs are small enough that caching all the content in memory and sorting/selecting is so cheap and simple it really makes no difference these days, and you gain simplicity.

My blog is flat file because I like to work on a version on my home server, editing stuff in emacs, commit to git and push an updated version atomically. Even if I continue writing at my current pace for the next 100 years, my current server would hardly notice having to re-read every single article.

As for reinventing the wheel, the code for pulling in the articles from flat files and slicing and dicing them simply by iterating over an in memory connection is so small and simple that there's hardly any wheel to re-invent.

bowerbird 4487 days ago

i'm glad to see there is a pleasant range of disagreement on some of the crucial dimensions running through this...

still lots of room for experimentation, i would say.

-bowerbird

yen223 4488 days ago

Replacing the database with flat files? We have gone full circle, haven't we.

I can think of a lot of disadvantages with not using an actual database. What's the benefit of going back to flat files?

vidarh 4488 days ago

> What's the benefit of going back to flat files?

Edit in your editor of choice. Use your source control for edit history. Trivial to keep a separate instance to edit/test on, and push with rsync. Makes it trivial to treat the code and the content as one unit, so that e.g. if you change parsing of the articles, update the articles and need to revert, you don't need to mess with reversing database updates separately).

Fewer moving parts. For a typical modern blog with comments farmed out to Disqus or similar, the data is likely to be tiny and very static. My blogs data, for example, is about 8.3MB of text that changes maybe a couple of times a month, so the 1-2 second cost of reading every article I've ever written in from individual files on disk is hardly an issue.

sheraz 4488 days ago

I wonder what simple blogging / CMS engines are out there that use sqlite as the backend. This also has the benefits of simple backups without the complexity of another process running. Most (if not all) webhosts will have this baked into their PHP installs.

biturd 4487 days ago

Perhaps using sqlite as the "backed" in that you build it like any other blog, so you get the simplicity of building it out, using one good sql framework etc.

But then, when you click "post" it rips through the database using a library of markdown -> HTML or whatever the case may be. Thhere can't be must overheard to a single include to pull in the html, but you could render the enter page. I just can't see php falling down too much with a few includes and functions being called to generate a header, the included rendered HTML, and a footer with some design elements.

It's not much fun building your own database out of flat files. I cut my teeth on a Mac only system that more or less only talked to Filemaker, which was only as fast as the actual screen could redraw and search out the data. It could be painfully slow.

In a way I am glad, as I learned how to do things and think differently when most were just running a "select * from foo where bar = 'x'" which was a 15 minute luxury I didn't have. There were no joins, no tables, you actually ran applescripts on the database and it returned the data somehow back to a web server on the Mac.

So we used the database on the backend, where admins could be more patient, or do more intricate things, but almost always generated out some HTML, so in the end, the site was semi-dynamic. I think I was doing "caching" of data as a result almost 15 years ago.

The rule was, no more than 2 database calls per page, ever. And you couldn't do things like update foo set name = 'me' where id = 1, because there was no exposed notion of a record id, it was internal, so you had to select name from foo where name = 'whatever' which would return the name, but also a RedID value, which you then got to run another query to update foo set name = 'me' where if = <special RecID token>

Made me think a bit different.

bananas 4488 days ago

cp/tar/rsync/git/svn to name a few.

HN is apparently flat file based.

I actually have a 100% static site deployed in production. It is served off nginx, built with make, shell and sed (does some include processing and index generation) and is deployed with rsync from make and uses git for version control.

chc 4488 days ago

The design of news.arc is notoriously quirky. Given the fact that paging is entirely based on continuations, I don't think "flat-file-based" covers it.

bananas 4488 days ago

That explains a lot. I get the principle of why this might be done but boy is it ugly. You'd have to maintain state between pages. State is bad. Which probably explains all those link expired errors...

gararapa 4488 days ago

Why are you versioning the dependencies?

danpros 4488 days ago

Did you mean the PHP 5.3? There is a library that requires PHP 5.3 or greater.

gararapa 4487 days ago

No. I meant the dependencies contained in vendor/ directory. And the composer.phar file itself. Is there a reason why you're versioning these things?

danpros 4487 days ago

Basically I just want all dependencies works as well as when I test it, and in accordance with the guidelines getcomposer.org :)

This platform prioritizes writing through the admin panel, the convenience for users, particularly for non programmmer or for those who are not familiar with coding at all.

If you have other views, you can contribute to the project, so that we can discuss it further.