Hacker News new | ask | show | jobs
by wenc 2850 days ago
This is what I learned from designing data architectures:

If your data is variegated in format and form, don't look for a single tool/method solution to organize them. There is no single all-purpose organization method.

Don't start with figuring out a method to organize your information. You will end up overengineering your org method.

Instead, organize your information per "use-case". Start with a specific use-case/project (e.g. writing a blog or paper), and work backwards to figure out how to organize your data to meet the requirements of that project. Do a couple of iterations. After a couple of projects, and you will naturally discover your own data use patterns.

If you go with a super organization system on day 1, it will likely be too general and require too much effort (tagging, keywords, hierarchies, version controlled, branches, etc.) you will end up expending resources on metadata management on data that you may never ever need to retrieve and very soon you will abandon the effort.

My PKM system is very simple: a single unorganized Google Docs for quick thoughts and ideas (just bullet points), separate Google Docs files for specific projects, etc. and Dropbox for files. It's simple, searchable, and multi-device.

I also occasionally use some specialized tools like Jabref (BibTeX) for specific types of data like references, but I hardly ever write papers anymore, so these have fallen by the wayside.

I've tried wikis but due to their multipage nature, they segment knowledge too finely (often there are wiki pages hidden in deep in the link hierarchy that I forgot existed). Wikis don't fit the PKM use case that well, so these too have fallen by the wayside for me.

For me, PKMs need to in some way feel like a single broadsheet where I can easily see and touch my information without having to drill-down hierarchies and follow too many links.

p.s. I've heard good things about Evernote. It's a little too heavy for me, but many people seem to find it useful.

5 comments

Don't start with figuring out a method to organize your information. You will end up overengineering your org method.

I think this is an important point. I like to start with a few simple rules:

- To retrieve information, I should know where to start: a Schelling point.[0] For me, this is the home page of my wiki. For wenc, it's a Google Doc.

- It shouldn't take me more than three clicks to get from my starting point to the information I'm looking for.

- Links/URLs will tie everything together. They are the edges in my knowledge graph. But as wenc notes, keep the graph shallow.

Then I need to be rigorous, reorganizing things when they don't work intuitively and adding new nodes when something I need has not yet been recorded. As wenc puts it, "discover your own data use patterns."

Wikis do work for me, provided it's organized around Schelling points. I've used and refined these principles in setting up wikis at my last 3 companies and it's worked pretty well for organizing a collective knowledge base as well.

[0] https://en.wikipedia.org/wiki/Focal_point_(game_theory)

What software do you use for your wiki?
I use TiddlyWiki (https://tiddlywiki.com). It's brilliantly simple to set up (doesn't require a database for example), has a small but nice plugin ecosystem with things like Markdown support, etc.
I'm not currently using a personal wiki, but both MediaWiki and Confluence are pretty easy to set up. Confluence/Jira licenses are cheap for self-hosted personal use. MediaWiki is oss and probably not going anywhere anytime soon.
I second Mediawiki. I prefer that for professional use:

https://www.mediawiki.org/wiki/MediaWiki

Personally, I've been using WikkaWiki for years:

https://github.com/bakoontz/WikkaWiki

I would prefer something that supports markdown. Neither of these really do.

Dokuwiki is what I use. Easy to setup, easy to use. And with Dokuwiki on a stick there is a zero-installation, portable local option.
> It shouldn't take me more than three clicks to get from my starting point to the information I'm looking for.

I see this sort of criteria and really don't agree. There are many operations that take more than three clicks. Navigation a UI is a graph in itself, and we can handle way more than 3 nodes. In the same way that I can remember how to get to work, or get a book from one of my bookshelves.

Anyway, small point.

Trivial inconveniences.

https://www.lesswrong.com/posts/reitXJgJXFzKpdKyd/beware-tri...

There was a joke in one of Stephen Hawking's book that each equation included in the text cuts the number of readers by half. In a similar fashion, you can imagine that each extra step you need to get to the information will cut the number of occasions you do so by half. If you want the system to benefit you daily, in many areas, it has to be as simple and seamless as possible.

Fascinating observation, and could be a corrolary to Nudge theory [1], which won Richard Thaler his Nobel Prize this year, which says Opt-in rather than opt-out is more effective for changing behavior.

Underlying idea: friction disincentivizes.

[1] https://en.m.wikipedia.org/wiki/Nudge_theory

Yeah, sounds like the same thing.

I would not expect you could get a Nobel Prize for that, though. Maybe Scott Alexander should start submitting his articles wherever it is you need to submit them to get a Nobel in economics.

> In 2017, economist Richard Thaler was awarded the Nobel Memorial Prize in Economic Sciences for "his contributions to behavioral economics and his pioneering work in establishing that people are predictably irrational in ways that defy economic theory."

Proving that all the other economists are modeling things wrong is more impressive than you make it sound.

Having a simple system that works efficiently for each use-case is a good idea, just be careful that organizing your notes doesn't become a goal [0] itself. It is much better to just start using some system, and see where you end up, even if it turns out you made a bad choice for your initial system you can always adapt mid-way (e.g. tools like `pandoc` can convert between a variety of formats), focus on the content and linking, not on the tools themselves.

[0] http://blog.dilbert.com/2013/11/18/goals-vs-systems/

I think there actually is a way to organize everything (arbitrary knowledge) with a single tool/method, and am excited about it and want to make it work better for us all to organize knowledge individually and together. I have written about it extensively at http://onemodel.org (where the current take on the idea is available for free download, AGPL), though not all the features are there yet. There is some more info in a comment farther down on this page if you search for "onemodel". It is extremely efficient and effective (physically and mentally, once you learn it -- about everything needed is always on the screen): you can get around really fast.
I attest to the per use-case notion for organizing anything.

I'm fairly settled on a homebrew style of noguci's system including pen, folded paper in my back pocket, a few paper notebooks as well as markdown, google docs, simplenote (via notational velocity) and google keep.

Here's the post that inspired me towards this system - http://www.literatureandlatte.com/forum/viewtopic.php?p=1592...

However, my system is stunningly poor at filtering and modelling information into memory. I tend to frequently write, bookmark, capture things and leave it at that. That does not help. I have recently begun exploring concepts of spaced repetition to this effect but its too early to comment on this.

I have learnt to differentiate between archive and active areas. Screens and the internet while great at archival, just dont work well for active information that I want to recall at will.

Just wanted to point out that important difference between active and archive information. That link I posted above was from my archival system but remembering Noguci's system and that it applies in this case is from repetition over active information systems.

I endorse the idea that over-engineering is counterproductive. Focusing on principles over tools is good; tools come and go. My list of lessons learned (which has some overlap with yours) is:

1. Make sure your files and their organizational scheme can be transferred between tools. Your files need to be searchable, and at some point you will need portability. Text-based notes are easiest, but formats that have ubiquitous support (like docx or html) are ok too. Similarly, a collection of files is more reliable and flexible than a database tied to a subscription-based service.

2. Don't bother with elaborate tag or folder based taxonomies. Search is more efficient. I, at least, cannot accurately predict what taxonomy I'll need 5 years in the future, nor can I consistently tag every file correctly. There is some empirical support in favor of search over tagging [1].

3. Keep your notes organized; store your other files all over the place, wherever it is convenient. Use your notes to record context and thoughts relevant to a project/task/other activity, and use hyperlinks to connect those notes to the relevant files (wherever they are). Being able to embed or preview images in your notes is also very useful.

4. Organization is worthwhile for files that you have read/watched and thought about. The purpose of organization is to help you revisit and develop your thoughts on a topic. Organizing files you haven't read is not progress and does not improve your capabilities; you have to actually read and think about them to get any benefit.

5. Organization is also worthwhile for files that you know you will need for a specific project/task. Add a link to the relevant item in your notes for that project/task. Include a note why you thought it would be useful to future-you.

6. Files that you have not thought about / critiqued, and which do not fit a specific project/task, are not worth organizing. Dump them in an unorganized "to read later" list or just forget about them. The idea that you actually will get back to these files is mostly fictitious, and you can easily re-find them, or a superior equivalent, via search engines, browser history, or library logs.

7. Switch tools as needed based on circumstance. For example, my personal work notes are in org-mode, academic papers are in Zotero, personal work data is in a git + git-annex repo, and shared work notes and data are spread between Google Drive, Dropbox, and a private file server, depending on the requirements of the other people involved. I also have personal life notes in org-mode, Simplenote, and some ancient notes still to be migrated from Evernote. Hyperlinks tie files together as needed.

[1] https://people.ucsc.edu/~swhittak/papers/chi2011_refinding_e...

I agree that spending time on tagging/categorizing everything is not worthwhile, I tried doing that for my Firefox bookmarks/Zotero, and ended up abandoning it in favour of search. Having said that it is useful to have a limited number of temporary categories:

- things I want to (re)read next

- "best" / "mind-blowing" technologies, usually built around a simple, minimalistic, but powerful concept (e.g. reagents for composable lock-free data structures, parallel prefix sum, Futamura projections, etc)

For everything else full-text search in Zotero has served me well.

About portability between tools I would go with Pandoc, it supports conversion between a number of Wiki formats and markdown, which should suffice for most purposes. Definitely avoid tools that lock you in their proprietary formats.