Hacker News new | ask | show | jobs
by dharmaturtle 1816 days ago
I'm building an open source Anki clone, and lemme tell you shit's surprisingly hard. Getting syncing working without resorting to uploading entire SQLite databases is nontrivial. You're basically doing multi-master database replication... but you gotta build it yourself (there's no easy way to sync sqlite with some serverside database (ignoring firebase/couchdb/pouchdb for various technical reasons)): https://en.wikipedia.org/wiki/Multi-master_replication

The lack of "worth-it" alternatives is due to the fact that there's no money in this market. Students are very unwilling to buy software, and if you use ads you end up like Quizlet. YC funded a startup in this area (Hickory) and they're... not doing well.

The main reason why everyone still uses Anki despite its issues is because it is still hands-down the best solution out there, despite all the bugs and clunkiness. There are a million and one spaced repetition systems out there, but Anki's plugin system and shared decks make for a very strong network effect.

If you're a med student, I don't recommend moving off Anki. The ecosystem there is too strong. If you're losing data, post in the /r/anki subreddit - Anki automatically generates backups and they'll walk you through restoring them.

4 comments

I've built my own flashcard app and I ended up just syncing a journal of all your actions on a device (creating/edit cardings, evaluating them) and having all your other devices play those back to get to the "shared state". I store data using SQLite as well. It works pretty well for me.

You're definitely right about students being unwilling to buy software. They're already cash-strapped, so they're very unlikely to pay for an app, let alone a monthly subscription.

Hah - you're describing "event sourcing" and it's the technique I'm using: https://flpvsk.com/blog/2019-07-20-offline-first-apps-event-...

Unfortunately event sourcing means distributed systems... and I'm learning this on the fly on nights & weekends. Martin Kleppmann's "Designing Data Intensive Applications" has put the fear of god in me.

Yup, I'm doing event sourcing.

For my approach, I really just have each device append to its own journal file. I use iCloud's file storage, so I don't even have a service that I run. Each peer device just uploads the latest version of its journal to a shared folder and downloads its peers' journals and plays back the delta as needed.

I intentionally chose this architecture since I didn't want to run my own sync service. It keeps the sync system free, but iCloud can be slow. Unfortunately for me, if iCloud is slow, the app gets blamed for it.

Nice. Just checking, did you not go with firebase/couchdb/pouchdb due to pricing?
Yes. Anything that required me to pay would mean I'd have to charge a subscription from users, which I didn't want to do.
I'm sure its non trivial, especially if youre building the clone by yourself, but over the time anki has been around, you'd think they wouldve ironed out the kinks. losing data itself isn't the problem. i can get the decks back myself. its the fact the app errors so much that its almost common knowledge to hit up r/anki and get a manual procedure to crawl through backups manually.

i think their mistake is thinking memorization tools are only focused on students and not learners in general. I use anki for memorizing chess positions, learning languages, etc. I'm also a full time worker and have plenty of money to spend on a tools that will help my productivity/hobbies. id be more than happy to fork over cash for the anki system with better client apps. if they had a concern that the cost would put them out of reach of students (their iphone app is $20 so i doubt its a concern), then they should just implement student tier memberships for free while people who can pay will pay.

>its the fact the app errors so much that its almost common knowledge to hit up r/anki and get a manual procedure to crawl through backups manually.

I'm curious what causes this, because I definitely have a different experience from you. Very few errors at all, and I have decks with thousands of cards and hundreds of megabytes of metadata (audio clips, images, etc)

Is there any reason why these apps have to keep the database state locally? Why not just create a web service? The problem with this model would be keeping the server infrastructure running and getting some money to pay the bills, but it looks much simpler.

I also hate that the anki shared decks web site does not encourage collaboration, or at least not the decks I see in their web site. There are a lot of shitty, outdated decks and instead of collaborating to fix them people just upload their own shitty deck. Perhaps the people studying medicine who create decks on their own don't have this problem, but it is something I see in the web site. It would be great to have a site integrated with git so people can collaborate on github.

Also, classification of decks by language is something I miss. When you search for a language, eg. Russian, you get decks for English->Russian, but also for Russian->English. It's hard to find the deck I want

(Just some suggestions, in case someone of working on anki clones)

> I also hate that the anki shared decks web site does not encourage collaboration...

Dude, I'm building exactly this. I'm not basing it on git for various reasons, but I am using event sourcing, and git is basically event sourcing for code. My system will (eventually) allow pull requests, comments, upvotes/downvotes, and all kinds of community shenanigans on flash cards. It's months away from release... but here's the repo if you wanna have a look: https://github.com/dharmaturtle/cardoverflow

> Is there any reason why these apps have to keep the database state locally? Why not just create a web service? The problem with this model would be keeping the server infrastructure running and getting some money to pay the bills, but it looks much simpler.

The idea of not having my personal decks stored on my own disk(s) is honestly much worse than the annoyance of sometimes (rarely) running into issues syncing. I've been working on various decks for literally years and have several thousands of cards made. Server-side stored only? Please no.

You can use Anki in your web browser without storing anything locally. It has some limitations and is not really meant to be your sole use of Anki but you can create and study text based cards.

https://ankiweb.net/about

Are you asking why does an app that is locally installed need to work without internet connection?

If so, consider commuting via subway.

I do get your point but... are you saying you have no reception in subway?
In many cases, yes. But additionally, even with mostly reliable reception people don't want to have to depend on that reception. It's poor form and poor design to include an intermittent service (even if largely reliable for a large portion of your users) as a mandatory component when it is technically unnecessary.

Offline first is simply a more reliable way to build systems than online-only. Specifics long forgotten, but a few years ago (perhaps 10+ now) a lot of people got, understandably, upset when they found out their single player games couldn't be played without an internet connection in order to authenticate. Oops, auth server went down, players can't play the thing they dropped $50+ on.

I've barely used Anki, but I feel like it shouldn't be more than an open spec/format that allows people to make their own compatible clients and services.
It's open source so you can just copy the spec. The community's built a syncing server here: https://github.com/ankicommunity/anki-sync-server

There's also a really interesting templating library that generates cards for Anki: https://closetengine.com/