|
|
|
|
|
by brianshaler
3922 days ago
|
|
I appreciate the detailed response! My use case is yet another indie web, run-your-own-server kind of thing that processes content my friends post online (from tweets to shared articles) and can predict whether or not it is relevant to me based on extracted topics, source, my context/location, and whatnot. Both the source data and training data comes in at a trickle (though I'm pondering ways to propagate training data throughout friend-of-friend networks) and is processed in the background rather than on-demand, so my performance characteristics are very different than most of the use cases I've seen. I'm trying to keep the system frugal with memory but liberal with persistent storage, since you can run a commodity instance 24/7 and mount a pretty large volume for fairly cheap. It'll be slow, for sure, if there's only one user for each installation it won't need to worry about handling many queries per second. |
|
MLDB can memory-map some kinds of datasets which would also help with the low memory-to-datasest size ratio.
Please feel free to reach out (jeremy at datacratic) if you'd like to discuss further.