Hacker News new | ask | show | jobs
by jasode 3187 days ago
>Anyway, so far we've built Shelf primarily on a NodeJs stack [...] And we built a web clipper as browser extension for Chrome and Firefox [...]

Thanks for providing extra technical detail. However, I'm more curious with what's happening on the backend.

As far as I can tell, the your differentiation from something like MS Sharepoint or DropBox is an integrated OCR to extract keywords, and Artificial Intelligence to help filter (or "screen" as your landing pages call it). Is there more to your special sauce that I have overlooked?

Also, where are you storing customers' data? Amazon S3? In house servers? Are you using something like ElasticSearch or did you build your own search engine?

1 comments

Dropbox is really great for storing and syncing files across devices. It doesn't have a rich set of pre-built filters, you can only store files, not mixed content including links, contacts, etc. So, Shelf is really a complementary solution for Dropbox and can sit on top of it, combining the Dropbox content with other content from a single interface.

Sharepoint is of course very powerful. And that means you typically need a project to make it work. Shelf works out of the box, is opinionated and let's you get started with minimal to no configuration setup.

Yes, we use Amazon S3 for content uploaded to Shelf (encrypted of course) and we utilize ElasticSearch as well.

You needn't be all that specific, but how do you securely search the encrypted data?
Data is secured at rest and in transmission. We take every measure available in Elastic to secure the search indexes themselves.
Ah, so the indexes are secured - but are they encrypted?

You don't need to answer. It's okay. I'm largely asking because I've been thinking about writing an series of essays on the subject of security and one of the topics I have taken some notes about is searching encrypted data.

If the index isn't secure, it kind of defeats the idea of encryption - someone need only make off with the index and be able to draw some conclusions. More so if it's relational.

There are different ways that some go about this, one is the hash with individual words with a unique salt and search for the hashes, but that has its own set of problems, like the ability to eliminate words like 'the' and 'it' from search queries. Well, at least computationally easy.

So, it's purely for my own curiosity that I ask. I imagine it might be doable to load it into RAM, the whole DB - if it's small enough or you have enough RAM, and then do the searches there in an encrypted environment?

I am not so concerned with exfiltration by 'hackers' so much as I'm concerned with exfiltration by employees. Should I get to writing the essays, that's going to be a central theme - protecting data from rogue employees with the increased use of cloud services in today's business environment.

Again, I'd not want you to feel obligated to release anything proprietary or anything that would compromise your security.