Hacker News new | ask | show | jobs
Apply HN: Vaultedge – a private Google for your private data
9 points by sajeevaravind 3724 days ago
Vaultedge demo : https://www.youtube.com/watch?v=ufnMZjg5344

Problem : Consumers have data in many different places like cloud storage, emails, laptops and offline devices. Trying to find a document which we created or downloaded few months or even days ago will require answering questions like was I at home or office while downloading it, did I receive it over an email or did I store in dropbox? You quickly make an assumption and search for the filename or keywords there and if you don't find it you search in other places until you find it or you give up.

Vaultedge : is an attempt to help find your document quickly just in few clicks. I personally faced this problem in my previous job as Software Architect and asked people around on what tools they use. Leaving aside few who were well organized, rest of the folks faced the same issue. They would love to have a tool which will allow search across their online and offline content contextually like to search for "last three months of cell phone bills" or "the document on networking send by x". They were also interested in a grouping capability which groups documents into similar categories so that they can see - all their bills in one place - to highlight the difference between the fluctuating cable or cell phone bills month over month - to download bills for a month in bulk - easily find all tax records for a year etc

With those two functionalities (Search & Grouping) in mind we are building Vaultedge. Contextual search is something we are still building.

About us: I have spent all my life (~16 years) building enterprise products for Storage, Data and Analytics, my last job being Architect for Backup products at NetApp. I'm a sole founder and backend developer now, but in talks with another tech guy to join as co-founder. I also have a awesome frontend engineer in my team as employee.

Love to answer your questions, comments and to take feedback.

10 comments

I have been a beta user of this product. Initially I didn't realise the power of this platform. The moment I saw my form 16 going into a separate folder, I started realising the potential of this platform. More than the search, I found the categorisation useful. It helped me to find documents, which I thought was long gone.

If Sajeev can crack the categorisation better and bring in accuracy, there will be a day people will say "Life before and after Vaultedge".

Good luck.

Thanks Visakh for the kind words. We are fortunate to have beta users like you. Between private beta and now, we have put most of the effort in improving the categorization. Hopefully you will like the public beta even better.
Hey, just watched the demo, looks very useful! Do you mobile (iOS/Android) versions of the product too? Cause usually I am struggling to find relevant docs on the email on my phone when running errands and Vaultedge would be perfect for that. :)
Thanks Vaibhaw, glad you find it useful. Our public beta launch in couple weeks will only have the web version. But the following release will have a Android app. We are already working on the UX for the app. Let us know if you need any particular usecase to be solved in the mobile version. Thanks for the question. Happy to clarify if you need more details.
Thanks for replying so quickly! That's good to know, hope to see the Android version soon! :) My use case with exactly what the product does right now, just on a phone. So that I can pull up relevant docs I need from my email/cloud on the move without doing a lot of searching around on a small screen.
Thank you.
Hi Sajeev, good that you are finally releasing the product. I've been waiting for a long time to try it.

What are the supported cloud storages during your launch?

Thanks Ashwin for your support. We are launching with gmail, outlook, dropbox and google drive support. Will be adding desktop/laptop support immediately after that. Let me know if you have any followup questions.
Hi Sajeev.

This seems an interesting concept. I am curious about the competition though. Are there startups who are working on a similar concept?

In addition, how do you plan to migrate people from well established storage services such as Google drive and Dropbox?

Hi Shanu,

That's a good question. Part of the problem is addressed by few startups like meta.sc and mohiomap, all in infancy like us. They both allow search for files across all storage. But they don't attempt to categorize the data nor allow data of offline devices to be kept within them.

We are complimentary to Dropbox and Googledrive. Users can continue to keep their data where ever they are storing now. We just make the discovery easier.

Thanks, those are some great questions. Happy to answer any further questions you may have.

This sounds interesting. I've always had the problem of sorting through and finding my old documents. Would love to try it out. Can you tell me what the accuracy of your categorization is? Also, are you targeting any specific market to start with?
Glad you liked it. Vaultedge will categorize a document into "some" category 80% of the times. Remaining 20% of the documents will be put into a General category. Within the 80% of the classified documents, we have seen that the classification is 60-80% accurate and the accuracy rate varies within that range based on the type of documents user has. If the user has large number of documents unseen by Vaultedge before, then the accuracy lowers. But we have a continuous feedback system which learns based on user actions. If user changes the category to a different one, we learn from that and will apply that categorization to new set of documents for that user.

The initial target market is IT/Finance professionals who use multiple email accounts, cloud drives etc.

Thanks for the question. Happy to answer any follow up question you may have.

Hey, Sounds like an interesting product! From the YouTube video I see that the files are separated into different categories, Is this a manual process or does your product segregate it automatically for me ??
Vaultedge uses a combination of Machine learning and parsing to categorise documents. First level ML will identify the file as belonging to certain category and then that category focussed ML training set will give us the sub-categories. Sometimes we also do plain parsing to arrive at the category. But all these are automated, there is no manual intervention involved. Thanks for your question. Please feel free to ask any follow up questions .
How do you anticipate users inputing the metadata necessary to make this work?

For example, how will the service know how to access my phone bill /\ my taxes /\ the brown cat playing Chopin on the piano video?

Not so much interested technical details as to the degree of effort an ordinary person with an iPhone would need to invest to make use of the service.

Good luck.

User has to connect his services to Vaultedge. Services can be cloud storage like dropbox and google drive, emails like gmail and outlook and offline storage like laptops and usb disks. Vaultedge will periodically check for any new files (docs only now) in these services and classifies them using Machine learning and also builds a search index for these files. Other than giving oAuth read access to these services, user need not do anything.

Thanks for asking. Happy to answer any follow up question you may have.

Sorry for not being clear, I am curious regarding how my phone bill becomes searchable.
Short answer is Vaultedge will create search index based on the content of the bill and also has additional knowledge that it is a cell phone bill from its classification algorithm. Put together this will allow user to search for "last month's cell phone bill".

Long answer: Let's say your bill is in pdf or html format. For Vaultedge initially it is nothing but a document, it has no idea that it is a bill. Vaultedge will extract the contents of this document and test it against our training data set using Machine Learning(ML). In this process, the document will be identified as a "Bill". Then it will apply it on a different training set which will identify it as a "cell phone" or "cable bill". Then we do further analysis and extract info like this bill is "for the month of march 2016". Now all of these info is used to construct the search index. So you can search now for "march bill" or "last months bill" to get to that bill. Please don't hesitate to clarify if anything is not clear.

Hi Sajeev,

Seems very interesting. How do you compare with services like spark form readdle? Also how long does the categorization take the first time?

Hi Vikas,

That's a good question. Spark is an inbox assistant which helps to improve the email tasks productivity. Eventhough the categorization of emails makes it looks like it is in the same space as Vaultedge, I think it is not. Vaultedge is a personal document organizer, which helps you to find your documents quickly irrespective of whereever it is. So we work across emails, cloud storage, offline devices etc. So in that sense, the purpose of these two tools are different at least as of now.

Thanks for the great question. Happy to answer any follow up questions.

I think it would be very risky to put so much information in a single place that can sift through all these details and summarize this information about you. What are the security measures that you will offer?
Vaultedge will not store the actual files, only metadata about the files. We plan to encrypt the metadata and also optionally allow user to control the keys. We will also have 2-factor authentication. Please suggest if you have any suggestions. Thanks for asking.
this is better. My suggestion is that the metadata is not linked to a complete profile.
I would like to understand a little better your comment about "metadata is not linked to a profile". Do you say by not linked to a profile, there shouldn't be a way for a hacker to figure out the metadata of a user? Or you mean "metadata" should be associated to individual accounts(say dropbox), so incase of a breach, only the metadata for that account is compromised and not for all the accounts of a user? Thanks for your time.
Can Vaultedge work offline; and if a complete online cloud platform how do u update data from users device.
Though Vaultedge is hosted on cloud, it can support offline data sitting in laptops or usb drives. When connecting a offline device to Vaultedge, Vaultedge will ask user to install a client program. This client program will help in categorization and creation of search index of the offline data. Once categorized and indexed, periodically the client program will redo the operations when there are changes to the data. When user clicks on such a offline file, Vaultedge has the ability to pull the file from the offline storage if the device is connected. Users also have the option of keeping a copy of such offline data within Vaultedge cloud for faster access. Thanks for the question. Please let me know if you have further questions.