Hacker News new | ask | show | jobs
by liamca 4313 days ago
Hi ChuckMcM,

I'm a Program Manager on the Azure Search team. I am going to correct your numbers a bit. Even though you can have a maximum number of 36 search units, the number of partitions you can create (currently) is 12. Partitions, by the way is what you increase to allow you to increase the number of documents. With this limit of 12 partitions, the maximum size of an index is actually 180M documents or 300 GB (not 900 GB as you stated). So far, we have found that the vast majority of customers we have been working with fit well below these limits and in fact even more of the majority fit into the 1 partition (15 M document / 25GB) range.

For a very few customers we have talked to, there is a need for more than this and for this we can actually allocate a much larger system that has much higher ranges. We have an azuresearch_contact email address on the pricing page (http://azure.microsoft.com/en-us/pricing/details/search/) with more details if you need this.

To your other question about racks and search units. You can think of a search unit as a dedicated Azure VM for your usage. For each additional Search unit you create is an additional VM for your use. Each VM has a certain amount of capacity that it can handle. If your needs grow beyond what you can get with a single search unit, you can move the dial up to increase it whether it is increasing replica count to add more QPS / High Availability or increasing partitions to add more documents / faster data ingestion. The way you calculate the number of search units you have is replicas x partitions, where each search unit (during public preview) is $125 US / month. By the way a single replica can handle about 15 QPS which for most customers is more than enough. But even with this, the ability to scale up and down is pretty important to a lot of people. Imagine Black Friday in the US where a retailer gets hammered with searches, yet only wants to allocate increased replicas for that day to handle the increased query load. There is a bit more information on this here: http://azure.microsoft.com/en-us/documentation/articles/sear...

Hope that helps, Liam

1 comments

It does help Liam, thanks. I'm coming at this from a web search perspective. Checking our crawler we have about 16M documents from Wikipedia indexed, which would presumably fit inside your single partition. The 'hot' crawl (things that change with a frequency <= 7 days) is a lot bigger than that though :-)

I'm guessing your target market is folks that want to corral their documents? (sort of like the Google appliance but in the cloud?) What is your privacy policy on that? (lawyers for example have a lot of documents but rarely put them in the cloud for example) And when you say 15 qps what is the SLA? I that at the 50th percentile? 95th? 99th? I've noticed it seems to be hard to pin down in Elastic Search.

ChuckMcM, you are absolutely right. Nailing down QPS rates are an incredibly tough thing. Not just for Azure Search but also for most Search engines that I am aware of. Things like #'s of facets, complexity of queries all play a part in what a search engine can serve up from a QPS rate. When we say ~15QPS we try to point out that this is based on an average index of the ones that we have seen from our typical customers. Certainly some customers may see way more QPS on a single search unit and others will see less.

The main markets (or scenarios) that we target with Azure Search are eCommerce Retail, User Generated Content sites (such as a recipe site or Hacker News) and internal organizational apps. The interesting thing about internal organizational apps is that we are seeing more an more users are finding that search is a natural way to navigate and explore their data. Users are typically far more knowledgeable of using search to explore their data thanks to engines like Google and Bing then then are with say SQL.

We actually don't have an official SLA yet for this preview. That is one of the goals of this public preview which is to really determine what we can realistically promise for our v1 release.

Yes, privacy is a thing for sure. It is interesting that you say lawyers because we have had a number of companies in the law field that have wanted to use Azure Search. Things like indexing of case documents is quite popular from what I have seen. In many of these examples (and also with Helathcare especially), privacy or more specifically encryption at rest as well as compliance (such as HIPPA) often become critical. As of today we don't have either. We don't have encryption at rest and we do not have HIPPA compliance for Azure Search. Of course, this will be a goal and I guess we need to start somewhere. The encryption as it relates to search is actually going to be a really hard thing to do properly so that will be an interesting thing for the future.

By the way, WikiPedia is one of the datasets we often test with our service. Feel free to ping me as we have a loader for the WikiPedia dataset that I could look into sharing with you if would you like to play with it and Azure Search. My email address is my YCombinator username + microsoft.com.

Liam