| I've been tasked with creating an API to look up a customer based on a variety of fields. A non-exhaustive list of fields could look something like: - customer_id
- customer_first_name
- customer_last_name
- customer_loan_number
- customer_phone_number
- customer_email
- customer_address
- customer_date_of_birth
- customer_zip_code What I ultimately want is customer_id. But what I have is any combination of the subsequent fields. They might be all lowercase, they might not. There might be type-os. The phone number might have parentheses and dashes, it might not. The address and zip are also wonky - for example customer_zip_code may look like XXXXX-YYYY, or it may just be XXXXX. And example query would be: given customer_first_name and customer_last_name and customer_address, find the customer_id. The customer_last_name might be lowercase in the query and uppercase in the main database. The customer_address in the query might be "123 Main St" but in the database it might be "123 MAIN STREET". And the data is in the 100m-1000m rows size, so this has to run reasonably quickly with that in mind. Can't just do a "select * where [x = y] and [p = q] and [t = s]". Would like to pre-load the big table in RAM, and then somehow look up the matches. Because this almost definitely a common problem in our industry, I was wondering, is there some kind of good database already pre-optimized for this kind of fuzzy lookup? |
Deep dive on how it works: https://podcasts.apple.com/us/podcast/world-of-daas/id157064...