I have built something very similar to this is the past, here are a couple of observations:
Scraping:
- Dealer web sites are run by a handful of different data brokers, for the most part if you find a good way to scrape one (say dealer who uses http://www.dealer.com/ than you can extend your scraper to get others)
- Dealer web sites, in general, are horrible to view.
- Normalize the ext. color, create a distinct list of all the crazy colors of car makers give their cars and all the short hand dealers give the colors. Create a map of the colors and apply when scraping
- Scraping is like farming, a lot of initial work, but there is constant upkeep for changing sites
Display:
- User's don't want to search by city as much as what is close to them, you should geocode the dealerships and display distance. For instance if I search West Des Moines, I would expect inventory in Des Moines also to come up
- Add searching by zip code, you can easily find database of the centroids. It can also be a cheap way of geocoding the dealerships
- Switch Mileage to use miles instead of KM, it looks like most of the inventory is in the US an that is what the user will expect.
- Use a ip2geo to set the initial location of the search, right now it looks like it is all over the place, check to see if the browser supports geo location and optionally set the initial search by that
Changing sites is not a problem as the algorithms are generic to target any website!! We do not develop algorithms for each provider out there, that will not be feasible and quiet frankly a waste of time on our part.
For geolocation searches, we auto detect a users location. I cannot guarantee the accuracy of a users location as the geo data source is a free version. You can however easily select the miles to increase the radius of your search.
Built my own scraping some of the free sites, You can do basic make and year off plain VIN, then scraped some of the free sites with break downs of the various VINs to build a a map.
I think this is nice, and I already found a deal I haven't come across yet on a vehicle I'm interested in, so thanks!
A few initial thoughts:
- I can't click the radio buttons themselves within each options group; only the link/label itself is clickable
- I would rather the mileage filters be < 10k, < 20k, < 36k, etc. If I can't have this, I want to be able to select multiple mileage filters at once
- I want to select multiples of other filters too. Year for example. eBay allows me to enter "2009-2011" or "2009-". I can only view one year at a time on this site.
- It took me a minute to realize I had to select a make before a model (yes, I feel stupid for this); showing an empty stub where the options will appear once I have selected a make is not intuitive - how about not showing the not-yet-ready filters until they are relevant?
- Seeing the "Contact Dealer" red button directly below the phone number made me initially think I was going to be calling the dealer. I finally clicked it after seeing no other place to view the dealer's own listing. I'd make getting to the dealer's site a bit more prominent
A few thoughts:
- I think CarFax has an affiliate program that might let you link through to their reports, keyed by VIN. These reports are a pretty valuable tool to shoppers as the condition of a used car is driving 50% of the buying decision.
- One data point that is really important to dealers is how long the car has been on the lot. Hypothetically you could just track how long the same car has appeared at the same dealership. The longer the car has been sitting there, the more motivated they'll be to sell it. I remember hearing 30 days is a long time for dealerships to sit on a car. For buyers this could be good information to have.
- Getting the mobile experience right would be a huge win. So often the cars the dealer lists aren't actually the cars they have on the site (I'm not sure how much of this is intentional and how much is a matter of how fast inventory turns over). So when you get to the dealership, you want to quickly and easily be able to comparison shop - get that right and you cover people in that really critical uncomfortable moment.
What about trim levels? Trim levels are crucial in the buying decision. For example, if you search for a Ford F-150 on your site, there is no way to see if it's an XLT, King Ranch, Raptor, etc. The price and equipment difference in those vehicles is huge.
What about options and equipment? Does the car have navigation? Sunroof? Most consumers have specific option packages in mind when searching for car online.
This is why VIN explosion is necessary for any serious automotive shopping site. If a consumer can't narrow the vehicles down to a trim and option package level then it won't get wide adoption.
Trucks are certainly the most difficult to decode. But if you are using something like ChromeData for the VIN data, and combine that with the info from the dealer's site then you can usually narrow the vehicle down to a specific trim level.
Not always however. Dealers frequently have incorrect or missing information on their websites, so garbage-in, garbage out.
This is why scraping dealer websites for data is always going to be problematic. Far better to work with the providers to have them send you the data. It's faster, easier and you get far better data.
I used to work for a competitor in the same space as AutoRevo =) Chrome was ok, but there was another provider that had exact vin matches in their catalog. It was a bit more expensive, but made it so the trim field was a non-issue. I don't remember the name, it's been too long.
It might have been AutoData, but they have merged with Chrome. Edmunds has a decoder, but it's pretty laughable. I'm not aware of any other major players in that field outside of those.
Chrome offers 1-1 matches on VIN to style Ids for most OEM's, but it's an additional cost.
Suggest a map that shows which dealer and where a car is located. There's a subset of car buyers where the car itself is less important than the garage they bought it from, in case they need to bring it back for repairs / tuning etc. This is especially true of used cars. And very important on mobile
For folks who lease cars (a poor financial decision long-term, but lower monthly payments) you need to take the car back to the dealership you bought it from at regular intervals (3-10K miles).
Overall, impressive data, but UI/UX could be improved.
'Refine By' menu shows an arrow on mouseover, but user can
only click on words, not arrow or high-lighted area.
'Refine By' pop-out menus show a square, value as hyperlink, count.
Square looks like a checkbox (with rounded corners)
but clicking on square produces no result (this user
expected a checkbox response, & ability to select multiple
models, colors, etc.)
Change of other filters does not require a 'go' button,
but change of search radius does.
Results appear to have a wide radius, but pull-down
for location does not show what default/pre-selected radius is
(by experimentation, 10 miles).
Generally, the user should be able to tell what filters are currently active,
and what their values are.
Clear Price filter action appeared to clear all filters.
US states and placenames with multi-word names should
have all elements capitalized:
s/New mexico/New Mexico/
s/San luis obispo/San Luis Obispo/
Price filter should allow either end-point to be absent.
Mileage should allow arbitrary range end-points, like price.
Year should allow a range, or checkboxes.
Consider having Refine by Make hide the less popular makes behind a 'more'
button. So, by default display top N makes & 'more', 'more' displays
top N2 (or all) makes & 'more', top N4 or all, top N*8 or all, etc.
(otherwise the menu may grow to many dozens of obscure makes)
Support 'open in a new tab' on the hyperlinks show below the first search box.
Searching <color> <make> <letter> displayed 3 links with counts, but displays
nothing - sigh. (perhaps the back-end function is not yet implemented?)
A suggestion, also a source of competitive advantage: Allow selecting more than one Make per search. Seems nobody does this. I would like to see all SUVs except those by the big North American manufacturers. To do this I need to execute many separate searches.
I really wish people would test their sites before doing this. I can understand if someone else submits your stuff and you had no idea it was going to happen (but even then...), but if you create an account with no history for the explicit purpose of submitting to HN as the OP did, you should at least test it under some semblance of load.
I really wish people would test their sites before doing this.
I'm conflicted; whilst ultimately you're right, it's very easy to type those words, and not as easy to test for a realistic load.
It's not as simple as throwing ab at your website, you need to use a proper tool (e.g. JMeter), make sure you're testing realistic user behaviour (even basics such as whether images and CSS have an impact), and ensure that you're not getting a false sense of security (e.g. how many connections are actually hitting the server at the same time?).
So yeah, whilst I kinda agree with you, I think it's a lot easier to say than to carry out.
I totally agree, and if I tried to do it I'd probably botch it myself. I wouldn't even know where to start as I've ever built anything that needs to be under this kind of load where the deployment isn't handled by people better at it than I.
amplified log playback works best in my experience.
of course, this requires at least some public traffic to play back. i've only used proprietary tools for this in my personal experience, but i believe jmeter has this functionality (log sampling).
I understand what you mean. but we have actually tested the site several times before posting it on HN. We are working hard at this and will be back shortly. Thanks.
Funny, I made something very similar when I was looking to change my car 2 years ago or so, but did't open sourced it. The UI was awful, but it had email notifications when a search query matched
Seems very useful (looking for a car right now), but the filter by price range function seems to return zero results with no regard to the values entered.
I would guess they did so in the same way that google asks every website operator before crawling and caching. (I.e., I suspect they didn't come to any explicit agreement. If google doesn't, why should they need to?)
This is probably the wrong attitude towards founding startups. In general, you shouldn't unnecessarily risk the business -- but if people are throwing roadblocks in your way, lots of startups seem to generally do pretty well when they play fast-and-loose with rules. The logic is that nobody's going to bother to sue you until you get big and can defend yourself. Obviously taking a big risk like this isn't ideal, but you shouldn't let it stop you from moving forward with a business.
“Listings are currently sourced from several delearship websites by means of crawling and extracting relevant content available on the host application. If you are a dealer wishing to list and/or promote your inventory on Demanjo, we can help you drive qualified, local shoppers to your dealership.”
and “The selection and placement of listings on this page, except featured listings, were determined automatically by a computer program. For premium placement, please contact us.”
Yep and as I worked on sites for a big Audi dealer in the past they would not be happy giving stuff to scrapers as opposed to getting the lead direct.
BTW for information in the UK a lead for our Audi B2C site was worth around £60.
Sounds dodgy from a Google perspective republishing other peoples content - though I know that Google looked at doing a niche car product - like they have with hotels etc so might not be a viable long term business.
Do you know of any good articles demonstrating the repercussions of violating a ToS vs. violating a copyright?
I'm guessing violating a copyright is more likely to result in aggressive legal action whereas a ToS violation would just get you banned from the service or sent some sort of cease and desist.
yeah its a bit of a grey area I suspect that the big players don't want to be the first one to start legal proceedings - they want some one else to pull the trigger.
I used to work for Reed Elsevier and there was rampant scrapeing and plagiarizing going on usually to create crappy MFA sites or to insert middlemen (offering no social value) into the job board market.
I possibly could see EU based recruitment companies going after indeed - maybe if stepstone are up for a fight.
Matt Cutts whats the deal on allowing indeeds search results into your index I thought you did not like other se results in Googles index
Noticed!! some dealerships are operating under several distinct domain names which create these duplicates. We are currently working on a solution for this.
Scraping:
- Dealer web sites are run by a handful of different data brokers, for the most part if you find a good way to scrape one (say dealer who uses http://www.dealer.com/ than you can extend your scraper to get others)
- Dealer web sites, in general, are horrible to view.
- Learn to love VIN explosion/decoding - http://www.researchmaniacs.com/VIN/VIN-Decoder.html the dealers enter features in so many different ways, it is your best chance to normalize the data.
- Normalize the ext. color, create a distinct list of all the crazy colors of car makers give their cars and all the short hand dealers give the colors. Create a map of the colors and apply when scraping
- Scraping is like farming, a lot of initial work, but there is constant upkeep for changing sites
Display:
- User's don't want to search by city as much as what is close to them, you should geocode the dealerships and display distance. For instance if I search West Des Moines, I would expect inventory in Des Moines also to come up
- Add searching by zip code, you can easily find database of the centroids. It can also be a cheap way of geocoding the dealerships
- Switch Mileage to use miles instead of KM, it looks like most of the inventory is in the US an that is what the user will expect.
- Use a ip2geo to set the initial location of the search, right now it looks like it is all over the place, check to see if the browser supports geo location and optionally set the initial search by that