|
|
|
Show HN: Which LLM Finds Obscure Knife-Brand URLs Cheapest? (8-Model Benchmark)
(new.knife.day)
|
|
2 points
by p-s-v
375 days ago
|
|
Hi HN, I’m building *new.knife.day* (https://new.knife.day), a crowd-sourced
database of every cutlery maker—from Al Mar to brands so small they barely
show up on Google. That means I need an automated way to fetch each brand’s
official website, even for fringe names like “Actilam” or “Aiorosu Knives”. So I threw the task at eight web-enabled LLMs via OpenRouter: • gpt-4o and gpt-4o-mini
• claude-sonnet-4
• gemini-2.5-pro and gemini-2.0-flash
• llama-3.1-70b
• qwen-2.5-72b
• perplexity sonar-deep-research
Prompt: Return *only* JSON { brand, official_url, confidence }
Data set: 10 obscure knife brands
Scoring: exact domain = correct; “no official site” (with reason) = correct
Costs: OpenRouter prices on 31 May 2025 (Perplexity billed separately)Highlights
---------- • Perplexity hit 10/10 but cost $9.42 (860 k tokens!).
• GPT-4o-mini & Llama-3.1-70B got 9/10 for ~2 ¢ per correct URL.
• Gemini Flash managed 7/10 for $0.001 total—great if you can QA the misses.
• Half of Gemini 2.5 Pro’s replies were HTML tables my parser rejected.
Full table, code, and raw logs are in the post (and on GitHub).Take-aways
---------- 1. 90 % accuracy + quick human review often beats 100 % accuracy that costs
45× more.
2. Structured output is part of model quality—validate JSON on arrival.
3. Promo pricing moves fast; always ping the price API before large runs.
Next step: wire GPT-4o-mini into *new.knife.day* so visitors get verified
manufacturer links. Crawling ~250 brands now costs under $5.Curious what you’d improve, and which model you’d bet on for similar
“find the canonical URL” tasks. AMA on the setup, prompts, or results! |
|