Hacker News new | ask | show | jobs
Show HN: Got Tired of LLMs refusing to visit URLs Built open source analyzer CLI (github.com)
3 points by thetall0ne 178 days ago
Built this because I got tired of ChatGPT/Claude refusing to visit websites when doing research.

Crawls sitemaps, parses metadata files (robots.txt, humans.txt, llms.txt), detects tech stack with Wappalyzer, then generates summaries using either AWS Bedrock or a local Llama model via Ollama.

Batch mode processes CSVs with checkpoint/resume - hit Ctrl+C anytime and pick up where you left off.

The local LLM option means zero API costs and your data never leaves your machine. Llama 3.2 3B works surprisingly well for this task.

2 comments

Nice - right? How annoying is that. If a public human can read content, why can't an LLM? ChatGPT/Claude also (at least the for me, also don't consistently fully review the content I upload for review. Sometimes it's full, but most of the time (especially if it's a larger document, say 100pg pdf or 15 python scripts), I have to continually push them to go through everything.

Really annoying - thank you for this! Now, am I too lazy to apply it, that's the question.

lol! no problem. Its such an annoying problem. One time and LLM said "I can't directly access URLs" and I replied with "yes you can" and then it did it! WTF!?
A cool feature would be to be able to read WARC files from crawling the target site previously.

https://github.com/webrecorder/warcio

https://github.com/ArchiveTeam/grab-site