| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by its_down_again 592 days ago
	Screenshots aren't as accurate or context-rich as HTML, but they let you bypass the hassle of building logic for permissions and authentication across different apps to pull in text content for the LLM.

1 comments

EGreg 592 days ago

Can’t you just make a browser extension to haveaccess to the HTML and CSS, and use LLMs from that?

link

maggreenWAI 592 days ago

Context length + API cost is right now main bottleneck for huge HTML + CSS files. The extraction here is already quite efficient but still: with past messages + system prompt + sometimes extracted text + extracted interactive elements you are quickly already around 2500 tokens (for gpt-4o 0.01$).

If you extract entire HTML and CSS your cost + inference time are quickly 10x.

link

EGreg 592 days ago

Aren't screenshots far larger than this?

link

gregpr07 592 days ago

Nope: 1280x1024 low resolution with gpt-4o are 85 tokens so approx $0.0002 (so 100x cheaper). For high resolution its apporx $0.002 https://openai.com/api/pricing/

link

stuckkeys 591 days ago

Yeah. I noticed a very low cost when I run it via vm, predefined resolution. Good tip.

link

e-clinton 592 days ago

I do this for my extension [0] but the HTML is often too large for context window sizes . I end up doing scraping of the relevant pieces before sending to LLM.

[0] https://chromewebstore.google.com/detail/namebrand-check-for...

link