|
|
|
|
|
by jnnnthnn
775 days ago
|
|
I'll write up a little blog post once the traffic dies down a bit! In the meantime, one thing that comes to mind is that simply embedding the whole contents of the webpages after scraping them didn't yield very good search results. As an example, an article about Python might only mention Python by name once. I found that trimming extraneous strings (e.g. menus, share links), and then extracting key themes + embedding those directly yielded much, much better results. |
|