Hacker News new | ask | show | jobs
by gsharma 1017 days ago
This is very helpful! I see that a GitHub gist/file isn’t summarized and just mentions that it is code. What is the input to the summarization? Do you use the entire HTML (header, footer, sidebar, etc.) for summarization or do you do any processing between crawling and summarization?
1 comments

Thank you! Currently, I pull the body of the page. Looks like this does not handle GitHub repos and Gists correctly. Will investigate further.