| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by robjk 4715 days ago
	Our success rate for extracting the reference information from papers in some categories, notably math and cs, is still pretty low. This means we can't place these papers very well and also that their radius is not a good representation of their true number of citations. Furthermore, from a distance these categories appear to lack structure. If we don't have references for a paper, we use its keywords to place it, in which case it can certainly be misplaced. Improving the reference extraction means building a better database of the journals available to that field coupled with more robust regex.