Hacker News new | ask | show | jobs
by josefcullhed 1558 days ago
Hi,

Yes our documentation is probably pretty confusing. It works like this, the base score for all URLs to a specific domain is the harmonic centrality (hc). Then we have two indexes, one with URLs and one with links (we index the link text). Then we first make a search on the links, then on the URLs. We then update the score of the urls based on the links with this formula: domain_score = expm1(5 * link.m_score) + 0.1; url_score = expm1(10 * link.m_score) + 0.1;

then we add the domain and url score to url.m_score

where link.m_score is the HC of the source domain.

1 comments

The main scoring function seems to be index_builder<data_record>::calculate_score_for_record() in line 296 of https://github.com/alexandria-org/alexandria/blob/main/src/i..., and it mentions support for BM25 (Spärck Jones, Walker and Robertson, 1976) and TFIDF (Spärck Jones, 1972) term weighting, pointing to the respective Wikipedia pages.
This is actually not used yet. Working on implementing that as a factor.