Hacker News new | ask | show | jobs
by ngshiheng 921 days ago
this is awesome. i think it'd be super cool if it can read/summarize the comments too
1 comments

Not sure if it can, but Kagi can:

• GitHub's previous code search was slow, limited, and did not support searching Forks due to indexing challenges. A new system called Blackbird was built from scratch to address these issues.

• Indexing code poses unique challenges compared to natural language documents, such as handling file changes in version control systems and deduplicating shared code across repositories.

• The talk discussed techniques used in Blackbird like trigram tokenization, delta compression, caching, and dynamic shard assignment to improve indexing speed and efficiency at scale.

• Architectural decisions like separating indexing from querying and using message queues helped Blackbird scale independently without competing for resources.

• Data structures like geometric XOR filters were developed to efficiently estimate differences between codebases and enable features like delta compression.

• Iteration speed was improved by making the system easier to change through frequent index version increments without migrations.

• Resource usage was optimized through techniques such as document deduplication, caching, and compaction to reduce indexing costs.

• Blackbird's design allowed it to efficiently support over 100 million code repositories while the previous system struggled at millions.

• Building custom solutions from scratch can be worthwhile when leveraging data structure to outperform generic tools for a domain.

• Anticipating and addressing scaling challenges at each magnitude is important to ensure a system remains performant as it grows over time.

Those don't look like video comments. Which are: fantastic/well done/wonderful/great talk etc.