| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by troyethaniel 312 days ago

Thanks for checking this out — happy to answer any technical or product questions.

A bit more context on how we built it:

The Technical Solution The Architecture has 4 major components

1. Data Extraction Engine (DXE): We aggregate information from multiple sources across web, news, social media, etc. using a combination of scraping and APIs. 2. Data Inference Engine (DIE): We use a combination of methods + prompts and context engineering to get LLM to "deduce" context from indirect and unstructured sources. e.g "a company's Tech Stack can be inferred from job postings" 3. Update Tracking Engine (UTE): Essentially a combination of DXE and DIE to track updates across web, news and social media and use LLM to filter and summarize raw data into insights, tags and sentiment analysis 4. Personalized Conversation Engine (PCE): We ask for a users company, solution (product/services), Unique Value Proposition, ICP and Competitors - so responses to any user question are relevant and in context to the solution user offers.

The hardest problems:

Signal vs. noise — Many updates from public sources aren’t actionable. We had to train heuristics + LLM filters to prioritize events like funding rounds, leadership changes, product launches, and regulatory filings over irrelevant press.

Context merging — Linking a single prospect’s identity across fragmented sources (different domains, naming variations, subsidiaries) without false positives was tricky — we built a lightweight entity resolution system for this.

Inference — Instead of just surfacing raw events, we needed the AI to connect the dots (e.g., “They just hired a new CFO from a competitor — may signal strategic shift”). This required multiple chained LLM calls and domain-specific prompt templates.

Speed — Research and plan generation had to happen in seconds, so we implemented caching and pre-fetching strategies for monitored accounts.

What we’re curious about from HN folks:

- Better ways to extract information from unstructured data sources - Ideas for reducing LLM token usage without losing context quality - Feedback on UX for chat-based research vs. dashboard-style layouts

Happy to dive into any of these or share more if people are interested.