| HN Mirror

The comparisons between AI systems and "PhD-level" intelligence don't really make sense as a meaningful benchmark, and I agree with your assessment of what a PhD actually represents.

A PhD isn't simply a marker of raw intelligence or knowledge retention, it represents years of specialised training, critical thinking development, original research, and the creation of new knowledge. The process involves learning to identify meaningful questions, design appropriate methodologies, interpret results with nuance, and contribute novel insights to a field.

I've been thinking about this from another angle though: what if we considered "PhD-level" more narrowly in terms of context window capacity? Since an average PhD dissertation is around 70K words (translates to roughly 90-100K tokens in most LLM tokenization schemes), perhaps one benchmark could be whether an AI system can maintain the equivalent context. By this definition, several current models would technically qualify:

- Claude 3.5 Sonnet: 200K tokens - GPT-4o: 128K tokens - Claude 3 Opus: 200K tokens - Anthropic's experimental systems: ~1M tokens - Google Gemini Ultra: 1M tokens

But this framing has significant limitations. Context window is just raw capacity, like saying a hard drive with enough storage for a dissertation is "PhD-level." The ability to simply retain 70,000 words of context doesn't mean a system can identify significant gaps in existing knowledge, formulate original research questions, or synthesize new insights that advance human understanding.

Current AI systems, regardless of context window size, don't truly "understand" information the way humans do. They recognise patterns in data and generate outputs based on statistical relationships, but lack the deeper conceptual understanding, intentionality, and the other characteristics of human intelligence.

A more meaningful comparison might focus on specific capabilities rather than vague intelligence comparisons or simple context metrics. The goal shouldn't be simply to declare AI systems as "PhD-level smart" but to develop tools that complement human intelligence and extend our collective problem-solving capabilities.