Hacker News new | ask | show | jobs
by sunrunner 400 days ago
Do the comparisons to PhD-level as a marker for some quantity of how 'smart' a system (by which I mean the current colloquial usage of 'AI') really make sense?

I thought a PhD was ultimately a representation of a body of work that a likely-smart person produced in order to a) learn the state of the art in a narrow domain, b) practice research methodology, c) hopefully push the boundary of human knowledge in that area. Which is not to say that PhD candidates and holders are not 'smart', just that PhD represented the work and not just the person after an IQ test.

Or is the comparison valid and the goal is for AI (again, current colloquial usage) to be able to push the boundaries of human knowledge? Or perhaps it would be human-machine-knowledge at that point?

1 comments

The comparisons between AI systems and "PhD-level" intelligence don't really make sense as a meaningful benchmark, and I agree with your assessment of what a PhD actually represents.

A PhD isn't simply a marker of raw intelligence or knowledge retention, it represents years of specialised training, critical thinking development, original research, and the creation of new knowledge. The process involves learning to identify meaningful questions, design appropriate methodologies, interpret results with nuance, and contribute novel insights to a field.

I've been thinking about this from another angle though: what if we considered "PhD-level" more narrowly in terms of context window capacity? Since an average PhD dissertation is around 70K words (translates to roughly 90-100K tokens in most LLM tokenization schemes), perhaps one benchmark could be whether an AI system can maintain the equivalent context. By this definition, several current models would technically qualify:

- Claude 3.5 Sonnet: 200K tokens - GPT-4o: 128K tokens - Claude 3 Opus: 200K tokens - Anthropic's experimental systems: ~1M tokens - Google Gemini Ultra: 1M tokens

But this framing has significant limitations. Context window is just raw capacity, like saying a hard drive with enough storage for a dissertation is "PhD-level." The ability to simply retain 70,000 words of context doesn't mean a system can identify significant gaps in existing knowledge, formulate original research questions, or synthesize new insights that advance human understanding.

Current AI systems, regardless of context window size, don't truly "understand" information the way humans do. They recognise patterns in data and generate outputs based on statistical relationships, but lack the deeper conceptual understanding, intentionality, and the other characteristics of human intelligence.

A more meaningful comparison might focus on specific capabilities rather than vague intelligence comparisons or simple context metrics. The goal shouldn't be simply to declare AI systems as "PhD-level smart" but to develop tools that complement human intelligence and extend our collective problem-solving capabilities.

> Since an average PhD dissertation is around 70K words (translates to roughly 90-100K tokens in most LLM tokenization schemes), perhaps one benchmark could be whether an AI system can maintain the equivalent context.

This is a really interesting idea, and my immediate question around the average dissertation size is how many tokens are needed to represent all of the implicit/unstated knowledge that forms the basis for the dissertation itself. If the dissertation itself really is the tiny bump in the boundary of human knowledge that Matt Might's 'The illustrated guide to a Ph.D.' [1] shows then what's the token size for everything up to the bump created by the dissertation.

> Current AI systems, regardless of context window size, don't truly "understand" information the way humans do. They recognise patterns in data and generate outputs based on statistical relationships, but lack the deeper conceptual understanding, intentionality, and the other characteristics of human intelligence.

Whether or not I'm an AI believer, I'm not sure I could genuinely answer the question 'Do _you_ truly understand information?' if someone posed that to me, as I have no real understanding of how to measure that. I want to say it's meta-cognition, my ability to think about thinking and reason about my own knowledge, but that starts to feel pretty fuzzy and I wonder how much of that is anthropocentric thinking.

[1] https://matt.might.net/articles/phd-school-in-pictures/