|
|
|
|
|
by sunrunner
400 days ago
|
|
Do the comparisons to PhD-level as a marker for some quantity of how 'smart' a system (by which I mean the current colloquial usage of 'AI') really make sense? I thought a PhD was ultimately a representation of a body of work that a likely-smart person produced in order to a) learn the state of the art in a narrow domain, b) practice research methodology, c) hopefully push the boundary of human knowledge in that area. Which is not to say that PhD candidates and holders are not 'smart', just that PhD represented the work and not just the person after an IQ test. Or is the comparison valid and the goal is for AI (again, current colloquial usage) to be able to push the boundaries of human knowledge? Or perhaps it would be human-machine-knowledge at that point? |
|
A PhD isn't simply a marker of raw intelligence or knowledge retention, it represents years of specialised training, critical thinking development, original research, and the creation of new knowledge. The process involves learning to identify meaningful questions, design appropriate methodologies, interpret results with nuance, and contribute novel insights to a field.
I've been thinking about this from another angle though: what if we considered "PhD-level" more narrowly in terms of context window capacity? Since an average PhD dissertation is around 70K words (translates to roughly 90-100K tokens in most LLM tokenization schemes), perhaps one benchmark could be whether an AI system can maintain the equivalent context. By this definition, several current models would technically qualify:
- Claude 3.5 Sonnet: 200K tokens - GPT-4o: 128K tokens - Claude 3 Opus: 200K tokens - Anthropic's experimental systems: ~1M tokens - Google Gemini Ultra: 1M tokens
But this framing has significant limitations. Context window is just raw capacity, like saying a hard drive with enough storage for a dissertation is "PhD-level." The ability to simply retain 70,000 words of context doesn't mean a system can identify significant gaps in existing knowledge, formulate original research questions, or synthesize new insights that advance human understanding.
Current AI systems, regardless of context window size, don't truly "understand" information the way humans do. They recognise patterns in data and generate outputs based on statistical relationships, but lack the deeper conceptual understanding, intentionality, and the other characteristics of human intelligence.
A more meaningful comparison might focus on specific capabilities rather than vague intelligence comparisons or simple context metrics. The goal shouldn't be simply to declare AI systems as "PhD-level smart" but to develop tools that complement human intelligence and extend our collective problem-solving capabilities.