Hacker News new | ask | show | jobs
by cornel_io 1588 days ago
Lexical homogenization is not the same as idea homogenization, and it does not surprise me that as time goes on the word choice in a given category of writing (esp an insular one like grant applications) would constrict.

It also wouldn't surprise me a huge amount if the idea space was constricting, but just looking at the cosine distance is not enough to establish that. There probably are meaningful ways to exploit and analyze the representations inside language model transformers to better capture the idea-space geometry, but that's a big research project.

5 comments

The internet has made it very easy for people to homogenize both words and ideas very rapidly. This is the cost of making knowledge universal I guess. Maybe it’s a good thing if it allows us to progress all on the same page rather than taking the time to understand differences? If progress is good. Whatever progress is. Who knows. Although the ideas aren’t really homogenizing in a lot of cases, it is often causing dichotomies. But really it’s no more than a few factions that reinforce their homogenizations. Ok I guess it’s bad… I am rambling as much as that essay. You have increased your follow count to 6 now, good show.
Just as an observation about this…I’ve increasingly begun to see the term knowledge expand to encompass information as if they are equivalent. That tracks, in my field, with changes in US K-12 education and the problems we now see in college learning behaviors.

I agree with your general ramble, but found this interesting in context.

Yeah, good point, should have said information. Knowledge cannot be transferred directly to people, only information. The person needs to understand the information to convert it into knowledge. An analogy I have heard used before is that data is the primitive, the integral of data is information, and the integral of information is knowledge.
The point of the thing for me was that the Crimson analysis showed no change in rate of decline since 1900. So there is a need for more examples over larger timelines before we start modelling, perhaps.

The analysis also seems sensitive to the mapping of words to categories. Some kind of robustness analysis to this sensitivity would also be interesting to see.

Yea exactly. This is survivorship bias - the prospects of today look at the winners of yesterday to guide their entry on how to communicate, they’re slightly more likely to get funded, etc

It’s reinforcement learning over decades

Emulating the past for success in the future is a form of idea homogenization.
Survivorship bias is when excessively high-risk behaviors are made to look good by only showing the winners. I think what you're referring to is evolution.
> Lexical homogenization is not the same as idea homogenization

Actually it IS evidence for idea homogenization. Words represent ideas. Unless you are claiming that the same word is used to represent different ideas, which would be even more confusing than having different words representing the same idea. Therefore fewer unique words => fewer unique ideas.

I would not call it a consequence of idea homogenisation but it might certainly lead to this. The basic scientific idea IMHO is compatibility of research. Which is important at least in a competitive and comparitive setting. If people call a measure like 'sensitivity' different in every adjunct field it does not help reviewers. If structural elements of a proposal are similar, it helps understanding quickly the key difference that remain. And this difference should be actually the actual idea, which is often the smallest part. We actually teach students to emulate style and do the same to write successful grant applications.This sure creates a bubble and in effect hinders outsiders to enter. That we see a gradual assimilation is IMHO rather an effect of available 'training material' and interdisciplinarity. Sure there are dangers that this might be an indicator for. However, do not misinterpret it in a way that it make a grant proposal more novel just because it uses totally different language...
When, for instance, a science is young, the same concept is often explained using many different terms. Over time canonical naming conventions emerge which standardize the settled parts of the field, leaving less settled future frontiers to be discussed productively with the aid of shorthand.

All named theorems do this in math, continually compressing the lexical space as a way of enabling further out ideas to be even expressed. Granted, math may be the except that proves the rule, but it is an important one.

For instance, the addition of boilerplate to every document would increase their similarity metric. This has certainly happened over that time period with required compliance statements.
> Lexical homogenization is not the same as idea homogenization,

It is for the word thinkers.