Hacker News new | ask | show | jobs
by hn_throwaway_99 2429 days ago
I didn't know anything about SuperGLUE before (turns out it's a benchmark for language understanding tasks), so I clicked around their site where they show different examples of the tasks.

One "word in context" task is to look at 2 different sentences that have a common word and decide if that word means the same thing in both sentences or different things (more details here: https://pilehvar.github.io/wic/)

One of their examples, though, didn't make any sense to me:

1. The pilot managed to land the airplane safely

2. The enemy landed several of our aircrafts

It says that the word "land" does NOT mean the same thing in those sentences. I am a native English speaker, and I honestly don't understand what they are thinking the second sentence means. Shot them down? If so, I have never heard "landed" used in that context, and it appears neither has Merriam-Webster. Also, the plural of aircraft is just "aircraft", without the s.

41 comments

My mother got a perfect 800 score on the GRE English test many years ago when she wanted to go back to graduate school after her children were grown up enough (highschool/college age).

She told me that the way she got her perfect score was by realizing when the questions were wrong and thinking of what answer the test creators believed to be correct.

She had to outguess the test creators and answer the questions wrong -- in the "right" way.

This seems like a similar situation.

I've had the 'pleasure' of taking some 'Microsoft certifications' at various companies I worked at in the past and this sounds extremely familiar.

"I probably won't ever do it like that and/or there's a syntax error in all four of the answers... but this is the answer you want to hear. It's wrong, mind you, but it's what you want to hear."

Reminds me of the 1 question I got "wrong" on a DOS test (years ago) at TAFE.

The question was "How do you delete all files in the current directory?". Using DOS 6.22 (I think, it's from memory).

My answer "del." was marked incorrect. Because the teacher didn't know enough about DOS to understand that's the standard shortcut for "del .". And the teacher refused to even try out the command, lets alone fix the incorrect mark. sigh

TAFE anecdote time!

In my TAFE class, I was asked to list two examples of operating systems.

I listed Linux and eComStation. The teacher had never heard of eComStation and marked me wrong.

Refused to correct my mark even when I proved him right. I'm still bitter about it a decade later.

Swinburne TAFE as well? ;)
Yep! You have to do away with conventional logic and ask yourself "What insanity would Microsoft recommend I do?"
It's not always insanity, sometimes just sub-optimal / way over-engineered in my opinion.

They're getting better at it though. More recently I've done their devops certification and it looks like they're recommending somewhat more sane practices now...

There were still questions where even after three or four tries at certification / reading up on whatever Microsoft thinks is 'good' we didn't find 'the correct answer' according to Microsoft though... ¯\_(ツ)_/¯

Yeah, that's true. It's still a good idea to get an idea of what a desired answer would be, which is why those answer dumps are so popular.
I'm a spatial thinker, and I got a similar problem, I see all answers as correct. Eg. which one follows this sequence, and I can find a pattern to all alternatives. And I have to figure out which option the test author think is correct.
Sometimes the questions are also just broken. I.e. asking you to select the things that do not apply, but the answer would have been to opposite.
Back when I took the 'C# certification' (70-483 I think?) there were multiple questions in the style 'which of the following answers will make the program compile', where all four answers had a syntax error, or the program had a syntax error at a different line that would cause an issue regardless of your answer.

I tried the dispute process but it's basically impossible to dispute / report broken questions unless you have a photographic memory.

I have achieved similar results by similar means in both English and certain other subjects wherein one would assume a “true academic” would “know better” (picking out Sin[x]=2 as being “evidence of error in prior working” when x could merely be Complex, or marking “f[f[n]]=-n as “unsolvable” when it’s just requires a bit of lateral thinking). This always depresses me, like when (as a Brit) I hear Americans say “I could care less” as an indicator of disregard, when actually that indicates they are somewhere above the point of minimal regard.
“I could care less” is sarcastic.
No, it’s lax.
This does seem like the meta solution to most tests, particularly standardised tests :)
Paraphrasing Simonyi: “Any test you can pass, I can pass meta”.
That's how I got through the SAT…
I think this is really interesting, because "the enemy landed several of our aircraft(s)" is the sort of sentence I'd have hauled a student up for using as a teacher, because 1) it's a none standard, arguably incorrect usage they've used either because they're a none native speaker or because they're trying to be clever and failing, and 2) because the plural of aircraft is aircraft. Nevertheless the author of this sentence almost certainly meant land to mean something different (shot down) than the author of the first, and we can infer the author's intended meaning despite the none standard usage. This poorly written sentence is the sort of thing you see all the time in the real world, especially from none native speakers, children, and people writing about a topic outside their expertise. If a program can spot the difference in the usage of the word land between these two sentences and infer what the intended meaning in the second sentence is, then it's doing pretty well. Just inferring that land is used to mean something different in the two sentences is less impressive but still pretty cool and I'm not sure which claim is being made.
If you teach others English, please learn the difference between "none" and "non". You mean "non-standard" in all your examples here (if British) or perhaps "nonstandard" (if American).
Sssh, pumpkin. We live in a world of autocorract.
Sssh dysgraphic. It's not an excuse.
Yup. They made the same mistake in "none native" (sic).

I'll admit that, as a non-native speaker, this fills me with glee.

(And non-native)
As someone who spends a lot of time puzzling out intent, I would infer they are using "landed" to mean "grounded" in that context.
I would have assumed the second used the term landed to mean acquired. But only after being told that it’s meaning is supposed to be different from the first. With no other context from those two sentences, I’d have guessed #2 Meant land the same way as #1

One other point: I’ve never heard the term “landed” to mean “grounded”, which is maybe the actual intent of #2, but maybe the ai sentence generation is off.....

The example directly below that: "Justify the margins" and "The end justifies the means" is the one I find dubious. Obviously the former could mean to format a document, but those exact words in that structure could be a demand for someone to justify a financial margin for example. It is both true and false depending on the context.
One of my favorite examples that I heard in a David Rock talk which I can no longer find on youtube: "Time flies like an arrow":

Time moves swiftly and in one direction.

Record the speed of flies in the same way you would an arrow.

Time flies, which are a kind of fly, are fond of an arrow. (e.g. Time flies like an arrow, fruit flies like a banana).

It sounds like you're talking about garden-path sentences [0], and in particular: "time flies like an arrow; fruit flies like a banana" [1]. These are sentences whose structure tricks the reader into making an incorrect parse. My favourite of these has always been: "The horse raced past the barn fell".

[0] https://en.wikipedia.org/wiki/Garden-path_sentence

[1] https://en.wikipedia.org/wiki/Time_flies_like_an_arrow;_frui...

I've always enjoyed the multiple valid parses of "Time flies like an arrow". I can't wait for AI to generate more Escher sentences like "More people have been to Russia than I have" ( https://en.m.wikipedia.org/wiki/Comparative_illusion )
You know, I only just now got the second interpretation of that sentence. I always thought of it like "Time flies like an arrow (straight and in one direction), Fruit flies like a banana (when thrown)"

Obvious in hindsight...

Same here, except it's comparing fly's flight trajectory to that of a banana is new to me.
"The horse raced past the barn fell, which has been haunted since all those teenagers were murdered there."

(Noun-adjective is a rare formation, but amusingly more common in the same situations where the author uses rare and archaic definitions like the adjective "fell".)

"I eat my rice with butter." could mean that you use butter as a utensil to eat your rice with. There is often an unlikely way of parsing the sentence that gives an alternate meaning. The point is to test the computer to see if it can distinguish the likely parse from an unlikely one.
These aren't really alternate _parses_ though (in the sense that they don't give different parse trees). They do highlight the different possible meanings of "with" though.

I think "I eat my rice with chicken" vs "I eat my rice with children" vs "I eat my rice with chopsticks" is the canonical example here.

There's a whole field in NLP involved in showing what changes happen to entities mentioned in a sentence as a a side effect of the sentence, and this example shows it pretty well.

Wouldnt those be different parse trees? Like the "with X" could either be attached either to the verb or the noun
I think it's more clear if you say "I usually eat X with Y", i.e. Y it's either the company, the tool or the condiment that you eat with (contrasted with "I'm eating my X", where X is a dish like "rice with chicken")
Yes, possibly.
A good demonstration that context (and cultural conditioning) is everything to understand what a text actually means.
Not to mention something that almost all NLP systems are resounding terrible at - short-term memory. If we've been talking about corporate financials for an hour and I say 'Justify the margins', it should be crystal clear what I mean. But most automated systems try to operate without a hint of memory or 'state' being tracked.
I'm guessing this is intentional. To a human, although this could be somebody being asked to justify their financial margins that's not a very likely answer. The human can easily see that, while it's possible they're the same meaning, given the lack of any other context the answer is that they're not.

The enemy could have landed several of our aircraft on one of their runways. Agassi may have beaten Becker over the head with his tennis racket. I suspect part of the test is that there can be other meanings that do technically work.

> The enemy could have landed several of our aircraft on one of their runways.

This is something that actually does happen. Less than 10 or 20 years ago, China did it to an US Air Force reconnaissance aircraft.

This is a good point I hadn't thought of. Honestly, I'm really not surprised anymore that the humans only scored 89%.
The ends justify the means.
The second one means "the enemy successfully got several of our aircrafts".

Specifically, definition 3a or 3b for the verb form here: https://www.merriam-webster.com/dictionary/land

So potentially the enemy captured the aircraft (3a) or destroyed them (3b).

Would a native English speaker use the word "landed" in this way? In the context of aircraft? "Landed" is badly ambiguous here and several distinct meanings are plausible. Captured is the most natural word given your interpretation.

Honestly that sentence -- the use of landed and that awful plural -- approaches engrish. Is that deliberate or is the use of English here just badly flawed? I can't see any other possibilities.

There are a lot of native English speakers in the world and not all of them use the same idioms that you do. This seems like perfectly valid English to me; some other words that could be used instead of “landed” in the aircraft sentence include “bagged”, “nabbed”, “poached”, “got” and “did in”. One of the entertaining aspects of English is the multitude of ways it can be used.
Those are all good synonyms for "got" in the context of shooting at things. But none of the others already has a strong meaning in the context of aircraft, and this other meaning does create some confusion, which is why many speakers would avoid it (if thinking clearly).
On the other hand they might go out of their way to use it to take advantage of the word play.
I wouldn't use it that way myself, but at the same time the intended meaning is clear as day to me from the context. I'm surprised by the reactions. "Enemy" should give it away immediately.
I'm surprised too. This algorithm is about understanding language, and surely that includes understanding the intended usage. This is something humans have to do all the time. So what if there isn't a formally archived consensus on the definition of "landed" as used in the example. The intended meaning is clear, and so hats off to the algorithm for rolling with it, that is in my mind the fundamental goal of understanding language.
It's more or less impressive depending on whether the algorithm already ate a dictionary; then it's the difference between inferring from context, as people do, and simply knowing all of the known unconventional usages in a very inhuman way.
I don't know. I guess I understood the sentence with 'landed' the same as I would have if someone told me that they'd 'landed a big job'. I wouldn't really say this myself though, although I hear people say 'landed a big catch' when they're talking about fishing.
Landed, with this meaning, is used in the context of successfully enticing someone to give you something. Like hooking a fish with bait.
FWIW, Landing a fish is not the same as hooking it. Landing a fish literally means pulling it to land (or boat).

So landing=catching=scoring.

Depending on the type of fishing, you can still be an underdog to land the fish after hooking it.

I don't think anyone would use that particular construction, unless it's some weird dialect of pilot-speak or argot among anti-aircraft folk that I'm not aware of. It's just really awkward and unnatural. Possibly correct, but not the way that anybody actually talks.
Possibly, you could say the planes were landed, as in forced to stay on the ground (because of damage, fear of enemy fire, or damage to the runway). But grounded would be better.
I think it's archaic; in the past a fishing reference would have been more common and widely understood.
I guess it annoys me because I suspect that if this is the sort of borderline incoherence one must wade through I would probably score below average.
Or just average. There's contextual dependencies in most speech, and (as displayed in this subthread) not every speaker of a language has the same context. It's a fallacy to think that if you lack context for one of the examples, you will automatically score less than average -- other people may miss context for things obvious to you.
For me, this context sounds like "damaged, but in a minor way which forced them to leave the battle/exercise/war/whatever and go land"
If taking the "captured" interpretation, I think it could be reasonably inferred that they successfully landed the aircraft at an airfield afterwards (same meaning). This was my initial read of it and it does not seem strange to me on reflection.

I would like also to point out that even if we do interpret the second as meaning "destroyed", the first could then be interpreted as a combat aviator shooting down an opposing aircraft, bringing us back to the same meaning. Or perhaps both of my interpretations are correct and the meanings are different...

What this tells me is that the benchmark is not very useful.

Landed in the sense of a fisherman landing a marlin.
So at the end of the process they were in possession of the enemy aircraft. Maybe they jumped across in mid-air and wrestled it off the other pilot.
The benchmark is useful primarily because it puts humans and computers on a level playing field. Human readers will misinterpret written language, and human writers will poorly represent concepts.

The propensity to make mistakes in comprehension is unavoidable, humans only approach 90% accuracy, and computers are getting close to the same level of accuracy on the same base materials as humans.

The other way of testing would be to devise a test where there is only a single interpretation, where the context is clear, and there is no ambiguity in meaning. In that case a competent human and computer algorithm could be expected to answer all questions perfectly.

The purpose of this benchmark on the other hand is to test comprehension when meaning is not explicit and context clues are implied, something humans have had the advantage at over computers until quite recently. The computer won't be 100% accurate, but that's not the purpose of this test.

My immediate thought was captured ie. "Iran successfully landed our UAV by transmitting false GPS data".

This language is used on the Wikipedia page about that incident.

https://en.m.wikipedia.org/wiki/Iran–U.S._RQ-170_incident

Aircraft typically get captured on the ground, or get forced to land by threat of being shot down. “Landed”, for me, would require the enemy to actively land the plane, just as “landing a fish” requires both the fisherman’s action and moving the fish from water to land.

I also wouldn’t use “landed” for destroying an enemy plane (neither by shooting it down nor by destroying it on the ground)

That, realistically, leaves hacking the plane’s electronics and then directing it to one’s own airfield.

Yes -- if the sentence had been "grounded the aircraft", then the meaning is obvious. But even though "land" is a synonym for "ground" I don't think there's an equivalence of meaning here. I'm struggling to find a sense in which "landing and enemy aircraft" is a meaningful concept short of jumping out of one plane to land on another one, removing the pilot, and landing the plane, which is a bit much for the single word "landed" to carry.
So many options for sentence number two.

- The enemy stole the aircrafts, and after some drama in flight managed to land several of them.

- The enemy used remote control to force them to land.

- The enemy used coercive force to force our pilots to land them.

- The enemy captured them.

- The enemy shot them down.

- During a friendly event while we set our differences with our enemy aside and agreed to fly each other's aircraft at an airshow for some reason, we landed several of theirs, and they landed several of ours.

- There was a hearing mistake and "energy" (as in energy beam beamed by a UFO) was accidentally transcribed as "enemy."

- The writer is just screwing with us.

- The writer is not a native speaker of English, and they made a mistake and actually meant that the enemy boarded several of our (parked) aircrafts.

- The writer is creative with language and believes that it would be cute to say that when an enemy projectile struck one of our aircrafts, then the enemy has "landed" that aircraft as one would land men on the moon or land rovers (no pun intended) on Mars.

- An ML algorithm from the future traveled back in time, writing specific SuperGLUE examples to poison AI research, thereby preventing the emergence of a competitive AI which would also master the secrets of closed timelike curves
Actually the algo was able to determine we exist in a simulation and perform meta programming by hacking the sim infrastructure (higher order dimensions of spacetime) and rewriting the future which to us appears that it traveled to the past.
> then the enemy has "landed" that aircraft as one would land men on the moon or land rovers (no pun intended) on Mars

Or perhaps as one would land a punch.

I think it's landed in the same sense as "landed a deal": got, or achieved, in this case achieving shooting them down.

For me, my first read of the sentence would definitely be that it means shot down.

Ahh, just found an example where that's taken from https://glosbe.com/en/en/land. If you find on that page you'll see the exact sentence "the enemy landed several of our aircraft" (without the s after aircraft) which it says means "shoot down".

I have still never heard landed used in that way, and again in other dictionaries I searched I couldn't find that definition either. Thus, this is a case where the "AI" may get it "right", and me, the human would get it "wrong", but that still feels like it's missing a huge point. It feels you could get a number of errors by the human which the AI gets "right", but in fact the human is better able to detect what is rare, uncommon or at least ambiguous.

I've worked in aviation for 8 years and also didn't understand this use of "landed". I've heard "grounded" used like this: "The maintenance issues gounded the jet," but not "landed".
Working in aviation probably puts you in a mindset that makes it harder to parse. It's not being used in a way that is related to flight or aircraft.

It's like if people were discussing where to have a conference, and one of them proposed a hotel. Then another person suggested a resort. Then a third person floated a cruise ship. Cruise ships do float, but it has nothing to do with anything. They are floating the idea of the ship as a venue.

Plenty of other HNers, myself included, don't work in aviation and still find this use of "landed" nonsensical.
Do you normally "float" a cruise ship though? A more apt analogy might be "dock". Maybe a news report says that a vacation company has broken some regulation so the government docked a cruise ship, meaning they took away a cruise ship like you would dock someone points. It's ambiguous at best.
You could float the idea of it, and you might also think that to float a ship means the process by which it is landed in the water when coming out of a dock?
I think the sentence is referring to aircraft that have been forced to land by the enemy, in contrast to "grounded" aircraft that had not taken flight.

I haven't worked in aviation so my understanding of terminology could be wrong, but either way it is definitely an unusual example.

"The enemy landed 4 of our aircraft" without context wouldn't generally mean "forced to land" imo (as a native speaker). It would mean that they either destroyed them or managed to acquire them.

For example I might say that "they landed 4 aircraft with their daring" if they forced us to abandon an air craft carrier (e.g. by sinking it) and then managed to steal 4 of the planes (before it sunk). Or I might say "they landed 4 aircraft with that bomb" if they dropped a bomb on an airfield and it destroyed 4 aircraft.

Right, I think you understand the word as I do: 'verb' + ed. "The enemy landed the jet" as in they forced the jet to land either directly or indirectly. This would mean that the two sentences use "landed" the same way. But my understanding is SuperGLUE's offical answer is that these use "landed" differently with the rational that "landed" is idiomatic and just means to procure or bring about (e.g. "I landed the job") and it happens to be used with planes.
A fishing boat can land a big catch - and a sales executive might have landed a big deal, perhaps after reeling them in or having them on the hook.

So this would be particularly apt wording if the enemy had thrown a net over the plane as it sank in the ocean.

But I prefer to think the enemy gifted british country estates to the planes.

I think if we really looked at it, it likely comes from fishing where "to land" a fish means to succeed in quite literally getting it onto land from the water. But we use it as "to successfully get" (something typically uncertain) in many other contexts.
though you can "land" a fish while still on your boat.

disclaimer: beyond pedantic, but 100% appropriate given the topic is NLP and idioms

Sure, although what is a boat but an island to a fish?
I agree, AI should realistically be able to detect the rare/uncommon/ambiguous usage as well, and rated for that.

I suppose in some case it could score better than humans on SuperGLUE benchmark.. but eventually it will have to come back down to near human score as it gets more accurate.

Why? In many of those benchmarks the average human score is not 100, but the AI progression doesn't really have a ceiling or a slow down at the human number. It should go through it and settle somewhere above. Plus we create these tests with our own limitations. There may be a world of more complexity or subtlelty that we all fail to grasp but the AI will.

I think humans are already behind at the face recognition task for example.

>If you find on that page you'll see the exact sentence "the enemy landed several of our aircraft" (without the s after aircraft) which it says means "shoot down".

They're not shy about illustrating a military application up front!

so this is why the human score is 89.8 :)
> I think it's landed in the same sense as "landed a deal": got, or achieved, in this case achieving shooting them down.

My buddy is a pilot and they always say "I landed the takeoff pretty good. PRETTY GOOD!"

I've never seen "landed" used as in the second sentence, but I was definitely able to understand from context that it was not being used to mean the same thing as in the first sentence.
Have you ever "landed" a deal? Or "landed" first strike in a game?
You've never landed a fish?
I haven't, though I'm familiar with that use of "landed" for fish.

As a lifelong native speaker (PNW English), I've also never heard "landed" used to refer to shooting down or capturing enemy airplanes. I could understand it from context, which is what I suppose the software is also going for, but I'd mark it with a red pen if someone showed me that sentence, just for clarity's sake (i.e. understandable from context but should be replaced).

'Landing' an aircraft does not imply shooting it down. 'Downing' an aircraft does imply that.

These uses of 'land' and 'down' are military euphemisms for the use of force to compel a reluctant pilot to land. The difference is the degree of violence used.

Involuntary 'landing' implies the aircraft is forced to land by a party other than the pilot because if the pilot did not comply the plane would be shot down or collide or crash. It usually implies survival of the pilot. 'Downing' also means involuntary removal of the aircraft from the sky, but does not denote that a violent landing did occur, only that the likelihood of violence is much greater because a (more abrupt) landing was forced upon the pilot. From what I've read, 'downing' usually implies the plane crashed.

Is "landing a fish" the same thing as "watering a plant"?
Have caught a fish though
Is land an acceptable habitat for a fish?
I think the difference in these sentences is about the way to land. In sentence 1, the pilot of the aircraft is in control. In sentence 2, the pilots are not in control, the enemy forced them to land (whatever the means).

If I read these two sentences in context of some news, they would evoke very different "landing" scenes in my head.

#2 is the same as landing a fish. i.e. to place on land what doesn't belong on land.
The only possible explanation I can think of is this.

3. a : to catch and bring in

// land a fish

b : gain, secure

// land a job landed the leading role

imagine enemy soldiers capturing a base or hangar ship including the aircraft.

It's kind of a stretch though.

This is definitely where the 10.2% of human failures are.
.
Ever tried taking SAT or GRE tests?
In looking through many of the replies to this downstream, it appears that the system is actually correct in that there's an obscure use of 'land' at play in the second sentence.

It makes me think that there's going to be many adversarial examples of text that humans parse one way because of common usage while machines parse another way because of details like this.

Colorless green ideas sleep furiously!

Search for it if you’re interested in its origin.

For #2, my immediate read was that the planes had been shot down. If the context were to suggest that the enemy had somehow hijacked the planes, then of course the word land would mean the same in both sentences.

I have never used or heard 'land a plane' in this context, but the sentence didn't immediately strike me as unnatural, incorrect or unclear.

> I have never used or heard 'land a plane' in this context, but the sentence didn't immediately strike me as unnatural, incorrect or unclear.

It struck me as pretty awkward and very ambiguous. It probably means 'obtained' but 'captured' would be a far better word in that case. The suggestions that it means 'hit/shot' don't work because in that case it's not the aircraft that is landed but the shot, which is landed on the aircraft.

Also the use of the incorrect plural "aircrafts" when 'aircraft' is both singular and plural makes me think it's just a poor question.

The very fact that there's so much discussion about it is evidence that it's not straightforward even among native English speaking humans.

Time matters too. Current tech would hint to us that the planes had been shot down.

But in the future that sentence might mean hacking and theft of the actual planes, an actual landing.

It means 'succeeded in shooting down' right? Seems pretty contextual, but understandable.
Seems like a really odd way of saying it but that’s what I’d think too, as in “landed their shots”.

This is either a poor question, or a really great question, if the goal of the test is to confuse computers where a human would normally say “huh, weird way of saying that but I guess they mean...”.

From the abstract of the associated paper: "performance on the benchmark has recently surpassed the level of non-expert humans, suggesting limited headroom for further research."

It occured to me that hn_throwaway_99's question, and the responses to it, is the sort of dialog in which one could find additional headroom for further research into natural language understanding. We can understand, for example, that while the two uses of 'landed' are different, they are not completely unrelated, and we can explain how they are related, for example by introducing a third construct, 'landed a fish', as a couple of replies have done.

Limited headroom? Seems like they're assuming greater-than-human language ability is just impossible and will never be surpassed.
I'd argue that greater-tham-human language ability is by definition useless.

Language is specifically a human communication tool, there's no value in surpassing the language skill that humans have, if indeed such a thing is even meaningful (what does it mean to be better than the best* French person at French?)

* By whatever language-related metric

I disagree, greater-than-human-average is not useless. There's a lot of room for misinterpretation in human language. We compensate for that by non-verbal communication (posture, expression) or by asking for clarification. On top of that, most places have local expressions or idioms that are not necessarily globally recognized.

So there's two ways in which a language automaton must be better than human: it cannot rely on non-verbal hints nor can it easily ask for clarification, and it must be able to interpret many different dialects and idioms correctly -- many more than an average human would need to.

I do not think this result is that close to a greater-than-human language ability in general, and I do not think they are claiming it. I think the point is that, with scores on this test closely approaching average human scores, there is not much headroom for this particular test to drive, or measure, further progress.
It's a reasonable assumption if only for the simple reason that humans said the sentences being tested, so how would you surpass that?
You create a new test designed by your newly better-than-human language experts.
So, here is the thing. ML shouldn't just be about learning rules. It should be about actually learning, and understanding.

Just because you've never heard the word used that way, you were able to infer it meant something different. Even with the use of aircrafts.

We all make mistakes when writing or speaking. We don't let that get in the way of interpreting the information being passed. Even if we post comments that contain errors.

Yes, the second should be, "The enemy downed several of our aircraft." Landed can be used to mean "bagged," as in, "We finally landed the Smith account," (it's a fishing term), but it should not be used in this figurative sense when referring to aircraft, because of the obvious confusion with the common, concrete sense of the word. And, yes, it should be aircraft.
The fact this comment sparked so much discussion with some agreeing and some disagreeing says to me that Google did about as well as a human.
The plural of aircraft is aircraft. Not aircrafts. https://www.grammar-monster.com/plurals/plural_of_aircraft.h...

Maybe the _examples_ for a language test should be grammatically correct?

It depends on what your goal is. But in most cases, I'd say no. If the goal has anything to do with understanding real language written by real humans, it's better for the system to be able to handle texts with errors.
True, but having some noise in the label is actually good for generalization. If it's only learned on perfectly correct sentences then its tolerance for mistakes will be very low.
Maybe the examples for a language test should use language that people actually use every day.
It's weird, because I understood the second one as meaning shoot down, yet to me that's the same definition of landed. You just assume the enemy didn't land them gracefully without a scratch, because they are well, enemies.

So I would have answered that the word meant the same thing.

> One "word in context" task is to look at 2 different sentences that have a common word and decide if that word means the same thing in both sentences or different things (more details here: https://pilehvar.github.io/wic/)

Can anyone explain what makes this difficult for a machine? What existing knowledge does the machine start with? At a glance, it doesn't feel like it should be difficult if the machine had a large corpus to train on that showed many examples of each words in different contexts.

1) The pilot [voluntarily] brought down his aircraft.

2) The pilots [involuntarily] brought down their aircraft [because some authority figure(s) forced them down.]

The active verb 'land' can be performed by different actors: pilot vs a more powerful agent (usually who flies an armed aircraft). The voluntary/involuntary agency is a subtle difference that only those familiar with this military practice are likely to grok.

> I am a native English speaker, and I honestly don't understand what they are thinking the second sentence means

Clearly the enemy conferred lesser nobility and commensurate landownership unto said aircrafts. https://en.wikipedia.org/wiki/Landed_gentry

I believe it’s “land” in the sense of “land a fish” (or a prize in general) which is a less common but legitimate usage.
Perhaps the enemy obtained several of our aircraft. In the same sense as one might land a new car in a contest.
Possible, but also the worst context to use land in. You land a car, but if the game host would say you landed a small airplane, there’d be a laugh from the crowd.
The example looks like they're not written by native English speaker. It's funny reading English tests from other countries that are not English speaking because a lot of it focus on pedantics that are long lost while following a convention that would to us just feel _different_.
In option 1, the aircraft met the ground gently and safely.

In option 2, the aircraft met the ground violently and lethally.

Not necessarily. Maybe they captured it.
Yeah I don’t get the s at the end of aircraft. Landed, it would seem, would be land as in acquire, although that’s a bit odd of a construction. It seems rather forced. It may possibly mean that the aircraft were forced to land by the enemy. So it’s a tortured construction.
2. Reminds me a of a theory that Iran landed an American stealth drone by sending spoofed signals.
Still ambiguous. Landed as in make it contact the ground or landed as in obtain, like in landing a job?

For me taking an airborne object and making it touch the ground is pretty much the same meaning whether it's from the inside or remotely or shooting it down.

Yes. I think "ambiguous" is the best word to describe all of this.
I'm not a native English speaker and it is pretty obvious to me what both sentences mean.
That might help. I don't think a native speaker would ever say it this way.
Yeah, I gues you can use landed in this way but you would never use it with "planes" because it would make the whole thing awkward and ambiguous.
I think they might going for 'landed' as in 'landed a deal'. Maybe?
Well "he landed the deal" implies a score or a hit. So to say they "landed" the planes could vaguely make sense but it is hardly good English. They might have been thinking of "grounded"?
'Grounded' means the plane could not take off. It was on the ground and must remain there.

Landing a deal (or a fish) is like landing a plane. A human acts to cause a desired outcome. Unlike forcing a pilot to involuntarily land a plane, the perspective of the fish as involuntarily being forced to land is not a necessary inference for this use of 'land'.

Geez, language can be subtle.

I understood 'landed' as an euphemism for 'shot down'.
I think people are digging too deep for an answer here... it seem to me to be a simple mistake, which on the scale at which they're evaluating those models is not statistically significant.
It's being used by analogy with "landing a fish". I've never heard it either, but I could believe it's in the argot of military airmen in some English-speaking country.
It's conceptually the same - having an entity go from water or air to the the ground. The hard part would be to associate the fact that there's no way for an 'enemy' to land the aircraft other than to do so forcibly which implies shooting it down.
That's probably why humans have 89,9% and not 90,1% :p
It sounds like the “landed” in 2. is similar in usage as “landed” used in the turn of phrase “he landed the deal.”
Same as "to land a punch". To successfully hit a target.
The second implies that the aircrafts were shot down; the first states that the aircraft landed safely. It looks like this reduces to the machine being able to figure out whether or not something is good or bad for the speaker.
Good point and most of the replies ignore the key point to me which is; You are right about the plural of aircraft and the benchmark is horribly wrong, so why should we take any notice of this benchmark?
Probably: The enemy grounded several of our aircrafts
I landed this job