-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dependency relation acl in French (and Spanish and Portuguese) #4
Comments
I have checked the first ~ 25 occurrences in UD_Spanish-GSD and agree that they are either GSD annotation errors or errors in conversion from GSD to UD. Many of them should be |
Same situation in UD_Portuguese-GSD: 791 |
Hi Dan an everyone, Best, |
@dseddah : I think it should. Sometimes it is difficult to establish that a rule must hold strictly in all languages but I think |
@jheinecke: I didn't manage to find the same numbers.
How do you find the 1542 |
@bguil, @jheinecke : I also have 447 cases in UD_French-GSD 2.1: http://hdl.handle.net/11346/PMLTQ-XYAO |
maybe the next validator version could also include the notion of threshold below which a non-compliance to a rule should be qualified as residual errors, which are almost always unavoidable? |
If it is unavoidable because the grammar/guidelines allows it, then it is not a matter for the validator. If it is "unavoidable" because annotators make errors, then it is actually avoidable :-) |
After all this time treebanking, I don't believe in error-prone annotations for anything longer than a toy dataset (I'm sure you don't either btw.) |
@bguil , @dan-zeman You are right, I took by error the version 2.0. Much has been corrected since. After I |
@dseddah : You're right, I don't believe in error-free annotation. But I do believe that a subset of the errors can be detected automatically and eliminated (although the elimination sometimes has to be done manually). |
" I do believe that a subset of the errors can be detected automatically and eliminated although the elimination sometimes has to be done manually" |
Well, the beauty of releasing corpora on GitHub is that if we find those POS errors, instead of getting to know and learn them by heart for all those years, we can just fix them, commit, and they're gone in the next release :) But on the current topic: I don't think we should forbid VERB --acl:relcl--> __ since sometimes this is a legitimate construction. For example in English: http://match.grew.fr/?custom=5b2d3cd1a63d5&[email protected] |
@amir-zeldes : I think there has been this discussion somewhere already but I cannot remember where. Anyways, acl “stands for finite and non-finite clauses that modify a nominal”, which is not the case of the example you gave. (BTW, some of the examples I saw in other languages resembled this English one. But it does not mean that their annotation is not a violation of the guidelines.) |
advcl:relcl, then? |
pattern { GOV [upos = "VERB"]; GOV -[acl:relcl]-> DEP; }
@dan-zeman I didn't know that, so what is the current recommendation? I don't know whether a special subtype of relative clauses is needed for when they modify verbs (it would be quite rare), and if so, I think advcl:relcl is maybe the wrong way around: I think it's a relative clause first, as indicated by the relative pronoun, and happens to modify a verb second (so maybe acl:advcl?) But if you think about it, relative clauses can modify all sorts of things and I'm not sure this is reason enough to give different labels: NOUN (normal): the day which we agreed on The class of token being modified is already marked on that token's part of speech, so I don't see what is gained by more subtyping. |
I do not know what is the current recommendation (@sebschu, @manning?) But if anything with a relative pronoun qualifies as a relative clause, then it was probably wrong to assume that relative clauses are always a special case of @amir-zeldes : note that the part of speech of the parent token will not tell you everything. If it is a
|
It is yet another case where the correlation between structural form (relative clause) and syntactic function (modifying a nominal) is less than perfect. We need to review all these cases systematically for v3, but for the time being I think the best compromise is to extend the use of acl:relcl to cover 2 as well (better than introducing advcl:relcl, since it is not really an adverbial clause either). |
@dan-zeman yes, that's a good point, it's like nmod modifying a nominal predicate vs obl modifying the whole predication. I'm with @jnivre in preferring acl:relcl for these and not modifying the current inventory if possible. |
Relative clauses seem like such a common strategy cross-linguistically that it feels odd to have them as a subtype anyway. Maybe we should make |
All the examples given by @amir-zeldes are translated in French with "ce que": |
For me parataxis, notwithstanding its use for parentheticals, implies a certain degree of independence - it's like a coordination without an explicit conjunction. These relative cases seem clearly subordinate for me, at least in English, so I wouldn't want to use that label. In French I guess you could say 'ce qui ...' is an independent NP, so it's more natural to use parataxis (there's no formal subordination, although both sides of the parataxis are not of the same category, which is maybe unusual). |
Hello
The UD annotation guidelines define the
acl
relation asThis implies the head of an
acl
deprel is only in rare cases a verb. However, in the UD_French-GSD (and UD_Spanish) I found 1542acl
(withoutacl:relcl
) relations with a verb as head (24% of allacl
, 2336 or 49% for UD_Spanish). All other treebanks (including French-sequoia, and Spanish-AnCora) have less than 1% of allacl
relations linked to a verbal head. I wonder whether this is an error, since I expectedccomp
(orxcomp
) in these cases.regards
Johannes
The text was updated successfully, but these errors were encountered: