Contribute: how your language identifies its arguments
Earlier today I blogged on three different strategies languages use to mark the roles of different arguments: word order, marking on the arguments, and marking on the verbs.
I gathered some data from the fantastic World Atlas of Language Structures to put together a survey of many of the languages on the Internet. For each of the languages, I got the canonical word order and whether the language marks the role of its argument on the verb and/or the arguments themselves.
As you can see, there are a number of data points that are still missing. Please contribute information on the languages you speak! You can edit the spreadsheet on Google Docs. Thanks!
Tags: arguments, coding properties, contribute, data, grammatical relations, language, linguistics, Mozilla Planet, ubiquity
If you enjoyed this post, make sure you subscribe to my RSS feed (optionally with tweets from my Twitter)!
2月 18th, 2009 at 10:46 am
Italian filled.
2月 18th, 2009 at 1:59 pm
You have English listed at not marking on verbs, but that seems to fail to take into account the third-person singular, which generally is different than any other form (e.g. "likes" vs. "like", "sings" vs. "sing", etc.).
2月 18th, 2009 at 3:09 pm
Argh, I wanted to contribute but someone beat me to the punch. However, I can confirm the data added for Polish. I will add my highly redundant two cents. Order is highly flexible for most utterances.
Mitcho pisze artykuł. {Mitcho} {is writing} {an article}.
Mitcho artykuł pisze. (Emphasizes what Mitcho is writing) Pisze Mitcho artykuł. (Emphasizes present tense, VSO often favored if used as an introductory clause) Pisze artykuł Mitcho. (Awkward, but possible) Artykuł pisze Mitcho. (Also somewhat awkward, but possible, esp. in a subordinate clause of some kind, or answer to the question, "/What/ is Mitcho writing?") Artykuł Mitcho pisze. (Same as above)
So an order may emphasize something, but it depends on tone and context. Note also that the subject may frequently be dropped, at which point it forms kind of an electron cloud around the sentence or clause. It could be reinserted in a variety of different places, but there are usually clearly ungrammatical options.
Marking occurs on objects. To my knowledge not on the subject, perhaps related to the fact that it is so frequently dropped.
On kupił jej/niej kwiatka. {He} {bought} {her (dative, unmarked form ona)} {a flower (accusative, unmarked form kwiatek)}.
Verbs are marked for person and tense, and a good thing, too, since the subject of a sentence is frequently dropped.
Jedliśmy śniadanie. [We] {were eating (unmarked form jeść)} {breakfast}.
Alright, that was fun. Keep the posts on coming, and good luck with work!
2月 18th, 2009 at 5:26 pm
I'm not 100% about modern spoken Icelandic - I'll look around a bit about it later - but Old Icelandic has rather flexible word order.
2月 19th, 2009 at 1:38 am
Gordon, thanks for pointing that out. Indeed, you're right, although it sometimes isn't very useful of a marker. Speakers often vary on whether the agreement is , and there are different prescriptivist rules too. For example, would you say "The Beatles were a great band" or "the Beatles was a great band"?
An even better question: "Coldplay is a great band" or "Coldplay are a great band"? Here the Brits disagree with my American intuitions.
2月 19th, 2009 at 1:39 am
Gordon, thanks for pointing that out. Indeed, you're right, although it sometimes isn't very useful of a marker. Speakers often vary on whether the agreement is syntactic, semantics, or "just whatever sounds right," and there are different prescriptivist rules too. For example, would you say "The Beatles were a great band" or "the Beatles was a great band"?
An even better question: "Coldplay is a great band" or "Coldplay are a great band"? Here the Brits disagree with my American intuitions.
2月 19th, 2009 at 2:44 am
[…] blog « How natural should a natural interface be? Contribute: how your language identifies its arguments […]
2月 19th, 2009 at 6:49 am
Well, personally, I'd say that the Beatles are a great band, but maybe that's just me.
No, but to seriously answer your question, I would say "The Beatles were a great band" and "Coldplay is a great band". I've witnessed this argument before, but I've also seen worse. You should have seen the quibbles about whether the "T" in "The Beatles" should be capitalized when in the middle of a sentence (i.e. is "the" part of the name of the band?).
2月 19th, 2009 at 6:03 pm
I'm glad to see a linguist is on this project!
Modern Hebrew is actually SVO (unlike Biblical Hebrew, which was VSO). I corrected the entry in the spreadsheet.
Regarding agreement and case-marking morphology, Modern Hebrew has: * Verb agreement: person/gender/number in the present tense, and additionally person in the past and future tenses. Ordinarily the second person future forms are used as imperatives, which would seem to be most relevant to Ubiquity. Pronominal subjects can be dropped in the first and second person past and future, as they are redundant given the verb agreement. * There is no present tense inflection of 'be'; instead, the pronouns 'he'/'she'/'they' are often serve as a copula, though they are optional: e.g. 'The boy is young.' could be said as {the-boy he young} or {the-boy young}. (I don't know how relevant this is to Ubiquity, though.) * Case marking: Nominative case is unmarked. Accusative case is marked in pronouns and with a particle preceding definite nouns. A few single-letter prefixes which can serve as case markers/prepositions: l- 'to; for', b- 'in; with (instrumental)'; k- 'as'. These prefixes as well as standalone prepositions have forms with pronominal enclitics (e.g. li 'to/for me', lo 'to/for him', etc.) * Definiteness: The prefix ha- (written h-) marks common nouns as definite. * Gender system: Two genders, masculine/feminine. Adjectives must agree with the nouns they modify in gender and number. * Number: Singular and plural—there are regular plural suffixes for each gender, though some nouns have irregular plurals. * The first word in a subordinate clause is typically prefixed with complementizer še- (written š-).
Arabic is similar in spirit (and history) to Hebrew. Some important differences: * Dialect variation (would we expect Ubiquity users to type in Modern Standard Arabic)? * Freer word order and more case marking * No present tense per se: just a perfect and an imperfect * Many nouns have "broken plurals" which follow no simple regularity
The main difficulties for a Ubiquity parser of Hebrew/Arabic, it would seem, are orthographical: ambiguities from the lack of representation of vowels and from the use of prefixes as markers; and the right-to-left order, which can make things difficult when Hebrew or Arabic is used in combination with English or another LTR language.
2月 19th, 2009 at 11:55 pm
Don't Spanish verbs work the same way as French verbs?
2月 20th, 2009 at 11:21 am
[…] blog blog « Contribute: how your language identifies its arguments […]
2月 20th, 2009 at 8:24 pm
Okay, from some quick glancing around, as far as I can tell, modern Icelandic is much less prone to shifting its verbs around to peculiar places than its predecessor. However, as far as I can tell it's still definitely true that case-marking and verb inflection is much more important for determining what roles nouns are playing in a sentence than word order.
2月 24th, 2009 at 8:30 am
[…] you to everyone who contributed data to how your language identifies its arguments! The data collection is ongoing so please contribute data points for languages you […]
3月 4th, 2009 at 7:53 pm
I just filled in Esperanto (arguments marked). The direct object always has the ending -n.
But I notice that Esperanto, as well as Slovak and Slovenian, are listed with word order as "SVO *". What's the asterisk for? If it means "not strict", that's accurate for Esperanto, but the notation should be defined on the spreadsheet.
10月 14th, 2009 at 1:40 pm
Great tips. Thanks for sharing with us.