回収 vs. 収集 and Better Word Meanings Through Usage

Bailey just asked me what the difference between 回収 (kaishū) and 収集(shūshū) is—two words that would both map to the English verb “collect.” I intuitively came up with a hypothesis to explain the distinction:

  • 回収 may take things away from others when collecting while 収集 does not have that implication.
  • Things that you 回収 may have been previously distributed by the actor themself while 収集 does not have that implication.1

Not content with armchair theorizing, however, I decided to take advantage of one of the largest corpora in the world: [[Google]].2 To test my hypothesis, I chose two “objects of collection”, one you can take away (and often is distributed first) and one you can’t take away: アンケート (ankēto “survey,” from the French enquête) and 意見 (iken “opinion”). I then took the four resulting collocations3 on Google in quotes (“•”) and recorded how many hits there were.

“意見を収集” “意見を回収” “アンケートを収集” “アンケートを回収”
218000 6200 784 169000

A better way to organize this data is as follows:

“↓を→” 回収 収集
アンケート 16900 784
意見 6200 218000

This data clearly supports the hypothesis I laid out above: アンケート, which can be taken away from people and is often distributed first, occurs much more likely with 回収 than 収集. 意見, on the other hand, which crucially cannot be taken away when collected, occurs much more likely with 収集 than 回収.

While this one example doesn’t prove anything in and of itself, it does help clarify with data a nuance between two near synonyms. While my hypothesis was borne out here, native speaker intuitions on word nuances and distinctions can be unreliable.4 This type of quick test can be very helpful for language learners and instructors alike.

Languages very often have words which vary in very subtle ways. Just this Tuesday I went to a Tokyo Language Exchange Meetup, a great [[meetup.com meetup]] which brought together various language learners and enthusiasts. A hot topic that night was words with very similar meanings—near synonyms. A few English learners were lamenting sets of words like {see, view, watch} and how difficult they are to learn. I myself have had the same experience studying Mandarin.

I noted that these difficulties in offering contrasting definitions often are due to the fact that word meanings are not just “what the word points to” but also the implication of “what it relates to”.5 For example, “unborn baby” and “fetus” may point to the same thing, but are used in different contexts, in contrast to different other terms, for differing effect. Similarly “Death Tax” and “Estate Tax.” “Kneel” and “genuflect.”6

The concept of word meanings being “what it points to” and “what it relates to” also helps explain why certain words are difficult to translate. Fillmore uses the Japanese example of ぬるい (nurui) which is the de facto translation of “lukewarm.” However, some Japanese speakers will only use ぬるい in contrast with “hot,” i.e., hot tea can become ぬるい over time but ice water does not become ぬるい. In contrast, English “lukewarm” can be used to describe things that are initially or prototypically hot or cold. “What the words point to” in this case is the same but “what it relates to” or, here, “what it contrasts with” is different, making it an imperfect (though very close) translation.

Every language has near synonyms which vary slightly in nuance but this nuance or “feeling” is borne out objectively in data. Looking at what words certain terms relate to in real usage is often the key to getting a richer understanding of vocabulary.

  1. This second point could also be hypothesized based on the component meaning of 回, which in the verb 回る (mawa=ru) can mean “circle back.” 

  2. Google is of course a huge corpus but it has very limited search and can easily be misused and misunderstood, thus making Google an unreliable (unprofessional) source for statistical data. One Google alternative for some different statistics is the [[n-gram]] data they offer for research. 

  3. [[collocation ”Collocation” on Wikipedia]] says: “Within the area of corpus linguistics, collocation is defined as a sequence of words or terms which co-occur more often than would be expected by chance.”

  4. Hm… I just made a claim… looking for a citation. 

  5. “Relates to” here is not meant in an etymological sense. In [[frame semantics (linguistics) frame semantics]], a part of [[cognitive linguistics]], the “what the word points to” may be called a profile while the “what it relates to” is called the (semantic) frame. These distinctions are due to the work of [[Charles J. Fillmore Fillmore]] 1976.

  6. The great examples in this section come from Bill Croft and D. Alan Cruse’s Cognitive Linguistics, 2004