blog

Posts Tagged ‘data’

回収 vs. 収集 and Better Word Meanings Through Usage

木曜日, 9 月 18th, 2008

Bailey just asked me what the difference between 回収 (kaishū) and 収集(shūshū) is—two words that would both map to the English verb “collect.” I intuitively came up with a hypothesis to explain the distinction:

  • 回収 may take things away from others when collecting while 収集 does not have that implication.
  • Things that you 回収 may have been previously distributed by the actor themself while 収集 does not have that implication.1

Not content with armchair theorizing, however, I decided to take advantage of one of the largest corpora in the world: Google.2 To test my hypothesis, I chose two “objects of collection”, one you can take away (and often is distributed first) and one you can’t take away: アンケート (ankēto “survey,” from the French enquête) and 意見 (iken “opinion”). I then took the four resulting collocations3 on Google in quotes (“•”) and recorded how many hits there were.

(続きを読む…)


  1. This second point could also be hypothesized based on the component meaning of 回, which in the verb 回る (mawa=ru) can mean “circle back.” 

  2. Google is of course a huge corpus but it has very limited search and can easily be misused and misunderstood, thus making Google an unreliable (unprofessional) source for statistical data. One Google alternative for some different statistics is the n-gram data they offer for research. 

  3. ”Collocation” on Wikipedia says: “Within the area of corpus linguistics, collocation is defined as a sequence of words or terms which co-occur more often than would be expected by chance.” 


© 2006-2008 mitcho (Michael 芳貴 Erlewine).
Proudly powered by WordPress.
Entries (RSS) and Comments (RSS).
The views expressed on these pages are mine alone and do not
reflect those of my employers and clients, past and present.