“Hmm… that makes sense,” said Ben.
“Uh, you sound surprised,” was my response.
“Well, usually when people start out by saying ‘Here’s my take on that,’ it tends to not be that helpful.” Well Ben’s a smart guy, and if he hints that it might be helpful, that was enough to get me to write this post, which has been rattling around in my head probably about since blogs were invented.
My “take” had been prompted by another iteration of a typically tedious discussion that you may also have been involved in periodically: whether “data” is singular or plural. For those of you too young to know, or too smart to care, “data” is the Latin plural of “datum,” meaning “a piece of information.” So when a hapless person would say “There isn’t enough data,” grammar snoots would correct them and try to get them to say “There aren’t enough data.” It’s been a losing battle. (David Foster Wallace fans will know that he actually called these people SNOOTS, all caps; an executive summary is here.)
So why do all of us troglodytes (we troglodytes? no, I guess it is “us troglodytes”) insist on treating “data” as singular? But wait, we don’t! “Car” is singular, but we don’t say “There isn’t enough car,” at least not routinely. The answer is that we are treating the word “data” as a mass noun, like “rice,” “water,” or “sand,” as opposed to a “count noun” like “car.” We say “there isn’t enough rice” without any problem. (Incidentally, I had originally thought that the word for this is “collective noun,” but that refers words like “baggage” [a number of bags], “library” [a number of books], or phrases like “a pride of lions” and other terms of venery.)
The conclusion I had come to is that before the “information age,” there just weren’t “that many data,” and you could count them one by one. Since the arrival of the computer, we are, to paraphrase Torricelli, “swimming at the bottom of a sea of data,” and it is no more practical to enumerate data one-by-one than it is to count grains of sand on a beach. So it seemed to me that this was a shift in usage prompted by a technological change, not simply by ignorance.
I was very satisfied with this explanation until, in the process of writing this post, I went to check out what I thought might be an early use of the word, recalling Sherlock Holmes saying “Not enough data” as he mulled over a difficult case. So I fired up the OED on line (so awesome not to need the magnifying glass!), and eventually found (after clicking on “full entry”), the following usage note under “data”:
The use of data as a mass noun became increasingly common from the middle of the 20th cent., probably partly popularized by its use in computing contexts, in which it is now generally considered standard (compare sense 2b and the recent uses cited at datum n. 1b, some of which are ambiguous as to grammatical number). However, in general and scientific contexts it is still sometimes regarded as objectionable. Compare the plural uses cited at datum n. and the following:
Snoots on parade! And pretty much what I expected. But when you read the entry on data, you find some shockers:
This is “data,” used as a singular count noun, with “datas” as plural. WTF!?
Perhaps more interesting, the use of “data” as a mass noun has a long history:
In fact, to find unambiguously plural usages of the word “data” you need to look at the OED entry on “datum.” The earliest one is:
Merriam-Webster online has what seems to me a very sane usage note:
Of course this sent me scurrying to the Chicago Manual of Style; in 5.220 of the 16th edition (“Good usage versus common usage”), we see:
Uh, so I guess “computing contexts” are not considered “science.” Oh well.
The usage examples from the OED suggest to me that if you are pining for the days when an educated person would know beyond question that “data” is the plural of “datum,” your nostalgia is for first-century Latin, not eighteenth-century English. Condemning “data is” has a bit more historical basis than some other snoot shibboleths, such as the proscription on ending a sentence with a preposition, but that historical basis is not from English. In fact, many of these “rules” may arise from the same group of Latin-obsessed 17-century introverts. There seems to be a rabbit-hole the size of the internet to slide down here, in which even cherished stories about Winston Churchill are demolished.
But back up and out the rabbit-hole: “data is”-ers, take heart! Not only does it make more sense, for the last half-century or so, to use data as a mass noun—and hence say “data is”—but you have over three centuries of English usage to back you up!
PS With apologies to Duke Ellington for the title, and to you for having to put up with an ad on that youtube video.
PPS Of course, my own inner snoot was interested to find this in the OED under datum:
Wait, don’t you mean “mass noun”? I felt better about not have previously understood the difference.
PPPS I was delighted to find our old friend Cal Mooers in the OED with the first “mass noun” reference under “Computing” (2b).
PPPPS The dialog with Ben at the start was reconstructed from (my poor) memory, and might get edited if I can manage to communicate with him.