3rd Conference
The Evolution of Language
April 3rd - 6th , 2000




From sensorimotor praxis and pantomime
to symbolic propositions

Stevan Harnad

Department of Electronics and Computer Science
University of Southampton
Highfield, Southampton

What lies on the two sides of the linguistic divide is fairly clear: On one side, you have organisms buffeted about to varying degrees, depending on their degree of autonomy and plasticity, by the states of affairs in the world they live in. On the other side, you have organisms capable of describing and explaining the states of affairs in the world they live in. Language is what distinguishes one side from the other. How did we get here from there? In principle, one can tell a seamless story about how inborn, involuntary communicative signals and voluntary instrumental praxis could have been shaped gradually, through feedback from their consequences, first into analog pantomime with communicative intent, and then into arbitrary category names combined into all-powerful, truth-value-bearing propositions, freed from the iconic "shape" of their referents and able to tell all.

The attendant increase in speed and scope in acquiring and sharing information can be demonstrated in simple artificial life simulations that place the old and new means into direct competition: Symbolic theft always beats sensorimotor toil, and the strategy is evolutionarily stable, as long as the bottom-level categories are grounded in sensorimotor toil.

If you have a preference for "hard evidence" you approach the problem of language origins at your own peril. It has been suggested that language is a kind of organ (Chomsky 1972; Pinker & Bloom 1990), but even if it is, it must share the fate of that other organ of which any language organ is surely only a part, the brain, namely, that it leaves no trace after its demise, or at least no trace that anyone has yet given a convincing functional (let alone cognitive) interpretation (Holloway 1970, Wilkins & Wakefield 1995).

So in pondering the origins of the language organ, we cannot expect much help from the fossil record. But if soft organs deny us hard evidence, surely their behavioral functions are even more evanescent: "Verba volunt, Scripta manent," to be sure, but writing surely arrived on the scene too late in the day to help us (Harnad 1991).

Some have turned, in desperation, to other traces: tools, weapons, drawings (Isaac 1987). They have hypothesised that either language was necessary to make and/or use these artifacts or that there is some formal or functional commonality or co-dependency between the capacity to make or use them and the capacity to speak. But hypotheses are hardly evidence, let alone hard evidence, and in and of themselves, artifacts are just artifacts.

Others have looked to contemporary species, rather than ancient ones, for a clue (Greenfield 1992), but hunting and tool-making and -use can hardly be said to be garrulous activities today, so they are not very compelling evidence for loquacity, let alone its origins, long ago. Moreover, contemporary evidence for rudimentary tool-making and use in animals that have neither language nor the ability to acquire it goes against the idea that these functions have much in common.

The other prominent functional commonality that has been proposed is between language and consciousness (Rolls 1997, 2000). Some have thought language was a prerequisite for consciousness; but if this were true, it would deny pain to all creatures who were incapable of expressing it in words, and surely that's wrong, although, again, there is no "hard" evidence for it, short of BEING the mute animal feeling the pain (Harnad 2000).

The impossibility of determining empirically whether or not a non-speaking creature is conscious – otherwise known as the "other minds" problem – does suggest one genuine point of commonality with the problem of language origins, however: Consider the problem of "demoting" explanations, when you try to attribute consciousness to an animal or a machine: You can always interpret a system "as if" it were conscious, even an inert, nonbehaving system like a book on a table. You can say the book knows it's on the table, wants to be on the table, etc. Who's to say otherwise? The data are perfectly compatible with that interpretation, it's just that the interpretation seems both unnecessary and wrong. We can understand the book's being on the table without having to infer that it is conscious. If anyone claims the consciousness is necessary, we can ask why? There is no physical reason why a book needs to know it's on a table in order to be on a table.

Demoting conscious interpretations of inert systems is trivial. It's only a bit harder to demote dynamic, performing systems: Physics and engineering do not require, nor can they use, a conscious explanation of the functioning of a thermostat ("it feels hot," "it wants to turn off the furnace") or a car, an airplane or a computer. The performance of each of these systems is fully explained by mindless mechanisms.

The trick is to show where the need for a conscious explanation kicks in: What functional capacity CANNOT be explained without recourse to consciousness? Behaviour-analytic psychology gave us operant and Pavlovian conditioning, but do these require consciousness? It is easy to build devices, not very different from thermostats, whose performance is shaped by their history of associations or their history of reinforcements, especially in this era of neural nets and other computational models. So any behavior that can be reduced to an operant or a Pavlovian explanation has been demoted to a mindless explanation.

And no behavior seems to be immune to this sort of demotion: Show me a creature, human or nonhuman, that you think is managing to do something only with the aid of consciousness and it will invariably be easy to show that the same thing could be accomplished by a mindless mechanism merely shaped by the consequences of its behavior, just as Skinner would have said: Gorillas rubbing off from their foreheads yellow spots that they have seen in their mirror image (Gallup 1970)? All the data are there for a mindless learning mechanism to learn that correlation from the sensorimotor interactions with its mirror image: Happens too fast? Well then the same correlation, or a generalization of it, could have been "prepared" by evolution, likewise a mindless process.

The point is not to deny that gorillas are conscious and do recognize themselves in the mirror. It is the causal role of the consciousness in the explanation that keeps on turning out to be unnecessary, hence the demotion: The rule seems to be, whatever we happen to do mindfully, could just as well have been done mindlessly.

Now this is not a conference on the origin of consciousness. I have only introduced the ever-ready, mindless demoting explanation by way of analogy, for it has an exact counterpart in the case of language. Here is the heart of the analogy: Just as it is impossible to show that THIS is where what you can do with a mindless mechanism ends, and beyond this you can only go with a mind – there is no such point, and hence no functional explanation for why consciousness should ever kick in (though it certainly does) – there is likewise no point at which nonlinguistic praxis and pantomime end and linguistic propositions take over.

Let's define terms. "Praxis" just refers to our sensorimotor skills, the things we and other species need to be able to do in order to live and act out our lives: Finding food, eliminating wastes, avoiding predators, finding mates, etc., each according to the demands of our own species-specific niche. Nonlinguistic creatures share this nonverbal portion of our praxic repertoires (including the capacity to learn by operant and Pavlovian conditioning), and that covers a lot.

"Pantomime" is a special subset of praxis: It is social. Behavioral imitation (as opposed to anatomical mimicry) largely occurs between living creatures, not between living creatures and inert objects (Byrne & Russon 1998). But pantomime includes both automatic and deliberate imitation. A songbird may imitate the tune of its conspecifics mindlessly (who knows), but people at least, and perhaps other species as well, can "act out" in ways that are intended to get you to do and even think something. We know this in the case of people (I might cover my mouth and point when I mean for you to know that someone else is present), but we know from the demoting explanation of gorilla mirror-recognition that even pantomime can be explained as mindlessly as any other form of praxis.

So far, this is not the analogy I was promising; it is still the problem of consciousness. To appreciate the analogy, we first have to pass from pantomime to propositions. A few critical differences have to be borne in mind: A pantomime, like a picture, cannot be true or false. It can only be more or less like whatever it is a picture or pantomime of. Praxic gestures, whether pantomimic or just plain instrumental acts (Catania & Harnad 1988; Harnad 1996b), do not have truth values. They can of course be CONSTRUED as having truth values, but then they are construed as propositions. Propositions propose something. That's why they have truth values: What they claim or propose to be the case may or may not be the case. If not, then they are false.

Now when I put my hand on my mouth and pointed, you could have construed that as the proposition "There is someone else in the room," and, if upon inspection, it turned out there was no one else in the room, you might want to say I had lied (Whiten & Byrne 1988). But in a court of law, so accused, I could claim that all I had done was put my hand on my mouth and pointed. I had never said – never uttered a verbal contract – to the effect that there was someone in the room. That was just your interpretation. I was merely performing a pantomime. I never intended you to construe it as a proposition.

By way of contrast, if I were in a crowded theater and I yelled "Fire," I could be held liable for causing a stampede and causing injury if it turned out I had been crying wolf. It would not do me much good to claim (though it might be true) that I had merely been saying out loud words that rhymed with "dire," and that I had not intended it to be construed as a proposition. If this distinction sounds legalistic rather than objective and empirical, then you are beginning to catch my drift, but there is nevertheless a way, if not to draw the line, then to make the nonpropositional construal more and more improbable:

"Fire" uttered in isolation is, by accepted social convention, a shorthand way of uttering the proposition (say) "There is a fire in here." It is true that one might have been enumerating the words that rhyme with "dire," but that is unlikely, and if it is unlikely for a monosyllabic proposition like "fire," it is still more unlikely for the longhand version "There is a fire in here." As the utterance becomes more complex, it becomes more far-fetched to construe it in any other way than propositionally.

So with complex propositions, we are in a performance domain that is radically different from praxis and pantomime, for propositions DO have truth value. Moreover, they seem to have the power to express any truth: This hypothesis, in the form of the "effability" hypothesis, to the effect that anything that is the case can be described in words, was put forward by Jerry Katz (1976) in the NY Academy of Sciences Conference that helped re-open the language origins question in our century. In the same volume, an independent variant of the effability hypothesis was proposed by Steklis and Harnad (1976) in the form of the "translatability" criterion, to the effect that all natural languages are fully intertranslatable. That turns out to be logically equivalent to the effability hypothesis, but it may be the more suggestive version for our purposes here, because it highlights the "cryptographic" aspects of language's expressive power: There are of course a limitless number of propositions, but every language has at least one way of expressing them all: What are the chances of coming out with just the right string of symbols, but without intending that proposition? They quickly approach the chances of chimpanzees typing a passage from Shakespeare (Harnad 1996a).

So here we are faced with a profound divide: One one side, is the world of praxis, with its objects, events and states of affairs. It has some limited resources for aping itself: I can try to draw or imitate a tree in the wind or perhaps even a rainbow over a horizon. But my praxic repertoire quickly runs out of resources in relation to all the objects and events and states there are in the world: How to mime that "all men are mortal," or that "a continuous function is everywhere differentiable?" Those are states of affairs that can only be described in words. Moreover, as I suggested before, even pictures and mime are not DESCRIPTIONS: they are merely other states of affairs that happen to have some similarity to whatever they are pictures of. So even to construe pantomime as anything other than "being there" – like a poem, that should not "mean" but "be" – is already to construe it propositionally.

And here is where the analogy with the demotion of consciousness comes in: For case by case, practically, or rather praxically speaking, every instance of praxis and pantomime could be acted on instrumentally: The person who covers his mouth and points could be a correlate and hence a predictor, like the yellow spot on the gorilla's forehead, of the presence of someone else in the room. I need not have recourse to a proposition for that; hence a propositional construal need not be posited for the pantomime itself. Where does the proposition need to kick in? And in what does its kicking in consist? What can we do with language that we can't do with praxis and pantomime?

Candidates immediately come to mind: It's hard to mime things that one does not have the equipment to depict. It's hard to mime in the dark. In their presence, we can point to objects we'd have trouble mimicking, but in their absence? It's hard to mime either/or relations, or conditional relations – hard to mime relations themselves, or features or properties. The more abstract something is, the harder it is to mime it, because miming is concrete and particular.

Can we even mime KINDS, as opposed to specific instances? We've conceded that the proposition "All men are mortal" might defy miming, but could we even mime "mortal"? Sure, we can show someone or something dying; and then maybe show another, entirely different thing dying, and hope that in providing this panorama of concrete instances, the abstract category will somehow be picked out. But how would we MARK that abstract category that we had laboriously acted out? And how would we carry it into the more complex proposition "I am mortal," much less "All men are mortal." Be careful to distinguish "I am going to die," which is relatively easy to mime, from "I am mortal," which is not.

So we need to be able to MARK abstract categories, such as "mortal." At the very least, one of the concrete depictions of some dying creature would have to do double duty for THAT dying creature, and for mortality in general. Now notice that for its concrete role of depicting a particular dying creature, the resemblance between the depiction and the object depicted is the kind of nonarbitrary, analog relation that psychologists have called "iconic." Saussure stressed the arbitrariness of linguistic signs, in contrast to this. Why? I'm not sure whether Saussure intuited the property that Jorge-Luis Borges (1969) singled out in his "Funes the Memorious."

Funes was a man who once fell off a horse and after that time he could never forget anything. He had an infinite rote memory for every concrete particular he ever encountered. His memory was so good that he gave all the integers unique proper names – Fred, Jeff, Charlie – all the way up into the hundreds of thousands till he got bored. Yet he had the greatest difficulty understanding why the rest of us, ordinary mortals with frail memories, insisted on calling (what we referred to as) that dog "Fido" in that particular position at that particular instant by the same name as what we insisted was the same dog, "Fido" at another instant, in another position. For to Funes, these were all infinitely unique and different experiences. His memory faithfully mimed and saved them all. What it couldn't do was forget or ignore any of it. Hence it could not abstract. Hence he couldn't mark all those instances of "Fido" with the arbitrary sign "Fido." They were all infinitely different and unique to him. So of course if Borges had been completely consistent, he could not have portrayed Funes as speaking at all, for to speak he would have had to have gained a command of those arbitrary names for abstract categories that would have required forgetting or ignoring all the differences that are preserved in a faithful copy. Instead, all he had was the nonarbitrary icons, each unique to its specific instance (Harnad 1987).

Now of course it's not just a speaking Funes that is impossible; even a nonverbal Funes could not survive for a day if he could not abstract. The abstraction would not have to be marked by an arbitrary sign; it could be marked by a nonarbitrary praxic response such as sitting only on those things that afforded sitability-upon, and so on. A repertoire of evolutionarily prepared as well as learned feature detectors that subserved praxis would serve creatures nearly as complex and capable as ourselves quite adequately. Where does the added power of the arbitrary sign and the proposition kick in? What can species NOT do by praxis and pantomime alone?

My main objective here was to suggest that this is the real question at the heart of the problem of the origin of language. The origin of language is the origin of marking categories with arbitrary "signs" (symbols) and stringing those symbols together into descriptive propositions that far outstrip the possibilities of praxis and pantomime. What would be the survival value, the adaptive advantage, of propositions over mere praxis and pantomime?

In a series of artificial life simulations (Cangelosi & Harnad 2000; Cangelosi, Greco & Harnad 2000) we have tried to show that this advantage can be thought of as the advantage of (symbolic/propositional) "theft" over honest (sensorimotor) "toil." I will close with a sketch of how this would work. "Honest toil" is good old trial-and-error operant learning guided by feedback from the consequences of one's behaviour, as in learning to distinguish edible mushrooms from toadstools: An organism samples mushrooms, tastes them, see whether it gets sick or gets nourished by them, and eventually, if the category is learnable, learns to tell apart the ones that afford nourishment from the ones that are toxic. Those are categories an organism has earned by honest toil.

Note that the foregoing is just a description, not an explanation. An explanation requires a causal mechanism for HOW the organism managed to learn to tell apart edible and toxic mushrooms by honest toil: how its brain managed to find the critical features that reliably distinguish the shadows cast on its senses by mushrooms from those cast by toadstools.

Neural nets are one natural candidate for such a feature-detecting, category-learning mechanism (Tijsseling & Harnad 1997). Mushroom-sorting (Cangelosi & Paris 1988) is of course not a realistic paradigm for category learning; it is just a "toy" problem. (For one thing, the timing is unrealistic: If telling apart mushrooms and toadstools were hard, then how could a creature in a mushroom world afford to sample them by trial error long enough to learn which kinds are which without starving itself to death?)

But contrast this with the inborn internal feature detectors of the frog, who already "knows" what kind of thing to flick his tongue out at from birth, or rather, from the time of metamorphosing from a tadpole into an air-breathing frog. Let us say that the frog has come by his bug-detectors not by honest toil, as in the case of the hypothetical mushroom-detectors, but by Darwinian theft: He was born with already prepared detectors; he got them "for free." Of course this too leaves out the critical part, for nothing comes for free. If the frog did not perform the honest toil, involving variation and selection on the basis of the consequences of trial and error, then the "Blind Watchmaker" (evolution by natural selection) must have done it for him.

But it is not Darwinian theft that I meant when I spoke of the virtues of theft over honest toil. To understand what form of theft I had in mind, we have to go back to the mushroom world: Suppose the bleaker scenario I mentioned were the actual one: Suppose there was not enough time in the day to sample toadstools and mushrooms until you had them safely sorted out: If you had to rely on honest toil alone, you stood a good chance of starving to death or perhaps getting poisoned. But suppose there were others of your kind who already had the detectors (by some means or other): If they could just DESCRIBE to you in words the features of the safe and unsafe mushrooms, perhaps supplemented with some pointing to examples, they could save you an awful lot of honest toil. (Biederman & Shiffrar [1987] have provided evidence that a verbal description of the winning features, together with some good examples, can fast-forward a novice to 90% of grandmaster performance level in newborn chicken-sexing, a level that normally requires months of honest toil at the feet of black-belted masters.)

Now notice that in a realistic scenario "theft" is a misnomer here, for, all else being equal, symbolic theft (hearsay) is a victimless crime. If you know something I don't, you are not in general any the poorer for telling me about it and saving me the time and trouble of learning it the hard way. Of course, we have managed to put a price tag on everything, and perhaps in is only in our contemporary information society that this kind of "gift" (as opposed to theft) is becoming the COMMERCE it was always destined to be, but gift, barter, theft or commerce, it is clear that it is language that has conferred on us the power of bypassing countless hours of honest toil.

Cangelosi & Parisi (1988) tried to show the adaptive advantages of symbolic theft in artificial life simulations. They put the theft/toil strategies into competition: A population of virtual mushroom-foragers (back-propagation nets) learned to distinguish edible from inedible the hard way, through trial-and-error sensorimotor toil, supervised by feedback from errors; another population learned it the easy way, by overhearing the toilers vocalize "edible" and "inedible." The thieves did not have to learn the features, because the toilers had already done it for them. And there were plenty of mushrooms, so vocalizing freely did not deprive anyone of anything. Within a few generations the thieves were out-surviving and out-reproducing the toilers. The theft strategy defeated the toil strategy, and demonstrated the adaptive advantage of language.

Or did it? Would such a strategy be "evolutionarily stable," or would it, like certain forms of cheating and parasitism, eventually play itself out? For consider that the theft strategy only works while there are toilers in the know within earshot; without the guidance of hearsay, the thief is lost, not having learned the critical features. So if anything, a competition of this sort, continued across generations, could at best induce an oscillation, with thieves at an advantage over toilers while there are plenty of toilers about, vocalizing their hard-won knowledge, but as the toilers' numbers shrink in favour of the thieves, the thieves become increasingly clueless and their own numbers accordingly shrink in favour of the toilers.

Such an oscillation is evolutionarily possible, but there is certainly no evidence of it today: We are all thieves. How did this come to pass? Theft is parasitic on toil: it is "ungrounded" (Harnad 1987, 1990) unless there are toilers about too. So perhaps we all do the groundwork, acquiring certain "basic" categories the old way, by direct sensorimotor toil, and then the rest can be acquired the new way, by the Biederman/Shiffrar strategy of symbolic description, stringing together the grounded names of the basic categories into propositions describing higher-order categories by Boolean combinations (Harnad 1996a). This is indeed what our mushroom-simulations have shown (Cangelosi & Harnad 2000): Everyone learned the ground-level categories by toil, but for higher-order categories, the theft strategy beats the toil strategy, and it is evolutionarily stable.

This is still a toy simulation, however; it remains to be seen whether the this model for the adaptive advantage of language will scale up to lifesize ecological settings.


Andrews, J., Livingston, K. & Harnad, S. (1998). "Categorical Perception Effects Induced by Category Learning". Journal of Experimental Psychology: Learning, Memory, and Cognition 24(3) 732-753.

Biederman, I. & Shiffrar, M. M. (1987) "Sexing day-old chicks: A case study and expert systems analysis of a difficult perceptual-learning task". Journal of Experimental Psychology: Learning, Memory, & Cognition 13: 640 - 645.

Borges, J. L. (1969) "Funes the memorious". In Yates, D. & Irby, J. E. (eds.) Labyrinths. New York: New Directions.

Byrne, R.W. & Russon, A.E. (1998). "Learning by imitation: A hierarchical approach". Behavioral and Brain Sciences, 21 (5): 667-684.

Cangelosi, A. & Harnad, S. (2000) "The Adaptive Advantage of Symbolic Theft Over Sensorimotor Toil: Grounding Language in Perceptual Categories". Evolution of Communication (Special Issue on Grounding)


Cangelosi A., Greco A. & Harnad S. (2000). "From robotic toil to symbolic theft: Grounding transfer from entry-level to higher-level categories". Connection Science (in press)


Cangelosi A. and D. Parisi. (1998) "The emergence of a ‘language’ in an evolving population of neural networks". Connection Science, 10:83-97.

Catania, A.C. & Harnad, S. (eds.) (1988) The Selection of Behavior. The Operant Behaviorism of BF Skinner: Comments and Consequences. New York: Cambridge University Press.

Chomsky, N. (1972) Language and mind. Harcourt, Brace, and World.

Gallup, G. G. (1970) "Chimpanzees: self-recognition". Science 167:86-87

Greenfield, Patricia M. (1992) "Language, tools, and brain: The ontogeny and phylogeny of hierarchically organized sequential behavior". Behavioral and Brain Sciences 14:531- 95.

Harnad, S. (1987) "The induction and representation of categories". In: Harnad, S. (ed.) (1987) Categorical Perception: The Groundwork of Cognition. New York: Cambridge University Press. http://www.cogsci.soton.ac.uk/~harnad/Papers/Harnad/harnad87.categorization.html

Harnad, S. (1990) "The Symbol Grounding Problem". Physica D 42: 335-346.

Harnad, S. (1991) "Post-Gutenberg Galaxy: The Fourth Revolution in the Means of Production of Knowledge". Public-Access Computer Systems Review 2 (1): 39 - 53


Harnad, S. (1996a) "The Origin of Words: A Psychophysical Hypothesis." In Velichkovsky B & Rumbaugh, D. (Eds.) Communicating Meaning: Evolution and Development of Language. NJ: Erlbaum: pp 27-44.

Harnad, S. (1996b) "Experimental Analysis of Naming Behavior Cannot Explain Naming Capacity." Journal of the Experimental Analysis of Behavior 65: 262-264.


Harnad, S. (2000) "Turing Indistinguishability and the Blind Watchmaker". In: Fetzer, J. & Mulhauser, G. (ed.) Evolving Consciousness Amsterdam: John Benjamins (in press)

Holloway, Ralph L. (1970) "Neural parameters, hunting and the evolution of the human brain." In: Advances in primatology, Volume I: The primate brain, ed. Charles R. Noback and William Montagna. Appleton-Century-Crofts.

Isaac, Barbara (1987) "Throwing and human evolution". The African Archaeological Review 5:3-17.

Katz, J.J. (1976) "The Effability Hypothesis". In: Harnad, S., Steklis, H. D. & Lancaster, J. B. (eds.) (1976) "Origins and Evolution of Language and Speech". Annals of the New York Academy of Sciences 280.

Mesulam, M.M. (1998) "From sensation to cognition". Brain 121: 1013-1052

Pinker, & Bloom, P. (1990). "Natural language and natural selection". Behavioral and Brain Sciences 13 (4): 707-784.

Rolls, E.T. (1997) "Consciousness in neural networks?" Neural Networks 10: 1227-1240

Rolls, E.T. (2000) "Precis of Brain and Emotion." Behavioral and Brain Sciences 23 (2): 177-234

Steklis, H.D. & Harnad, S. (1976) "From hand to mouth: Some critical stages in the evolution of language". In: Harnad, S., Steklis, H. D. & Lancaster, J. B. (eds.) (1976) "Origins and Evolution of Language and Speech." Annals of the New York Academy of Sciences 280, 445-455. http://www.cogsci.soton.ac.uk/~harnad/Papers/Harnad/harnad76.hand.html

Tijsseling, A. & Harnad, S. (1997) "Warping Similarity Space in Category Learning by Backprop Nets". In: Ramscar, M., Hahn, U., Cambouropolos, E. & Pain, H. (Eds.) Proceedings of SimCat 1997: Interdisciplinary Workshop on Similarity and Categorization. Department of Artificial Intelligence, Edinburgh University: 263 - 269.


Whiten, A. & Byrne, R. W. (1988) "Tactical deception in primates", Behavioral and Brain Sciences 11:233-273.

Wilkins, W.K. & Wakefield, J. (1995). "Brain evolution and neurolinguistic preconditions". Behavioral and Brain Sciences 18 (1): 161-226.



 Conference site: http://www.infres.enst.fr/confs/evolang/