3rd Conference
The Evolution of Language
April 3rd - 6th , 2000




From ‘nursing poke’ to syntactical speech

Chris Knight

Department of Sociology and Anthropology
University of East London
Longbridge Road, Barking
Dagenham RM8 2AS, Essex, UK

Syntax is the hierarchical, recursive structuring of words or morphemes to form higher-order units such as phrases and sentences. While it is possible to conceive of words without syntax, the reverse possibility – syntax without words – is logically excluded. From this, it follows that in the evolution of language, words or ‘protowords’ must have arisen before syntax or at least concurrently with it.

Syntax cannot have preceded words. Yet it is not inconceivable that syntactical competence evolved prior to the emergence of vocal speech as we know it. One theoretical possibility is that it evolved to facilitate hierarchical, recursive structuring in some other medium such as manual gesture (Armstrong et al. 1994) or ‘mimesis’ (Donald 1991). In both primates and humans, gesture is a natural, intuitive medium for expressing thought (Tomasello and Call 1997; McNeill 1992); when iconic gestures become conventionalised, a repertoire of discrete, arbitrary signs is the result. Against this background, it is tempting to infer a similar dynamic at work in the early evolution of words (Steklis and Harnad 1978; Kendon 1991).

Recent authoritative accounts of language origins have denied any role for manual gesture in the early evolution of language. MacNeilage (1998: 238) adopts an extreme position, asserting that ‘the vocal- auditory modality of spoken language was the first and only output mechanism for language’. This coincides with Dunbar’s (1996: 141) view that gesture was never necessary - - ‘it can all be done by voice’. MacNeilage’s (1998) central argument is that if the vocal-auditory modality was adaptive during later stages of speech evolution, it must have been equally adaptive from the outset.

MacNeilage’s argument would have force if it could be confirmed that the social contexts of language use remained invariant throughout the course of human evolution. But if changing social strategies are built into our models, there is no reason to suppose that a modality which is adaptive during an early period must remain equally adaptive at a later stage. Where social contexts are ‘Machiavellian’, as is the case among primates (Byrne & Whiten 1988), we know that constraints operate to obstruct the emergence of low-cost, conventional – in other words fakeable – signalling (Zahavi 1993, Zahavi & Zahavi 1997). The problem is that intentional signals are by definition cognitively controllable – yet any intentional manipulability in signals automatically undermines their intrinsic reliability. Because of the fundamental requirement for reliability, primate vocalisations are resistant to manipulation, remaining emotionally expressive, limbically governed and hence hard-to-fake. They must remain beyond intentional control as a condition of their efficacy in communication.

Precisely because the primate vocal modality has been designed by natural selection to serve a reliable, hard-to-fake system of communication, it is ill-suited for exaptation to serve novel linguistic functions. Indeed, of all aspects of primate expressive behaviour, vocalisations appear among the most constrained and hence the least qualified. They are not autonomous with respect to emotion, largely escaping cortical control. For the same reason – being unavailable for intentional ‘picture- making’ – they lack the iconic potential required of elementary conceptual communication (McNeill 1992). Being resistant to re-shaping or social learning, primate vocalisations lack the requisite plasticity to take on linguistic functions (Steklis and Harnad 1978: 447).

These features are not maladaptive: they reflect selection pressures intrinsic to communication in the animal world. Theorists unaware of Darwinian constraints might imagine that the evolution of cognitive complexity in primates will be matched by corresponding manipulability and complexity in communication. But this would be to overlook the requirement for reliability in signals (Ulbaek 1998; Zahavi 1993). As receivers seek to avoid costs of deception, they respond preferentially to signals which are ‘hard-to-fake’ (Burling 1993). Such pressures prevent primate vocal communication from coming under cognitive control.

Primate vocal communication, then, is a poor candidate for exaptation to serve linguistic functions. Much better qualified are the forelimbs, with their specialisation for prehensile and manipulative functions (Napier 1960). Manual gestures are manipulable precisely because the hands have evolved to perform technical, not communicative, functions. Complex bimanual activities such as food-extraction and tool manufacture intensify selection pressures for fine-tuned intentional co-ordination and control. Since such serial motor sequences are cognitively governed, it follows that on those rare occasions when primates need to convey details of cognition, they should rely more on gesture than on voice.

Among chimpanzees, conventionalised manual gestures occur only in highly restrictive social contexts. A well-documented case is the infant ‘nursing poke’ (Tomasello et al. 1994). This begins as a functional action: the infant pushes aside its mother’s arm to reach the nipple. As mother and infant interact with one another over time, the poke becomes abbreviated and conventionalised. The end result is a learned, intentional, discrete shorthand, falling outside the normal species-specific repertoire of emotionally expressive, hard-to-fake gesture-calls (cf. Burling 1993). The ‘nursing poke’ suggests a plausible step in the direction of language.

What are the social conditions for such a step? At issue is the question of trust. A sign can become conventionalised only to the extent that the requisite trust can be assumed. Note that in acquiring its conventional form, a ‘nursing poke’ has no prospect of escaping the confines of the particular maternal relationship in which it has originated. Conventionalisation occurs because in this particular social context of communication, interests on both sides coincide. Infants have an interest in cutting the costs of requesting a feed, while mothers have a corresponding motive to satisfy their offspring and at the same time reduce the amount of poking endured.

Now, the question arises, where else within ape society might we find a comparable convergence of interests? Perhaps comparable social contexts do exist, but certainly they are few and far between. Ape social life and corresponding intelligence is 'Machiavellian' (Byrne and Whiten 1988). In such a social setting, volitional signalling between adults is likely to be manipulative. Whereas a nursing mother may not mind being ‘manipulated’ by her offspring, there are few contexts in which an adult chimpanzee has an interest in being manipulated by neighbouring conspecifics, even when kin-related. In ape society, community-wide relationships do not constitute extensions of the mother-infant bond. Where mother-infant-like trust is not extended throughout a coalition or community, associated conventions cannot be extended either. In short, apes in the wild have no use for low- cost conventional signs. This is not because they are cognitively deficient – Kanzi has disproved that idea. It is because arbitrary signs are low- cost signals and so lack intrinsic credibility (Zahavi 1993). Any successful use of such signals, whether vocal or gestural, will depend on exceptional levels of mutual trust and co-operation (Knight 1998).

If this is accepted, then the gap between nursing pokes and human language was not bridged thanks to the sudden evolution of special cognitive capacities. Rather, it was bridged thanks to the elaboration of low-cost, high-speed, convention-based strategies of communication – initially based on existing capacities for controlling temporal-sequential outputs – made possible by intensifying levels of ingroup social trust. Where sufficient trust existed, individuals would have been under pressure to develop corresponding expressive and communicative capacities. Kanzi and other trained chimps have demonstrated their ability to deploy signs and to act upon them provided there is some function to be served by doing so – in other words, provided human trainers (differing in this respect from chimp conspecifics) can be trusted to consistently reward such behaviour. In the case of evolving humans, there were of course no trainers to reward use of conventional signs. Rather, speakers and listeners had to place trust in one another to an extent which would be maladaptive under conditions of ‘Machiavellian’ primate politics.

Language is a low-cost system of conventional signs wholly dependent on social trust and bound up with the evolutionary development of stable kin- based coalitions (Dunbar 1996; Dessalles 1998; Nettle 1999; Power 1998; Knight 1991, 1996, 1998, 1999). Within these coalitions, status was determined by linguistic relevance (Dessalles 1998) rather than coercion, emotional manipulation or violence. Progressive elimination of emotional conflict from the sphere of ingroup relations allowed gestures to become increasingly dispassionate, low-cost, abbreviated and contrastive. Unlike costly displays, signals of this kind lend themselves naturally to sequential ordering and hierarchical, recursive structuring. There are no grounds for assuming that humans at any stage lacked the neural control capacities necessary to achieve such ‘syntactical’ output. Initially manual-gestural, and initially bound up with emotionally expressive ‘mimesis’ (Donald 1991, 1998), conventional signing dedicated to ingroup communication would have come under pressure to exploit and integrate whatever additional modalities best facilitated cost-cutting and efficiency. As ritually structured, trust-based coalitions became increasingly stable and emotionally homogenous, interlocutors became less interested in emotions or performance- based distinctions, more interested in underlying intentions and thought- processes. In such contexts, signallers came under pressure to develop shorthands, speed up transmission and string together increasingly elaborate sequences. For reasons of efficiency, this evolutionary dynamic drove progressive adoption of the vocal-auditory modality as the default medium for human language. Licensed by changed social circumstances, vocal speech exapted and developed for its own purposes the sophisticated neural control machinery originally developed to serve an earlier, more gesturally based system of communication.


