Skip to content
Could AI allow us to speak to animals like Doctor Doolittle? -

Could AI allow us to speak to animals like Doctor Doolittle?

Discussion of artificial intelligence (AI) is currently inescapable, whether it’s being positioned as a messianic salve for all of the world’s problems… or as an existential threat. What’s inarguable is that it’s here to stay – for better or for worse – so it’s perhaps more profitable to consider the concrete benefits it could bring. Fanciful as it may sound, one such benefit could be an increased ability to communicate with the animal world.

Read on as we explore how this Doctor Doolittle effect could change our world.

Why do AI-assisted attempts to speak with animals matter?

In the 1960s, 25,000 sperm whales were killed, primarily to obtain spermaceti, the lucrative (and, yes, semen-like) substance found in the whales’ heads – which, since the 17th century, has been used in cosmetics, textiles, and candles. This was the peak of commercial sperm whale hunting, from which the species has yet to recover: their population is currently estimated at around 8,000 – just 0.4% of pre-hunting numbers totaling as many as two million.

Photo by Todd Cravens on Unsplash

That these whales are no longer hunted on an industrial scale is in part thanks to biologist and bioacoustician Roger Payne. (Some exceptions take advantage of legal loopholes.) Viewing a dead, beached dolphin as a student was so formative that the experience prompted Payne to switch his focus from bats’ echolocation to whales: “Some bozo had chopped off its flukes [the lobes of the tail] and somebody else had stuffed a cigar butt in its blowhole […]. I thought to myself, ‘Is this the only interaction that can occur between people and the wild world?’” 

In 1967, along with fellow researcher Scott McVay, he discovered what he described as the “exuberant, uninterrupted rivers of sound” sung by male humpback whales during their breeding season. Doing so not only indirectly laid the groundwork for potential future animal-human communication, it had an immeasurable effect on the fate of whales in the wild.

Three years later, Songs of the Humpback Whale, an LP of captivatingly unearthly recordings, was released, going on to sell over 100,000 copies. At the time of writing, it continues to hold the record of best-selling environmental album in history. It also “galvanized a global movement to end the practice of commercial whale hunting and save the cetaceans [whales, dolphins, and porpoises] from extinction”. “Soon, the slogan ‘Save the Whales!’ became omnipresent” – until, in 1986, campaigning about this cause célèbre led to the International Whaling Commission passing a global moratorium on commercial whale hunting which remains in effect to this day.

The effect that a cultural artifact like an LP can have on the public consciousness is clearly enormous – as much, or perhaps even more than statistics can. Payne recalled that, when he first heard the humpbacks’ vocalizations, he realized “that he potentially had the secret to stopping the massacre of these creatures. […] ‘[A way] to get the world to think: “Hey, folks, we’re killing off the largest animals that have ever lived in the history of the planet. This is nuts!”’” 

Before his death in 2023, Payne had become an “ardent member” of the Cetacean Translation Initiative (as reported by The New Yorker in ‘Can We Talk to Whales?’). The brainchild of marine biologist David Gruber, Project CETI seeks to determine whether “machine learning could be used to discover the meaning of […] [sperm] whales’ exchanges”. (Unlike humpbacks’ ‘songs’, sperm whales “produce quick bursts of clicks, known as codas, which they exchange with one another”. These sounds – variously compared to bacon frying, popcorn popping, horses on cobblestones, or “[a mechanical] clatter […], as if somewhere deep beneath the waves someone was pecking out a memo on a manual typewriter” – appear to be structured in a way akin to conversations.)

The prospect of human-whale communication may seem unlikely; even CETI’s lead field biologist, Shane Gero, admits as much. When initially contacted about the project – on the basis that it might be possible to “find some funding to translate whale” – his first thought was simply, “Oh, boy.”

Yet, as Payne noted, “Inspiration is the key […]. If we could communicate with animals, ask them questions and receive answers […] the world might soon be moved enough to at least start the process of halting our runaway destruction of life.” The repercussions of inter-species would be enormously far-reaching and significant. So much so that it is the hope of fellow CETI team member and computer scientist Michael Bronstein that the project could alter “how we see life on land and in the oceans […]. If we understand […] that intelligent creatures are living there and that we are destroying them, that could change the way that we approach our Earth.”

What previous attempts have been made to communicate with animals?

Though CETI “represents the most ambitious, the most technologically sophisticated, and the most well-funded effort ever made to communicate with another species”, it is by no means the first to do so.

In the 1950s, behaviorist BF Skinner theorized that, if children could pick up language through positive reinforcement, the same might be true of animals. Twenty years later, one of his former students, Herbert Terrace – by then a professor of psychology at Columbia University, New York – set out to prove as much, by raising an adopted chimpanzee which was taught American Sign Language from the age of two weeks old. 

Nim Chimpsky, as the ape was named (after linguist Noam Chomsky, a critic of Skinner’s), initially appeared to have “crossed the language barrier” by apparently using a repertoire of 80 signs. It was only when Terrace reviewed the video footage taken to document Nim’s progress that he realized how “vastly […] his linguistic competence” had been overestimated: instead, the chimp simply imitated the signs last made by his caregivers. 

Photo by Francesco Ungaro on Unsplash

Perhaps Chomsky’s dismissal of “the possibility that language was available to other species” was correct after all? Subsequent attempts to prove otherwise, involving the other great apes (gorillas, bonobos, and orangutans), other mammals (like dolphins), and even birds (such as Alex the gray parrot), have failed to come to an uncontested conclusion. According to Shane Gero, whether sperm whale codas constitute language is still an open question, which hinges, “Ironically, […] [upon] a semantic debate about the meaning of language”. 

The first “semi-scientific” study of sperm whales, a pamphlet from 1835, did not credit these mammals with the ability to vocalize; it was not until 1957 that this assertion was overturned, when researchers encountered “sharp clicks” (which they correctly supposed to relate to echolocation, which the whales use to detect their squid prey in the darkness of deep water). Communicative codas were not identified until the 1970s, but, “Since then, cetologists have spent thousands of hours listening to codas, trying to figure out what [their] function might be.”

Dialectical differences between the codas of sperm whales in the eastern Pacific, eastern Caribbean, and the South Atlantic could speak to their status as language (even one with, potentially, a limited expressive capacity). Admittedly, Gero goes on, “They won’t have a word for ‘tree.’ And there’s some part of the sperm-whale experience that our primate brain just won’t understand. But those things that we share must be fundamentally important to why we’re here.”

How is the CETI AI project attempting to communicate with sperm whales?

Nomadic sperm whales swim around 20,000 miles (32,000 km) per year. However, they are attracted to particular locations, presumably due to the presence of their primary prey, medium-sized squid. One such location, near the Caribbean island of Dominica, led to CETI establishing its “unofficial headquarters” in the vicinity of the country’s capital. 

Photo by Venti Views on Unsplash

A repository of codas, which will “be used to ‘train’ machine-learning algorithms”, will be gathered via a combination of recording devices temporarily planted on individual whales (attached by suction cups) and tethered listening stations (clusters of hydrophones able to record codas from up to 12 miles [19 km] away; sperm whales, the loudest species on Earth, can generate 230 decibel [dB] sounds – 80 dB louder than a jet engine). This equipment is crucial in part because sperm whales frequently dive to depths of 2,000 feet (610 m), and sometimes further than a mile (1.6 km).

The chatbot ChatGPT uses unsupervised machine learning – the cannibalization of internet content – in order to be able to credibly (though not necessarily accurately) create text according to particular parameters. (The author of the New Yorker article had it generate an extract of a version of Moby-Dick rewritten from the perspective of the eponymous white whale; Melville, it is not.) This extends to gaining fluency in other languages, “without ever understanding English”. CETI is banking on the assumption that what goes for Chinese or Spanish will be equally true for “sperm whale-ese”, via code prediction based upon sufficient amounts of data.

However, it may ultimately be possible to go beyond coda prediction. If behavior observed with the use of underwater cameras and other sensors (mounted on robots designed in the form of turtles or fish) can be interpreted in concert with vocalizations, it may be possible to venture into the realms of coda comprehension. (This technique has been used by researchers at the Tel Aviv and Ohio State universities to classify fruit bats’ vocalizations and determine the previously unappreciated complexity of their language.) 

Sperm whales “chatter” to their podmates prior to diving. If AI could indeed learn to interpret these ‘conversations’, it could crack open the mysteries they contain: “What are they chattering about? How deep to go, or who should mind the calves, or something that has no analogue in human experience?”

The nature of machine learning also means that the nature of successful outcomes by the CETI project – communication with whales – could be ambiguous. According to Shafi Goldwasser, a computer scientist who initially suggested the utility of machine learning in this field, even if AI comes to be able to predict ‘sperm whale’, to the extent that it could “generate a conversation that would be understood by a whale” – we ourselves would not be able to.

Nevertheless, progress is already being made. The “holy grail” for CETI revolves around ‘double articulation’ (AKA duality of patterning): the linguistic construction of meaning from meaningless elements, seen in human language in the way that syllables which carry no meaning make up words which may carry multiple meanings. Whales in the vicinity of Dominica have been observed to use a collection of approximately 25 different codas. This may not seem substantial, if interpreted as a vocabulary, but it is not yet known what elements may have significance for the whales: “It may be that there are nuances in, say, pacing or pitch that have so far escaped human detection.” 

Though the meaning potentially embedded in sperm whale codas has not yet been deciphered, CETI has discovered a new signal within the codas – a single click which may act as a form of punctuation – emphasizing the possibility that further significance exists which have yet to be understood.

What might be the repercussions of AI-facilitated communication with animals?

Recent research has revealed that many more organisms can “detect, respond, and make sound” than was previously imagined: this is now known to be true of species as diverse as coral, moles, manatees, starfish – and marine seagrass. Even without being able to communicate with these species, a deepened understanding of their behaviors enriches our relationship with the natural world. Imagine what could be possible then, if – in the words of scientist and author Karen Bakker – “digital technology [could] function […] like a planetary-scale hearing aid, enabling humans to record and decode nature’s sounds, from the Arctic to the Amazon”. An effect on conservation exceeding even that of Songs of the Humpback Whale on the Save the Whales campaign could be incalculably valuable: “What could a Google Translate for the animal kingdom spawn?”

In fact, research in this field has already started to increase our understanding of animal populations – for example, the social groups which a particular population may be divided into, which could have “important conservation implications”, or the assessment of reintroduction projects according to how certain modes of communication are passed on among animal groups.

CETI’s stated goal is to use “technology to amplify the magic of our natural world […] [and] bring us closer to nature”. AI presents this opportunity by allowing researchers to move beyond anthropocentric attempts to teach animals human methods of communication (which may not be compatible with their own worldviews and ways of thinking), but instead “using improved [data-gathering] sensors and artificial-intelligence technology to observe and decode how a broad range of species, including plants, already share information with their own methods”.

For example, the language of honeybees is “vibrational and positional” and “sensitive to nuances such as the polarization of sunlight”; they will never speak in the way that we do, but by using what Bakker describes as digital bioacoustics, we may be able to become attuned to the way in which they communicate. 

A ‘RoboBee’ has already been used to communicate instructions to honeybees in a hive. Other projects using AI to translate or communicate with animals include algorithms which analyze the emotional states of pigs and rodents on the basis of their vocalizations, while the more ambitious Earth Species Project aims (eventually) to decode the communication of all non-human species. 

That modern sensors can gather data uninterruptedly in situations humans would be unable to monitor over long periods (on the wing; in the deep ocean) means that AI can be provided with the quantity of information required to satisfactorily decode it, there is still no guarantee that translating inherently alien modes of communication will be possible. “Even if the power of AI increases ‘a million fold’, some of the obstacles that currently stop us from talking to animals will remain”: an algorithm may one day be able to interpret what our pets (for example) are trying to tell us – yet it may never be possible for us to ask them how they feel.

Different species’ umwelt (the way they experience the world) may simply preclude inter-species understanding. A dog may be taught the meanings of a couple of hundred words, but will never be able to “learn [the] words, lexigrams or gestures for ‘bacteria,’ ‘economy’ or ‘atom.’ […] The concepts they represent are beyond their conceptual capacity. You can’t learn words for things you can’t understand.”

It’s important to understand the ways in which our capacity for “understanding complex concepts and generating grammatically complex utterances” are grounded in our morphology and genetics. 

The encephalization quotient (EQ), a measure of relative brain size, defines whether a species’ brain is considered large in relation to body size (particularly in reference to mammals). The EQ of chimpanzees is 2.5 – that is, two and a half times larger than expected. Dolphins’ EQ is 5.3. But our EQ sits at around 7.5. In addition, given that low-IQ individuals still acquire the intricacy of human language, there is clearly more at play. In particular, a mutation of the FOXP2 gene in hominins has led us to be “genetically wired for communication” in a way that is not true of species which possess earlier versions of the gene.

Photo by Alexander Sinn on Unsplash

These caveats aside – and even taking into account difficulties such as the validation of AI-generated animal communication samples through the measurement of animals’ responses, the tendency of AI algorithms to “find […] spurious correlations”, and the likelihood of their being biased by human umwelt – Karen Bakker equates “the invention of digital bioacoustics […] to the invention of the microscope”. By allowing Antonie van Leeuwenhoek to view the microbial world, the microscope “laid the foundation for countless future breakthroughs” in just the same way that AI-enabled communication with animals could also do – ways that are largely impossible to conceive of at present.

Yet this is to say nothing of the philosophical and ethical questions this research is beginning to raise. While even limited communication with other species could have undeniably attractive implications for their conservation – for example, by directing them away from injurious environments – it may also lead to increased abuses. It could be that “a better understanding of animal communication will help bad agents to better exploit nature”; already, recordings are used by poachers to entrap songbirds. 

Hopefully, increased mutual understanding between humans and animals will have dramatically positive effects, not least in prompting empathy for our more-than-human cousins. But, as we proceed down this path, questions of animal rights and autonomy will only become more vital, and we must ask ourselves what it “would take to expand the democratic imagination” to take their voices into account.

Featured photo by Thomas Lipke on Unsplash is a completely free streaming service of 700+ nature sounds from around the world, offering natural soundscapes and guided meditations for people who wish to listen to nature, relax, and become more connected.

Check out our recordings of nature ambience from sound recordists and artists spanning the globe, our thematic playlists of immersive soundscapes and our Wind Is the Original Radio podcast.

You can join the family by signing up for our newsletter of weekly inspiration for your precious ears, or become a member and not only enjoy extra features and goodies but help us grow new forests on our beloved planet.