Skip To Content
Cambridge University Science Magazine
LANGUAGE IS A FUNDAMENTAL aspect of human existence. It is what separates us from animals and enables us to communicate complex ideas and emotions. But can language be abstracted? In other words, can we remove the human touch and fluidity of language and reduce it to a set of patterns and symbols? Abstraction is a powerful and proven tool used in science and mathematics to identify patterns and simplify complex systems. By stripping away unnecessary details, we can gain a deeper understanding of the underlying structures and relationships of a complex phenomenon or object. It can even be said that abstraction is the key to unlocking new discoveries and technologies, from the laws of gravity to computer programming languages. However, at first glance, one would not associate language with these fields. Language is a fluid and dynamic system that constantly evolves and adapts to new situations. It is deeply rooted in human culture, history, and emotions.

Humans learn language from birth through a process of immersion, observation and trial and error. From a very young age, infants are exposed to the sounds, words, and grammar of their native language through interactions with caregivers, family members and other individuals in their environment. Through repeated exposure and reinforcement, they begin to associate certain sounds and gestures with specific meanings and begin to build a mental lexicon of words and phrases. Babies are experts at statistical learning, observing patterns in the linguistic input they receive and making accurate generalizations. Ultimately, we learn a language by understanding its rules and ensuring that they are obeyed so we can communicate with other people.

More than just rules, language is a true marvel of the human experience. From the clicks and pops of Xhosa in South Africa to the tonal variations of Mandarin in Chinese, every language boasts its own set of intricate sounds and grammatical quirks that make it one-of-a-kind. With over 7,000 languages spoken around the world, it's no surprise that there's such an abundance of diversity in the structures of language. But despite this incredible diversity, all languages share some fundamental similarities that make them intelligible and useful for communication. For example, syntax is an essential aspect of language that allows us to form meaningful sentences. Without syntax, our language would be reduced to a series of unconnected words, lacking structure, and meaning. While the specific rules of syntax can differ greatly between languages, the fundamental principles are universal. Another important aspect of language is morphology, which refers to the way words are constructed and how they change to convey meaning. For example, in English, we use suffixes like ‘-ly’ to indicate adverbs, as in ‘quickly’ or ‘happily’. This type of morphological structure is found in many languages, allowing speakers to modify words in a flexible way. Finally, semantics is the study of meaning in language and is a crucial aspect of how we communicate. Every language has a unique set of words and phrases that carry specific connotations and associations. Understanding the nuances of these meanings allows us to fully express ourselves and to convey our thoughts and emotions to others. With an understanding of the rules of language, machines just might be able to imitate it.

Our best approach for machines to understand language is Natural Language Processing (NLP). NLP is a field of artificial intelligence that enables computers to understand and generate human language. It works by taking in vast amounts of language data and deploying complex algorithms to identify patterns and represent words as abstract mathematical objects called word vectors or, more technically, word embeddings. By parsing hundreds of millions of sentences from books and articles, it can establish relationships between words in a complex, high-dimensional vector space and build a probability distribution over words. Through this, the machine is beginning to ‘understand’ language. NLP techniques have wide applications in diverse fields. Machine translation algorithms use NLP techniques to translate text from one language to another. Sentiment analysis algorithms use them to analyze the emotional content of text. Speech recognition algorithms use them to convert spoken language into text. The last example is an interesting one as it bridges between spoken and written language, which adds another layer of complexity. It involves sound processing and breaking words down into phonemes, just like how we learn to pronounce words when we learn language.

Nonetheless, human language is sometimes highly ambiguous and contextual, which makes it difficult for machines to understand. For example, the sentence ‘I saw her duck’ could mean that you saw a woman physically duck, or that you saw a woman's pet duck. NLP techniques must be able to understand the context in which a sentence is being used to accurately interpret its meaning. Advancements in machine learning have fuelled a craving for more advanced NLP techniques. For instance, we don’t start our thinking from scratch every second. As you read this article, you understand each word in light of previous words. You don’t throw everything away. Simply, your thoughts have persistence. However, this is not an easy thing to say for machines. Computers are linear devices. To build more powerful NLP models, Long Short-Term Memory (LSTM) artificial neural networks have been implemented in many simple NLP models to allow machines to remember long[1]term dependencies. In the seminal paper by Google titled Attention is All You Need, an even more advanced attention mechanism called the Transformer was introduced. It is the backbone of ChatGPT, short for Chat Generative Pre-Trained Transformer. This seems only like something humans can do, but impressively there is a set of mathematics to enable this syntactically.

The explosion in popularity of ChatGPT is a tell-tale sign that we are at a new milestone of NLP. Long gone are the boring and predictable replies of chatbots and virtual assistants; we have officially entered an age where the boundary between human and machines is blurring. This begs the question: is language just a bunch of mathematical rules ready to be solved and manipulated? Some researchers have argued that mathematics is the fundamental language of nature, and that the patterns and structures we observe in the natural world can be described using mathematical formulas. I agree that abstraction can help us understand language on a deeper level, but it can never fully capture the true essence of human interaction. Sometimes a smile or stare says much more than a thousand words will ever do.

Article by Shikang Ni. Artwork by Marida Ianni-Ravn.