By Clinton Ikechukwu
In my home country, Nigeria, after we reached SS1 in secondary school, we were expected to choose between the arts and science disciplines.
There was a clear-cut divide, with English Language and Mathematics being the overlapping subjects. That was heartbreaking for me because I often enjoyed the art courses like Literature, as well as the core sciences like Physics and Mathematics. My younger self resented why I had to choose between the things I loved. In the end, I leaned towards the science track, but literature forever resided in my heart.
Years later, on a winter evening in Saint-Étienne, I took a tram back toward Châteaucreux, and I decided to experiment with a new app my friend had recommended to me.
“Summarise the texts I uploaded & bold the main points for ease of reading,” I wrote. By the time the tram stopped at the next stop, the work I gave to ChatGPT was done, without even realising it: a detailed and structured summarised version of my class notes.
By that time, I was already experimenting with classical machine learning models, but not Large Language Models (LLMs). I was enthralled and would go on and on conversing with it. As days slipped into months, an epiphany occurred: my long-debated two distinct disciplines could actually coexist ― so, probably literature and pure sciences can function together after all.
Later on, as we began experimenting with LLMs, I realised that most of my mates in AI thought that LLMs would produce even better results with a higher model size and larger training data. They were, in totality, treating LLMs as if they were a classical machine learning model or a physics problem. Probably, they saw it as a mathematical or computational kind of problem where, once the computation becomes large enough, better results would be achieved. This, I saw as they debated from a technical point of view.
And maybe they were right in many ways, but not entirely. Because, as someone who doubles as a storyteller, I know that language is beyond mere structure. Even if the grammar and syntax make a sentence correct, language is more about intention: how something was said, the tone it carries, the feeling underneath it. All of these make language! Annie Ernaux said this better in her 2022 Nobel Lecture when she writes that her task is “finding the words that contain both reality and the sensation provided by reality.”
I think of when we are apologising for something wrong, or when we are probably persuading someone, or even flirting, and how our language becomes more about intention. This is called subtext. Because subtext forms a huge part of our communication, even if it is implied rather than stated, it must be accounted for. People express themselves by hinting or using sarcasm, so if LLMs are, in practice, systems that produce language, then creative writing becomes a part of engineering, not just decoration. Just as Christopher Sullivan, an adjunct instructor in creative writing and English at Southern New Hampshire University, puts it, creative writing is the kind of writing that expresses an author’s “unique voice, writing style, thoughts and ideas” in an engaging and imaginative manner — and this is exactly the layer most technical discussions of LLMs still underestimate.
In simple words, LLMs are intelligent, but they can give the correct answers with the wrong tone. They often sound super confident, even when they are just guessing, and they often miss what a user really means, stating coherently structured answers that are artificial and cold. So, the challenge is not just building intelligent systems, but creating LLM systems that are trustworthy in human communications.
I am persuaded that to achieve this, creative writers, even psychologists, should enter the technical stack. The days of only developers, UI/UX designers, etc., exclusively dominating the technical stack should be over. I argue that building LLMs requires more than engineering alone because we need creative writers to shape communication: from tone to subtext and clarity, while psychologists can help design for trust, interpretation, and human behaviour.
I understand that the industry already recognises the importance of human judgement, and this is why RLHF (Reinforcement Learning from Human Feedback) is widely used. But more can still be done. For instance, OpenAI’s InstructGPT paper makes this clear: “Making language models bigger does not inherently make them better at following a user’s intent.” And so, to improve the system, the author uses labeller demonstrations and rankings of model outputs during post-training
Even though organisations acknowledge that the human role is important, they still use humans in a limited way.
For instance, humans are often just made to rank responses or label outputs, when they could be doing more like rewriting responses, explaining tone problems and designing better communication behaviour.
Like what Scale AI is doing, where humans are expected to choose between which AI response is better or not, is a good step. But I believe that companies should start employing thousands of creative writers or people whose profile fits both tech and storytelling. This is because simply ranking AI’s response is limited; creative writers should edit and teach the model how good communication should sound.
So, if we are serious about treating language as communication, then we cannot stop at training. We also have to change how we evaluate these systems. It is not enough to ask whether a model can produce the correct answer; we must also ask whether it can communicate that answer responsibly, in a way that is clear and humanly trustworthy.
If we consider MMLU (Massive Multitask Language Understanding), it is useful, but it can’t solve the problem of communicating properly as a human. At best, MMLU are for evaluating Knowledge and reasoning. The best way to judge an LLM is to consider how an editor evaluates a book. That eye that questions clarity, flow, and persuasion is the missing answer. So, editors must treat LLM as a soon-to-be-published book or essay.
To this point, I believe that the next phase of language modelling isn’t just bigger models but better communication design. Engineers will be needed for capabilities, and writers will be needed for human voices, while psychologists will develop human trust. This is not an inclusivity-for-all campaign, but it is a practical requirement that will help LLMs operate in human contexts. I don’t think that the most advanced LLM will be the one that sounds the smartest; I am persuaded that it will be the most reliable and inspire trust.
Clinton Ikechukwu
Clinton Ikechukwu is an ML Engineer and Manufacturing/Materials Engineer at the frontier of Industry 4.0, computer vision, and data-driven systems. A Global Innovation Prize winner, EU Erasmus Mundus meta 4.0 Scholar, and storyteller, he writes at the intersection of technology and human experience, questioning what we build with technology, and what we risk losing to it.
www.delreport.com
