On the Endless Infrastructural Reach of a Phoneme

A telephone is a familiar device. From past until present, a phone has been associated with home and the possibility to communicate from distance, allowing voices to travel, thoughts to be shared, and stories to be told. What happens, though, when on the other side of the line is not a person but a piece of software? What changes when a phone becomes an affective infrastructure which, based on accent recognition, can grant or deny asylum to someone? These are questions that Pedro J S Vieira Oliveira addresses in the following article, discussing how the trivial and the familiar can be weaponized in order to trigger emotions, evoke dialects, and ultimately decide upon the course of lives.


Let us begin by posing a question: what are the words, the scenes, the shapes that may elicit you to pick up the phone and call home?

A telephone might evoke a certain sense of familiarity. In fact, we are so familiar with telephones that they have become an intrinsic part of us, lightly touching skin through the fabrics of pockets, or suspended in bags, purses, fanny packs hanging from shoulders, backs, waists. Depending on how old you are and where you come from, you might also recall the clunky device of the landline as being one of the epicenters of home. Static, immobile. Connected to the wall, wired to the table. In a corner, above the fridge, on a desk. Surrounded by sparse annotations and scribbled pieces of paper smudged blue or black or red or green or pencil-grey with barely recognizable, abstract drawings, numbers, names, dates, family recipes, cheesy poems. Connected to the telephone, which is connected to the wall. This is the beauty of the telephone and its familiarity—it brings us home, but also, to some extent, used to (and perhaps continues to) shape it. Stories and histories become interlinked, woven in an affective network of voice patterns travelling in cables.

A telephone might be thought of as something to evoke a sense of familiarity. Let us imagine a telephone again, but this time sitting on a desk in an empty room. How large is this room? How well-lit is it? Does it feel like an abandoned office room, or does it have that aseptic quality of white, buzzing lights? Perhaps we may change it a little—there is so much we do not know that we can take some liberties with it. We can maybe think of this telephone in a narrow booth instead, sitting atop a small, taller-than-usual table, with enough room for a single chair that is perhaps too short for the table in front of it. Maybe the chair wiggles a bit, one leg slightly shorter than the others. We can be a little creative, and picture this specific telephone in a dark shade of red, but most likely it would be of that quirky color between an off-white and a light grey, though yellowed by age and maybe a good dose of cigarette smoke. Its dialing pad feels a bit greasy to the touch, for each number might be smudged with longing, hope, dread, love, disappointment, grief. Most likely a mixture of them all, combined as if in a bittersweet potion. But today, this specific telephone, we can guess, is smudged with anxiety. The cold sweat that runs through the cable to the dialing pad to the table to the chair, to the body sitting in front of it.

In this telephone, this specific device of a weird color and of greasy touch, we can dial whatever number we want. It will only respond, however, to this sequence:

                        7 2 0 9 9,
                                   then #

If we look closely enough, we might even notice these numbers are fading out from the dialing pad. They feel a bit lighter to the touch, they might even give a tiny wobble when the fingertip rests on them, hesitating. Yet they work. They should, at least. This telephone, this specific device of a weird color and of a somewhat greasy touch, only exists insofar as this number can be dialed in. The room only exists as long as this telephone sits there. Connected to the wall, connected to the network. Seven, two, zero, nine, nine, hash. Silence. A short beep.

Now let us tell it a story. We have two minutes.

Wait, not any story. There are protocols to it, and most likely we see now something in front of us, next to the device. A picture, probably flipped over so as to entice some curiosity. How come we missed it before? Who placed that piece of paper there, lost amongst all the bureaucracy that subsumes a life onto a collection of papers showcasing incomplete stories about oneself? Forget about them, focus on the picture. We have two minutes to describe it.

Gloria Anzaldúa, Latina Chicana poet and writer, extensively discussed the evoking power of objects and the abilities conjured by the storyteller engaging with ritualistic artifacts. The ‘presentness’ of a given object, when invoked through rite and story, she remarks, evinces that an object is "both a physical thing and the power that infuses it.’1 Objects are not static, and Anzaldúa’s writings serve as a way to remind us that there is nothing "new" in “new materialisms.”2 In non-Western cosmologies, objects are imbued with the power relationships they contain, they help construct, and that are evoked in them by the stories told within and around it. The end result is not a fixed thing in time and space (as Western aesthetics would presume), but instead “an assemblage, a montage, a beaded work with several leitmotifs and with a central core […] appearing [and] disappearing in a crazy dance.”3 The world changes whenever a story is told. The storyteller is a shapeshifter, and stories of and with objects are transformative.4 The nature of storytelling is repetition. The nature of repetition is unequal.

The ritualistic power of objects and their ability to transform both storyteller and listener is not exclusively confined to the evocative power of artworks. For Anzaldúa, that would be the essential distinction between the art of the colonizer and that of the colonized, because for the latter the art evoked into objects is not separated from everyday life. Power is embedded in the affective relationships between object and body, inasmuch as that we inhabit different versions of our bodies whenever objects touch us. Put differently, affective infrastructures inhabit us, but at the same time they change the world around us, and this change shapes us back. Affective infrastructures are haptic, timbral, malleable, textured, archipelagic even.5

I am thinking of the power of stories, but I am also thinking of the power structures that weave an entire world in which affect is supposed to be removed from the act of telling a story.

If we think of these two minutes in which a story must be told, we find out that they do provoke change in the life of the storyteller, but not because of the intrinsic power contained in that specific story. Rather, because the story depends on this telephone, this specific device of a weird color and uneasy, perhaps greasy touch, on the other side of which sits not a listener desiring to be moved by it, but instead, a piece of software. An algorithm that will process the story, break it down into tiny snippets of audio, convert it from the time to the frequency domain, measure the power coefficients distributed in thirteen selected vectors, and match that to an existing dataset.6 It will eventually spit back a list of probabilities, distributed over a specific set of languages the listener is x or y percent confident the storyteller speaks. 63 percent Turkish or 22 percent Hebrew.7

Log Likelihood Ratio.

That the processing of the story implies some form of listening, human or machinic, does not necessarily mean that the affective power yielded by the contents of the story matters. On the contrary, the power lies in the act of connecting the picture with the speaker with the telephone.

This is not any ordinary telephone, we now begin to realize—not least because of its weird color and greasy touch. This is a device that is supposed not to connect the speaker with home, but instead evoke a sense thereof. This is a device that creates a sitting body longing for home. The telephone that connects this speaker home does not sit on a desk like this one, but, much like themselves, survives the journey that leads there. The speaker, we learn, is a very specific person, defined as such by the infrastructures of borders, geopolitics, and war. Connected to the wall, connected to the network connected to the system. The autonomy over the (sounding) self is temporarily hijacked by this bond. Speaker and telephone, speaker and infrastructure, they co-constitute one another, defined and bonded by the power relationships that inhabit body, voice, telephone, algorithm, policies, trade agreements, governments, colonialism, history.

Their name, a number. Their identity, a placeholder: “asylum seeker.”

In sociolinguistics, the analysis of a dialect imposes a series of ethical problems that impact how, why, and when speakers are elicited to speak more “naturally,” i.e. as they would do in their everyday. For a piece of software specifically calibrated to attend to dialect elicitation, it is also expected that the speaker should provide the most “natural” account of their prosody, pronunciation, rhythm, and vocabulary. Miriam Meyerhoff et al. argue that speakers tend to adapt their speech when they know they are being recorded, and the omission of such awareness exacerbates the power relationships at play between researcher and researched.8 When the software omits the figure of the “researcher”—replacing it, say, with a telephone—the presence of a recording device might be, at least visually, obscured. The authors further discuss one technique employed in sociolinguistic research, which consists of recording conversations between two or more speakers that are familiar with one another, without the physical presence of the researcher in the same environment.9 Furthermore, whereas the sociolinguistic research implies consent (written or otherwise), in the German asylum system the accent recognition test, done so via interview or via software, is compulsory when no other “material proof” of the applicant’s origin can be presented.10 The power relationships, in this case, are not broken nor obscured. Rather, they become hypervisible.

The German Federal Office for Migration and Refugees (BAMF) began experimenting with accent recognition software by the end of 2017, allegedly after the disastrous case of Neo-nazi German soldier Franco A., who managed to trick the asylum seeking system and was granted temporary asylum as David Benjamin, a Jewish Syrian. His plan was to commit a series of terrorist attacks on German public figures and attach the violence to his temporary identity.11 The software was implemented in 61 of the reception centers (Ankunftszentrum) and external offices (Außenstellen) in Germany. The databases for speech and accent recognition (speech corpora) were bought from the University of Pennsylvania in the United States.12 Its capabilities were, for a couple of months initially, specially focused (or one may say, tuned) to speakers of Levantine, Egyptian, and “Gulf” variations of Arabic. By April 2018, this software had been deployed 9,883 times in asylum cases, and the BAMF, in response to an inquiry of the software’s effectiveness, reported the error margin to be of 20 percent, with promises of improvement.13 By browsing on the University of Pennsylvania’s long list of speech corpora we find that the most popular databases amongst the Arabic dialect variants are named CALL FRIEND and CALL HOME.14

Franco A. never spoke a word of Arabic in his hearing. He answered his questions in French, and apparently also in German.15

Now the picture might be getting a bit clearer: we are talking here about form over content. Power is embedded in the unbalance between these two correlated aspects, present in the contingent link formed between the importance of content for the teller, and the relevance of form for the listener. The story now is that of the endless infrastructural reach opened up by the articulation of a phoneme. Lives hang by this articulation, fates predicated on this unbalanced relationship. The error margin, we know, is 20 percent.

The list of ethical problems that may arise from the attempt of making people to “speak naturally” is endless. From the moment the human component is removed from this already very uneven power relationship, it is implied that relying on software-automated decisions yields a certain degree of “neutrality” and “objectivity,” and the ethical problems might temporarily be left aside. Put differently, the error margin is 20 percent, but at least—so they say—there is no danger of human emotion getting entangled with an objective account from the listener’s side—which might pave the way, allegedly, for emotions to be elicited from the speaker’s side.

Tell me: what are the words, the scenes, the shapes that elicit you to pick up the phone and call home?

A telephone might be used as something to evoke an uneasy feeling of familiarity. According to Meyerhoff et al., “topics where speakers can get emotionally involved” tend to be extremely successful in evoking dialects, and they emphasize the use of techniques such as “the danger of death”:

“‘Have you ever been in a situation where you nearly lost your life? When you thought this is it?’ Answers to this question usually require some emotional engagement, and it may trigger stories with an abundance of vernacular features (Labov 1972a; 1984). However, it does not necessarily work in all speech communities and for all individuals […] One of the speakers who contributed to the Bequia corpus (Meyerhoff and Walker 2007) started to cry after answering this question […]” 16

It seems this method might be no longer used in sociolinguistic research.17 The “danger of death” question not only is highly problematic from the point of view of research ethics, but also extremely inconsiderate of the subjects’ emotions and the psychological consequences of eliciting trauma. For survivors of violence, abuse, and war, elicitations like these can yield long-term consequences.

As we turn over the picture placed on the table in front of us, and take a good look at it, we find out it depicts a family having a meal together at home.

Anzaldúa reminds us that an image “is a bridge between evoked emotion and conscious knowledge.”18 Can you imagine the most trivial of situations being weaponized as an encapsulation of the unattainable?

A telephone might be instrumentalized to remind an asylum seeker that “home” is always elsewhere. Home, it seems, is forever out of reach.

We now have two minutes to describe a picture into this telephone that sits in front of us. Connected to the wall, connected to the network, connected to the system. Connected to the border.



