Why is listening in a new language so hard?

Gianfranco Conti and I are embarking on a second edition of our best-selling handbook called Breaking the Sound Barrier: Teaching Language Learners How to Listen (2019). We wrote the book to help address how we should go beyond just teaching listening through traditional comprehension exercises. In so doing, we were strongly influenced by the work of John Field whose 2008 book Listening in the Language Classroom questioned the so-called comprehension model (a product-based model, sometimes called teaching by testing), and proposed his process model, known by Gianfranco Conti as listening-as-modelling.

Catching up with studies over the last six years is a reminder to me that research on how to teach listening is out there, but it has remained a relatively neglected area. What research there has been has tended to focus on metacognition - strategies for helping learners be better listeners. Field's process model, focused more on decoding skills, has not been followed up to a huge degree, which is a shame. Researchers are missing out on some interesting work happening in schools in England and elsewhere, partly driven by our book and Gianfranco Conti's blogs and professional development days.

In this post, using material from the first edition of the book, this post summarises why second language listening is so difficult in the first place. In the next post I'll start to suggest ways teachers can help overcome the challenges.

Introduction

Listening in a second language (L2) is widely recognised as one of the most cognitively demanding skills for learners to master. It involves real-time processing of fast, fleeting speech, which leaves little room for error or delay. The challenges are generally split into cognitive and affective domains, both of which significantly affect learners’ ability to develop effective listening skills. Socio-affective factors also come into play. Let us look at these areas.

Cognitive Challenges

1. The transient nature of aural input

One of the most immediate cognitive barriers to listening is the transient nature of spoken language. Sounds disappear almost instantly — within two seconds — and are replaced by new input, making the retention and processing of information difficult. In first language (L1) contexts, this rarely presents a problem because native speakers handle such processing automatically. However, for L2 learners, especially those at lower levels, this is a major obstacle. L2 listeners must consciously process incoming language, which taxes their working memory far more than it does for native speakers. This heightened cognitive effort increases the risk of overload and comprehension failure.

2. Limited Working Memory

Human working memory (short-term memory) is limited — typically to about three to five pieces of information at a time. (Think how quickly you forget the steps in a recipe when cooking.) This constraint becomes significant when students attempt to understand language in real-time. If learners process each word separately (e.g. “dog,” “have,” “nice”), their capacity is quickly maxed out. But if they can process chunks of language (e.g. “I have a nice dog”), they can handle more input at once. This is why frequent and familiar word combinations (multi-word units or chunks) are easier to remember and understand. Unfortunately, novice learners often lack enough automatic chunking ability and instead rely on conscious, word-by-word decoding, which consumes limited cognitive space and reduces their ability to use broader listening strategies.

3. Assimilation in connected speech

Another difficulty arises from assimilation, a natural feature of spoken language. Words often sound different in connected speech than they do in isolation due to the influence of neighbouring sounds. For instance, the Spanish phrase “un barco” may sound like “um barco” because of the assimilation of the /n/ into an /m/ sound before the bilabial /b/. In English, similar phonological changes occur frequently, altering the surface form of words and making them hard to recognise for L2 listeners. Recognising such forms requires targeted practice and training.

4. Simultaneous Processing of Form and Meaning

A major limitation of working memory is that it cannot easily process both form and meaning at the same time, especially when learners are under cognitive strain. In a typical classroom, comprehension questions test students’ understanding of meaning but do little to teach them how to recognise and process sound, grammar, and syntax. This undermines listening as a learning activity and reduces its potential to help students internalise language structures. The shift, following the Field model, should be from merely testing comprehension to actively modelling how listening works. Teaching learners HOW to listen.

5. Comprehensibility Threshold

Research by Nation and Newton (2009) highlights another critical point: for L2 input to be truly useful, it must contain at least 95–98% comprehensible words. If input falls below this threshold, learners rely too heavily on top-down strategies (guessing based on context or prior knowledge), which may help them survive the task but doesn’t promote deeper language learning. Comprehensibility, therefore, must be a guiding principle when designing listening tasks.

6. Noticing

According to Schmidt’s (1990) Noticing Hypothesis, language acquisition begins when a learner notices a new linguistic feature. However, not all language features are equally noticeable. While salient items like key nouns or verbs may be relatively easy to pick up, less prominent features such as grammatical endings or function words (like the, this, that, but) are more likely to be missed — especially during listening, where cognitive load is higher than in reading. This uneven salience means that certain crucial elements of language may never be noticed or acquired without targeted attention and repetition.

Affective Challenges

1. Listening anxiety

Research suggests that listening is often the most anxiety-inducing skill for language learners. This is partly because it is usually treated like a test, where success is judged by how many questions a student gets right. The stress of being evaluated increases cognitive pressure and reduces performance. Furthermore, in many listening activities, the input comes from disembodied voices in recordings, which adds to the feeling of disconnection and discomfort.

2. Self-efficacy

Closely tied to anxiety is the notion of self-efficacy, or the belief in one’s ability to succeed at a task (Bandura, 1997). In language learning, self-efficacy is a strong predictor of future success. Learners who feel competent and confident are more likely to persevere and improve. Graham (2007) stresses the importance of building self-efficacy from the very beginning, which requires careful design of learning experiences.

Importance of Visual and Social Context

Human communication is inherently visual and interactive, yet many classroom listening activities rely solely on audio recordings. This can create a sense of unnatural distance for learners. Seeing a speaker — their gestures, facial expressions, and body language — provides important comprehension cues and reduces anxiety. Teachers who provide the input themselves can control delivery speed, emphasise important words or structures, and support understanding with gesture and expression.

Research (e.g. Huang, Kim, & Christianson, 2018) confirms that gesture enhances comprehension, especially for learners at early stages. Teachers can play a critical role in making language input more accessible and reassuring through their presence and performance.

Conclusion

Listening poses a unique set of challenges in second language learning, combining both cognitive limitations and affective pressures. Learners must process fleeting, assimilated speech using limited working memory, all while attempting to extract both form and meaning. Many features of spoken language, especially subtle grammatical elements, are easily missed under this pressure.

At the same time, the emotional burden of listening tasks — often perceived as high-stakes tests — undermines learner confidence and motivation. Supporting self-efficacy, ensuring comprehensible input, and reducing anxiety through appropriate task design and delivery methods are all essential steps in helping learners succeed.

Ultimately, the goal should be to move away from treating listening as merely a test of understanding and instead to model how listening works, teaching learners how to notice, process, and internalise the sounds and patterns of their new language. By addressing both the cognitive and emotional sides of the challenge, teachers can make listening a more effective and rewarding skill for all learners.

The next post will start to address the issues above.

References

Bandura, A. (1997). Self-efficacy: the exercise of control. New York, NY, US: W H Freeman/Times Books/ Henry Holt & Co.

Conti, G and Smith, S.P (2019) Breaking the Sound Barrier: Teaching Language Learners How to Listen. Independently published.

Field, J. (2008). Listening in the languages Classroom. Cambridge; Canbridge University Press.

Graham, S. (2007) Learner strategies and self-efficacy: making the connection. Language Learning Journal, 35 (1).

Huang, X. Kim, N. & Christianson, K. (2019). Gesture and Vocabulary Learning in a Second Language Language Learning, 69 (1), 177-197.

Nation, I.S.P and Newton, J. (2009). Teaching EFL/ESL Listening and Speaking. New York: Routledge.

Schmidt, R. (1990). The role of consciousness in second language learning. Applied Lingistics, 11(2) p. 129-158.

What is skill acquisition theory?

For this post, I am drawing on a section from the excellent book by Rod Ellis and Natsuko Shintani called Exploring Language Pedagogy through Second Language Acquisition Research (Routledge, 2014). Skill acquisition is one of several competing theories of how we learn new languages. It’s a theory based on the idea that skilled behaviour in any area can become routinised and even automatic under certain conditions through repeated pairing of stimuli and responses. When put like that, it looks a bit like the behaviourist view of stimulus-response learning which went out of fashion from the late 1950s. Skill acquisition draws on John Anderson’s ACT theory, which he called a cognitivist stimulus-response theory. ACT stands for Adaptive Control of Thought. ACT theory distinguishes declarative knowledge (knowledge of facts and concepts, such as the fact that adjectives agree) from procedural knowledge (knowing how to do things in certain situations, such as understand and speak a langua...

Language Teacher Toolkit: Steve Smith's blog

Search This Blog