multimedia learning


Multimedia learning assumes:

We process visual and auditory (words and pictures) separately - linking with Paivio's Dual Coding Theory. We have a limited capacity in processing the words and pictures we receive - linking with Sweller's Cognitive Load Theory. We process information by selecting and organising it into a mental representation and integrate that with our existing knowledge - linking to Fiorella and Mayer's Generative Learning Theory.

Image taken from Mayer's Multimedia Learning 3rd edition, p40.


The principle that begun the development of Multimedia Learning is that we learn better from words and graphics (pictures) than words alone. This principle has a incredibly significant effect size of 1.35, though this is dependent on the type and way pictures are used. The impact of using words and pictures together is potentially greater for learners with lower existing knowledge in that content area. Those learners may need support in creating a mental image of what is being learnt.


Multimedia learning employs a knowledge construction view, believing that for something to be truly learnt, as well as remembered, it must be understood and transferred to new situations.

In an information acquisition view, learners are passive receivers of information. Empty vessels that receive whatever information is given to them by the teacher, who's job it is to present information to the learner. When learner's receive the information, it is added to their long-term memory. Multimedia learning suggests that this view presents the learner in too passive a role.

The response strengthening view is the original view of learning psychologists developed. With this view, there is an association between a stimulus (a question about the intended learning) and a response (demonstrating remembering of the stimulus). The learner is passive in this view and dependent on the answer, either receives a reward or punishment for their response - hence strengthening a correct response. It is the job of the teacher to give rewards and punishments based on the response of the learner. This is too simplistic a view for multimedia learning.

With a knowledge construction view, learners are active in that they have to make sense of the information to be learnt, then build and integrate a mental representation of that material. This process will be different for each learner, hence their knowledge will be personally constructed. In a knowledge construction view, the job of the teacher is to guide learners in how to process the materials (by deciding how to present it and how they hope learners will make sense of them) as well as helping them do so throughout the learning process. This view is consistent with multimedia design focusing on aiding the cognition of learners, rather than focusing on how to use the features of the available technology.


The coherence principle: Remove extraneous information and learning will be improved. The extraneous information can be split into 3 principles to exclude. Firstly, interesting irrelevant words and pictures. Excluding these has an incredibly significant effect size of 1.27, though is most effective in learners with low working memory. Secondly, interesting but unneeded words and symbols. Excluding these has a medium effect size of 0.70, working most effectively on learners with low prior knowledge. Thirdly, interesting but irrelevant background music. Excluding this has a large effect size of 0.96, as with the first principle, working most effectively in learners with low working memory.

One way to understand what to exclude is to think of all 3 coherence principles as seductive detail (detail added to make what may be seen as dull, more interesting). It is seductive because what is remembered by the learner is likely to be this detail rather than the essential learning. Less is more. Indeed, if learners construct a mental image from a summary of what is to be learnt first, before it is elaborated on with further essential detail, the impact will be greater.


The signalling principle: Add cues to signal the organisation of the essential materials and learning will be improved. By using these signals, extraneous processing of information (which hinders learner's creation of a mental image) can be minimised.

Verbal signalling is adding cues to verbal or written words. It has a medium effect size of 0.69 and is stronger in learners who have lower skill in reading. If the teacher over uses signalling, it can dilute its impact. There are 3 types of verbal signalling.

Firstly, classic signalling, which in turn has 3 types: outline, headings and pointer words. An outline cue is to start a lesson with a summary sentence of the essential materials (an outline). Heading cues are phrases or a sentence that is used to summarise the essential material in the section about to be introduced. Pointer words are words such as ordinal numbers (first, second, third) to structure essential material. Secondly, a spatial outline is a graphic organiser (such as those on this page), which arrange key words in physical space, showing how ideas are contained or the paths they follow. Thirdly, highlighting refers to either saying key words louder and slower so that there is a vocal emphasis, or making key words text bold or coloured (usually red). Therefore key words are 'highlighted' - it does not refer to the teacher using a physical or onscreen highlighter (though there is some similarity with colouring words). However, there is currently insufficient evidence on the impact of highlighting although vocal emphasis appears to aid recall without helping transfer to new contexts.

Visual signalling is adding cues to visual content, usually onscreen. It has a medium effect size of 0.71 as long as gestures are specific and not general. There are 2 types of visual signalling.

Firstly, colour coding, which is when key components of what is being viewed, changes colour. This needs to occur at the same time they are being described by the teacher or an onscreen representation of a teacher (onscreen agent). Secondly, an onscreen agent (which can also apply to a teacher in the room) using specific pointer gestures such as a finger or onscreen arrow. This relates to using cues to signal the organisation of the essential material, such as guiding through the stages of a diagram. A general pointing gesture (e.g. pointing at the diagram as a whole) makes the cue ineffective.


The redundancy principle: Avoid using identical (and therefore redundant) printed text alongside existing graphics and screen or teacher narration. There is additional, unnecessary processing that occurs in the visual channel when learners compare the spoken (narrated) text with the identical written text.

The redundancy principle has a medium effect size when the spoken text is at a fast pace, otherwise there is a low and negligible effect size. However, there are 3 boundary conditions that diminish extraneous processing, rendering the redundancy principle less impactful in following.

Firstly, if the identical printed text is cut down to just key words and placed next to the graphics they are related to. Secondly, if there is identical printed text but no graphics at all with the verbal narration chunked into short parts. Thirdly, if the words used in the identical printed text are unfamiliar, or in a second language.


The spatial contiguity principle: Integrate words with pictures on the page or screen rather than keeping them separate. This principle suggests processing separate words and pictures adds to extraneous processing, linking with cognitive load theory's split-attention effect. It has a stronger effect when transferring rather than recalling knowledge.

There are 3 main boundary conditions:

Firstly, the effect is stronger when materials that are being presented are more complicated. 

Secondly, the effect is stronger when the diagram can't be understood without the addition of words. 

Thirdly, the effect is stronger in learners with low knowledge of the area being studied - importantly, the effect is reversed in learners with high knowledge.


The temporal contiguity principle: Integrate words and pictures at the same time. This principle suggests processing words and pictures separately in time adds to extraneous processing. It has an incredibly significant effect size of 1.31.

There are 3 main boundary conditions:

Firstly, the effect is stronger when materials that are being presented are more complicated.

Secondly, the effect is weaker when the materials are split into short, manageable segments. For example, in material to be learnt with 8 segments, rather than all 8 being narrated and then all 8 having pictures/animation following it, segment 1 narration is followed by segment 1 pictures/animation, then segment 2 narration is followed by segment 2 pictures/animation and so on.

Thirdly, the effect is weaker when learners can control the pace of the lesson, for example, where the lesson is recorded and watched independently on a digital device.


The segmenting principle: Split instruction into segments rather than a longer sequence. The more complex the materials, the more impactful segmenting content becomes, particularly when presented at fast-pace. Importantly, the effect size of the segmenting principle is linked to whether the user controls when the pauses occur and when to restart instruction - when this occurs, the impact is a medium effect size of 0.67.


The pre-training principle: Pre-train (pre-teach) learners the names and characteristics of the main concepts. This avoids essential processing overload (links to cognitive load). The effect size is a medium-to-high 0.78 when materials being learnt are complex, the instruction is fast-paced and learners have low background knowledge.


The modality principle: It is more impactful that essential processing occurs through pictures and spoken words rather than pictures and written words.

The modality principle has a high effect size on transfer tests when the material is complex and fast-paced and learners are familiar with the words already. However, written words may be more appropriate if the lesson contains technical words, second language learners or the lesson is learner paced.


The personalisation principle: Use a personal conversational style during instruction rather than a third person, formal style.

Using personal pronouns such as 'you' and 'we' in a polite rather than direct style enables generative processing to occur. It is most effective when lessons are short in length and learners have low existing knowledge of what is being learnt. However, the impact is lessened when personalisation is taken too far.


The voice principle: When using a digital platform for instruction, such as pre-recorded videos or an adaptive package with a machine voice, ensure that there is a connection between the voice and any character or person on screen. Broken social cues such as the voice not seeming to 'fit' with the character removes the effectiveness of the voice principle.


The image principle: Static images representing a teacher or instructor do not help learners learn better. Only a negligible effect size of 0.20 was identified in the use of static images of the instructor. However, if the image has human-like features such as eye movement, gestures or specific pointing,the impact may be increased.


The embodiment principle: When an on-screen instructor shows high embodiment, learning can be deeper. Examples of high embodiment are specific pointing, drawing as explaining and first person rather than third person perspective. This yields an effect size of 0.58. However, if social cues are broken (i.e. the learner does not view the on screen instructor as having human like qualities), the impact is less effective, if at all.

The immersion principle: 3D immersive virtual reality environments do not aid learning more than 2D presentations. The effect size of using 3D virtual reality was -0.10, meaning there was a negligibly small negative effect of using virtual reality. Part of the limited effect may be related to the experience of the learners in using the environment. It is hypothesised that learners may focus more on understanding the environment than the learning until they become sufficiently experienced.



The generative activity principle: Generative activities are effective in impacting on generative learning (learning that needs to be generated by the learner). There are 8 studied activities with an average effect size of 0.71. Impact was highest when there was support or scaffolding for the learners in carrying out the activity, alongside minimising the activity’s cognitive load.

summary poster