So I shall summarise some points referred to in Chapter 7 of Nation and Macalister (2010), adding a few observations of my own.
What is good assessment?
Assessment needs to be reliable, valid and practical. Let's look at these three aspects:
A reliable test gives results which are not greatly affected by conditions which the test was not intended to measure. If the same person sat the test twice you would expect them to get more or less the same result. A test is more reliable if it is always given in the same conditions, is marked consistently, has the largest number of questions possible and in which questions and instructions are as clear as possible. An unreliable test cannot be valid.
(I would add a couple of other issues here. One very obvious one is that pupils should not have the opportunity to copy each other. If classroom layout means pupils are sat close to one another the temptation to cheat is too great for a good number of students, so steps need to be taken to ensure it is virtually impossible to see a neighbour's work. My solution was to have pupils place a bag in the middle of their shared table so that it was extremely difficult to see their partner's work.
A second point is that the type of question has a significant influence on reliability. So-called objective questions such as true/false or multi-choice are inherently more reliable than essay-style questions or oral tests where a level-based mark scheme has to be applied. If you set free writing, for example, you need to decide precisely what you are assessing and to apply any mark scheme as consistently as possible. Research shows that this is not easy, as teachers bring their own bias to the marking of pupils' work. As far as speaking and writing are concerned, you need to decide what you are assessing. Is it accuracy? Relevance to the question title? Pronunciation accuracy? Range and complexity of language?)
A valid test measures what it is supposed to measure. A valid achievement test measures what has been learned on the course, not language to which students have not been exposed. A valid listening test measures skill at listening. To ensure this teachers need to consider face validity. If it's a test of reading, does it look like one? If it's a vocabulary test, does the test actually test the spelling and meaning of words and phrases. But as well as face validity, we have content validity. When you analyse a test does it actually test what it's supposed to/ For example, if you wish to test reading or listening alone it would be unwise to include questions which require written target language answers, since these involve knowledge of writing as well as reading.
(I would add that this presents a challenge. If you decide to separate out the four skills and test each individually, this may end up having an effect on the way you teach, owing to what's called the backwash effect (aka washback). let's suppose that you want to test listening by asking questions in English (so as to avoid using target language writing), the danger is that, in the run-up to the test and in other prior lessons you use English questions when doing listening lessons. This, in turn, reduces the amount of target language (comprehensible input) you use in lessons. In this case the backwash effect is negative since we know that maximising comprehensible input is likely to lead to more acquisition.
On the other hand, if you design assessments to reflect the way you would like to teach then the backwash effect of tests can be positive. Suppose you believe that teaching is best when skills are integrated in lessons, with listening reinforcing reading, reinforcing speaking, reinforcing writing, and so on, then a good test would reflect this practice by including mixed skill assessment, e.g. a TL text with questions in the TL.
This has important implications when we look at GCSE and A-level in England, Wales and Northern Ireland. At GCSE Ofqual/DfE has long ago decided that, in general, we need to test the four skills separately. Yet they also reduce the validity of the tests by including TL answers in listening and reading tests. At A-level mixed skill testing has long been taken for granted.
The backwash effect is very powerful and encourages, for example, schools to use GCSE-style assessment even as early as Y7, which I find very unwise. These issues need to be kept in mind when designing unit tests and end of year exams.
In sum, I would argue for designing tests to resemble well-conceived lesson activities. You then end up with positive backwash and fairness, as students are asked to do well-designed tasks with which they are familiar. the test thus becomes a natural extension of the teaching. Remember too that research clearly shows students do better on tests whose form resembles what they have previously done in class.)
When you consider practicality you need to look at factors such as cost, time taken to sit the test, time needed to mark it, the number of people needed to mark it and the ease in interpreting the results.
Practicality is aided when you can reuse tests year on year, or when a course book provides well-designed unit tests. Tests are more practical when they can be quickly marked, e.g. true/false or multi-choice, as opposed to composition or question-answer. But keep in mind that reliability and validity should take precedence over practicality. A multi-choice test is quick to mark, may be reliable to a limited extent, but certainly lacks validity in important areas.
(I would add that one way in which practicality comes in is to do with speaking tests. If you spend too much time on these during the year you risk sacrificing other areas of teaching. One formal oral assessment a year is probably enough, as long as you are assessing oral performance informally the rest of the time. Some might argue that a formal oral assessment is not worth doing at all in the early years since it is stressful and you already have enough information from lessons to assess spoken ability. But my experience is that you need at least one formal spoken assessment a year for the benefit of quieter, less confident learners who are reluctant to perform in normal classroom tasks.)
In conclusion, as Nation and Macalister point out: "Assessment also contributes significantly to the teacher's and learner's sense of achievement in a course and thus is important for motivation. It is often neglected in curriculum design and courses are less effective as a result" (p.120). So I conclude with these few general pointers;
- There's nothing wrong with testing - it provides opportunities for review and retrieval.
- Design a regular programme of low-stakes assessments covering all the skills in an integrated way, as far as possible.
- If you decide to test skills discretely (separately), don't let this have negative backwash on lesson design.
- Only test what pupils have learned, keeping input comprehensible.
- Don't spend too long on tests.
- Make sure tests can be marked quickly.
- Record results carefully to enable you to have an idea of a student's progress.
- Make sure you use test results and analysis to help guide your future teaching.
- Change tests if they are found not to work, e.g. they are too easy, too hard, or unclear.
- All classroom activities, including tests, involve retrieval practice - if a test is a just like a regular classroom tasks it will be less stressful and produce positive backwash.
- If the course book tests need adapting, do so, e.g. by adding more repetitions of audio tracks or reading them aloud.
- When setting and marking end of year exams, spread the workload fairly across the department.
- If you decide to do regular vocabulary testing, remember what research says about vocabulary acquisition - learning isolated words in written form has severe limitations. "Knowing words" is much more than knowing what they look like and what they mean.
- Raising the status of tests can have a motivating effect (many pupils learn hard for them), but keep things in proportion!