Key concepts in assessment - video transcript

My name is Greg Benfield. In this presentation I'm going to talk you through some key theoretical ideas about assessment and feedback relevant to university teaching. I leave it to you to think about how to interpret these ideas in relation to your context, by which I mean the subject you teach, the department and institution you are in, and whether or not your course is delivered primarily by traditional face-to-face methods or substantially online.

These are the intended learning outcomes for this session. This session is really a foundation for a second session that examines some of the current topical issues in assessment and feedback and focuses on some practical issues for teaching.

There is an extensive bibliography on assessment and feedback in the course VLE site.

These quotes express a fundamental understanding about the role of assessment in student learning. We know from our experience and from extensive research on the issue that strategic students will pay careful attention to those aspects of the course that they know will be assessed and are likely to ignore those aspects they believe will not be assessed. One of the conclusions that we can draw from this is that one in the third quote. Assessment can be an incredibly powerful lever for changing not only what, but how students learn.

There are several important theoretical concepts in the domain of assessment. One of these is the idea of assessment validity. Put simply, an assessment tool or task can be thought to be valid if it measures what it purports to measure. However, achieving assessment validity can be problematic. Certain tools or techniques may be inappropriate for measuring some learning outcomes. For example, a multiple-choice test which only allows students to select responses from a set of predetermined choices cannot assess things like creativity. Sometimes the issue is less clear than this. For example, although in theory it is possible to test problem-solving ability in an examination, frequently it is not problem-solving ability that is tested in examinations but students' capacity to correctly identify the features of a problem they have solved before and correctly apply the relevant procedures to this version of it. Not so much problem-solving then as recall and comprehension. These quotes by educational researchers express these ideas as being a problem inherent within the system of higher education, a long-term problem that we are still grappling with. Constructing valid assessment is both important and tricky and for the assessment designer it requires absolute clarity about the intended learning outcomes that are to be assessed.

To follow through on this idea I want to remind you of the principle of constructive alignment in course design. Constructive alignment was formulated by John Biggs. It is founded in constructivist notions of how students learn and the basic idea is very simple: intended learning outcomes, the activities that students do in order to learn those outcomes and the assessments that they do that help to judge whether they have learned those outcomes should all be mutually supportive or well aligned.

In short, constructive alignment is about three stage course design. Be clear and explicit about the desired learning outcomes. Design learning activities for students to do that inculcate the student behaviours you are trying to encourage and will be sufficient for them to learn the intended outcomes. And thirdly, design assessment tasks that are both valid for assessing the desired outcomes and are likely to encourage students to engage with the desired learning activities.

Another important theoretical concept in assessment is that of reliability. Reliability concerns the extent to which assessments are consistent. At a very simple level this should mean that if you design a quiz to measure students' ability to solve say trigonometric equations, things like time of administration of the test, location of the test, and who marks the test should make no difference to students' performance. Furthermore, this test should be internally consistent, which is to say that if a student gets a particular item correct then they should also get other equivalent items correct as well.

A big problem in higher education concerns the notion of inter-marker reliability. Much of our marking involves expert judgments that to a greater or lesser extent are subjective. Frequently student work in the same course is graded by multiple assessors. As you can see from these quotes–and I should say that despite their age the problem continues to this day–the measured variations between marks given by individual assessors on the same piece of work can be extreme. In fact the Newstead and Dennis study showed greater variation in marks by external markers than by internal members of the department.

One of the mechanisms we use to manage this problem is internal moderation of marks. Usually a pretty sizeable sample of student work, somewhere between 10 and 20% often, is taken and shared around for others to mark. If the two markers agree, fine, and if not they discuss the reasons for their differences and try to reach agreement, sometimes appealing to a third marker.

Now, this moderation process is a formal one that seeks to provide students with a fair and transparent assessment process. But it also contributes to the development of what we might call an assessment community of practice within a department, in which regular discussion about student work and about standards contributes to a shared understanding of standards within that community. So, although we are unlikely ever to entirely remove subjectivity and variation between individual markers, there are a number of studies that show surprisingly high levels of inter-marker reliability in departments that support high levels of informal discussion about student work. Or to put it another way, departments whose members talk to each other, especially about learning and teaching, are far more able to develop shared assessment standards than those with relatively impoverished interactions between their members.

By the way, there is an important implication here for student learning. Which is that students are only likely to come close to similar understandings of assessment standards as their teachers hold through much the same mechanisms as their teachers use to achieve shared understandings of standards. They need to be actively involved in assessment processes, especially as markers. For example, marking exemplar pieces of work using assessment criteria, or giving feedback on the work of their peers. I discuss this a bit more in the second talk.

Finally, let me make a brief comment about one last important concept in assessment. This is the notion of fairness. Reliability is related to the idea of fairness, because an assessment would clearly be unfair if two students with the same level of achievement obtained different marks. More generally, assessments need to be fair in the sense that they should only discriminate on the basis of students’ ability in the tasks in which they're being assessed. It is unfair if things like cultural background, language background, disability, economic and social background and so on can substantially affect results on a test or assignment. That said, it's really not that simple or black and white. For example, it may be perfectly appropriate to treat grammatical errors more leniently if they are made by a non-native English speaker than by a native speaker.

Okay, that's the end of this session. As I mentioned at the start, a second session on assessment deals with some current issues in assessment and feedback.

About the course: Teaching Online Open Course (TOOC)