Valid assessment is a plausible argument not an absolute

About the author

Professor Alex Steel is a Professor in the Faculty of Law & Justice.

He is a Fellow of the UNSW Scientia Education Academy and a Principal Fellow of the Higher Education Academy. His leadership roles at UNSW Sydney included Director AI Strategy Education, Director Teaching Strategy and Interim Pro Vice-Chancellor, Education & Student Experience, among others.

Subscribe to UNSW education news

Published 23 October 2025

[Originally published in Needed Now in Learning and Teaching, 20 October 2025]

No matter what your perspective, assessment in higher education (HE) is in crisis in one way or another. Possibly it always is. What we should be assessing, and the appropriate way to go about doing so, is always contestable. In HE theories of learning and assessment, rationales for studying and learning, what success looks like, and what society and employers expect, all change over time. Sometimes technologies or pandemics drive changes in learning and assessment faster than we feel equipped for.

Approaches to assessment that might have seemed valid and acceptable in the past may no longer be fit for purpose into the future. There may be a tendency to lurch away from apparently innovative but compromised assessment types to older formal types. But conversely, new technologies may mean that some approaches which might have previously been dismissed could now be appropriate.

Trying to describe why assessment is, or is not, acceptable has proven difficult. Words like ‘secure’, ‘reliable’, ‘trustworthy’, ‘valid’, ‘integrity’ (and more besides) often have technical meanings in different disciplines, and can be misunderstood in common conversation.

There is a tendency to see assessment in binary terms – ‘it’s a secure assessment’ or ‘it has no integrity’. Or to misquote George Orwell: “Exam good, essay bad”.

One term that has recently gained popularity in some papers discussing the issue is ‘validity’.

Measurement science and validity

Measurement science has long debated the concept of validity. In the 1990’s, Messick and Kane developed the idea that validity was based on an interpretative argument, where the assessment outcome was the premise, and the meaning given to it was the conclusion. In other words, we need to establish reasons why an assessment tells us anything about student learning. Kane noted that:

“Because it is not possible to prove all of the assumptions in the interpretive argument, it is not possible to verify this interpretive argument in any absolute sense. The best that can be done is to show that the interpretive argument is highly plausible, given all available evidence.” (Kane, 1992, p. 527)

How we identify and prove our assumptions about assessment as part of this interpretive argument varies depending on the specific interpretation being proposed. Evidence can involve: observations of how the assessment was designed and delivered; use of theories of learning or other data and experience to make inferences and extrapolations about what the particular behaviours or answers mean; feedback from students; statistical analysis across cohorts; among others.

Later writers have expanded the scope of the argument for validity to include the way in which the assessment outcome will have impact on students, or how it may be relied upon by others (including, for example, misapplication of specific assessment for evidence of general competence). Outside of measurement science, others have developed frameworks to demonstrate the trustworthiness of teacher observations as alternatives to psychometric approaches.

How does this help?

This core idea of a need to make an argument for each valid use of an assessment seems very useful for HE, even if the full rigours of measurement science are not realistic for the highly dynamic HE assessments.

The argument approach to validity asks us to consider both what aspects of learning we are trying to assess, and the basis on which we can claim that a particular outcome of the assessment could be plausible evidence of that particular learning.

It encourages us to examine and test assumptions and to recognise that an assessment that might be highly valid in one setting might be less so in another. Some arguments for validity will be easier to make than others; some may be less valid than might be assumed.

Thus, a closed-book, invigilated exam makes a highly plausible argument (though not absolutely certain) that the person writing the answer is the enrolled student, and that they are writing the answer from their own memory (or guesses) of the course content. It is less plausible that this answer is the best answer they could give, nor that it accurately reflects their broader level of understanding, nor their capability for application of the learning. But at some high stakes points in a degree that might be sufficiently valid, and could increase validity of other assessment.

Conversely, research essays seem vulnerable to misuse of AI. But the validity argument approach suggests that, in some settings and with specific design elements, a plausible argument could still be made that there is evidence of student learning. What those elements might be will vary by circumstance. How the assessment is structured, the way it relates to other assessment, the relationship of the educator to the students, and the purposes of the assessment will all impact on the strength of the argument for plausible validity. Honours theses might still be valid, but online reflective notes might not be.

Beyond these circumstances which relate to assessment format, for all assessment types a separate analysis of the actual questions set is also needed. What is the strength of the argument that a response to the questions can assure that the student has attained the required learning outcomes? Is validity increased by seeing the response in light of other assessments?

So, to merely say an assessment is ‘secure’ is not a convincing argument that there is evidence of learning. To say an assessment is ‘well designed to encourage reflection’ is not a convincing argument that the submission is by the enrolled student. More nuanced arguments are needed.

The question of plausibility

The question we should be asking is:
“What plausible arguments do we have that this assessment is evidence of learning?”

Plausibility implies that not everyone will agree the assessment is fully valid; that the proof won’t be watertight. But it requires that the argument be good enough for us to be ‘going on with’. It implies that we keep an eye on how our assumptions about validity play out in practice and to adjust and revise where necessary. Most importantly, it emphasises that ongoing assessment validity is very largely in the hands of the expert educators who design nuanced assessments for complex learning environments.

****

Reading this on a mobile? Scroll down to read more about the authors.