In the coverage of the Principal Assessor’s report on the new Higher Maths paper, the key word was “challenging”, applied particularly to the infamous crocodile question. Needless to say, the various online stories don’t provide a link to the report from which this rather loaded word was taken; as the documentation is a little tricky for outsiders to locate on the SQA website, here are some direct links.
- The new Higher papers: Paper 1 and Paper 2, and the marking scheme.
- The Principal Assessor’s report on the new Higher.
- The old Higher papers: Paper 1 and Paper 2, and the marking scheme.
- The Principal Assessor’s report on the old Higher.
I spent a happy couple of hours hacking my way through these, and though I was slightly more reassured than I expected, I fear that whitewash is being slapped across some serious cracks in the brickwork.
I’ll start by getting something off my chest. Section 5 of the PA’s report is a forthright collection of advice to those preparing future candidates, which summarises comments scattered throughout the report along the lines of “numeracy skills prevented progress”, “algebraic manipulation continues to challenge a large number of candidates” and “some candidates could not find a correct expression for the area of a rectangle”. Good: it’s nice that somebody’s noticed, and if they do something about this kind of basic ineptness then my job will be an awful lot easier. May I suggest, though, that if you’re going to ask, quite reasonably, that candidates should “communicate… clearly” and “make use of correct mathematical terminology and vocabulary” then it might be advisable to write your report in decent English? Committing minor solecisms such as “comprised of” and “Areas in which candidates found demanding” — or confusing the active agent by writing “this question performed well” instead of “candidates performed well on this question” — probably doesn’t affect the meaning much, but it does tend to tarnish your credibility. So does using antiquated terminology such as “the derived function” for “the derivative” and, in the instructions for markers, describing the fractional power notation for a square root as “differential notation”. OK: with that dealt with, let’s get on and look at the numbers.
The old Higher, taken by 10854 candidates in 2015, consisted of Paper 1 (40 marks of multiple-choice questions and 30 marks of short written-answer questions) and Paper 2 (60 marks of written-answer questions). The new Higher, taken by 10220 candidates in 2015, consisted of Paper 2 (60 marks of short written-answer questions) and Paper 2 (70 marks of written-answer questions). The two versions of Paper 2 were almost identical, with the exception of the “crocodile” question, which appeared only as Q8 of the new Higher and was worth 10 marks.
It’s interesting to compare the performance on the two papers, always bearing in mind that these may reflect systematic factors other than the differences between the questions. In the old Higher, the average marks on Paper 1 were 25.4/40 (63.5%) on the MCQs and 14.88/30 (49.6%) on the written-answer questions, giving roughly 40.3/70 (57.6%) in total; the average mark on Paper 2 was 30.48/60 (50.8%); so the average mark overall was about 54.4%. In the new Higher, the average mark on Paper 1 was 24.8/60 (41.3%), and on Paper 2 it was 32.1/70 (45.9%), so the average mark overall was about 43.8%.
According to the Principal Assessor’s report, some minor changes had to be made to the grade boundaries for the old Higher to allow for “more demanding” questions than expected on Paper 1: the boundary for an A was set at 89 (from a target of 93) and that for a C was set at 56 (from a target of 59). All this is plausibly well within the usual bounds of adjustment.
In contrast, on the new Higher the boundary for an A was set at 78 (from a target of 93) and that for a C was set at 44 (from a target of 59). Of this 15-mark adjustment, 7 marks were attributed solely to the “crocodile” question; although the report notes that “progress was made” by some candidates, I think we can assume that the typical performance on this question was not more than three out of ten. The other 8 marks were “as a result of the unintended increase in challenge of Paper 1”, which in turn was because “there are no objective test questions [i.e. MCQs]… and the overall impact was that this made Paper 1 more challenging”. The result of these adjustments was that 19.7% of candidates got an A on the new Higher (against 25% on the old Higher), while 70.8% passed (against 73.1% on the old Higher). The resulting grade A rate of 22.4% across the two versions of the Higher should be compared with 25.3%, 25.1%, 24.8% and 25.3% in 2011 to 2014 respectively.
We knew the crocodile question was a fiasco: no news here. It’s interesting that the other sting in this exam was the removal of the MCQs. For my part, I think that reducing the number of MCQs can only improve assessment — but hands up if you’d not expect written-answer questions to be tougher than MCQs, especially given that 16% difference in performance on the MCQ and written-answer parts of the old Paper 1. I’m unsure why the SQA seem to have been surprised by this discovery.
And back to that word “challenging”, and the sins it covers. The nice thing about a word like “challenging” is that it suggests no actual failure on the part of examiners or examinees. The former didn’t screw up the questions and the latter weren’t underprepared: it’s just that everyone — like Icarus — had loftier aspirations than they could attain. This is good language for politicians to use, but lousy language for examiners who want to know what went wrong.
I don’t know about the SQA, but in discussions of assessment in HE [e.g. this useful briefing by Challis et al.] it’s often useful to distinguish between the validity of an assessment (does it assess what we actually want to know about?), its reliability (are the remarks repeatable, sensitive to differences between students, and free from bias?), and its usability (do the marks mean what students, teachers and employers wish or expect them to mean?). Usability is often the issue that causes tension and claims of “unfairness” because of its link to students’ and institutions’ expectations, and to call an assessment “challenging” is usually to make a comment about usability: for example, commenting that we need to rescale the marks or shift the grade boundaries so that a candidate with a qualitatively “grade A” performance receives an A. Occasionally, we might intend to comment on reliability, as when all candidates score so poorly on an assessment that it fails to discriminate between strong and weak performances. To use the word “challenging” is, though, to imply that the validity of the assessment is not under question — in an extreme case, if I were to set a maths exam that contained a question on Swahili etymology then I don’t think that “challenging” would be the appropriate word.
I can well believe that the new Paper 1 is perfectly valid — and indeed, given my scepticism about MCQs, that it is a more valid assessment instrument than the old Paper 1. “Challenging” here seems a reasonable word to use. As I’ve argued elsewhere, though, the problem with Paper 2 lies deeper than the fact that Q8 was tough. The question was in fact so badly written that it rewarded a mechanical treatment and penalised critical thought; in other words, it wasn’t even valid. Often, in the post-exam post mortem there is nothing to be done to rescue a situation except rescaling; fair enough. But if as their language implies the SQA really think that the answer is to set a less challenging paper next time round — especially given the gaping weaknesses in basic skills noted in the Principal Assessor’s reports — then they are conning themselves and trying to con the rest of us.
Let’s be clear: in education, aspirations are fine and challenges are fine. Competence and validity come first.