I Do Not Get Assessment At All Sometimes

I let Chuck, Shelley, and Robert skip the final exam. We logged fifteen concepts in the first semester of Algebra 1 and those students studied them, practiced them, and demonstrated mastery on all of them. Take a break, kids.

But what if I had given them all fifteen of those concepts again. How accurate is my ranking not just of those three kids but of all of my kids? I have ranked everyone on a four point scale on each of those concepts. Will a student ranked at 2 (“major conceptual errors”) again score a 2?

In lieu of a 50 question scantron final, I re-assessed every student on every concept, entered the current ranking into Excel alongside the student’s old ranking, and took the difference.

Should’ve left well enough alone, right?

How Accurate Were The Old Rankings?

  • Okay, so big sigh of relief that, in 313 instances, my old ranking was an accurate assessment of a student’s current knowledge. Could’ve been worse.
  • Could’ve been a lot better. That’s only 47% accuracy. And in 43 instances, my old ranking was three levels too high. That would be putting a student at a 3 (“minor mechanical errors”) and watching the student stare totally blankly at the question on the final forty-three times.

What Does Mastery Mean?

If I have a student ranked at mastery, would she master the same concept on the final exam?

  • This isn’t awful. This isn’t great. I don’t know at what point I should be unhappy.

Enduring Questions

  • What do we mean when we say “mastery”? Does that mean a student will score perfectly on the same concept every time? Should I be unhappy that the correct/incorrect balance wasn’t 100/0?
  • What do we mean when we say “retention”? This is a common question of my assessment strategies. “Don’t kids forget?” Obviously, I can now answer that question, “yes, sometimes.”
  • What do we mean when we say “grades”? I don’t know what kind of results here would prompt me to pack up the shop and dole out monthly, summative unit exams (“Chapter 6 Test”) with the rest of my department. The fact is that this kind of precision analysis isn’t even possible under a unit exam model, which puts other teachers in an enviable position; the question “do these assessment scores represent my students’ current knowledge?” cannot be answered so it goes unasked. The answer, I’m afraid, is that their assessment scores underestimate student knowledge since Chapter 7 clarified many of Chapter 6’s concepts but these teachers have no mechanism for class-wide re-assessment. So they lower assessment’s grade weight beneath that of homework, instead, and inflate their grades with a few extra credit assignments. Look, I’m open to absolutely anything. I just want my grades to mean something. And I need to respect what few guiding principles for assessment make sense to me.
I'm Dan and this is my blog. I'm a former high school math teacher and current head of teaching at Desmos. He / him. More here.


  1. For myself, I consider a concept mastered when I can in a small amount of time (relative – this could mean seconds for basic multiplication, minutes for looking up the law of cosines on Wikipedia, hours for re-reading Polya’s Method of Enumeration, or days for techniques for solving for generating functions, to pick real examples) reteach myself to the point that I would score “mastered” on an exam with decent probability.

  2. Mastery requires maintenance. No big deal, just make sure to touch on the topics that they have determined to be at mastery level on, once in a while. As long as you use what’s often referred to in the literature as maintenance rehearsal, you’ll find these numbers to change in your favor.

    Chris Craft

  3. The cop-out answer is that there is no such thing as a perfect assessment system because there is no such thing as a perfect student.

    All you need to do is watch a 30-second clip of Jay Walking to confirm this.

    I’m facing a difficult conference with a parent this week. She is passionately opposed to my standards-based grading system. She’d be more comfortable if I did what last year’s teacher did.

    I commend you for opening yourself to this kind of self-reflection and data analysis. It’s not the safe path, but it’s the right one.

  4. I once had a student score a 51% on a midterm and then come to my office and without knowing I was going to ask her to do so, proceeded to correctly explain the material, in detail, from memory, at a mastery level. That’s when i realized I was testing her mental status/level of anxiety, not her knowledge and that was the year I stopped giving exams, except as a diagnostic tool (concept tests).

    The distribution you present is really something to be proud of – roughly 500 out of roughly 600 scored within 1 point of your original evaluation. The normative condition is for students to return to their pre-instruction level of understanding within a short period of time so this represents significant gains over that condition. The skewness toward negative shifts could reflect retention issues, but it could also reflect anxiety unless earlier measures were in the same high-stakes environment. The positive side might be showing that a good number of students continued to consolidate their understanding even after moving on from the unit.

    There’s another issue at work here as well. Students with a 4 have no where to go but down. That’s definitely going to skew your curve tot he negative side. I’d be curios to see what that curve looked like, broken down for each of the starting grades.

    I don’t know the answers to most of what you are asking, but I am convinced you are asking the right questions.

  5. Dan, I really appreciate this analysis. Sometimes I can’t stand the way the numbers breakdown from my tests because I think the data I spit out at kids in class is a lot harsher than the data was when I remember taking a math class as a student. Back in the day we’d basically get A’s and B’s overall on test, and never hear more about specific concepts. Nowadays there are so many data points and little benchmarks that teachers have to make a choice or simply fall into the habit of reporting big stuff like overall test scores or little bitty skill-based scores.

    I worry, at least in my class, that the little-bitty stuff misses multi-step and criticalthinking problem solver skills and also is only based on five questions, but I can’t stand doing the large tests because all the data that could make “laser guided remediation” possible is lost inside one single final score.

    And, just to pass along a suggestion, I spent a class this week on the chapter 5 problem of the week. Kids loved the problem, and they felt smart solving it. I just walked around and asked questions, every once in a while I told a kid what someone had thought of on the other side of the room, but I never offered my own ideas. The class was uplifting because I got to sit beside kids as they were applying x‘s and y‘s – I wish I could make Algebra like that more often.

    Anyway, back to the post. You say that you had only 47% accuracy as if that’s something to be ashamed of and I wonder if there’s anybody at any school getting like 80 or 90 percent accuracy with assessment? Is this the right question to be asking?

    I think the bigger picture is maybe a little something like this. Your kids are getting tons of specific feedback. Technically, they’re refining lots of fundamental skills and ideas about algebra – they are constantly working on specific, clearly defined nuts and bolts, but then at the end they take an exam on all of it. I’d be curious how things would look, if you had a chance to ask them what they were focusing on prior to the test and how they performed. In cases where their scores dropped do they have an explanation? Did they improve scores on skills they were focusing on? Pick a random kid in your class, do you know what they’d say about their performance on the final exam, question by question? What happened, why did they mess up this problem, or how did they get this one right?

    Is the nature of the final exam sufficiently different from the skills tests so that the two won’t necessarily correlate despite some severe legwork on your part to align each question with a skill?

  6. Good feedback here. To answer Nick‘s question, to serve the purposes of this investigation, I didn’t just align multiple choice questions to our concept list. I gave them concept questions. The old questions and the final exam questions came from the same source.

    David explains away some of the negatively skewed distribution on these scores but the issue of retention remains outstanding. If I give those kids the same test in a year, the scores will skew even further South. At what point do we admit that students forget, and then figure out what we’re really trying to say when we say “we value retention.”?

  7. Dan I’m only hesitant to chalk things up to forgetfulness because of the change in test structure – 1 concept vs. 15 concepts. I think the big effect of that change is that it tests grayness more – memorization or procedural knowledge may get you through a concept test, but without context you may not know exactly how/when to apply your tools.

    A one concept, six question test probably does a thorough job in assessing a kid’s ability to apply the ins and outs of a procedure.


    A six concept, six question test probably does a better job at assessing if a student knows when to apply a certain procedure.

    Kind of an applications test vs. a procedural competency test. I’m pondering how this would get applied in what order. Maybe trying to throw in two or three cumulative review (multi-concept) applications tests throughout the semester would help predict final exam scores.

    Lastly, personally, what would you have predicted to happen for each kid? You’ve seen so much of their work – where you expecting the mastery scores to determine the final, or did you have other intuitions?

  8. Is “mastery” the ability to quickly “refresh” the skills when they’re needed? For example: pulling out the skill of factoring when working with rational expressions?

    This is my second year of working with seniors. “We’ve learned this before” is something I hear a lot. I think they mean “I’ve seen this, I’ve done this, but I don’t really remember it well.”

  9. Fair question.

    I assigned a portfolio notebook. They had to take their five lowest ranked concepts, complete two examples, and write a verbal explanation for each. Let’s say that half my students turned this in.

  10. Gotta say, I’m less than convinced that mastery means retention. I aced the traditional chapter tests throughout high school, but needed serious review later. Sure skills are spiraled through, but by finals time material isn’t fresh anymore. Review is critical wherever you are on the spectrum.

    Maybe part of the process is teaching students about the curve of forgetting, or whatever you want to call it. Remind them that once you know something, once you have that mastery, it’s important to keep using it so you don’t forget.

  11. Dan, too bad you don’t have a control group. I’d like to see how a group who took only “Chapter” exams would perform on the same final exam. If the results are the same, then is it worth all the blood, sweat, and tears to assess the mastery way?

    I’ve been assessing the mastery way as well, and am rather disappointed in my midterm exam grades. They performed no better than in the years before mastery. (But, perhaps they were weaker to begin with, and so mastery helped? Alas, the lack of a control group!)

    Also, if it’s possible, I’m curious as to how their ORIGINAL concept scores (before re-takes) match up the ones on the final exam.

  12. My word, yes. My kingdom for a control group. Ethically, though, I would have a hard time subjecting any of my kids to chapter-based assessment. Because, even if their grades were equivalent to the control group, there are a host of other reasons why I find standards-based grading to be better for teachers, kids, and other living things.

  13. Sounds like you’re understanding of mastery learning needs some work.

    I suggest taking a look at Engelmann’s “Student-Program Alignment and Teaching to Mastery” and following the guidelines he lays out.

    He lays out four criteria:

    Criterion 1. Students should be at least 70 percent correct on anything that is being introduced for the first time.

    Criterion 2. Students should be at least 90 percent correct on the parts of the lesson that deal with skills and information introduced earlier in the program sequence.

    Criterion 3. At the end of the lesson, all students should be virtually 100 percent firm on all tasks and activities.

    Criterion 4. The rate of student errors should be low enough that the teacher is able to complete the lesson in the allotted time.

    Often the reason students don’t retain what they’ve learned is because the teacher did not provide them with enough distributed practice after the initial teaching.

    Engelmann estimates that almost all commercially available curricula need to increase by about 4x the distributed practice provided for mastery learning to occur and for retention to be gained by the students.