Category: tech contrarianism

Total 126 Posts

What Students Do (And Don’t Do) In Khan Academy

tl;dr — Khan Academy claims alignment with the Common Core State Standards (CCSS) but an analysis of their eighth-grade year indicates that alignment is loose. 40% of Khan Academy exercises assessed the acts of calculating and solving whereas the Smarter Balanced Assessment Consortium’s assessment of the CCSS emphasized those acts in only 25% of their released items. 74% of Khan Academy’s exercises resulted in the production of either a number or a multiple-choice response, whereas those outputs accounted for only 25% of the SBAC assessment.


My dissertation will examine the opportunities students have to learn math online. In order to say something about the current state of the art, I decided to complete Khan Academy’s eighth grade year and ask myself two specific questions about every exercise:

  • What am I asked to do? What are my verbs? Am I asked to solve, evaluate, calculate, analyze, or something else?
  • What do I produce? What is the end result of my work? Is my work summarized by a number, a multiple-choice response, a graph that I create, or something else?

I examined Khan Academy for several reasons. First, because they’re well-capitalized and they employ some of the best computer engineers in the world. They have the human resources to create some novel opportunities for students to learn math online. If they struggle, it is likely that other companies with equal or lesser human resources struggle also. I also examined Khan Academy because their exercise sets are publicly available online, without a login. This will energize our discussion here and make it easier for you to spotcheck my analysis.

My data collection took me three days and spanned 88 practice sets. You’re welcome to examine my data and critique my coding. In general, Khan Academy practice sets ask that you complete a certain number of exercises in a row before you’re allowed to move on. (Five, in most cases.) These exercises are randomly selected from a pool of item types. Different item types ask for different student work. Some item types ask for multiple kinds of student work. All of this is to say, you might conduct this exact same analysis and walk away with slightly different findings. I’ll present only the findings that I suspect will generalize.

After completing my analysis of Khan Academy’s exercises, I performed the same analysis on a set of 24 released questions from the Smarter Balanced Assessment Consortium’s test that will be administered this school year in 17 states.

Findings & Discussion

Khan Academy’s Verbs


The largest casualty is argumentation. Out of the 402 exercises I completed, I could code only three of their prompts as “argue.” (You can find all them in “Pythagorean Theorem Proofs.”) This is far out of alignment with the Common Core State Standards, which has prioritized constructing and critiquing arguments as one of its eight practice standards that cross all of K-12 mathematics.


Notably, 40% of Khan Academy’s eighth-grade exercises ask students to “calculate” or “solve.” These are important mathematical actions, certainly. But as with “argumentation,” I’ll demonstrate later that this emphasis is out of alignment with current national expectations for student math learning.

The most technologically advanced items were the 20% of Khan Academy’s exercises that asked students to “construct” an object. In these items, students were asked to create lines, tables, scatterplots, polygons, angles, and other mathematical structures using novel digital tools. Subjectively, these items were a welcome reprieve from the frequent calculating and solving, nearly all of which I performed with either my computer’s calculator or with Wolfram Alpha. (Also subjective: my favorite exercise asked me to construct a line.) These items also appeared frequently in the Geometry strand where students were asked to transform polygons.


I was interested to find that the most common student action in Khan Academy’s eighth-grade year is “analyze.” Several examples follow.


Khan Academy’s Productions

These questions of analysis are welcome but the end result of analysis can take many forms. If you think about instances in your life when you were asked to analyze, you might recall reports you’ve written or verbal summaries you’ve delivered. In Khan Academy, 92% of the analysis questions ended in a multiple-choice response. These multiple-choice items took different forms. In some cases, you could make only one choice. In others, you could make multiple choices. Regardless, we should ask ourselves if such structured responses are the most appropriate assessment of a student’s power of analysis.

Broadening our focus from the “analysis” items to the entire set of exercises reveals that 74% of the work students do in the eighth grade of Khan Academy results in either a number or a multiple-choice response. No other pair of outcomes comes close.


Perhaps the biggest loss here is the fact that I constructed an equation exactly three times throughout my eighth grade year in Khan Academy. Here is one:


This is troubling. In the sixth grade, students studying the Common Core State Standards make the transition from “Number and Operations” to “Expressions and Equations.” By ninth grade, the CCSS will ask those students to use equations in earnest, particularly in the Algebra, Functions, and Modeling domains. Students need preparation solving equations, of course, but if they haven’t spent ample time constructing equations also, those advanced domains will be inaccessible.

Smarter Balanced Verbs

The Smarter Balanced released items ask comparatively fewer “calculate” and “solve” items (they’re the least common verbs, in fact) and comparatively more “construct,” “analyze,” and “argue.”


This lack of alignment is troubling. If one of Khan Academy’s goals is to prepare students for success in Common Core mathematics, they’re emphasizing the wrong set of skills.

Smarter Balanced Productions

Multiple-choice responses are also common in the Smarter Balanced assessment but the distribution of item types is broader. Students are asked to produce lots of different mathematical outputs including number lines, non-linear function graphs, probability spinners, corrections of student work, and other productions students won’t have seen in their work in Khan Academy.


SBAC also allows for the production of free-response text while Khan Academy doesn’t. When SBAC asks students to “argue,” in a majority of cases, students express their answer by just writing an argument.


This is quite unlike Khan Academy’s three “argue” prompts which produced either a) a multiple-choice response or b) the re-arrangement of the statements and reasons in a pre-filled two-column proof.

Limitations & Future Directions & Conclusion

This brief analysis has revealed that Khan Academy students are doing two primary kinds of work (analysis and calculating) and they’re expressing that work in two primary ways (as multiple-choice responses and as numbers). Meanwhile, the SBAC assessment of the CCSS emphasizes a different set of work and asks for more diverse expression of that work.

This is an important finding, if somewhat blunt. A much more comprehensive item analysis would be necessary to determine the nuanced and important differences between two problems that this analysis codes identically. Two separate “solving” problems that result in “a number,” for example, might be of very different value to a student depending on the equations being solved and whether or not a context was involved. This analysis is blind to those differences.

We should wonder why Khan Academy emphasizes this particular work. I have no inside knowledge of Khan Academy’s operations or vision. It’s possible this kind of work is a perfect realization of their vision for math education. Perhaps they are doing exactly what they set out to do.

I find it more likely that Khan Academy’s exercise set draws an accurate map of the strengths and weaknesses of education technology in 2014. Khan Academy asks students to solve and calculate so frequently, not because those are the mathematical actions mathematicians and math teachers value most, but because those problems are easy to assign with a computer in 2014. Khan Academy asks students to submit their work as a number or a multiple-choice response, not because those are the mathematical outputs mathematicians and math teachers value most, but because numbers and multiple-choice responses are easy for computers to grade in 2014.

This makes the limitations of Khan Academy’s exercises understandable but not excusable. Khan Academy is falling short of the goal of preparing students for success on assessments of the CCSS, but that’s setting the bar low. There are arguably other, more important goals than success on a standardized test. We’d like students to enjoy math class, to become flexible thinkers and capable future workers, to develop healthy conceptions of themselves as learners, and to look ahead to their next year of math class with something other than dread. Will instruction composed principally of selecting from multiple-choice responses and filling numbers into blanks achieve that goal? If your answer is no, as is mine, if that narrative sounds exceedingly grim to you also, it is up to you and me to pose a compelling counter-narrative for online math education, and then re-pose it over and over again.

A Response To The Founder Of Mathspace On The Costs And Benefits Of Adaptive Math Software

Mo Jebara, the founder of Mathspace, has responded to my concerns about adaptive math software in general and his in particular. Feel free to read his entire comment. I believe he has articulated several misconceptions about math education and about feedback that are prevalent in his field. I’ll excerpt those misconceptions and respond below.

Computer & Mouse v. Paper & Pencil


Just like learning Math requires persistence and struggle, so too is learning a new interface.

I think Mathspace has made a poor business decision to blame their user (the daughter of an earlier commenter) for misunderstanding their user interface. Business isn’t my business, though. I’ll note instead that adaptive math software here again requires students to learn a new language (computers) before they find out if they’re able to speak the language they’re trying to learn (math).

For example, here is a tutorial screen from software developed by Kenneth Tilton, a frequent commenter here who has requested feedback on his designs:


Writing that same expression with paper and pencil instead is more intuitive by an order of magnitude. Paper and pencil is an interface that is omnipresent and easily learned, one that costs a bare fraction of the computer Mathspace’s interface requires, one that never needs to be plugged into a wall.

None of this means we should reject adaptive math software, especially not Mathspace, the interface of which allows handwriting. But these user interface issues pile high in the “cost” column, which means the software cannot skimp on the benefits.

Misunderstanding the Status Quo


Does a teacher have time to sit side by side with 30 students in a classroom for every math question they attempt?


But teachers can’t watch while every student completes 10,000 lines of Math on their way to failing Algebra.


I talk to teachers every single day and they are crying out for [instant feedback software].

Existing classroom practice has its own cost and benefit columns and Jebara makes the case that classroom costs are exorbitant.

Without adaptive feedback software, to hear Jebara tell it, students are wandering in the dark from problem to problem, completely uncertain if they’re doing anything right. Teachers are beleaguered and unsure how they’ll manage to review every student’s work on every assigned problem. Thirty different students will reveal thirty unique misconceptions for each one of thirty problems. That’s 27,000 unique responses teachers have to make in a 45 minute period. That’s ten responses per second! No wonder all these teachers are crying.

This is all Dickens-level bleak and misunderstands, I believe, the possible sources of feedback in a classroom.

There is the textbook’s answer key, of course. Some teachers make regular practice of posting all the answers in advance of an exercise set, also, so students have a sense that they’re heading in the right direction and focus on process not product.

Commenter Matt Bury also notes that a student’s classmates are a useful source of feedback. Since I recommended Classkick last week, several readers have tried it out in their classes. Amy Roediger writes about the feature that allows students to help other students:

… the best part was how my students embraced collaborating with each other. As the problems got progressively more challenging, they became more and more willing to pitch in and help each other.

All of these forms of feedback exist within their own webs of costs and benefits too, but the idea that without adaptive math software the teacher is the only source of feedback just isn’t accurate.

Immediate v. Delayed Feedback

Most companies in this space make the same set of assumptions:

  1. Any feedback is better than no feedback.
  2. Immediate feedback is better than delayed feedback.

Tilton has written here, “Feedback a day later is not feedback. Feedback is immediate.”

In fact, Kluger & DeNisi found in their meta-analysis of feedback interventions that feedback reduced performance in more than one third of studies. What evidence do we have that adaptive math software vendors offer students the right kind of feedback?

The immediate kind of feedback isn’t without complication either. With immediate feedback, we may find students trying answer after answer, looking for the red x change to a green check mark, learning little more than systematic guessing.

Immediate feedback risks underdeveloping a student’s own answer-checking capabilities also. If I get 37 as my answer to 14 + 22, immediate feedback doesn’t give me any time to reflect on my knowledge that the sum of two even numbers is always even and make the correction myself. Along those lines, Cope and Simmons found that restricting feedback in a Logo-style environment led to better discussions and higher-level problem-solving strategies.

What Computers Do To Interesting Exercises


Can you imagine a teacher trying to provide feedback on 30 hand-drawn probability trees on their iPad in Classkick?


Can you imagine a teacher trying to provide feedback on 30 responses for a Geometric reasoning problem letting students know where they haven’t shown enough of a proof?

I can’t imagine it, but not because that’s too much grading. I can’t imagine assigning those problems because I don’t think they’re worth a class’ limited time and I don’t think they do justice to the interesting concepts they represent.

Bluntly, they’re boring. They’re boring, but that isn’t because the team at Mathspace is unimaginative or hates fun or anything. They’re boring because a) computers have a difficult time assessing interesting problems, and b) interesting problems are expensive to create.

Please don’t think I mean “interesting” week-long project-based units or something. (The costs there are enormous also.) I mean interesting exercises:

Pick any candy that has multiple colors. Now pick two candies from its bag. Create a probability tree for the candies you see in front of you. Now trade your tree with five students. Guess what candy their tree represents and then compute their probabilities.

The students are working five exercises there. But you won’t find that exercise or exercises like it on Mathspace or any other adaptive math platform for a very long time because a) they’re very hard to assess algorithmically and b) they’re more expensive to create than the kind of problem Jebara has shown us above.

I’m thinking Classkick’s student-sharing feature could be very helpful here, though.



So why don’t we try and automate the parts that can be automated and build great tools like Classkick to deal with the parts that can’t be automated?

My answer is pretty boring:

Because the costs outweigh the benefits.

In 2014, the benefits of that automation (students can find out instantly if they’re right or wrong) are dwarfed by the costs (see above).

That said, I can envision a future in which I use Mathspace, or some other adaptive math software. Better technology will resolve some of the problems I have outlined here. Judicious teacher use will resolve others. Math practice is important.

My concerns are with the 2014 implementations of the idea of adaptive math software and not with the idea itself. So I’m glad that Jebara and his team are tinkering at the edges of what’s possible with those ideas and willing, also, to debate them with this community of math educators.

Featured Comment

Mercy – all of them. Just read the thread if you want to be smarter.

The Scary Side Of Immediate Feedback

Mathspace is a startup that offers both handwriting recognition and immediate feedback on math exercises. Their handwriting recognition is extremely impressive but their immediate feedback just scares me.

My fear isn’t restricted to Mathspace, of course, which is only one website offering immediate feedback out of many. But Mathspace hosts a demo video on their homepage and I think you should watch it. Then you can come back and tell me my fears are unfounded or tell me how we’re going to fix this.

Here’s the problem in three frames.

First, the student solves the equation and finds x = -48. Mathspace gives the student immediate feedback that her answer is wrong.


The student then changes the sign with Mathspace’s scribble move.


Mathspace then gives the student immediate feedback that her answer is now right.


The student thinks she knows how to solve equations. The teacher’s dashboard says the student knows how to solve equations. But quiz the student just a little bit – as Erlwanger did a student named Benny under similar circumstances forty years ago – and you see just how superficial her knowledge of solving equations really is. She might just be swapping signs because that’s why her answers have been wrong in the past.

Everyone walks away feeling like a winner but everyone is losing and no one knows it. That’s the scary side of immediate feedback.

One possible solution.

When a student pulls a scribble move like that, throw a quick text input that asks, “Why did you change your answer?” The student who is just guessing will say something like, “Because it told me I was right.” Send that text along to the teacher to review. The solution is data that can’t be autograded, data that can’t receive immediate feedback, but better data just the same.

Related Awesome Quote

If you can both listen to children and accept their answers not as things to just be judged right or wrong but as pieces of information which may reveal what the child is thinking you will have taken a giant step towards becoming a master teacher rather than merely a disseminator of information.

JA Easley, Jr. & RE Zwoyer

Featured Comment

Justin Lanier:

I would want to emphasize that the issue is that Mathspace (and tech folks generally) tries to give immediate, “personalized” feedback in a fast, slick, cheap, low/no-labor kind of way. And, not surprising, ends up giving crappy feedback.

Daniel Tu-Hoa, a senior vice president at Mathspace responds:

[T]eachers can see every step a student writes, so they can, as you suggest, then go and ask the student: “why did you change your answer here?” For us, technology isn’t intended to replace the teacher, but to empower teachers by giving them access to better information to inform their teaching.

2014 Sep 4. I’ve illustrated here a false positive – the adaptive system incorrectly thinks the student understands mathematics. Fawn Nguyen illustrates another side of bad feedback: false negatives.

A Better Definition Of “Personalization”

David Wiley:

For me, personalization comes down to being interesting. You have successfully personalized learning when a learner finds it genuinely interesting. Providing me with an adaptive, customized pathway through educational materials that bore me out of my mind is not personalized learning. It may be better than forcing me through the same pathway that everyone else takes, but I wouldn’t call it personalized.

Held to that standard, most groups that are attempting to personalize learning through software are pretty screwed.

Jai Mehta:

But what I can tell you from visits to blended classrooms and schools, in both traditional public and charter schools, is that students tend to find what exists thus far as fairly dull, lacking both the community and the accountability that comes with good face to face learning. A number of students told us at one highly celebrated blended school that they liked everything about the school except for the online learning!

That last link via Justin Reich, who confidently predicts the results from the 2017 Khan Academy study.

Featured Comment

Jane Taylor:

Another aspect of personalization is the relationship between student and teacher, and I found that as blended learning decreased the amount of face to face whole class instruction in my class last year, I didn’t get to know my students as well and as quickly as I had in the past. When I know my students and find out what “works”, what engages, each particular group of students, as well what works for individual students, then my classroom can better meet individual needs, not just in the way I teach math, but in the way I encourage students to manage their time, to grow in their work ethic and study habits, to overcome math anxiety, and many other things. Whole class interaction is a lot of fun for me and, I believe, for students. Resources, such as videos, are great for motivated students to review or move ahead, and I will continue to provide them, but I am returning to primarily whole class instruction this year.

Personalized Learning Software: Fun Like Choosing Your Own Ad Experience


After last week’s post knocking around “personalized learning”, Michael Feldstein argued that the term is too ambiguous to be useful:

All learning is personalized in virtue of the fact that it is accomplished by a person for him or herself. This may seem like a pedantic point, but if the whole point of creating the term is to focus on fitting the education to the student rather than the other way around, then it’s important to be clear about agency. What we really want to talk about, I think, is “personalized education” or, more specifically, “personalized instruction.”

Mike Caulfield described the value of structured discussion and how current personalized learning technologies undermine it:

… if there is one thing that almost all disciplines benefit from, it’s structured discussion. It gets us out of our own head, pushes us to understand ideas better. It teaches us to talk like geologists, or mathematicians, or philosophers; over time that leads to us thinking like geologists, mathematicians, and philosophers. Structured discussion is how we externalize thought so that we can tinker with it, refactor it, and re-absorb it better than it was before.

Is personalization orthogonal to structured discussion? That’s debatable, I suppose.

In practice, do the current forms of personalization in vogue (see, for instance, Rocketship) undermine the ability of a skilled teacher to run productive structured discussions?

Absolutely. Not a doubt in my mind.

Alex Hernandez claimed I set up a false choice between personalized learning paths and structured discussion:

Students can engage in personalized learning for a portion of the day and spend the rest of their time in rich learning activities that only teachers can provide. The bet here is that if students can drive their development of background knowledge, teachers can “trade up” and focus their energies on challenging tasks and compelling experiences.

Kevin Hall, one of the most useful foils I have at this blog, described a particular form of personalization:

Different groups could do the task with the same or isomorphic data sets in different contexts: sports, movies, etc. [..] My guess is ed tech will have us to this point relatively soon, don’t you think?

I just finished reading Daniel Willingham’s Why Students Don’t Like School, a challenging and affirming read at different times, and he takes a very dim view of this kind of personalization:

Trying to make the material relevant to students’ interests doesn’t work. As I noted in Chapter One, content is seldom the decisive factor in whether or not our interest is maintained.

I left comments in response to Michael Feldstein, Alex Hernandez, and Kevin Hall, in which I elaborate on the title of this post.

And Benjamin Riley, after starting this whole fire, tossed on another can of kerosene.