What We Can Learn About Learning From Khan Academy’s Source Code, Ctd.

I’m used to seeing pedagogy manifest itself in lesson plans and classroom observations and curriculum and videos. It’s interesting, now, to see pedagogical decisions manifest themselves in web design and code also. For example, here’s some Javascript from Khan Academy’s box-and-whisker plot exercises.

Head over to the exercise. Complete a couple. What pedagogical mistake has Khan Academy made in the highlighted lines? How would you fix it?

Don’t get put off by the code. If you’ve taught box-and-whisker plots, you can sort out the issue here.

[via Travis Olson]

Brian lands it:

This code will always generate 15 data points, and these points will not have any outliers (outside 1.5 * (Q3 – Q1)), so students can just pattern match and drag the lines to the 1st, 4th, 8th, 12th, and 15th places once they’ve sorted the data. It’s kind of fun the first time.

Dan Anderson piles on:

Agree with Brian. Always 15 data points? Never have to deal with “having two medians”? Ever? The data is between 0 and 15 (never -40 to -30, never 100 to 1000, never 0.80 to 1.15)? No outliers? Always starting with the data and making a box-and-whisker, never using the box-and-whisker to make conclusions?

Peter Franza picks on a different issue:

I think the largest error is the reliance on random numbers to provide a set of assessments that test an actual set of knowledge.

Random number generators are great for creating a large set of problems that are all basically the same, but in my experience you can provide better assessments/examples with a much smaller set of questions that are designed to illustrate the concept.

Others have danced around it, but the fundamental flaw (as in some, but not all, Khan exercises) is that you get

THE SAME QUESTION

seven straight times, without any change in structure or difficulty, even though the underlying task has a huge variation in structure and difficulty.

Ben Alpert responds from Khan Academy:

I’ve updated the exercise so that it now includes anywhere from 8 to 15 points, so students are forced to deal with two middle numbers, both in finding the median and in finding the quartiles.

I'm Dan and this is my blog. I'm a former high school math teacher and current head of teaching at Desmos. He / him. More here.

1. Mr K

July 16, 2013 - 4:19 pm -

But the standardized tests all have sample sizes that are 3 mod(4), don’t they?

(The real WTF is below, where the calculation of the quartiles is dependent on the magic number 15. You could change this line, ad everything after it would go to hell.)

(Or I could be dense, and missing some other, more obvious problem)

2. Dave R

July 16, 2013 - 5:12 pm -

The code only generates data sets that are approximately normal. I would include some examples of skewed distributions. (Better yet, I would omit the topic entirely. What’s so important about box-and-whisker plots?)

3. Brian

July 16, 2013 - 9:11 pm -

I agree with Dave R and danny. I grew up in Wisconsin, so drawing box-and-whisker plots is one of those oddball California standards whose importance escapes me. Histograms, pie charts–I mean, circle graphs–and box-and-whisker plots should be left to the computer to draw. (Or have the kids write the program to do it.) CCSS says merely that students have to “display” these graphs, so I hope we can start interpreting that as using technology and working with data sets of more than 15 points.

This code will always generate 15 data points, and these points will not have any outliers (outside 1.5 * (Q3 – Q1)), so students can just pattern match and drag the lines to the 1st, 4th, 8th, 12th, and 15th places once they’ve sorted the data. It’s kind of fun the first time.

4. Dan Anderson

July 17, 2013 - 2:26 am -

Agree with Brian. Always 15 data points? Never have to deal with “having two medians”? Ever? The data is between 0 and 15 (never -40 to -30, never 100 to 1000, never 0.80 to 1.15)? No outliers? Always starting with the data and making a box-and-whisker, never using the box-and-whisker to make conclusions?

Some of these are certainly much more difficult to code for a genuine assessment. Does that mean that we don’t try?

This activity is sort of like teaching a kid how to service the chain of a fixie bike and then calling them an expert car mechanic.

5. Nathan Kraft

July 17, 2013 - 4:23 am -

The only thing I can add (which has not already been stated) is that there is no option to include the median when finding the lower and upper quartiles. If you do this, Khan tells you that you’re wrong. (Although, I’m not sure there is a right way to do this. Always seemed to be a matter of personal preference.)

6. Mark James

July 17, 2013 - 5:14 am -

I agree with all that has been said. I must admit that I did three problems and didn’t even pick up on the fact that each data set contained exactly 15 elements. What I DID notice, as someone who has taught box-and-whisker plots for 13 years (primarily to struggling learners) is that the process is much less challenging with an odd number of elements – and especially with an odd number like 15, 19, 23, etc., since there will always be a single “middle number” for each quartile. The standardized test that my students need to pass (the Pennsylvania Algebra I “Keystone” Exam, scores of which will be tied to my evaluation, starting next year) would never present my students with such a “softball” question.

7. Peter Franza

July 17, 2013 - 6:05 am -

I think the largest error is the reliance on random numbers to provide a set of assessments that test an actual set of knowledge.

Random number generators are great for creating a large set of problems that are all basically the same, but in my experience you can provide better assessments/examples with a much smaller set of questions that are designed to illustrate the concept.

For example, in the dataset above where are the outliers, gaussian distributions are normal, where would the student see examples of non-normal distributions? What about data sets containing an even number of data points?

We’ve experimented with randomly generated problems but you have a hard time guaranteeing that each assessment contains the breadth of knowledge and difficulty to ascertain mastery level.

8. Dean Ober

July 17, 2013 - 7:06 am -

I have no idea how to solve this.
I am not a teacher of math and have zero experience with Java code.
I follow your blog for the very same reason you posted this post, and frankly, I appreciate your brilliance in the matter…in your pedagogical-questioning you are uncovering a pedagogy that is deep inside most of us, and subsequently embodying that deeper pedagogy in this very example of teaching pedagogy! Thank you for your exploration, and I truly look forward to how this journey twists and turns.

9. Mr K

July 17, 2013 - 7:17 am -

Grrr.

After a night’s sleep on this, I have a larger bone to pick:

The way we teach statistics at the middle school level, in general, is all messed up. We teach one specific algorithm for finding quartiles, when in fact it is a fuzzy measure, with a variety of methods of determining it. For crappy data sets, the different methods may yield wildly different answers. For well distributed data sets, those differences become moot.

We spend all of our time on the mechanics of one of those methods (and even then, it’s not the one that gives proper hinges) rather than discussions of when data sets are appropriate or not.

At least it’s not as messed up as our teaching of mode.

10. JamesN

July 17, 2013 - 7:32 am -

Sure always using 15 numbers, and using a normal-esque distribution, and making sure all the answers are whole numbers undermine the effectiveness of this exercise in creating mastery.

However, I don’t think the answer is changing the source code for this exercise – all of those features also make it great for starting off. A better answer would be to create a second “level 2” exercise that breaks those molds. Or, better yet, a whole series of levels incrementally more difficult than the last. And I think, over time, multiple levels of difficulty will be developed. Khan is in its infancy, and I think that the fact that this code is openly available is the greatest possible invitation to improve it and build upon it.

11. Michael

July 17, 2013 - 7:47 am -

Others have danced around it, but the fundamental flaw (as in some, but not all, Khan exercises) is that you get

THE SAME QUESTION

seven straight times, without any change in structure or difficulty, even though the underlying task has a huge variation in structure and difficulty.

Also, pedagogically – if the first thing I decide to plot is the median, I’m out of luck – I have to move the endpoints and the quartiles first, to get them out of the way.

But that’s not a problem in the highlighted code.

Aside, to Brian: I agree that a single box-and-whisker plot is rarely helpful. But when you use several to compare a data set across several populations (boys vs. girls, students by grade, etc.) they are a marvelous way of very quickly seeing differences between the populations. Unfortunately, until they’re in an official Statistics class, most students don’t get to compare boxplots to each other, so their impression is that it’s just another dumb have-to-do-this-for-math-class task.

12. Lucy

July 17, 2013 - 9:51 am -

Now, I don’t know code, but when I was completing the exercises, I kept re-counting the amount because I couldn’t believe they would keep it at 15. Surely, it must get more difficult, I thought. Surely, they must present a challenge some how. I guess not.

To Michael – I agree the comparative nature of box plots is actually pretty helpful! Actually, after looking at this exercise and reading comments, I’m feeling pretty good about how I teach this. We focus almost completely on comparing data sets and using “messy” data.

To Brian – I used to think that these were silly as well. I never see them. Until this year, I found a Forbes article on box and whisker plots! Now, one article doesn’t mean they are super important, but it at least gave me some hope that there are people out there who use them, and so many in fact that an article teaching adults how to use them was written.

13. Dan Meyer

July 17, 2013 - 10:40 am -

You folks don’t disappoint. I’ve promoted some ideas from Brian, Peter Franza, Dan Anderson, and Michael to the main post.

JamesN:

And I think, over time, multiple levels of difficulty will be developed. Khan is in its infancy, and I think that the fact that this code is openly available is the greatest possible invitation to improve it and build upon it.

Khan is kind of a greybeard in the startup world at this point. They aren’t new. They have loads of funding. And kids are mandated to use their curriculum in massive numbers. I don’t think this is the time to pull punches or soften our critiques or wait for them to figure pedagogy out on their own.

14. Brian

July 17, 2013 - 10:47 am -

Box and whisker plots are useful and appear all over scientific literature. It’s the drawing them by hand from tiny isolated fake data sets that doesn’t make sense to me. The exercise that Khan Academy presents here is as far as middle school math goes, standards-wise.

Michael is right that their main value is as a quick visual comparison of multiple data sets. But that use is not what California middle school texts do and it is not what the Common Core seems to call for, either. In 6th grade, students are to “display” box plots but not interpret or compare them. So any test of this standard (6.SP.4) is going to be something lame like this about constructing a single plot instead of reading multiple ones side by side. (Or a trivia question about how to use a nationally adopted calculator brand to do it.) Constructing a box plot is a tedious clerical exercise with a bunch of steps and rules that have to be memorized, but then to have to do that when you’re sitting in front of a computer seems really cruel and twisted.

Maybe we should be happy that Khan Academy has found a way of letting students and teachers check off this standard through an easily imitated pattern…

15. Matt Leiss

July 18, 2013 - 8:56 am -

The more I am finding my way into this online math educator community the more I am loving it! The critiques of the Khan stuff and textbook activity is REALLY valuable, both as a way to show what can and should be improved by teachers (and publishers and online education organizations) and as a way of showing what these things can do.

I think Brian has a good point about finding a simple, relatively painless way to fulfill some aspects of the curriculum that are not required or set up to be in depth “good” math. I could be wrong here (and there is no need to take an entirely charitable view of these tasks or shallow curriculum requirements) but looking at it as something that should be introduced to students a year or two before looking at it in more depth is a way of doing that spiraling, non-linear education thing. It’s also a way of putting different kinds and contexts of math in view of students. Good teaching would be to provide students with some opportunity to explore these concepts more if they are excited about them and to make sure they know this is the shallow end of a large pool attached to an ocean.

I don’t know about the US curriculum or the CC standards (apologies for my new teacher ignorance; I’m only a year out of my B.Ed and an in Ontario and am trying to get myself totally up to speed on some of the HS requirements prior to upgrading my certificate this fall), but where I am teachers have a fair bit of flexibility in what is assessed for grades. All the curriculum requirements have to be assessed somehow, but not all are graded. I’m under the impression that it’s a bad idea to mix quantitative and qualitative feedback too.

Perhaps for a teacher, especially at a younger level, some of these topics would present a good opportunity to show students that sites like Khan exist (so they can learn to use them if and when they need/want/have to) and have them do one of these activities with some kind of assessment not only of how they did it but also of how they liked learning something through a website, whether they thought it was effective, etc. I guess what I’m saying is maybe we can turn some possible “flaws” in both the curriculum and free online education stuff into an opportunity to have students give feedback to us! This kind of thing isn’t as possible if we need to get students to a test ready mastery of a deep concept or help them create a project to show how well they can apply math concepts.

16. Rebecca Phillips

July 18, 2013 - 10:07 am -

FWIW, Dan, Michael’s issue with THE SAME QUESTION is exactly my (biggest) problem with Pearson.

17. Kate Nowak

July 20, 2013 - 6:26 am -

To push back on the question in the title of the post a bit, there’s not much we can learn about learning from how KA codes its exercise. They do such a crappy, ham-fisted, simplistic job, about all we can learn is how to colossally waste students’ time.

18. Bowen Kerins

July 22, 2013 - 12:02 pm -

The mistake is that box and whisker plots were created for, and are intended to be used for, comparison between multiple data sets. A box-and-whisker plot of a single data set tells you jack.

19. Ben Alpert

July 23, 2013 - 4:10 pm -

Hi, Ben from Khan Academy here.

Dan, thanks for pointing out that this exercise wasn’t what it could be — your feedback here is valuable.

I’ve updated the exercise so that it now includes anywhere from 8 to 15 points, so students are forced to deal with two middle numbers, both in finding the median and in finding the quartiles.

Michael, thanks for your feedback about the user interface. I’ve improved it so that when dragging a point, any other points that are “in the way” will shift over such that it isn’t necessary to drag the points in any particular order.

Both changes are now visible when you visit the exercise:

Brian’s comment about the Common Core standards mentioning only the construction of box plots is astute — if we go a step further and look at the 6-8 Statistics and Probability CC progression doc (available at http://commoncoretools.me/wp-content/uploads/2011/12/ccss_progression_sp_68_2011_12_26_bis.pdf) then we find that the authors of the standards in fact also recommend that in 6th grade, students are able to compare two distributions by using box plots. I agree that comparing box plots is much more valuable, so in the future, we’ll likely add an exercise for comparing plots.

We haven’t included outliers in the exercise because there exists a variety of different ways to draw box plots. Indeed, the progressions doc linked above says, “Because of the different methods for computing quartiles and other different conventions, there are different kinds of box plots in use. Box plots created from the five-number summary do not show points detached from the remainder of the diagram.” With this exercise, we’ve intentionally chosen to provide random numbers generated from a normal distribution so that students can focus on the important parts of a box plot and don’t need to worry about the differing ways to plot outliers.

In the last few months, we’ve hired many seasoned content creators to work with us on covering all of the the Common Core standards accurately. So far, they’ve created thousands of new handwritten questions that focus more on conceptual understanding to complement the machine-generated exercises we already have.

Finally, I just want to mention again that our exercise framework is open-source and we’re open to contributions; please submit a pull request at https://github.com/Khan/khan-exercises if you find more possible improvements to our exercises — I’ll be happy to take a look at your change.

Ben

20. Dan Meyer

July 25, 2013 - 11:03 am -

In the last few months, we’ve hired many seasoned content creators to work with us on covering all of the the Common Core standards accurately. So far, they’ve created thousands of new handwritten questions that focus more on conceptual understanding to complement the machine-generated exercises we already have.

Can you tell us how this team of content experts helps improve the videos and exercises on Khan Academy, given that those comprise a student’s primary experience with Khan Academy?

21. Dan Meyer

July 25, 2013 - 5:39 pm -

A response via email from Ben Alpert:

All these new problems are going either into new exercises that we’re creating or into existing exercises that we’re improving. Here are a few examples of the new exercises:

So far we’ve added around 50 new exercises like this and are adding more every week. We’re working through each of the CC standards to make sure that we cover them accurately — so far we’ve created exercises for the major clusters (defined here) in the 6th grade standards. We’re quickly working to cover the major clusters for other grades and then the remaining standards after that.

As I mentioned on Twitter, we feel that the CC standards are excellent, primarily due to their focus on conceptual understanding over rote memorization. We’re in talks with Student Achievement Partners (creators of CC) to have them review our content to make sure that it matches their original intent.

Finally, Sal is currently making new videos which complement all of our new exercises, so that we can ensure that Sal is explaining the same concepts that the standards cover.

Hope that gives you an idea.

22. Brian

July 25, 2013 - 7:29 pm -

It’s great to see Khan Academy’s responsiveness.

However, Ben explains that they “haven’t included outliers in the exercise because there exists a variety of different ways to draw box plots.”

I am not directing this to Ben particularly, but isn’t the real point of teaching box plots so that students can *read* them? And aren’t they going to have to read all the varieties in use? Drawing box plots by hand is not a CC standard, and drawing them using the Khan Academy widget is not something they will ever see again. But someone came up with a way to simulate drawing box plots with no outliers, so now it’s part of the program and teachers are going to have to teach students how to follow the steps to draw a box plot.

This is at the expense of spending time on what the consensus of math teachers on this page seems to think is important: interpreting and comparing box plots of real-world data. Presumably, from real sources with slightly different formatting standards. Because the standard doesn’t mention outliers specifically or comparing two box plots, does that mean that Khan Academy or anybody else aiming for CCSS alignment is going to skip those things? Isn’t this the kind of shallow, broad “coverage” approach with lots of rules that we are supposedly leaving behind? Why are the mechanics of sorting numbers and calculating medians being taught, drilled and tested under the name of box plots instead of the real thing? Since box plots are the only part of the 6th grade standards that prompt a discussion of what an outlier is and a precise definition, it’s not something to gloss over lightly.

Again, I don’t mean to pick on Ben and Khan too specifically–I worked with a 7th grade textbook that took the same approach, and I know they’re working from what the current texts do. I think the box plot page is going to be the page I check out first from now on when I’m trying to get a feel for a new math text!

23. Mike

July 26, 2013 - 6:55 am -

As a 6th grade teacher, I’m actually happy that Dan linked to this activity, even if it mostly incurred derision. (I’m also happy that feedback from teachers responding to this blog already helped improve it.)

Every concept has to have a beginning. This coming year, that activity won’t be the first exposure to box-and-whisker plots for my students. It also won’t be the last. It does, however, provide a very good structured practice for creating box-and-whisker plots which doesn’t rely on a student being able hold a ruler and draw perfect lines. It also provides the instant feedback of “I got this correct” or “I need to try something else” which I cannot provide immediately to 20+ students at the same time.

Will I go on to teach students how to compare different data sets? Of course I will. Will I find real world examples of these graphs in scientific or medical literature and have students analyze them? Yes. Will I rely on Khan Academy to teach my students everything? Of course not.

I’m not a KA fanboy. I don’t sign my students up each year and require them to complete X activities each week, etc. I do, however, direct students to specific videos occasionally or ask them to try out certain activities. KA can be a useful tool. In this case, while there is room for improvement, the KA box-and-whisker tool is far more versatile and clean-looking than the one I’ve been using on NLVM. It is excellent for BEGINNER practice on identifying landmarks in a set of data and locating how they fit onto a Box Plot. I intend to use it as a small group station activity while I run a more engaging small group activity that approaches this from a different angle. For that, there is absolutely nothing wrong with it. As with everything, it’s all in how it’s used.

24. l hodge

July 26, 2013 - 2:13 pm -

The inequality problems still seem to be the same question asked a bunch of times. Maybe involve a number line in some of the questions?

The expression problems are ok. Especially like: __ ( __ x + 7q) is equivalent to 10x + 14q.

The function questions need a lot of editing. What is the end behavior for i) energy stored in a spring as a function of displacement ii) water left in a draining bucket as a function of time iii) force needed to keep an object on a carousel as a function of rotation speed (correct answer – as carousel rotates faster & faster there is no LIMIT to the amount of force needed). Yikes!

The “fix” to the box plot problem makes the problem even more procedural. Students now have to deal with a technicality for defining quartiles that really is not very important.

25. Christopher

August 2, 2013 - 9:08 am -

Now can we do something about the decimals up in there?

See, Ben writes that going either into new exercises that we’re creating or into existing exercises that we’re improving and that Sal is currently making new videos which complement all of our new exercises, so that we can ensure that Sal is explaining the same concepts that the standards cover.

In some content areas, the foundation of this process is broken. Dan got one of them fixed through the conversation here. Would Khan Academy consider working on decimals next? By the video Sal Khan made the other day, it seems you have been working on them quite recently. Let’s get it right while you’re there.

26. Kevin Hall

August 2, 2013 - 10:27 am -

I only know what I read on their blog, etc, but my sense is that their strategy is just to seed their page with enough content to get lots of users, and then open content creation up to the masses so ANYONE can create materials (for example, Christopher could make them himself). Then the system will use data mining to figure out which ones are most effective. You can see KA’s lead dev describe this a little bit in this talk, starting at the 40:50 mark:

http://player

So maybe KA is actually ASSUMING that they don’t have the ability to create the best learning experiences, and therefore that they should focus now on building the infrastructure to allow everyone else to make content and upload it so it can be tested.

http://player

I say this as a teacher who stopped using KA last year because it was too hard for students to earn proficiencies. But I think KA is working on lots of these problems, and those infrastructure problems are frankly bigger than the issues with decimals or other content areas, because without better infrastructure, you still wouldn’t be able to do much even with better content.

27. Kevin Hall

August 2, 2013 - 5:43 pm -

Guess I should add, in fairness to KA, that from what I’ve read online, they are making some major improvements to the system this summer, including the problem I mentioned above (it being too hard to earn a proficiency). So I would still recommend others keeping their eyes peeled for whatever changes they roll out at the end of this summer.