Contest: Do You Know Blue?


a/k/a A Netflix Prize for K-12 Math Students
a/k/a Let Dave Major, Evan Weinberg, and Me Buy Your Class A Pizza Party

Can you teach a computer to recognize the color “blue”? Head to Do You Know Blue? and find out. If you do the best job teaching the computer, we’ll send your class a pizza party in appreciation.

Enter the contest as many times as you want. Come back and check out your standing at this page.

You have until Monday 5/27 at 7:00AM Pacific Time.


  • Anybody can participate but the winning entrant will need to be a K-12 student in the US.
  • $100 maximum on the pizza party.
  • You’ll have to include an e-mail address, school name, and teacher name if you want to compete for the pizza party.
  • If multiple people take the top spot we’ll draw the winner randomly
I'm Dan and this is my blog. I'm a former high school teacher, former graduate student, and current head of teaching at Desmos. More here.


  1. Tessa Sprunken

    May 20, 2013 - 6:55 am

    But how is it possible that my rule worked with “120% of the 153 colours everyone has seen”?
    Also, it seems I can only use AND once?

  2. I don’t think that I’m spoiling with my question here, but the rule that I entered doesn’t seem to process correctly. (b>g) and (b>r) doesn’t select only colors for which blue is the highest intensity, though it should. Cool problem!

  3. Tessa: I may have made a mistake (nothing like a bunch of people to expose problems carefully planned testing didn’t). It should be fine now

    jg: We’re concerned with not just matching what is blue, but also correctly matching what people say is not blue. Your rule has to work both ways, and so is judged both by the system.

  4. Timothy Russell

    May 20, 2013 - 8:43 am

    How do the percentages work? I have a rule that says a “60% not blue” is not blue, but I am not getting credit for it. Just curious how the percentages factor into it.

  5. I cheated when I had to mix to a target color, I’m a backend developer, design is not my thing :-) Web inspector to the rescue! Then I noticed that you have not filled out the Site ID of the Google Analytics code snippet yet, probably an omission.

  6. @Timothy, the percent agreements don’t affect the validity of your rule at all. Feel free to e-mail me a screenshot of your rule and the color you aren’t getting credit for:

    @Jan, strong work. I’d be a liar if I said I never used a color picker during the testing of that part. Thanks for the heads up on the GA ID also.

  7. Brilliant!

    Love it Dan, this is great on many many levels. Might use this with some of my computing classes. Get them thinking about AND/OR logic, RGB values and some programming-esque Maths all rolled into one. Top work!

    86%. Quite please with that :)

  8. Awesome – I’ll be turning my students loose on this one.

    And had this been next year I would be requiring in both my 8th grade intro to programming and 9-12 Foundations of Computer Science classes. Very, very cool idea.

  9. Dan – amazing…

    I would really like to see some increased functionality on the site that allows students to try an inequaltiy, and for them to see the range of colours which satisfy such rule (or maybe just 10 random solutions)

    I know this will give a massive head start to ‘winning’ the competition, but it would certainly help the weaker students see the fruits of their effort, and better and better their efforts.

  10. Timothy Russell

    May 20, 2013 - 1:07 pm

    I think what is most frustrating to me is that I work to get a high percentage on my 30 colors but for some reason it does not translate well to all the colors.

    I have got multiple 90% and higher but the highest I have seen in the total results was 87%. (I can’t remember which was which but I think that was even my first attempt, which showed a local percentage of 84%.) It seems my equations suffer from being tailored to the few colors I have seen rather than being a general expression of ‘what is blue.’

  11. I’m having way too much fun setting up my logical statement/inequality.

    A really great related article is XKCD’s color survey from a few years back, where they asked people to name colors. Warning: there are a fair number of not-safe-for-math-class words, mainly in quotes that shows what happens when you ask anonymous people to fill out online forms.

  12. Very nice. This type of problem reminds me of the types of questions that kids saw on the PISA test. I did some reviewing for the PISA test here in Canada and some of the questions were actually done on a computer with the kids interacting with mini applets to solve problems.
    I am curious, however, on how this went with actual students? I had my son try it (grade 8) and he quickly lost interest when he had trouble matching the shown colour by sliding the colour values. The promise of a pizza party for his class was clearly not incentive enough for him.

  13. A couple of minor thoughts — this is, in general, pretty great.

    – Why is everything on the leaderboard rounded to the nearest percent? With the data you have, you could give a little more accuracy (two decimal places) and spread out those “ties”.
    – The convention on those tied places is to mark the next place according to the number of ties: 1, 2, 2, 2, 5, 5, 5, 5, 5, 10, 10, 10… rather than 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4… Otherwise you’ll have everyone above average… which I suppose is good for your VAM.
    – Math question: is there a way to combine a statement like (b > 200 and g > 200) into a single inequality? b * g > 40000 doesn’t cut it, and neither does (b-200)(g-200) > 0.

  14. I feel kinda dumb but I don’t get how the rule works. I’m just not sure what I’m supposed to type in. Is it just RGB combinations?

  15. Hi Jim, just give something a shot like “r>g” and see what happens. That’s saying, “with every blue color, the red value will be greater than the green value.” Perhaps that’ll get you started.


  16. This sure is a good ironic ‘dig’ at the datamining industry, asking people to make a classifier ;-)
    Any way, as techie, great stuff, but pedagogically I’m not sure. The run-up encourages to ‘just use the sliders’ (Is a slider the new button? Zorn-esquely spoken) and not sure if logical expressions are really learned without much context. Encourages trial and error. I would put them in a more scaffold-ed build-up, I think.

  17. Hi,
    Doesn’t work using firefox 21.0, can’t drag the numbers, just so you know that’s all.

  18. Is it just me, or is there a cap on the number of AND and OR statements you can use in an expression? It seems like we can use at most 1 AND and 1 OR, but no more. We can’t do 2 ANDs and 0 ORs, for example.

  19. I wonder how many classes are going to find themselves slightly messed up by this due to poor color calibration, or colors being skewed on a projector. Probably won’t make a huge difference when generating a formula, but I tried it out here briefly and had something very-pale-pink on my screen show up as a flat, almost blueish grey on the projector.

  20. I input this rule, bg, and the program still gives me a high (87%) rating. How/why is this possible?

    Thanks, Scott

  21. I can’t seem to use parentheses in the formula. What is the order of precedence for And/or? Or is that part of what i need to get the kids to figure out, hmmm. Think I will let the kids puzzle over this. Thanks I have some kids really engaged.

    Great problem.

  22. hey, so I thought I had the formula just right or at least good enough to get a 80 or something, but ended up at the bottom with like a 20% on the score board…

    this leaves me with 2 major questions.

    #1. why does the % you get on the formula screen show something completely different then what’s on the ranking screen?

    and #2 would you be so kind as to explain what exactly your using to get these percentages? – as in – what factors go into finding the percent, cause I have a feeling like there is not a “defined answer,” but having it more or less depend on the opinions of the people (like when they do all the questions at the beginning) and if that is what you’re basing the “blue” on.

  23. @Scott, that rule will return false 100% of the time, so it’ll correctly identify none of the blues and all of the “not blues.”

    @Steve, let me check on this.

    @Ely, the percentage on the formula screen only tests your formula out on a small sample of colors. Then we test your formula on a much larger pool. It’s possible to create a formula that works for every single color on the sample set, but which isn’t all that useful more generally. That’s why we test it again on the larger pool.

    To your #2, we base blue on the opinions of the people taking the initial survey.

  24. but if that’s the case then there isn’t a way to get 100%… because my friend would just click through the first questions cause he said it wouldn’t matter. and I’m just guessing here, but I think a few people are probably doing the same, thus skewing the accuracy of a bit and not giving us a defined answer of what blue really is…

  25. and is the formula is trying to work with all of the ones that say 100-50% blue, and none of the ones that are 50-100% not blue? or every single one?

  26. @ely, True story, but the inaccuracy will ding everybody equally.

    Your goal is to create a rule that’s true for all the blue matches and false for all the non-blue matches.

  27. @Dan, thought I responded earlier. The kids will be together for the homeschoolpizzaparty (if they win) we meet in person usually and occasionally online. Online never works as well.

    Also found a bug, I entered “r=1” and got a 57% match!!!

    If you send me an email with the code for your parser I can try and help (time permitting)

  28. Clarification on typos :)

    r < 255 and (b 240) <== Does Not evaluate

    r < 10 and (b 200) <== Should not evaluate

    r < 255 and (b 240) <== Does evaluate

  29. @Steve: r=1 will return false for practically every colour…or not blue. This means that it will work for all the non-blue colours. Statically, this will probably get you a decent percentage across the entire dataset, but not as good as one that returns true (or blue) for blue colours as well.

    Re. order of precedence: the ‘computer’ runs on logical AND before logical OR. Parentheses currently won’t work. I’m going to try and put some time in on this today. If it is around a single clause it evaluate, but won’t actually change the result.

    I can’t seem to reproduce any of those problems. Is the blog stripping some characters here? (b 240) wouldn’t evaluate in any context. Can you drop me an email or something and I’ll take a closer look?

    @Jim: Sorry, I think you may have caught bluemalator at one of it’s busy recalculating periods and iOS timed-out. I’ve made some changes to hopefully reduce that risk. Hope it didn’t cost you a top score!

  30. Question: Do you periodically re-run the rules against the “Human Responses?”

    I saw one kid just randomly hitting yes/no to get to the answers (which may explain some the seemingly strange to my eye versions of “blueness”. Yes its the same issue for everyone so fair in that sense and once you get a large enough sample I would assume the variance would be low.

    Thanks again this is great challenge.

    Regarding the order of precedence issue, I think I will ask my kids how can they design some tests to determine the order of precedence, or even if there is one?

    Then I will ask, if ANDs are evaluated first, how can you express b &lt 190 AND (r &lt 180 or g *lt 180 without using parentheses?

    Now I need to fig

  31. This is great. Such a nice idea.
    I am far from a school kid, but I got into this. The reason for this is that this is a classical Machine Learning problem. As I completed Caltech’s online course “Learning From Data” a couple of months ago, this is the perfect exercise to brush up my knowledge and put it to the test!
    Of course you know that this is Machine Learning, you are referencing the Netflix Prize ;)

    It is great reading the comments and seeing people discover basic ideas of machine learning, such as a solution working well in the small sample size but not working well in the (hidden) overall data.

    Initially I was a bit puzzled by the percentage of blue you report, since I thought we had o match this too. Then I realised that we only care about the binary decision. Maybe you can make this more clear (?)

    Now I do not want to highjack your effort and its focus on school students, but would it be possible to release more samples of your set. You see, with only 30 samples it’s hard to do any fancy machine learning (although a quadratic plane does not perform too bad). If we could have 200 or 300 samples I bet we could do much better.
    I could try using the data from xkcd’s survey, that would be interesting :)

  32. Scores are recalculated every 10-15 minutes or so, based on pure binary (we are a computer here) match: If you are true on a blue colour, or false on a not blue colour etc. Colours are retired from public vote after 10 (realistically due to concurrency) to 15 votes, and enter a pool that is drawn upon during the task but not voted on. All colours—much to the chagrin of my server (sponsor me Amazon!)—are included in final calculations.

    A cool new thing is which shows the current colour dataset in terms of blue or not blue.

    I will happily offer for download all data after the end of the challenge. I’ll probably make the github repo public so people can see how they were judged. There is some cool maths early in the sequence.

    We haven’t decided the ‘afterlife’ of this, but I’d be up for keeping it up with a blank database every week or so. Part of me is annoyed (in the most polite sense) that the top percentile is maths teachers rather than maths students.

  33. @Dan – well if it makes you feel better two of my students beat me!!! It wasn’t until I asked them to share answers, that I was able to improve my score and beat them. I expect they will return the favor. No worries I have no intention of telling them what I am learning. Just trying to better understand the problem space so I can ask them better questions and try and guide them.

  34. I just tried it out myself and this seems really fun for students! The scale at the end is also kind of fun to mess around with. I’m kind of confused by the percentages, but I’m sure the more I tinker with it, the more I will figure it out.

  35. @Mr Steve Why do you think this is an “issue”?
    As you noticed yourself, looking at the all the colour tested, about 80% are not blue. So if you have a rule that always returns false (aka ‘not blue’) then it will match correctly with 80% of the total test colours.

    What I find puzzling, and I believe is a bug, are the results on the bottom of the standings list. They are way too small. Smaller than 1%. The last one is 0.09%
    This implies that by just negating this rule you get 99.9% accuracy!! I find this highly unlikely.
    Maybe it is just a bug, or maybe they are results from early tests in the database, when it had very few colours. Hmm but then again it would not report such a low percentage. 0.1% means that there are at least 1000 test colours.

    Dave if you have any insight into this, I’d be interested to find out.

  36. @Thanassis; Good point and poor choice of words. I would use “1=0 gives 80%” to ask kids why? Then ask is it fair or a good judge of the “blueness” formula to have so many of the colors “not blue”? How would you judge the formula for blueness?

    Steve Thomas (aka Mr. Steve)

  37. @Thanassis: I’ve been staring at those outliers for a while. I think they were people who got in early with a rule that was tripping the parser before I closed that bug. I just fixed the syntax error in their rule and they’ve shot up the rankings.

  38. @Steve: I see what you mean. This is an interesting point. The skewed frequency of classes is a well-known issue in machine learning. Generally you would have problems is for example only 1% of your data points were of one class and 99% were of the other class. There ways to account for this, but with 20% I think that the problem is not pronounced.

    @Dave. Thanks. BTW I have a technical problem now, and maybe you can help me. I tried to submit a few rules earlier today, and the webpage seemed to freeze. I tried another rule few minutes ago, and again it took quite long, but it finally showed me the standings with my new rule in them. It also showed me the results from the earlier entries that I thought they never came through. There is one that did pretty well (3rd overall) but I am not sure which one was it :) Can you please send me the name of the teacher for my first 3-4 rules (just look for the name boulis in the standings)

  39. OR is not working and I have kids working on it this weekend sending me emails. Anyway we can get a slight extension at least for a short time after the OR is fixed? I sent an email to Dave with an example of the problem. But just try to enter any expression with an OR

  40. When the site opened again after the first contest I decided to play a bit more with it. I saw that the original points/colours were kept. So I entered my best rule (a cubic plane) and it scored slightly better than before. Then I thought I’d try something major. You have a page (/blueis) that you give all the colors in the database in little boxes, separated in two regions blue and not blue. I saved these regions as images and then wrote a program to parse them and get r,g,b, values for all the different colours. So now I have the entire database (4090 points at the time). Now I could run the algorithm on the entire database and see what are the best results I could get. This is not really machine learning, it is more like fitting :)
    Curiously enough the cubic regression on the 4090 points works slightly worse than the cubic regression on the 140 points I initially had!
    Then I noticed some things that made me scratch my head even more.
    I tried a 4th degree plane to separate the blue from non blue, as I was expecting it would do better than cubic with 4090 points. It did. With my calculations it was giving me an accuracy of 93.4%. But when I applied the rule with your website I got 80%. I assumed 80% is what you get if your rule compute always false. So maybe my rule was not parsed/calculated correctly (it is a long rule afterall). I checked this assumption by entering an always-false rule (r=300) and noticed that although it was close, it was not identical.
    Moreover I started noticing that accuracy scores were changing in the standings *without* the number of colours changing. The number of responses are changing, but I assume, that the colours in the database are already fixed, i.e, when a color enters the database, it does not appear in the test and it does not change its blueness value.
    Dave, if you have thoughts on the last points I’d love to hear them.

  41. So still a problem when I enter b>0 and b<0 I should get 80%, but instead get 20%. Seems to act more like an OR than an AND. Also OR does not seem to work at all.

    Also Given that the population is ~80% NonBlue and ~20% Blue. Why do I seem to get samples that are at best 50/50 and in the last case were 17 Blue and 13 Non Blue?
    Is this by design? If so what is the rational?

    I did something similar to Thanassis (scraping the data set from the Website) so I could try a number of different algorithms locally. May give the data set to some of my students as well and see if they can write some programs to try and optimize certain constants in their formulas.

    Is this code going to be open sourced or made available somewhere?

  42. Steve, I think yes, it is by design. At least in terms of blue, you get almost the same colours each time. It makes sense to give 50:50 samples for the few colours the page shows. When you pick colours randomly then the ratio is 20:80 (that’s seems to be the case with the xkcd study too)

  43. I found a mistake in my code, that explains the poor performance of the 4-degree plane. After correction, I should have seen 93.75%. Still the webpage give me 91.36%. My calculations are based on all 4304 colours currently in the database. Maybe there are rounding errors as many terms in my rule are very small (10^-9) and one is 10^-11

  44. @Dave I believe there might be inaccuracies with the way you calculate the rules. Possibly due to rounding errors as I wrote above. I tried more tests with the database of 4304 colours. Using either 140 points or the full set of points (4304) I was calculating separating surfaces of various degrees (0th to 6th), and then testing their accuracy (i.e. how well they separated the blues from the non blues) on all 4304 points. The results I got should have matched the results from the standings page. But they do not. I only entered a few rules in the website, but all except for one (140 points, 3rd degree) perform worse compared to my expected results. The greater the degree, the worse they perform. The one exception performed 93.17% instead of 92.77% that I calculated.
    In my rules I enter numbers with up to 14 decimal places of accuracy. Maybe your parsing truncates them to less accurate representations (?) I tried producing rules where the r g b values are divided by 100 first (so that the terms of a rule are larger and do not have to employ the e notation, e.g. 1.092344e-06). The results in the standings page are even worse (where of course my own calculations on accuracy did not changed). Anyway just reporting, probably it is not important for the rules school kids are playing with.

  45. The site is down. At this point, we have no way of restoring the site, which is kind of a drag for all of us. Please accept our apologies.

  46. Stephen Thomas

    March 4, 2014 - 7:03 pm

    Is there anyway we can get the code used for Do You know blue so we can try and put up another site? or would we need to build our own from scratch?

    Mr. Steve

  47. Dave Major is the only one who has access to the code AFAIK and it’s been tough to get ahold of him. Until that happens, the site would have to be rebuilt.