Thursday, September 6, 2012

Color Distribution Methodology

Getting independent colors which are maximally dissimilar from all previous colors is a fairly common problem. The standard suggested algorithm is to subdivide the most significant differences in colors and then pattern them. I thought this was the case too. So much so that I solved the problem because I thought it would be very useful (and it might). Generally this required two minor breakthroughs understanding how to encode maximally significant differences in colors through binary (flip the ends), which I went through a lot of struggle to get to including my anti-gray codes and derivation algorithm and proof (intersperced gray codes with an additional zero anywhere in it and flipped will maximize hamming distance). And secondly understanding the patterning such that I could derive any value without iteration of the previous values.

Well, looking at the results, it quickly seems that they aren't that great.

Largely there are issues with greens being pretty close even when their RGB values are far, and the patterns constantly making what turn out to be slight variations of your standard pure colors RGB and CYM. You start clashing rather quickly.

The better solution is computationally much more difficult. And generally sort of cheating. You do the hard work. You run through every color you can represent in RGB (16,77,216 of them) and run a heavy duty color distance routine, likely CIE-Lab 2000, then you go in there and manually choose the colors. And then use this as a master list and just read from the list. The standard solution starts picking some greens that look pretty much like the previous greens after like 20. And you should be able to get like 50 colors or so which are visually distinct from one another. Perhaps with pruning of the list by hand.

Using 20k guesses (rather than brute forcing 16-million) I produced this after an hour or so.

There's likely not much left to gain brute forcing the algorithm, but I did (for a smaller value 64) and came out with this.

Which manages to out perform the heck out of the dividing algorithm and could just be read by a standard list. There might be some enhancements to be achieved by converting away from finding maximally good results initially which may cost the overall result to become pareto optimal. Or employ something like k-mean clustering in LabDE2000 colorspace (not really defined as a colorspace but pretty much is exactly that) to better maximize the list. Or goodness forbid collect data by humans voting (it's how the color spaces came about, that and math to approximate that data). Although, this would need to be done in somewhat controlled systems because human eyes are excessively sensitive to conditions. Some colors look similar on a black background when they wouldn't on a white background. Which could maybe be randomized to average out or find colors that do this less than others. It wouldn't be too hard to define a list or two which specifically address this question, because programmatically the question of which colors seem very distinct from a set of previously distinct colors has nothing really to do with computers but humans and how they perceive color, and the ability to approximate this has everything to do with humans and not that much to do with programming.

Observations which have been noted before.

There are some good color dictionaries out there. But, I'm not convinced there are much more optimal solutions. And it turns out the algorithms needed are much better at making color dictionaries than they would be generating colors on the fly. And since we're dealing with 64 colors max, we're much better doing that.

No comments: