This lack of consistent colour presentation is actually an advantage for Google's approach over the designed based one.
Their approach is optimising across the population of 'end colour presentations'. Even if 50% of users happened to see it shades of aqua then the best 'aqua' presentation would also be incorporated into the result with a 50% weighting. A designer would find it very difficult to weight all the of presentations proportionately even if he knew about them whereas Google's method doesn't even need to know about the variations. Let's face it you never really know what peoples internal representations of colours are.
Of course this type of testing only works where you can measure the results and the number of options to be tested are low compared with the number of tests. If you look at the whole layout of a page then then there are just too many dimensions for this type of testing to be practical. A designer could perhaps argue that changing the background colour, the border colour and using a bigger red button would be the best option. A designer could also argue that they are optimising for 'overall experience' rather than some measurable metric like clicks.
I think this all goes to show why both designers and 'measurement geeks' need to work together on these things.
You make good points that I partially agree with, however I still don't believe this is optimal for a few reasons.
End-user design is about creating a usable experience and often deriving an emotion to make them want to do it again, a process that relies on color combinations, layout, typography, etc. -- the complete package. I'm less attached to GOOG webapps because they lack subtle things such as sound feedback. I've found sound to be critical in software usability, far more important than very minor color discrepancies. That is, when software goes outside of very basic operations, and requires more attention from the end-user.
(Fighting about the shade of blue makes me puke in my mouth at the thought of such a corporate cliche).
Furthermore, this type of color testing would need to be extended to cover cultural differences to be truly effective. Colors will trigger completely different responses in China, Japan, France, etc, often the exact opposite of American counterparts.
Honestly, I think the engineers have a place AFTER a design has been released and there is data to mine and analyze that takes into account the entire product presentation as a whole. Interjecting them in the design process too soon and giving their opinion overriding power is a mistake. It's akin to slapping memcache on your back-end before you've done an ounce of optimization on your queries.
> I've found sound to be critical in software usability
Very little is worse than web pages that use sound. It's no accident that most sites don't use sound, your tastes are just way outside the mainstream on this.
> Interjecting them in the design process too soon and giving their opinion overriding power is a mistake.
The success of Google clearly shows it's not a mistake, numbers don't lie, but designers are full of bullshit they can't justify.
Desktop apps are used by more people on a more regular basis than any web app to date. Web apps are getting there, but there not there yet. Sound, when used appropriately, is a very critical component to user experience and more importantly, usability — hence why most operating systems use sound feedback. Apple's new Nano uses sound to extend its usability tremendously, and other examples are endless. (By no means do I assume sound should be forced, always a user-defined option).
Agree or Disagree: Twitter clients are an example of sound feedback hooking users attention (among other things), where the web was failing to do so.
Success of Google and 'design sensibilities' is not really an argument worth correlating; there are companies like Apple that master both engineering and design... tremendously successful. Furthermore, one could argue that Google Calendar is a near exact rip off of iCal, no credit to Google except putting it on the cloud and syncing it with their own software suite. I think influence is great, but goog engineers are not necessarily to credit for design and usability in this and other products.
You're moving the goal post. No one's talking about desktop apps, you said web apps. Web apps are a different space and sound is just not appropriate there, and it likely won't ever be.
There are many cases that justify when to use and when not use sound on the web...
I would say it's the goal of most web-app builders to mimic desktop apps b/c the result is a familiar environment for the end user. Software/UI design is an established practice, with many studied and proven methodologies for handling complex user interaction that has been unfamiliar to the web in former years.
That said, I believe it's inevitable that web-apps will mirror their desktop counter-parts in time (many are doing so already)... the exception being, highly-specialized web apps.
Of course this type of testing only works where you can measure the results and the number of options to be tested are low compared with the number of tests. If you look at the whole layout of a page then then there are just too many dimensions for this type of testing to be practical. A designer could perhaps argue that changing the background colour, the border colour and using a bigger red button would be the best option. A designer could also argue that they are optimising for 'overall experience' rather than some measurable metric like clicks.
I think this all goes to show why both designers and 'measurement geeks' need to work together on these things.