Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Optimizely (YC W10) Increases Homepage Conversion Rate by 29% (optimizely.com)
146 points by dsiroker on June 8, 2011 | hide | past | favorite | 48 comments


Dear Optimizely: Your statistics do not tell you that B has a 95% chance of being better than A. Your statistics tell you when the excess of B over A has less than 5% probability, assuming B and A are actually equally effective.

A Bayesian would understand this in terms of prior probabilities and likelihood ratios, but to put it into nontechnical terms, suppose that you tried out 15 different alterations and none of them seem to work. Then on the 16th, your detector goes off and says, "Less than 5% probability of these results arising by chance!" Do you conclude that it's 95% likely that this version is genuinely better? No, because the first 15 failed attempts told you that improving this webpage is actually pretty hard (the prior probability of an effective improvement is low), and now when you see that the 16th attempt has a result with a less than 5% probability of arising from chance, you figure "Eh, it's worth testing further, but probably it is just chance."

Another extremely important point is that the classical statistics you learned to use to decide that something was <5% likely to arise by chance, only apply if you decided in advance to do exactly that many trials and then stop. Your chance of finding, on some trial, that your running total of results is "statistically significant", when A and B are actually identically effective, is considerably greater than 5%. See http://lesswrong.com/lw/1gc/frequentist_statistics_are_frequ... - a trial I ran with 500 fair coinflips had at least one step where the cumulative data "rejected the null hypothesis with p < 0.05" 30% of the time.

You're not really to blame for this mistake, because the horrid non-Bayesian classical statistics taught in college are just about impossible to understand clearly; but it does sound to me like someone at your org needs to study (a) Bayes's Theorem (b) the case for reporting likelihood ratios rather than p-values (likelihood ratios are objective, p-values decidedly not) and (c) the beta distribution conjugate prior (which would make progress toward having priors and likelihood ratios over "These two pages have a single unknown conversion rate" or "These two pages have different unknown conversion rates"). Or in simpler terms, "Someone at your company needs to study Bayesian statistics, stat."


[I'm one of the co-founders of Optimizely] Hi Eliezer, thanks very much for your thoughts--these are great. You're absolutely right that classical p-values are not without their shortcomings, which can be significant when misused! A Bayesian approach would address some of these shortcomings, but would also introduce other hurdles, like generalizing the selection of a prior distribution and, more importantly, (IMHO,) distilling likelihood ratios in a way that can be quickly grasped by someone with no formal stats/math background. In simpler terms, classical statistics is not perfect, but in most cases it provides a lot of useful information in an easy-to-grasp way.

That said, we've started looking more seriously at incorporating Bayesian techniques into our interface and would love to get your thoughts on great ways to communicate these concepts to our customers. I've reached out to you directly off-thread and would love chat further!


The p-value represents "Probability this is all luck". So why isn't 1 - p "Probability this isn't all luck"?

>> Another extremely important point is that the classical

>> statistics you learned to use to decide that something

>> was <5% likely to arise by chance, only apply if you

>> decided in advance to do exactly that many trials and

>> then stop.

I agree. Not doing that I'd say it's "fudging with the numbers". I can't find on their article where they did this though.


The p-value means "chance of this happening, if the two were equally effective". So 1-p means "change of this NOT happening, if the two were equally effective".

Note that in frequentist statistics, the sentence "Probability this is all luck" doesn't even make sense. Either the universe is so that it was luck, or it is not. In frequentist statistics you do not assign probabilities to models of the universe. You only assign probabilities to observations, given a fixed model of the universe.


A better interpretation might be "probability this would happen if results were governed purely by chance". Note that the distinction is important if the process is in fact governed by chance!

Lets say I roll a dice two times and get a six both times. The probability of this happening is 1/36 or about 3%.

Would you say I have established with a 95% confidence interval that the particular die I'm using always rolls six? No, because you have good reason to believe that the the results are in fact random. Or in other words, you have a strong prior belief that the hypothesis you're testing is false.


>> Or in other words, you have a strong prior belief that the hypothesis you're testing is false.

Yes, that is true for the dice roll; You already know most die aren't rigged.

I don't see how this affects the results of optimizely's product; You do not have a strong prior belief if the hypothesis is true or false.

Also note that the number of observations in the article used were in the thousands.


I think you'll find this is a problem with all A/B testing apps out there. I haven't seen a Bayesian A/B testing app.


They increased the number of people submitting their URL, there is no telling if that actually resulted in higher leads for them.

What I have found is a simple landing page, that tells the user exactly what you are providing, and is free of any confusion, works the best over the long term.

I've run hundreds of thousands of website visitors through Google Website Optimizer in multi-variate tests and what I've found is that over time there is little to no difference in conversion rate for minor landing page changes. The biggest jumps come from eliminating content in the design and clarifying the message.

Looking at the small amount of users they sent to this landing page, I would call the results inconclusive. You can ramble off statistics to me all day long, but you can't change the fact that humans don't behave when the predictable that coin flips and physics do. (its really chilling when you see how many drugs the FDA has approved over tiny margins of change/success.)


> Looking at the small amount of users they sent to this landing page, I would call the results inconclusive

The statistic say otherwise. A 29% bump with a 1% margin of error is not inconclusive; it's virtually the very definition of a conclusive result.

> You can ramble off statistics to me all day long, but you can't change the fact that humans don't behave when the predictable that coin flips and physics do.

And you can rattle off personal anecdotes like this all day long, and the statistics are still more correct than you are and assert their margins of error and accuracy. The statistics are more correct than your intuition.


> A 29% bump with a 1% margin of error is not inconclusive; it's virtually the very definition of a conclusive result.

Well, no, it's a set of numbers with percentage signs after them. Perhaps the documentation for Optimizely specifies how their error margins etc. are derived, but nothing in the linked article does as far as I can see. Without knowing that underlying reasoning, all those pretty graphs and percentages are just a load of gobbledegook, apart from the original data points and the percentage increase figures derived directly from dividing them.


What a lazy ignorant statement. If you want to see how they crunch the numbers, go look.


I did. A Google search for

"error bars" site:optimizely.com

turns up exactly three hits. One of them is the blog post we're talking about. The others are discussions on the Optimizely support pages from December 2010 and January 2011, which are similarly statistically waffly. The older one promises a further clarifying post that never seems to have been written.

If you have found other sources where the Optimizely site publicly describes their statistical methodology, please share them. I think several people following this discussion would be interested.

Otherwise, I stand by my earlier comments.


So ask them; they aren't stupid, they wouldn't be building a business based on A/B testing without using valid methods of testing and displaying results. That you call the results crap because you don't have the perfect details of everything is simply absurd.


I didn't call the results "crap". I am simply pointing out that they are meaningless without knowing the methodology behind them. (And we aren't just missing the "perfect details of everything" here. As far as I can see, we have no rigorous details whatsoever.)

I would remind you that you were the person who was attacking another poster's position based on your interpretation of those currently meaningless numbers. It's up to you to back up your claim, not up to the rest of us to figure out whether your argument has any merit.


You called them gobbledegook, same difference.

> I am simply pointing out that they are meaningless without knowing the methodology behind them.

Only if you assume incompetence or malice on the part of Optimizely, neither of which you have any valid reason to do. It's perfectly reasonable to assume they aren't stupid and the results are valid.

> It's up to you to back up your claim, not up to the rest of us to figure out whether your argument has any merit.

Um, my claim is don't assume they're idiots; that doesn't require me to back anything up.

The poster I replied to wasn't attacking them, he was attacking statistic in general, which is what I was replying to.

Your response was to imply that Optimizely doesn't know what they're doing and therefore their results are invalid until you see how they're crunching the data; that's simply absurd.


Keep in mind that the 1% margin of error applies to the measurements of the conversion rates. The 29% has a significantly higher margin of error.


Great point, the goal we talk about in this blog post is the number of people who click that button.

One nice feature of Optimizely is that we can test as many goals as we like-- you can add goals even after the experiment has started running and we retroactively measure the conversion rate!

After I read your comment I went in and added a goal to see whether there was a change in the number of people who ended up starting a free trial later down in the funnel after seeing each of these variations. Turns out there was a +2.5% increase for the "Give it a try" variation. :)


I'm curious how far you're able to follow them down the funnel (e.g. conversions from that free trial to paying member? or effect on the cancel rate of those members?). I ask because I've been looking at Hubspot, and find their ability to connect the lead gen into the CRM a big selling point (I guess they call this "lead nurturing"). Are you focusing narrowly on the lead gen half of it?


Hi Josh,

Yup, you can track as far down the funnel as you like. Here is an article in our knowledge base that explains how you can do this: http://support.optimizely.com/kb/goal-tracking-and-reporting...

Let me know what you think!


Josh -

I think tools like Optimizely can track to the final sale if you have an online (ecommerce only) transaction. This works well when the user is expected to complete the transaction online and without changing computers or deleting their cookies during the consideration process.

If you have an offline sales process (where a salesperson takes an order in person or on the phone) or customers who pay by invoice, or a longer and more considered sales process (bigger ticket items), you need a closed loop marketing tool that connects your marketign leads database to the CRM that the sales team is using. HubSpot does this, and connects to Salesforce.com, NetSuite, Highrise, Sugar CRM, Microsoft Dynamics, ACT, Goldmine, or pretty much any CRM.

So, Optimizely for small ticket / ecommerce, and HubSpot for large ticket / offline sales.

Thanks, Mike Volpe (CMO @ HubSpot)


As always you need to be wary of how these results are reported.

AJ already pointed out that they're not measuring "conversions" in the sense of converted to paying customers, but converting to "entering a url in a field"

The 29% increase is always misunderstood (by clients at least)

The original page had a conversion rate of 8.9%

The page they ended up with had a conversion rate of 11.5%

The change is that an additional 2.6% of customers are now entering their urls in a field and clicking a button.


This is sorta related but since I concluded the test today and this post is here, I thought I'd share.

I ran a 5 way split test for 9 days on my newsletter's signup page (JavaScript Weekly). My original page was the 2nd best performing but an identical page just without the subscriber count got a 8% higher conversion (or about 20% more signups in all) with a 90% confidence at the end of testing.

The worst performer? A signup page with no screenshot preview of the newsletter. Sent conversions from about 37% down to a mere 3% (!!) Lesson learned? Always have visuals or screenshots on pages where you're trying to get people to sign up for things they aren't sure about.


Were you looking at the statistics all the time to see when they passed 90% confidence? That invalidates them, just so you know.

Basically, if you're watching it, it won't science.


I'm not sure how tongue in cheek that was ;-) but 90% confidence was not a goal, at least. I just got bored after a week and wanted to move on. C'est la vie..


Does anyone know of a central repository for all of these little landing page optimization tweaks? I know each site is different and just to test to know for certain but there should be some general decisions vetted that would make a good boilerplate.



abtests (mentioned above) is awesome but there's also http://whichtestwon.com/


We've been using Optimizely more and more extensively (we have a slightly unusual use case for it), and it's been fantastic at successfully ratcheting up conversions. We use it in concert with mixpanel when we need to push people along a funnel.

The tool's insanely easy to implement, a joy to use, and I get to rely on them to tell me when something is statistically meaningful.


What is your slightly unusual use case?


We actually deploy it across a number of client apps, rather than just our own site. We use it to run a dozen or so tests at any given time across multiple apps, and to quickly iterate on others' products.

It's an interesting combination with mixpanel. If we could get one more thing from Mixpanel, we'd really be set - I'm bugging them, we'll see if they come out with it!


It would be very cool if Optimizely could gather the A/B test results across all customers, run some statistical analysis, and publish which optimizations are significant. Sort of like an OKTrends for websites.

That would be a great resource for initial usability and design decisions. After which, can be tweaked and further optimized by their product.


Great idea! We'll do this.


That would rock.


That "Enter Your website URL" field is really annoying. The easiest way to change the default "http://www.example.com to your website's URL would normally be to double click the "example" and type your address, leaving the boilerplate "http://www. and ".com" and such.

But you can't do that due to the fancy javascript and everything. Have to type the whole thing yourself.

If the field cleared when you gave it focus, it wouldn't really matter and I'd just type in the URL myself. But the text remains, in the background, taunting you. It even appears to highlight the "example" part if you double click it.


Sorry about that. The URL will fade out slightly when you focus on the textbox but I agree it is better to just clear it completely. We'll make that change.


Why clear it? Why not leave http:// in there?

Might be worth A/B testing it. :-p


How does Optimizely compare with Visual Website Optimizer?


VWO is a great product but here are some of the unique benefits of using Optimizely:

First is ease-of-use and simplicity: we hope Optimizely is a product anyone can use, not just IT or engineers. Paras Chopra, the creator of VWO recently said on Quora: "Optimizely is a great tool and perhaps even bit easier than VWO, so I would recommend you to signup for a free trial of both the products and see which one is suited best to your needs."

Secondly, we enable power-users: all of the changes you make in Optimizely can be transformed into JavaScript statements that correspond to those changes. We enable you to see the JavaScript that gets generated and even edit it or write your own if you want to create sophisticated transformations of your page (click the "Edit Code" button in the bottom-right corner in the editor).

Third, we enable you to track any numbers of goals, even if you decide you want to measure a goal AFTER you've started your experiment. You can track any pageview, JavaScript event, or custom event and match these goals exactly, using a substring or regular expression. We keep all of the data you've sent us so you can always add and remove goals even if the experiment is still running. This gives you the flexibility and power to measure any goal you have on your website in real-time.

In addition to these three we are also unique in offering automated cross-browser testing, behavioral targeting, undo/redo, and unlimited simultaneous experiments.


I'm the founder of Visual Website Optimizer.

While Optimizely is strictly A/B testing, VWO is a much more comprehensive testing tool. In addition to A/B testing, we support Multivariate testing, Split URL testing, Heatmap/Clickmap, Targeting/Segmentation and Revenue Tracking. VWO is being used by companies that have sophisticated and comprehensive testing needs (like AMD, Microsoft, Groupon, etc. and thousands of SMBs)

That said Optimizely may be slightly easier to use but with VWO our focus is on both: comprehensiveness of feature set _AND_ ease of use. Since both companies offer free trials, best is to signup for both of them and see which one is better suited to your needs.


Without diving into the arcane specifics, I can safely say that I wouldn't have my sanity today without Optimizely.

It's mind-numbingly simple to use, but also hugely powerful, and intelligently implemented. Many people seem to assume the power/appeal is in the visual editor, but they've advanced so far ahead of just being "visually created experiments", in both experiment design, targetting, implementation, and customization; they've entirely eliminated the headaches and fudging that usually comes with online testing tools. I'm as happy of a customer as they come.


I spent the last month using VWO, and despite the website needing a slight Web 2.0 visual update, it has been a good experience.

I'm now getting started with Optimizely and the biggest thing I miss is Video demos. I also liked that VWO had a much smaller JS file with the option to remove Jquery. Fix this Optimizely, I don't need to load Jquery twice!


Hey Mike -

Good point about the video demos.

If you have your own version of jQuery running on your site, you can have Optimizely exclude jQuery from the project bundle. Check out this for more info:

http://support.optimizely.com/kb/advanced/does-optimizely-co...

If you have any other questions, you can contact us through support.

- Jeff (with Optimizely)


I'd be interested to see some testing around using a '!' in a conversion button text as opposed to a period.


I got a lifetime free Optimizely account as part of an AppSumo deal. I can successfully say it's one of the simplest yet most powerful and effective web products I've ever used.

The ease with which you can make changes is astonishing and there's no limits if you know a little jQuery.


Kinda out of topic, but I tried out top websites that use long-polling (like Quora), and it always fails. I guess the app waits for all the resources to load completely, which for long-polling websites, happens rather long after the DOMready event.


I wish I could apply Optimizely AB Tests in everything I do. Like measuring the best way to word movie choices so that the one you secretly want gets chosen (conversion)!

Anyway, interesting statistics. Larger sample size I think would definitely be more dramatic.


How do you know AB test works if you aren't testing with the same user?


A/B testing works by randomly distributing your traffic. We can't assume that every person is the same, but by splitting the traffic randomly we can assume that the two groups are similar.

For a little bit more information on how our particular flavor of AB testing works check out http://optimizely.appspot.com/works




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: