I know this is mostly orthogonal, but I think that's the wrong takeaway from Simpson's paradox. As an illustration, let's translate into a situation we have better intuitions for:
By the same reasoning above, 'glass of water' would be the better cure for all headaches, which is doesn't seem right. And yet all of the individual numbers seem plausible... Tylenol helps more in general, and the difference is bigger when the headache isn't that serious to begin with.
I think the handwavey explanation here is that the studies failed to run over a representative sample of the population, which is what makes the results difficult to compare between them. All other things being equal, Treatment B really is better; but not all other things were equal between the two aggregates, so the overall percentage is misleading. When statisticians 'control' for some variable, it's this kind of wierdness that they're trying to squeeze out.
I think the handwavey explanation here is that the studies failed to run over a representative sample of the population, which is what makes the results difficult to compare between them. All other things being equal, Treatment B really is better; but not all other things were equal between the two aggregates, so the overall percentage is misleading. When statisticians 'control' for some variable, it's this kind of wierdness that they're trying to squeeze out.