There is widespread misuse of two-tailed testing for directional research hypothesis tests. The fundamental cause of the current problem is the pervasive oversight in making a clear distinction between the research hypothesis and the statistical hypothesis (1). A sound relationship between the research and statistical hypotheses must be made in order for a test to be logically consistent and valid.
With certainty, non-directional research hypotheses should be tested using two-tailed testing while one-tailed testing is appropriate for testing directional research hypotheses.
If a research hypothesis presumes a positive relationship between two constructs, then right-tailed testing is appropriate. This will likely be the case for the vast majority of A/B tests. For example, after visitor analysis of heatmaps, visitor recordings, and analytics data, a researcher might conclude that:
Through a better U/X implementation of a product page, we expect a new design to improve final purchase conversion rates.
This research hypothesis can only be converted into a logically consistent one-tailed statistical hypothesis.
It would make no sense for the research hypothesis after such visitor analyses to be:
Through a better U/X implementation of a product page, we have no idea if the new design will be better or worse for final conversion rates.
The difference is subtle but important.
In the one-tailed hypothesis, the view is that by making U/X improvement we expect, based upon our research, to experience higher conversion rates and we are testing if this is, in fact, the case at some level of significance.
In the two-tailed hypothesis, the view is that by making U/X improvements we have no expectations that these improvements will actually improve anything at all!
A two-sided hypothesis and a two-tailed test should be used only when we would act the same way, or draw the same conclusions if we discover a statistically significant difference in any direction.
But this is clearly not how one would act in making an actual business decision with a two-tailed test. How often do you define your hypothesis in a way that no matter if the U/X improvement performs differently than our existing one and we would take the same action if it is better or worse?
This is why two-tailed tests are not appropriate for directional hypothesis testing. When researchers do not have a sufficient level of knowledge that provides for the directionality of the research hypotheses, the research hypotheses may take the form of non-directional ones and the subsequent use of two-tailed testing is appropriate
An appropriate two-tailed research hypothesis might be:
By adding additional functionality to the checkout that will allow the site to capture more information we don’t expect any impact on time spent on the page positively or negatively.
In this case, we don’t know if a user will spend more or less time on the page with the additional functionality but we don’t care because we will make the same decision to not implement the functionality of either of the two occur.
Therefore, when using a one-tailed test for directionality, the calculation of significance for a result in which the original had a higher conversion rate than a variation has no meaning and no possible interpretation from a significance standpoint. The only conclusion that can be drawn in the case where a one-tailed test did not reach its preset defined statistical significance is that the null hypothesis was true (e.g. the variation was the same or worse than the original). However, there can be no statistical significance attached to the truth of the null hypothesis.
June 12, 2018
1. Hyun-Chul Cho Shuzo Abe (2013) “Is two-tailed testing for directional research
hypotheses tests legitimate?”, Journal of Business Research 66:1261-1266