Are Online Surveys Reliable?
July 24th, 2008Whenever I recommend online surveys to my customers, the number one question I get is whether or not the results are trustworthy. My customers ask this question because they fear that website visitors typically won’t bother to fill out questionnaires online and that the collected data are therefore unlikely to be representative.
In this post I will present a case study which confirms that fear. My study shows that there are in fact significant differences between respondents and other visitors: they tend to be more engaged with the website, they see different content and they even come from different geographical areas.
As such the results from online surveys should not be generalized to your entire website without first correcting for sampling error. The way you should do this, however, depends heavily on the purpose of your survey. In some cases, you can get by without correcting for sampling error at all – for example if your purpose is to analyze the (behavioral) causes of satisfaction / dissatisfaction.
Why use online surveys in web analytics?
According to my own definition, web analytics is primarily concerned with measuring online behavior. Web analytics can tell you exactly what people do on your website as well as how often they do it. However, it cannot tell you who the visitors are, why they act as they do and what they think or feel while browsing your site.
The classical example of this limitation in web analytics is the ambiguous meaning of popular metrics such as “Page Views per Visit” or “Average Visit Duration”. High values for these metrics are often seen as signs of satisfaction. In principle, however, they could just as well mean that it was hard for the visitors to find what they were looking for.
Imagine a stubborn visitor who repeatedly tries to complete a check-out procedure. Such a visitor may produce many page views and spend a lot of time on your site even though he or she never completes the task. When you think about it, any given web metric or KPI based purely on behavioral data can always be interpreted positively or negatively.
Given this weakness of traditional web analytics, more and more web analysts are turning to online surveys as a supplement to behavioral data (see here and here). Such surveys enable you to ask the visitors directly instead of guessing from their clickstreams. If they are conducted continuously, you can even use them to include opinion scores in your dashboard alongside your conversion rate and other traditional web metrics.
If done right, online surveys make up a powerful tool for gaining access to the minds of your visitors. They are a perfect companion to web analytics, because they add a subjective dimension to the otherwise purely objective observation of behavior.
The problem of representativeness
Although online surveys complement web analytics in a powerful way, they have one major drawback: Not all visitors want to fill out questionnaires online and as such data are always sampled. I recently calculated the average response rate for all our survey customers and found that it was only around 8%.
The question is, therefore, whether or not survey data are representative. To answer this we first need to know what is required for a sample to be representative. Contrary to common belief, the small response rates of online surveys are not in themselves a problem. As long as the entire visitor population is relatively large, the sample can be proportionally very small and still be representative.
For example, if you have a total of 5,000 visitors on your website in a given period, you only need 357 respondents in order to be 95% sure that your results are correct +/- 5%. This corresponds to a response rate of only 7% (i.e. less than our average of 8%). If your visitor population increases, the ratio gets even better. Thus, if you have 30,000 visitors on your website, the required sample size is 380, corresponding to a response rate of only 1%.
In most cases the response rate is not the main problem. What is much more problematic is whether or not there is a systematic difference between respondents and other visitors. For a sample to be representative, all members of the population must have an equal chance to be drawn.
This is not the case if certain groups of visitors are more inclined than others to participate in online surveys. In this case, your sample will be biased even if you tried to increase the response rate (e.g. by offering an incentive to participate such as a gift or a chance to win a prize). If there is a systematic difference between respondents and non-respondents, an increased response rate will do little more than underscoring this difference.
A case study
It is normally difficult to compare online survey respondents with other visitors on a website. In most cases, we have no information about visitors who do not respond. At Netminers, however, we have developed an integrated web analytics and online survey tool. This tool enables us not only to see what survey respondents do online, but also how their behavior differs from non-respondents. We are therefore in a unique position to study the sampling bias of online surveys.
The following study compares respondents with all visitors on 12 websites belonging to the same company (a customer of ours who has kindly allowed us to use their data in an anonymous form). The websites are similar in structure and content, but differ in terms of language. A total of 55 surveys were launched across all of these websites. This resulted in 59,957 respondents from a variety of countries, including Denmark, Sweden, Norway, Finland, Holland, Germany, Poland, France, Italy, Spain, United Kingdom and United States.
In order to make the data more comparable, all repeat visitors were filtered out. This brought the base of respondents down to 43,154 and the total population of visitors to 8.6 million. The reason why repeat visitors should be disregarded here is to avoid counting respondents multiple times. Returning respondents are not invited to participate in the survey again and would therefore wrongly be considered non-respondents. This would distort the comparison.
Let us now look at the results. The following charts show that respondents and other visitors do indeed differ in terms of their behavior. The first chart shows the difference in traffic sources. As we can see, respondents tend to enter the site directly, whereas the rest of the visitors more often come from search engines.
This means that respondents are more likely to know the website beforehand and to visit with a particular purpose. They are unlikely to enter “by chance” because a particular search word happened to bring them to the website.
If we look at the next chart, we see another interesting difference, namely that respondents are less likely to “bounce” when they land on the site. A “bounce” is here defined as a single-page visit, whereas “retained” means a visit which views at least two pages. The chart shows a huge difference: whereas the general bounce rate for the website is 52%, it is only 23% for respondents!
Respondent seems to be much more engaged in the website: They both enter directly and delve deeper into the content after arrival. This is underscored by the fact that if we look only at “retained” visitors, respondents view in average 2 pages more per visit. That is to say, retained respondents view 12 page views per visits, whereas all retained visitors view 10 page views per visit.
The level of engagement is certainly higher for respondents than for all visitors. However, respondents also tend to see different content. The next chart shows the difference in exit pages for respondents and all visitors. More specifically, it shows which content section the visitors exited from.
The website in this case study is in the travel business and its two biggest content sections are called “Inspiration” and “Tourist Information”. Both of these sections have an over-representation of respondents. The reverse is true for the rest of the sections, where respondents are under-represented. It is especially noteworthy that the “Online Booking” section which includes the company’s conversion pages has proportionally fewer respondents.
The last chart compares the geography of respondents and all visitors. Again we see considerable differences. In general, the biggest target groups for the website (i.e. the Nordic countries and Germany) tend to have lower response rates than the smaller ones. What is especially interesting is that the UK stands out with an extremely high response rate. For some reason, Britons are much more likely to accept participation in online surveys than visitors from other countries.
Consequences for analyzing engagement
In this post I have shown that online surveys are indeed biased. The most striking difference is that survey respondents tend to be much more engaged than non-respondents: they know the site beforehand, they bounce less often and they see more pages during their visits.
This is perhaps not surprising: the more involved you are in a website, the more of an incentive you have to provide a feedback which could lead to improvements. In contrast, if you find the website irrelevant from the beginning, and perhaps bounce as a consequence, you have less of an incentive to answer.
What this means is that online surveys are weak when it comes to measuring or analyzing the causes of engagement (or lack thereof). We cannot simply ask visitors why they do not engage since these visitors have no intention of answering. We probably even cannot correct this sampling error by weighting the data since the difference is too big. The bounce rate, for example, is so much lower for respondents that it is doubtful whether bounced respondents and bounced non-respondents are comparable at all.
Consequences for analyzing satisfaction
It could be argued that satisfaction scores are likely to be artificially high among online survey respondents. Given that respondents are more engaged than non-respondents, you might think that they are also more satisfied. This is certainly true if engagement is caused by satisfaction or vice versa.
However, in my view, the relationship between engagement and satisfaction is not that simple. Engagement can be defined as an intensive or sustained focus on something (which is often accompanied by intensive use). This focus is not the same as satisfaction; rather, it is the act of building an experience with the object which eventually leads to an evaluation. If the evaluation turns out positive the person is likely to continue being engaged, whereas if it turns out negative he or she is likely to stop. This is why it is sometime possible to observe a correlation between satisfaction and engagement (measured by use intensity) over longer periods of time.
However, in a short term perspective, such as during a visit, engagement and satisfaction are not correlated. They are only related in the sense that satisfaction presupposes engagement. As such, it could be argued that respondents, who do not engage at all (e.g. those who bounce), should be disregarded entirely when calculating satisfaction scores. Given that such respondents have almost no experience with the website their “evaluation” of it must be considered unreliable. By the same token, it could be argued that highly engaged respondents, who still express dissatisfaction, should be given more weight insofar as their evaluations are more reliable.
(See also the discussion on engagements in “Measuring Online Engagement: What Role Does Web Analytics Play?” and ”Responding to Geertz, Papadakis and others.”)
If the above argument is true, then online surveys are not weak when it comes to analyzing the causes of satisfaction with a website. By comparing the page views of satisfied and dissatisfied respondents it becomes possible to identify those areas of the website which tend to cause this satisfaction / dissatisfaction. It is less important to correct for sampling error here because those visitors who respond to online surveys are exactly the most reliable ones.
Still, it might be relevant to weight data under certain circumstances. If your aim is to measure the overall satisfaction as accurately as possible (rather than analyzing the causes of satisfaction), you need to make sure that respondents are exposed to more or less the same content as other visitors. As I have demonstrated in this post, this is far from always the case. If possible, you should therefore apply weights to those respondents who have visited areas where respondents are generally under-represented.
Consequences for analyzing demography
Finally, an important reason to weigh your survey data is that respondents tend to differ in terms of demography. In this post I have shown considerable geographical differences between respondents and non-respondents. These differences are likely to skew other, underlying demographic data. It is probably always a good idea to correct for this type of sampling error. However, if your aim is to analyze the demography of your visitors, it becomes imperative.
Did you find this post helpful? Do yo have experience yourself with online surveys? Perhaps you have tried to integrate web analytics and online surveys? Please share your thoughts or experience by leaving a comment!






