026: Is bot traffic clouding your analytics and wasting your money?

The nonprofit sector is increasingly depending on web analytics to measure impact outputs. However how do we know that the numbers we see are accurate, even when we pay for them through online advertising?

Dr. Augustine Fou , Group Chief Digital Officer; Ad Fraud Researcher, knows and studies bot activity and believes this issue will continue to plague the industry. He also warns that “Nonprofit advertisers are some of the most vulnerable to ad fraud as they frequently receive remnant ad inventory with higher levels of fraud.” In the coming years, nonprofits need to “beware of vague reporting of where ads are being served, and focus on clear user actions taken on their sites instead of impressions and clicks. Always test and verify traffic, remember that bots don’t donate or volunteer, that’s what you should be measuring against ad spends.” In 2014, bot traffic accounted for about 54% on all online activity, 29% was from ‘bad bots’ that Dr. Augustine studies.


Episode 26

Speaker 1: This is Using the Whole Whale a podcast updating you on data and technology in the nonprofit world. My name ‘s George Weiner your host and the Chief Whaler of wholewhale.com. Thanks for joining us.

Something very interesting happened in 2014. For the first time in 65 years the Turing Test was beaten. This means a machine was able to convince a panel of human judges that it was in fact human. This test was originally designed by Alan Turing to decide whether or not we should consider a bot intelligent or not.

This is an interesting sign of the times and it’s fascinating to me to keep on an eye on this because we base so much, and definitely the work that I do, around web traffic. I mean we’re setting impact goals; we’re measuring how well we’re doing based on web traffic.

A report that came out in 2014 by Incapsula showed that 56% of all Internet activity is actually driven by bots. Now only a third of that is by what they consider bad bots, the spammers and, you know, the nefarious folks. So a lot of the bot traffic is in fact good, the Google spiders and Facebook crawling and things that are designed to hopefully help us.

I tracked down an expert in ad fraud and bot traffic to talk to us today. We’re talking with Dr. Augustine Fou who’s made it his business to understand what exactly is going on with regard to ad fraud which is going on to the impacts of bot traffic. He spends a lot of time working with not-for-profits in this sector and he was nice enough to spend some time with us today.

Let’s jump into this interview and find out exactly what’s going on here.

Speaker 1: Augustine, thank you for taking the time to talk with us today. Can you give us a little bit of about your background, who you are, and what issues do you care about.

Speaker 2: Sure. I ‘ve been doing digital work for about the last 20 years or so and I’ve really been focused on the kind of technical aspects and the strategy aspects. So right now my day job is to really help clients do better in digital marketing, and one of the things that I ‘ve really focused in on recently is the topic of ad fraud. Because most direct path to increasing ROI is to help them cut out all the wasted ads that use it as fraud.

You can imagine an ad shown to bots or not humans will never convert into a customer sale. So it’s really that basic, it ‘s how to cut out all the waste due to fraud.

Speaker 1: Interesting. So tell me a little bit more about how you began down the road of ad fraud.

Speaker 2: Well ad fraud, you know, there’s fraud in every imaginable industry. But in digital advertising the last couple of years more and more things have gone problematic, which means you know in the old days we used to have media buyers go through and optimize and pick where to place the media. More and more that is being automated by computer algorithms. And furthermore there are hundreds of thousands of if not millions of sites that carry advertising so it’s practically impossible to do by hand anymore.

So basically because of being automated, they’ve made it a lot easier for the bad guys to commit fraud. And just to kind of tie to a couple of types of online advertising, an example would be banner ads or display ads and video ads. Those are sold on a CPM basis, which means the advertiser pays for every thousand times the ad is displayed whether they get any action or not. So in that case all the bad guy has to do is load the web page with the bot and cause the ad to load. And in fact they can load hundreds of ads at a time that way. So now that it’s more automated and now that the bad guys are using algorithms to do it they are basically generating enormous quantities of ad impressions and that’s how they steal the ad dollars. So a lot of those basically if you’re looking at it from the advertiser’s perspective, all that’s wasted ad spend. So the key would be to cut out all of that fraud and waste.

Speaker 1: Wow. So you’re telling me these bad guys don’t even have the common decency to rip me off themselves by clicking, they have their robots do their dirty work?

Speaker 2: Absolutely. Previously when they did have click farms and human clicking, that was not nearly scalable enough for them. So now they’re basically using malware compromised computers where they have programs running in the background to load pages and click on ads and things like that so it’s all fully automated. And furthermore, beyond just malware compromising pc they’re even spitting out virtual browsers and data centers. And the idea there is that a lot of these browsers are originally meant to test web pages or test mobile apps before the developer launches them, but because they can do most things like scroll up and down the page and click on things, and so on and so forth, they can be used for these hellacious purposes as well.

Speaker 1: Boy that’s pretty evil. So just to make sure I understand. I have a hundred dollars and I’m going to off into the online advertising world. I spend a hundred of these dollars what happens when some of some of my display ads show up in front of these bot, can’t I just tell it’s a robot that clicked on the ad when it comes back to my website?

Speaker 2: It’s not that easy because the bad guys don’t honestly declare themselves. So if you’re thinking search engine crawlers, those are essentially algorithms that load a page and then look at the content on the page. Those search engine crawlers like from Google or Yahoo or Bing they declare themselves, “I’m a search engine crawler”– for example Google Bot 2.1. So you see that in what is called the user agent which is basically the browser telling you which browser they are.

However the bad guys do everything in their power to not disclose that, right, not tell you honestly they are the bad guy, right, so they’re going to look like the browser that comes from an iPhone, they’re gonna look like a Firefox browser, they’re going to look like an Internet Explorer browser. So when they click through and they arrive on your site there’s no easy way to tell that they’re a bot because they appear in your analytics to be Internet Explorer. So that’s where some of our techniques and our technologies have to come into play to kind of triangulate that they are a bad guy and therefore be able to detect and mitigate that over time.

Speaker 1: Gotcha. So they trick the user agent and they say “Hey trust me, I’m a person clicking around your website.”

Speaker 2: Exactly.

Speaker 1: And we can’t really tell the good from the bad at that point.

Speaker 2: Yeah. It’s harder, it’s not that easy, it’s not like you know they tell you they’re a bad guy, right, so they just hide themselves very well.

Speaker 1: Gotcha. How, I have to ask, you know this sounds pretty terrible to somebody who’s trying to spend money online and especially as we’ll get into the nonprofit world, but I mean are we talking about one or two percent of my you know ads are going to this sort of ad fraud or how big is this issue?

Speaker 2: Well based on our first hand research you know with our own data plus all the research that everyone else has done, you know the anti-fraud vendors that have sprung up in recent years, the ad networks themselves are putting measures in place, so the estimates naturally vary all over the place to the point that they’re almost unbelievable because some say it’s so low like single digit, one, two, three percent, others say it’s so high like it’s 80%, 99% I’ve seen, and so on and so forth. And when you really pick through it you know you see kind of a vested interest being revealed, right.

So the networks have an incentive to say don’t worry it’s all clean on our network, so you tend to see estimates in the one to three percent range. Whereas the security vendors in order to sell their services they have every incentive to say oh, well it’s 60%, 80% and even 99% so that they get business. So that’s why you really can’t trust any numbers that are published out there, you really have to look at it for example at your own data and be able to detect it that way. So again the estimates vary so widely they are almost unbelievable.

Speaker 1: Yeah, I guess to get your head around. But there were reports in 2014 from some major leaders. I don’t know if you know some of those stats just off your head to give some benchmarks that you know that people have picked out.

Speaker 2: Now the most widely quoted number is a third and I think Wall Street Journal said it was a third, and based on logic data I can corroborate that you know there is a lot of traffic going around the Internet that is not human. We have so many crawlers and so many bots, and whether it’s good or bad, a third of traffic on the Internet is not human, right. It’s created by website crawlers, search engine crawlers. Some are bad guys, you know, and so on and so forth. So about a third would be a good you know rule of thumb. So for every hundred dollars you spend you’re probably losing about $35 or so.

Speaker 1: Man that’s crazy. But you know, I think lumping together, thank you for being careful about some of these numbers. You know the third of all traffic, you know Wall Street came out with it, IAB came out with it, but it’s, they’re bundling this together and I’m sure there are some, you know, really bad advertising routes to take. I mean are we saying that $100 spent on Facebook on Google on all of these things, each one of those platforms is fake or are there some areas of advertising that are just, yikes.

Speaker 2: No, that, yeah, that’s an excellent point. So, basically, we’re talking about the super long-tale websites. So the whole point of ad networks is that, you know when advertisers started moving beyond the mainstream sites like espn.com or something like that and they started looking for smaller and smaller sites on which they could put their ads there became this enormously long-tale of super tiny websites that in and of themselves didn’t have enough inventory to try to sell to these big advertisers so that gave rise to the ad networks which would aggregate hundreds of thousands of these super tiny websites and say, “oh, when we put them all together we can do an ad deal media buy for you.  And it’s basically that’s what led to kind of the opportunity for fraud because all these super tiny websites that no human would really ever go to, and in fact when you look at some of them they either have no content or they have kind of algorithmically generated content that when you try to read it it’s unreadable.

But the purpose of those sites is actually to carry a whole ton of ads, right. So when you have these sites and they get added into the ad network and then the traffic to those sites is generated by bots, right? So every page load that loads a hundred ad impressions is generated by a bot. And that ad impression inventory is then sold to the ad exchanges in the ad networks and then that’s what get’s bought and sold by the big advertisers.

So in general if you stay away from you know those super long-tale websites, and you, you especially if you ‘re a small business or a non-profit, literally stick with ad words, literally stick with Facebook ads or YouTube for video ads. Those mainstream guides and mainstream sites is not where the fraud is happening. So that’s a great easy way to say no if I really want to make sure that my ad dollars are not wasted on fraud don’t go for the long-tale stuff, don’t buy traffic, don’t do all these partially shady kinds of things. And then, for the most part you’ll be okay.

Speaker 1: Yeah, I think that’s a, you know, a general good rule to live by there, but you know, if you be careful what you pay for and if the deal seems too good to be true, it probably is.

Speaker 2: Exactly. When you see thirty cent CPMs something ‘s wrong with that, right. There’s a reason mainstream sites like ESPN have to charge tens of dollars CPM’s you know because they actually have good content that humans actually want to go read, you know, so you really get what you pay for. And I guess in the recent years it’s kind of in the search for more and more and more and more impressions that’s what led big advertisers to say “Oh okay, let’s go buy some of this lower cost inventory because that might expand our reach.”  And a lot of that’s coming out of the reach of frequency and that’s how data is bought in from TV advertising, right, so we want to show our message to more people and so that’s when they try to buy more impressions and “oh by the way if we’re buying these impressions at lower cost we’re going to get more impressions at lower average cost,” and that’s how they kind of justify it, “oh well, I’m getting a better ROI because I got more impressions at lower average cost.”  But what they’re doing is mixing in this really, really dirty inventory.

Speaker 1: Man it sounds like junk bonds all over again.

Speaker 2: Exactly. This is the digital advertising version of that and, you know we’re coming very close to the total collapse of that and if it doesn’t happen then digital advertising is going to lose credibility and all the dollars that big advertisers want to spend in digital are going to go somewhere else. So it’s really important for the industry, the digital advertising industry to get this cleaned up as soon as possible.

Speaker 1: Yeah. So, on that topic, what is being done by the industry? Like who are the good guys, who are the players out there?

Speaker 2: So, you know, long story short, there’s no single solution that’s gonna work, so let me give you an example what the industry associations are going and it’s all a parts of the puzzle, this all is. So the industry trade groups are really, really great at setting standards and things like, you know, what is the view ability of an ad, like is that ad above the fold and therefore viewable by a human, or is the ad so far down the page that it never actually gets viewed. And so their setting standards around things like view ability, and now some companies are moving towards okay we want to make sure we buy only viewable ads, that kind of stuff. So standard setting, best practices, education, those are things that industry associations can do for the industry.

Then there are some other good guys like the anti fraud inspectors and there are many different types but I’ll explain two of them. One would be an example where they’re in the ad network and they’re trying to detect in real time you know whether it ‘s a quality bid and whether they should place an ad with it, and those would be companies like Double Verify or integral ad sites. There are others that work on the website so those would be like white offs, pixelay correctit and in that case what they’re doing is detecting all the traffic that comes through the website.

So there’s different approaches to detecting, and then they have their mitigation methodologies afterword. So there’s a bunch of good guys that are working on it, but the reason it’s still a very big challenge is that the bad guys don’t have to play by any rules, right? They can cheat however they want, there’s no rules. and furthermore, they’re probably sitting in their pajamas in Russia somewhere, or else where it’s out of the jurisdiction of the good guys so they’re almost like committing this fraud with impunity and that’s really why it’s such a challenging problem for the good guys to solve.

Speaker 1: Yeah. I can see that, and also anytime they publish a solution it’s like whackamole

Speaker 2: Exactly. So once the bad guys know what you’re looking for they can adjust their algorithm so that they can better hide. So literally for instance I published a piece on humans sleep at night and fraud bots don’t, we can start to see literally within weeks some of their traffic coming in at night starts to go down and the traffic coming in during kind of waking hours starts to go up. So they’re able to add their bounce phrase, as well as how many pages view on their websites, all of those things are variables that the bad guys can compute in their algorithms.

Speaker 1: Man, bad guys suck. All right, all right. So let’s change the conversation a bit to not the general industry but what can nonprofit organizations do to protect the credibility of their data, we already talked about, you know the Incapsula report for instance that says 56% of the internet’s traffic is by bot and about 30%, 29% there abouts are evil bots, like things that are just causing chaos. So how do we protect ourselves?

Speaker 2: So, in general, you there’s not a lot of so it’s not about protecting yourself, like they say things like ESPN, abc.com, CNN all those guys, there’s still going to be a certain amount of bot traffic that goes to the website no matter what, and that’s because some of those are search engines that are indexing the content on their pages and that kind of stuff. Others would be, you know when you post a page on Facebook it has a little bot come to your site and grab the images so that when you post it ads an image and a little bit of text to go along with it. So those are not necessarily bad bots, but those are bots nonetheless. So when those things hit the page or grab an asset from your website, those shouldn’t count towards traffic, right? So you know instead of thinking about kind of protecting yourself from a bot, there always going to come.

Nonprofits are a little different from advertisers because they may or may not be spending a lot of ad dollars, but some do, but not all of them do, right. So the general advice kind of best practice that I tell nonprofits is to really focus in on human like action, right. So I’m, I always like to say fraud bots don’t volunteer for public service, okay or fraud bots don’t make donations. So those are things that typically only a human would do, so if they focus in on those kind of things, right. The actions, the desirable actions that they expect that humans would do, instead of focusing on how much traffic they’re getting to their website, that’s going to be a much better way of thinking in terms of “how do we optimize these campaigns? or how do we optimize for these human actions?  right. So again it ties a little back to what advertisers, it’s kind of a mindset they have to go through which is instead of asking for more, and more of reach and frequency in terms of more ad impressions, lets now focus on those that actually convert, those that actually turn into some kind of desirable action. So for advertisers it’s a person actually buying something from them, for nonprofits it’ll be someone signing up to volunteer or someone making a donation through the website. So those are the kinds of things that they should be focused more on rather than the inbound traffic or the inbound kind of impressions that they got.

Speaker 1: I love that, the fraud bots don’t donate. And by the way, if they start donating then, you know, I don’t think they’re that evil anymore.

Speaker 2: Oh, okay, yes.

Speaker 1: Good bot, nice job

Speaker 1: So Google Analytics as you kind of mentioned there, Google Analytics is filtering out the general functional bots, the things that are there to scrape your site from Google, things from Facebook. Google is already cleaning those, but there’s another level that you’re suggesting that we have to do which is set up the tracking for human determined actions as best we can tell on our site.

Speaker 2: Yeah. So the Google Analytics stuff what they’ve done is obviously they have the list of known bots from IAB, right, that encompasses both good and bad, and when you have that show up, like I mentioned Google bots will say I’m Google bot and coming to crawl your site so that we can index pages for the search engines, right? So Bing bot, MSN bot, Facebook, if you have those known bots that’s what Google can filter out, and that’s actually the easy part. Those are the honest bots that actually declare themselves as user agents. So we know the Google bot when it shows up you on our website.

The hard part, like I mentioned before is that the bad guys don’t declare themselves honestly, and they’re doing everything in their power to disguise themselves. So I’ll give you an example how we can look for it in the data, right. So normally a human would access a website through a standard browser and most of those browsers have some kind of plugins like Flash or Silver Light or whatever, so if we see a user go to the site and there’s no plugin that’s something that’s suspicious. But that one factor alone won’t allow us to declare them to be bots, right. We just know there’s something suspicious because humans usually come to sites with a browser that has plugins. The other would be things like if the window resolution or screen resolution is an odd resolution like 10 x 10 or 100 x 100 right? Normally a laptop screen would be 1366 x 758, right, so again that is something that is suspicious. So these are the kinds of things we look for. Obliviously I won’t go into too much more detail, otherwise

Speaker 1: Don’t give away the secrets man, don’t tell them!

Speaker 2: Ha-ha, no the idea would be to look for the anomalies, and things that are out of pattern, or strange that is kind of unusual for a human visitor. And then you know that’s only one part of it, right, so then you later look at if they end up buying something if you’re an ecommerce site, did that visitor end up signing up and volunteer for public service or something like that. So, again not one thing, so any item or any parameter alone will not be enough to usually determine that they’re a bot, but those are the kinds of things that we look for.

Speaker 1: Yeah, so I guess going back into the broader, the broader sector what makes me nervous here, and you know I joke about the junk bonds but we have a lot of the same ingredients, you’ve got a market here of certain ad networks that aren’t really incentivized to cut out you know a third of their business, you’ve those folks on the fraud side able to leverage and do a lot of damage thanks to cloud computing and decreasing cost of that, and you’ve got now an influx of money in the digital world. You know, when are we going to hit this iceberg?

Speaker 2: Yeah. I actually think we’re going to hit the iceberg very, very soon, and it’s because of a couple of deep trends and one of them is the kind of explosion of video advertising.

So in the pre 2014, actually in the full 2013 results, we’re still waiting for the full year 2014, in the 2013 numbers we saw a dramatic increase in the number of video ads served, literally four times the amount of the prior year. So there’s clearly a lot of demand for video advertising and a lot of that demand is coming from advertisers that have historically put their ads on TV. So, now as they’re looking to shift dollars into digital, that’s a big huge bucket of dollars that are shifting into digital. But the problem is that what we’re looking at is in terms of digital ads is the CPMs are ten to twenty times higher than banner ads and so it’s also been a prime target for the bad guys who just have to generate the video ad impressions. So that’s kind of a big trend that is happening, and like I said earlier, if we in the digital advertising industry doesn’t clean this up sooner rather than later all of those big huge dollars coming in from PD are basically either go on pause and not shift in, or like literally not come in at all, so that’s what’s at stake here.

You know like the other thing is around education. So as long as advertisers and their media agencies are basically using the number of impressions, ie., the tonnage as a metric or BPI, these problems will persist because if they’re out to buy more ad impressions the bad guys are more than happy to generate as many ad impressions as they could possibly buy because they’ve got an army of bots that can just generate as many ad impressions as you want. So as long as that mentality continues to hold then this problem will continue to persist.

Once advertisers and their media agencies start to shift their thinking and say let’s focus in on the desirable human actions like we said for nonprofits or ecommerce or whatever, and be less focused on the tonnage of impressions that’s going to be how we make the change. The focusing in on you know things that actually drive human buyers to convert into customers rather than these bots generating tons of types of ad impressions.

Speaker 1: Yeah. We’re going to have to shift away from this ‘just get me the eyeballs’ to let me get the actions.

Speaker 2: Yeah, yeah. And that kind of stuff takes time, so in the meantime we have the good guys like the industry associations and we have the anti fraud vendors helping out but again this is a longer term kind of thing where we do need to change our way of thinking to focus more on what we’d generally call performance marketing, right. Even in a case of branding advertiser if they focus on what does performance mean for them then that’s going to be a lot better way of thinking about it than simply buying more and more ad impressions and increase of reach in frequency.

Speaker 1: Yeah. I love this and if you extend this to the nonprofit side you know the same kind of I’m a foundation and I’m supporting people that get more online awareness I’m going to be more and more focused on the types of behavior rather than raw traffic.

Speaker 2: Exactly. And that comes really basic stuff, instead of sending the report where the very first thing in the report is the amount of traffic to your website what if you actually started to focus in on how many people actually made a donation or, you know, how many people actually signed up for service or something like that. Those would be the more important things, so literally in the report that you send back to the executive committee, or whatever, if you just change that around that can help us start the process of the shift.

Speaker 1: Well, I think there’s a lot to think about here. Thank you so much for sharing your knowledge and your time with us today. Augustine, how do people find you, how do people, you know help you or get in touch?

Speaker 2: Augustine Fou, you can Google me, so August plus ine , last name f-o-u. I do a lot of writing on LinkedIn, and on Slide Share and look forward to connecting.

Speaker 1: Well again, thank you so much for your time and we will be aware of the bots now.

Speaker 2: All right. Thank you very much.

Speaker 1: There’s a lot to think about here. There are ingredients for destruction we can kind of almost clearly see. You’ve got a proliferation of bots, you’ve got revenue models here that kind of depend on some of this fake traffic, and it’s not hard to imagine a lot of this kind of crashing down. But why I care about this topic for the not for profit sector is I want us to be ahead of the curve. I want us to start to understand and ask the hard questions of are we sure that our internet traffic, the things that we base our impact on, is in fact real, and if we’re spending dollars let’s make sure we’re not wasting on fake clicks that simply make vanity metrics go up.

The key takeaway just to remember is that bots don’t donate and if they do and if you’re making revenue from that well, okay, that’s great, but follow that thinking when you’re doing your analysis, and make sure that you have the proper tracking set up in place.

One of the things that Whole Whale has been doing with our clients when we turn on the filtering that Google Analytics is it’s stopping a small percent, probably less than one percent of the overall traffic. We recommend creating a back-up account and making sure your main account where you’re driving insight is filtered out as best you can for bot traffic. This is going to be an ever changing landscape but if you’re looking for more resources for this and also Dr. Augustine’s work we, as always, have our resources on wholewhale.com/podcast where you can find this podcast with a bunch of resources to help you fight those bots. Good luck out there!