Shahin Pirooz, CTO, DataEndure: Today’s topic is a fun one. Talking about how bots are taking over the internet, and how most of the traffic on the internet is no longer human traffic. More than half of it is generated in an automated fashion with bots and other sources. But that shift is really the topic of today, and hope you get excited like we are, and a little scared about what’s coming. But let’s jump right into this, and, David, if you’d start with a little introduction and your background, that’d be great.
David Holmes, CTO of Application Security, Imperva: Sure. As Shahin said, I’m David Holmes, CTO for Imperva, which is an application security company, so think of like protecting web applications and websites from malicious attacks. And I’ve been in that business for a long, long time.
I did a stint as a Forrester Research analyst for just under five years up until last year. And that was a very exciting and interesting job. I spent about half of my time talking with what we called end user clients, which is, you know, people who buy stuff or looking for advice on, “Hey, I’m trying to solve a problem. How do I solve this problem?” And then the other half of time talking to vendors who were selling solutions. So it was really interesting being trusted by both sides, as being like an arbiter. But now that I’m back in the application security world, it’s a frightening place to be, Shahin.
Shahin: It sure is, and it doesn’t stop being. It keeps getting more and more as the bad actors are doing the same things we do. They’re starting to leverage AI to accelerate what they can do, do it faster, do it more efficiently. you know, and what started us down this conversation was the report that you guys put out, which was around the types of traffic that we’re seeing on the internet, and–
David: We put out a report every year called the Bad Bot Report, and it is full of statistics and narratives from data that we’ve gathered from our anti-bot service. And in this most recent report, we see tens of trillions of internet transactions per day, and we can tell which ones are automated, right? That’s like the whole purpose of the service is to see which ones are automated, so you can then decide what you wanna do about those.
And we noticed that for the first time since we’ve been doing this report, 51% of the traffic that we saw was generated by some kind of scripting or automation, right? It sort of means that the humans are now the minority on the internet.
Shahin: When we started talking about this, I jokingly said that the internet is not for humans anymore.
David: Right. And, maybe we’ll all look back, like remember those glory days when it was actually humans on the internet? Have you ever heard of that theory called the dead internet theory? For anybody who doesn’t know, the dead internet theory is this sort of theory that almost everything that you see on the internet is generated by some bot or something, or some kind of automated… And it’s really just this ghost town inhabited by robots that you might be stumbling through thinking that everyone you’re talking to is human. I don’t personally believe that yet, but I think–
Shahin: We’re halfway there.
David: Yeah. It’s showing that getting there.
Shahin: Yep, exactly. So let’s take a minute and tease apart that notion of good bot versus bad bot. Obviously good bots are search engines, monitoring tools, things that are intended to collect information to help businesses do things better. But the challenge is that they coexist with the bad bots, which are things like credential stuffers and scrapers and scalpers and so on and so forth.
Are you able to tell what percentage are which?
David: Yeah. As a matter of fact in that same report, we published the overall amount of traffic that we saw that we could attribute to malicious, say malicious automation. So bad bots. And it was 37% of all the internet traffic. So more than half of the half of the internet traffic was malicious. So, there is a lot of positive automation out there, web crawlers, search engines. Those are like the two main ones. Actually, there’s some new ones coming out.
Somebody was telling me about the Amazon Buy Bot, which apparently you can give it your credit card number and say, “Go find me the lowest priced thing anywhere on the internet and then buy it for me.” Which also seems terrifying because there’s a lot of agency that you’re putting into this automation. So that that would be another example of a good bot.
But the reason there’s so much more malicious automation out there is they’re all business cases for bad actors. And there’s some that you’ve heard about. Like, so for example, if Taylor Swift does a concert or whatever and they release tickets, almost all the tickets are bought by bots, right? And one could almost say, “Is that really a bad bot? It’s legal, right?”
They paid for the ticket. They just happen to use automation. But a lot of people end up feeling hurt that they weren’t able to get the tickets. And of course, those tickets end up getting resold on aftermarket, for way more.
There’s things like that, or there’s account takeover, which is any website that has accounts on it, there are attackers out there who will simply try every known username and password they’ve ever seen against your website. Not even necessarily to attack you. They might just be trying to validate the list. Are these passwords still good anywhere, right? But that ends up sucking up a lot of your compute time and causing all kinds of havoc.
Or, any sort of retailer who has any kind of inventory that anybody might be interested in would have competitors, and the competitors might write some scripts to create a bunch of accounts, and with each of those accounts go and add one single item to their cart, and that removes it from the inventory. So they don’t actually buy it, it just blocks the inventory so no one no one can buy it. So then the people who are trying to buy that one item then have to go to the competitor’s site.
There’s all these different use cases like this. There’s almost more than you can imagine, and behind almost every one of them, somebody is making some money somewhere.
Shahin: So sadly, human nature has always created scenarios like this and situations like this, even not in a digital world. But as digital has consumed so much of our daily interactions, I can tell you that I have friends that no longer walk into a brick and mortar store for anything, not even groceries. So when you think of that context, it’s that same behavior, that same bad actor behavior that, you know, we used to attribute to the mafia or ruffians or whatever, is now happening on the internet in the same context. It’s just they’re able to move faster because they’ve automated these things.
David: Yeah, and the thing you need to realize about automation is it makes you be able to scale. Back in the Wild West, if the money was in a bank, an attacker would literally physically have to go to the town where the bank was and then rob the bank. And if it was a local, the sheriff might know who this person is, like, “Oh, I knew he was gonna do that at some point. I’ve just been waiting.”
But now, an attacker could rob every bank around the world without even leaving their desk.
Shahin: At the same time.
David: Yeah, the same time. It scales everything.
Shahin: 100%.
I read this very interesting book called, I think it was called The Social Network, and it was about this, not to be confused with the Facebook book, but the idea behind The Social Network was that it started with way back in the day when the first villages came together. Then it became towns, and then the towns got too big, so they created multiple towns. And so now, the network was connecting those towns together and it was roads. And as we got bigger and had cities, it became rivers, and then it became highways, and then trains.
And so today, our social network is literally the internet and the speed, it’s kind of time offset reality, because we can move so much faster than we can process things, or we could even physically cover the space that we’re trying to communicate with somebody across the world. So it’s created scenarios and situations that were unfathomable to us not even 30 years ago.
David: Right. Similar, so I’m in Hong Kong right now. It’s like 1:10 in the morning. And if you go back 300 years, it would take you maybe a half a year to get here, and there was maybe a 40% chance you’d arrive alive. And if you did, you would be in really poor health.
But now you can get here in less than a day. Less than a day. But, you know, on the internet now, everything’s almost instantaneous.
Shahin: Microseconds, yeah. I mean, this is a perfect example. We’re here doing a livestream together on the opposite sides of the world.
David: We’re probably 12 hours off each other.
Shahin: Yeah, exactly. So, all of this creates this notion of an erosion of trust. It creates a trust challenge, but it also creates a resiliency opportunity. Let’s spend a few minutes talking about what that erosion of trust is, and how, you know, in specific, Imperva can help with the resiliency opportunity.
David: That’s a great point.
I do want to point out that there never was trust in the internet. It was originally built, and I used to joke that I was there when Al Gore invented it or whatever. I was in college working on actual real networks that had like security and authentication and stuff, and the network we used on the college was the internet. And you could read the protocols of it and see that there was no security built into it at all. Like, if you were trying to send an email, you didn’t even tell the email server who you were. You just said, here’s a mail from Shahin to staff all, “I love you guys.”
And it would just accept it.
Shahin: And I’m not saying, but we might have taken advantage of that in college.
David: And I think the reason that the internet became the de facto standard was because there was no security in it. It was just easy to set up. And we keep making this mistake as humans, over and over and over again.
Like, do you remember during the pandemic, there was a certain video conferencing software that basically had no security in it, and so everyone used that one, right? We keep doing this over and over and over again, picking the option that has no security.
But to get back to your question, what had happened to the erosion of trust? Yes, we then we sort of built a mountain of trust by piling different Band-Aids and workarounds on the internet so we could sort of get it to where it was somewhat trustworthy. What most consumers and citizens don’t know is that behind the scenes, there’s like this constant battle going on between the good guys and the bad guys, just trying to keep the internet going.
And the good guys now are pretty good in that they can sort of threat model, like, what might the next attacks be and then try to get ahead of them a little bit. It’s very much been a cat and mouse game this entire time. And so to a regular user of the internet, you might go to a website and you log in and you buy something and you think, “Oh, that was easy,” without realizing that there’s an insane amount of technology that’s there just to figure out, is that really Shahin we’re talking to? This credit card number… Okay, he just used this nearby. It’s still good, hasn’t been compromised. And his password isn’t one that we’ve seen, you know, 100 times before.
So while we have built this thing that the average consumer can trust, and God bless us all for doing that and let’s continue to do that, it’s not that it’s been effortless. It’s been this battle going back and forth. And anybody with an IT security budget knows that, right, Shahin?
Shahin: Exactly. It’s a big part of what we bring to market is that managed security services for enterprises specifically to fight the battle you’re talking about. I often say that the next world war will not be physical. It will be cyber. And we’ve seen evidence of that across the board. We’ve seen even in the most recent wars, we’ve seen more drone attacks than human involved attacks and that’s all run through technology. So we’re seeing more and more that the cybersecurity teams are actually much like the militaries of the past where we’re constantly trying to defend this beachhead we call our piece of the internet.
David: Right. In the spirit of talking about what are you defending, I had mentioned before the phrase application security. That is really what it is.
If you’re thinking about a web application, you’re running a website or a web service of some kind, what are the the four sets of defenses that you typically need in front of that? And you’ll see a lot of different vendors sell this particular portfolio and it’s mostly just four things.
There’s this web application firewall, which is like the original granddaddy security control of application security, right? And it’s the one that sort of correlates to what they call the OWASP Top 10, which is an old, old list that the OWASP organization updates every five years or so that says, here’s what the top ranked threats are. And that particular tool that was the first one and it’s still the most important one.
And then everything else, sort of like a firewall in a regular network, everything sort of revolves around that or is pivoted or attached to that. Same thing with the the web application firewall. The things you would bolt onto it are probably, number one, like an anti-bot defense, like I was talking about, for all of that malicious automation that’s trying to come in. You need something that can sit there and look, Okay, this is an automated script. Am I categorizing it as a good one or a bad one? And then what am I going to do if it’s a bad one?
Another one is, a lot of services now are written as what they call APIs, application programming interfaces. And this basically allowing programmers to take functionality, your business logic, and then express that in such a way that, it can purposely be called by automation. Sometimes it’s a browser, but sometimes it’s a set of scripts or some kind of partner connect thing. And that’s a very, very interesting space because it is your business logic translated in a digital form that can do all of the things that you want to do for you and your partners and your customers. And it was largely ignored by hackers until recently. The hackers have sort of figured out, Oh, you know what? Why are we even attacking the web applications where we have to like, authenticate and there’s CAPTCHAs and all this stuff? Let’s just attack the APIs. Actually, all the data we want is in the APIs.
We’re seeing a ton of attacks that are automated that are trying to attack APIs. Now, there hasn’t traditionally been a tool that sits in front of it, like your API gateway, and then blocks things like a web application firewall does. And I’ll tell you a little secret: there can’t be something like that because your business logic is specific to you, so no one can write a signature for it. No one can know that if you create some some business logic that a manipulation of one parameter like an address or a secondary address may cause an actual vulnerability or cause somebody’s shipment to get sent to an attacker’s house.
So, it’s a much more difficult space to operate in and the defenses are defined much more by artificial intelligence and machine learning as it tries to spot things that look bad rather than a signature-based thing like a web application firewall. And I’ll tell you something terrifying, good and bad. Most organizations have hundreds of APIs, business logic exposed. Good researchers have figured out perhaps the easiest way to understand these APIs is to take one of these new AIs, these LLMs, go say, “Hey, go connect to the entire API and like search it all and then come back and then show me the structure so that we can understand it.”
And that’s great because there’s been no tool to do that before now. But the terrifying thing is the attackers are doing the same thing.
Shahin: Exactly.
I think part of this shift is, even if we go back 10 years, we were developing web applications that were talking to databases. And what’s shifted is the whole Web 2.0 API-first mantra to create basically, fundamentally, is the shift from the traditional manufacturing-based EDI interactions between companies to API-based interactions between companies. So almost every website today is developed as an API-first platform where that web front end is nothing more than a front end to the actual application, which is the API. And so that’s kind of the challenge that shifted us here into the space we are.
And I agree with you, it’s really difficult to understand the business logic. But I kinda liken it to the traditional, when antivirus was the thing, when we were looking for signatures and definitions on an endpoint. And today, that defense mechanism does not work because we malware that is changing its behavior and its signatures every time it runs so that you can’t detect it, so it avoids detection. What has evolved, and I think where you’re going with this API and I hope I’m leading you in the conversation here, is this notion of behavioral-based analysis of that traffic to see if it’s really, just like on the endpoint security where we went from antivirus to EDR, API is now a new beast, and it’s no longer signatures don’t make sense anymore because they can changed and modified.
David: That’s a really good analogy, Shahin, going from antivirus to EDR. And, that was like a huge moment in the endpoint space. That’s one of the biggest things that probably happened in the last 10 years. And, some great businesses were built around that that are $10 billion businesses today. But one of the reasons that was necessary was partly because I think the network people had finally gotten good enough at network security that it actually made it kinda hard for the hackers to hack in through, say, a switch or a router, or through a firewall.
It became easier to attack the actual people in the organization. Get some malware onto their desktop and then now you’re already in, right? And so then there was a focus around moving to the endpoint. I think the analogy might be of attackers moving from the web application to the API. Very similar.
Shahin: Now one of the challenges is very similar, continuing that analogy, is we used to think that we had these castles and that’s why antivirus was good enough. Because the moats and the firewalls and everything else were protecting us from the bad actors getting in. The erosion of the edge kinda caused a challenge where we no longer have a firewall protecting us everywhere which is part of the reason the endpoint space became much more targeted.
But similarly, we’ve done the same thing in this web application space. We’ve literally moved to a model where we’re now exposing the innards, if you will, the part that was, it was a trusted rather than a zero-trust approach and now it needs to become a zero-trust approach to security. And problem we have is most APIs are not very secure. They haven’t been developed with a security-first mindset because of that shift from it was inside the castle, now it’s not.
David: Yep, yep. And for the listeners, if you haven’t, hopefully been able to pick up on this, but a lot of what we’ve been talking about for the last 15-20 minutes is, these are B2C applications. This is business-to-consumer and you’re basically accepting traffic from at least your own country but very often the entire world. And that’s literally the least trustworthy traffic that’s out there. And as we mentioned 37% of that being malicious.
Shahin: So, you mentioned earlier, David, that there’s the four categories. So it was upfront the WAF, critical. You wouldn’t have an office without a firewall. You shouldn’t have a web application without a firewall.
Now we go into the API space. You wouldn’t have an endpoint without EDR software running on it. Similarly, you shouldn’t have a API without some sort of behavioral analysis in front of it. You wouldn’t have identities that you’re not securing. You need to have the ability to identify which identities are real, which are not, and be able to determine the legitimacy of an individual who’s logging into your applications. And I think these things are understanding that web applications which are now your storefront are as critical as your physical assets and virtual assets inside your environment. And there needs to be something done with them to protect you because bad actors are targeting them every day. It’s big business.
I think there’s a couple of factors. It’s coming in through the APIs, it’s stealing data and basically doing fraudulent transactions and all these things. But it also is a potential to get into your infrastructure and create problems from a ransomware perspective and everything else and make it a bigger issue. So I think it’s all tied together and one of the reasons we wanted to have this conversation is obviously the fun dialog of the dead internet and the notion that bots are now taking over or have taken over since they’re more than 50% of the traffic is a fun topic. But it’s also a very serious topic and Imperva and DataEndure are here with a lot of capability to help close these gaps for you.
And there’s a couple of takeaways we can give to the users, the listeners here, about how to get started with this and reaching out to us. There’s a couple of very simple things that can be done to just evaluate if Imperva’s technologies are the right thing for you.
Number one, we can help you set up a free trial which will evaluate a non-critical application or if you choose a critical application to see what’s going on, what visibility do you get by just implementing Imperva’s WAF which is well-recognized as one of the top WAFs in the market.
Number two, much of these applications are built in AWS, and you’re using the database as a service functionality within AWS called RDS as the backend to the APIs, and you’re snapshotting that data. Imperva also has an application, or the Thales group has an application that we can use as an assessment of your snapshots to see what kind of data leak or data loss or PII is in that data that you can quickly get a sense of, do we have a risk here or not?
So these are very two simple, low-risk scenarios that you could run as a starting point to help determine is this the right path for us to take to secure our applications. David, would you like to add anything to those?
David: I would, and you described those two options perfectly. I couldn’t have done a better job myself. The one thing I would like people to walk away from our discussion understanding is that this is really, really hard want partners like us who know what they’re doing to to help you.
Shahin: I agree. It’s not easy by any means, and it takes teams of people. And we have a lot of customers who have come to DataEndure for our security offerings specifically because their people are taxed. To day in, day out battle these bad actors takes a full organization’s focus, and that isn’t their core business. It’s that core versus context dialogue. So, it’s hard to develop these technologies, evaluate these technologies, and utilize these technologies to your benefit.
And I think there’s a couple of practical takeaways I would say from today. Part of what we’re talking about is get some visibility and understanding. How much of your traffic today is bot driven? And certainly the WAF technology can get that visibility for you at a minimum, auditing your APIs to understand, what APIs are public? Which one of them are shadow or vulnerable, if you will, and have security risks that you need to address that you can take back to your development team and close some of those security gaps.
And then lastly, I would say really prioritizing adaptive defenses. This is that notion of moving from a signature-based to a behavioral-based so that you can defend against these new types of attacks and the bots and understanding which bots are good so you can let them through and which bots are bad, rather than, no offense, but blindly trusting that all traffic is there because it’s supposed to be.
David: Well said, Shahin.
Shahin: So with that David, thank you so much for being up in the middle of the night in Hong Kong to join us and have this live stream. As always, it’s been great speaking with you.
And to our listeners, please don’t hesitate to reach out. We’re happy to engage and get you some of those trials and bring David and his team to bear to help you close any technology gaps in this space that you need. Thank you very much.