How to Use Your Robots.txt File as a Cyber Honeypot for Counterintelligence

What is Counterintelligence?

Counterintelligence typically describes an activity aimed at protecting an agency’s intelligence program from an opposition’s intelligence service.

More specifically Information collection activities related to preventing espionagesabotageassassinations or other intelligence activities conducted by, for, or on behalf of foreign powers, organizations or persons.

In terms of this blog post, we’ll use a Honeypot based approach to see who is using looking at the Robots.txt file and scanning folders we’ve asked them not to and record information about the HTTP call for later review and analysis.

What is the Robots.txt File?

The Robots.txt is just a simple text file meant to be consumed by search engines and web crawlers containing structured text that explains rules for crawling your website.

In theory, the Search Engines are supposed to honor the Robot.txt rules and not scan any URLs in the Robots.txt file if told not to.

Robots.txt was supposed to help avoid overloading websites with requests. According to Google, it is not a mechanism for keeping a webpage out of Google Search results.

If you really want to keep a web page out of Google you should try adding a noindex tag reference or password-protect the page.

With a Robots.txt file, you can create rules for user agents specifying what directories they can access or disallow them all. It all sounds OK in principal but on the internet, nobody really plays by the rules.

In fact, the Robots.txt file is one of the first places a bad guy might look for information on how your website is structured.

Too many websites make the mistake of using the Robot.txt file without giving thought to the fact they might be rewarding possible OSINT or hacking reconnaissance efforts at the same time.

A Look at Amazon.com’s Robots.txt File

If we take a quick look a big website like Amamzon.com to see what their Robots.txt file looks like all we have to do is load up this URL.

https://www.amazon.com/robots.txt

What files or directories does Amazon tell the Google search engine not to crawl or index?

It looks like account access login and email a friend features are off limits so these are the first places a hacker will be looking.

More Sample Robots.txt files from Google

# Example 1: Block only Googlebot
User-agent: Googlebot
Disallow: /

# Example 2: Block Googlebot and Adsbot
User-agent: Googlebot
User-agent: AdsBot-Google
Disallow: /

# Example 3: Block all but AdsBot crawlers
User-agent: *,
Disallow: /

You can find more detailed information on how to make more complex robots.txt files over on the Google Search Central area for developers.

A Real World Robots.txt Based HoneyPot Example

Using the Robots.txt file as part of a honeypot system, we will broadcast a list of honeypot folders we don’t want search engines to index, but in this case, it will be a list of folders pointing to honeypot pages.

Having a honeypot / data collection service running in these folders allows you to see who is using the Robots.txt file to scan your web server thus tipping you off that OSINT footprinting activity on your webserver or domain names may be taking place.

These folders have a Disallow rule but contain honeypot code to collect information about the HTTP calls made against them and in some cases to redirect the user-agent somewhere else.

A Sample Honeypot Robots.txt File

A Sample Robots.txt file where we are telling all user-agents to stay away from our admin, wordpress and api folders.

# All other directories on the site are allowed by default
User-agent: *
Disallow: /admin/
Disallow: /wordpress/
Disallow: /api/

The Robots.txt file I’m discussing in this post was collected from the online classifieds website, FinditClassifieds.com.

If you try to hit any of the URLs found in the Robots.txt file, you’ll be redirected to a Rick Roll video on YouTube.com. IP address data is collected in a log for a more detailed review.

Each one of these honeypot URLs do a little something different.

The first honeypot URL replies with something naughty, the second logs it as a scan and the third URL is the Rick Roll redirect.

Depending on the honeypot page, you can collect data from the user-agent and log it before you redirect them off to the Land of Oz.

Using a tool that was originally created to be helpful that ended up becoming dangerous can now be a double agent if you set it up correctly.

Hoping this helps someone on their InfoSec journey.

~Cyber Abyss

Online Scams: Puppies for Sale or Are They? Probably not! Buyer Beware & Read This before buying a Puppy Online.

Email Address: Your Internet Driver’s License

First things first. We all need an email address in order to do anything meaningful on the the web. You do and the bad guys do too! I would go as far to say that an email address closest thing we have to a driver’s license on the internet today. Without an email address, you are on a read only version of the internet with no way to interact with with world.

By Federal law, you’re not allowed to have an email address until you’re at least 13 years old. This is specified in the FCC’s Children’s Internet Protection Act (CIPA). I often have to advise my clients on these types of issues when deciding who can legally register on a website.

An email address allows you to register on websites by validating your email address. An email address / IP address combo is the easiest and most cost effective way to provide a first pass at knowing who your customers are online.

At least we are supposed to expect that they are at least 13 years old because Google and Yahoo must check this for every email account, right? LOL! This will be important to the story below.

Blue French Bulldog Puppies For Sale or Are They?

First off, let me start by saying you should read this article which is a case study on the Anatomy of a Puppy Scam by Rae Wondersmith. A great primer on the subject!

Next, I will be hiding identity of the suspected scammer while disclosing enough details to be helpful in the analysis of the individual and the patterns observed.

The data I’m sharing comes from anti-fraud systems I’ve designed that are working in production on what I’m hoping will eventually become a popular website for local classified advertising. Maybe I’ll reveal the name of the site at the end of this article.

Blue French Bulldog  Puppies for Sale

Pic of Puppies uploaded by the Scammer

On 11/30/2020 a suspected Puppy Ad Scammer created four (4) accounts in four (4) different cities in a very short period of time.

Three of the accounts came from one ComCast Cable IP in Salem Oregon which matched one of the advertisements which did not raise a red flag initially.

They kept creating new accounts for various cities and creating a single ad for the same dog breed for each account. They targeted Salem Oregon, Kalamazoo, Boston and Lansing.

Then they posted again but IP switched from Oregon Comcast to Verizon Fios in Virginia but anti-fraud tools I built help me see it is indeed the same person registering again from same browser even though the IP had changed. I’m not sure if they are using some sort of VPN to shift the IP / Location.

I know it says “Email NOT VERIFIED” in the screenshots below but they are. I had added an email ban to the system and it reflects back on this view as NOT VERIFIED but they were. The email verification process data sits in its own database table. I collect the IP addresses from the user at start and finish of the email validation process.

I can also see the email exchange in the email server logs files which I also check daily. All of this data can be verified by looking at several system logs.

Connecting Accounts Created by Same Person on Earlier Sessions

Going back thru recent account creations I see another account matching one of the scammers email.

Observations & Fraud Patterns

Broken English or poor grammar.

Example: Breath Taken Blue French Bulldog Puppies Ready Now To GO

Notice poor grammar of Breath Taking as Breath Taken there are other examples through out the text

Phone numbers used

240 Maryland Area code in the phone number used and same phone number using in most of the ads.

Phone number was not used on all of the ads posted by this scammer

Unique Account details repeated

Same password was used on all of the accounts!

This is proprietary but yes, all accounts used the same password.

More Analysis

So far this is what I think I have and is subject to change if new data overrides this.

  • User is probably not native English speaker but may be located physically inside the US.
  • Has methods to change IP via VPN or access to computers in those cities via nefarious methods (hack) in order to hide their real IP address.
  • Its is very easy to create email accounts. This person has many email addresses and personas ready to use or creates them easily and often.
  • Only targeted one breed so far

Raw Data for Analysts

In order to help analysts and law enforcement, below are the actual ad text used in the scam advertisements.

Scam Ad for Puppies #1

Much love we have for them, we are really proud to find them a good pet loving home where they will be spoiled with much love and care. they are home raised, well fed, vet checked, vaccinated and had their first shots, update on shot and dewormed, all in good health and will come with paper we have 240) 242-7140

Suspected Scammer using email address marksmille56@gmail.com for ad posted in Kalamazoo.

Scam Ad for Puppies #2

Akc registered frenchie puppies ready for x-mas ! all shots are up to date. They have already taken flea and tick dose. They have beautiful coatings, are strong,text me (240) 242-7140 for more info

Suspected scammer using email address louisesteel259@gmail.com from ad posted in Boston Mass.

Scam Ad for Puppies #3

We are proud to find a good pet loving home for our cuties. We have lovely, young, pretty healthy males and females available now for a new home. they are home raised, well fed, vet checked, vaccinated and had their first shots, update on shot and dewormed, all in good health and will come with papers. you can contact now for more details

Suspect scammer using email address randyruy71@gmail.com from ad posted in Oregon City, Oregon.

Conclusion

The internet is still the wild wild west and most people don’t understand how it works or how the bad guys use it to take advantage of us.

The above example shows just how hard it is for anyone trying to validate and vet an online user as they create multiple accounts and post data.

I hope the information I’ve provided on this subject is helpful in any research you may be doing on the subject as I expect those would be the only people reading the article down this far.

~Cheers & Happy Hunting!

~Cyber Abyss