April 16, 2006
AdSense mediapartners bot adding to the Google search index
Since Google AdSense launched, there has been rumors and speculation about the possibility of the AdSense bot (officially known as the "Mediapartners-Google/2.1" and unofficially as the "mediabot") including some of its information into the regular Google search index. After all, it would be a nice perk of using AdSense if it gave a publisher easier access to getting pages into the natural search results ;)
But no one has ever seemed to have concrete evidence of this happening. And I have had several discussions with Matt Cutts over the past few years about this issue, and I have always been assured that they are completely separate and they are always careful the two never cross contaminate each other. And I have looked hard to try and prove Matt otherwise, but in the past it has always been to no avail ;)
But on SEO Rockstars this week, Greg Boser (aka WebGuerrilla) mentioned that he had seen mediabot information showing up in the natural search index, and my ears perked up. And Greg has now followed up with this entry detailing what he is seeing.
During last Tuesday's Rockstar show, I mentioned that I had been working on a project that got a bit messed up due to the fact that Google's Mediapartner bot was being used to index content for Google's database. We had setup some 301's for Googlebot, but had neglected to redirect the AdSense bot. The end result was a whole bunch of duplicate content due to the fact that we were serving the AdSense bot the old url, and Googlebot the new one. Both were getting indexed and added to the cache.
As I am often doing testing with AdSense, I had a collection of sites that I had not done any natural search optimization on it, since I was strictly using specific PPC terms (as a control group) to drive traffic and test some placements and ad unit color schemes. And none of those sites had any pages in the index as a result of the mediabot.
However, I went and checked some established sites. And the date and time on the Google cached version of the page is the identical time that the mediabot visited the site. (Cached time is GMT; Log time is EST). Click on each screenshot of the logs/cache info to view full sized version.
The following two are from a site URL I cannot reveal, but I included them to illustrate the problem is across multiple sites and covering multiple date ranges (JenSense is indexed regularly and there were no cache dates back that far). Again, the times on the cached version of the page and the time of the mediabot visit are identical.
With multiple dates being affected, it doesn't seem to be a case of just a one day glitch.
It is interesting to note that these pages have been visited by the mediabot since this time, but the new visits are not reflected in the cache.
So what does this all mean? First off, the AdSense support site clearly states that the two bots serve complete different purposes and should not affect the other.
Participating in Google AdSense does not affect your site's rank in Google search results and will not affect the search results we deliver. Google believes strongly in freedom of expression and therefore offers broad access to content across the web. Our search results are unbiased by our relationships with paying advertisers and publishers. We will continue to show search results according to our PageRank technology.
Adding the Google AdSense ad code or AdSense for search code to your site will not queue your pages for crawling by our main index bots. While our bot (starting with 'Mediapartners-Google') does crawl content pages for the purpose of targeting ads, this crawl is not associated with our main index crawl.
There is the possibility that there was an accidental cross over taking place if the AdSense team was keeping cached copies of the pages serving AdSense for quality checking purposes, such as checking to see if a publisher is serving the mediabot something different than what Joe Surfer sees when visiting the page.
It does seem that it is only affecting those sites that are already indexed, and likely pages that were already indexed at the time the mediabot took the cached version snapshot for the regular search index. I could not find any evidence of multiple sites I checked that were not already indexed getting any sort of indexing boost via the mediabot. However, could it potentially be an option for getting fresher pages in the index? Possibly. But I also found instances where the mediabot had visited the same pages yet not updated the cached version of the page, so there is likely more to the hows/whens of the mediabot updating the cached copy of a page.
But what is potentially more dangerous is the fact that the Google search index is including what the mediabot sees, and not what the Googlebot would see, as noted by Greg.
The content of that post got indexed in a template that we only serve to AdSense. It has no navigation and no comments; just the actual post.
This could have severe consequences to webmasters, such as Greg who suddenly had a duplicate content issue to clean up. Webmasters usually wouldn't think to include the mediabot in any special headers or robots.txt instructions they have for the regular googlebot.
But how much does it actually help from a webmaster perspective? On the surface, it saves on bandwidth for those few who complain about how much bandwith the various Google bots are using. But as far as how it helps in the natural search results, that is something that much more testing is needed on.
It will be interesting to see what happens with this issue. I must admit I was pretty surprised to finally see evidence of it, because I have periodically hunted for it over the years. But this is definite clear cut evidence that yes, the mediabot is sharing info with the googlebot, and possibly vice versa.
Posted by Jenstar at April 16, 2006 01:08 AM
GoogleBot seems to disguise itself occasionally (perhaps to check wether its being server the same content as users). Meybe it sometimes disguises as mediabot?
Posted by: av1 at April 16, 2006 07:32 AM
Great reporting jen.
Posted by: Shoemoney at April 16, 2006 11:20 AM
You SEO people have too much free time on your hands. . .
Posted by: Chris Zaharias at April 16, 2006 05:17 PM
If you want more to talk about.... I have a site that Google has indexed EVEN THOUGH EVERY PAGE IS CLEARLY CODED WITH NOINDEX AND GOOGLEBOT IS DISALLOWED IN THE ROBOTS.TXT.
Seems Google is doing a lot it is not supposed to be!
Posted by: me at April 17, 2006 08:19 AM
Is it against AdSense TOS to feed mediabot differrent content from what regular visitor (or Googlebot) sees?
Posted by: Ken at April 17, 2006 07:56 PM
Ken: The adsense guidelines say not to use cloaking, so, yes. They didn't specify mediabot but they did say that you cant feed crawlers different content than regular visitors.
Posted by: catnabbit at April 17, 2006 08:54 PM
Right - AdSense guidelines say this, and AdSense
guidelines (well, support in this case) say that, and obviously what they're actually DOING is quite another matter altogether. How come any experienced SEO isn't one bit surprised? Maybe because this isn't exactly the first time that Google's been exerting double standards big time. And maybe, too, it's because if people really and honestly weren't into "doing evil", there'd be no reason to blab about it all the time?
Posted by: Ralph at April 17, 2006 10:46 PM
I have observed this for my website too. Google seems to use Adsense to track the most popular page in a website
Posted by: pcwork at April 18, 2006 07:51 AM
Posted by: GreenWood at April 18, 2006 02:55 PM