Legacy:Wiki Site Stats/Discuss

From Unreal Wiki, The Unreal Engine Documentation Site
Revision as of 03:57, 24 March 2006 by imported>EricBlade
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Recently our Wiki site stats have dropped dramatically. At the present this drop in page views and other stats is attributed to the lack of Unreal Wiki results via the Google search engine. Possible reasons for this include:

  • A changed google response to Robots.txt

(More will be added as information is obtained.)

Solution Discussion

GRAF1K: I personally think that completely removing Robots.txt would be better overall for the site. At least as a temporary measure until things get sorted out. At least then we would be positive that Robots.txt is the problem.

Mychaeel: Currently no robots.txt file is present for the Unreal Wiki, and we never set one up; the usual search engine spiders are well-behaved enough not to put any excessive stress on the servers, and the those private harvester tools that occasionally grind the Wiki to a halt are very unlikely to pay any respect to the robots.txt standard at all.

However, about three weeks ago tarquin noticed that there was, indeed, a robots.txt present in our root directory which issued a blanket denial to all robots honoring the robots.txt standard, which includes the GoogleBot. Consequently, Google removed any trace of all Unreal Wiki pages from its index, and all that can be found when googling for the Wiki are links from other sites to it.

We have resubmitted the Unreal Wiki site address to Google after getting rid of the misguided robots.txt in our root directory, but it takes a while until GoogleBot gets around to reindexing any, let alone all pages of the Wiki. So whoever instated that robots.txt we found in our root directory quite seriously harmed the site, even though I don't suspect malice behind that – rather a misguided attempt to keep badly behaved harvesters away.

Wormbo: There is still a robots.txt in the www root allowing the google bot to only access paths other than /wiki. Other useragents than google bot are even excluded from the whole site.

Tarquin: Where is it?

Mychaeel: Ah, you're talking about http://www.beyondunreal.com/robots.txt. The Robot Exclusion Standard specifies that a robot only looks for /robots.txt on the same host it's trying to harvest, that is, wiki.beyondunreal.com (or www.unrealwiki.com) in our case.

The explicit exclusion of the /wiki path on www.beyondunreal.com in that robots.txt is a bit surprising indeed, though, especially given the fact that it's not a valid address anyway (check http://www.beyondunreal.com/wiki to see yourself), let alone linked to from anywhere.

Tarquin: So is there nothing else preventing us from being back on Google? I submitted our URL a couple of times a few weeks ago.

Mychaeel: No, there isn't. I resubmitted it myself a week ago or so, but it may take a while until the GoogleBot picks it up. I fear the bot has quite a packed schedule at this time...

EricBlade: This whole section hasn't been updated in a year or so.. time to delete?