admin管理员组

文章数量:1296508

My problem is about HTML links (the anchor tag) and web search engines. As far as I know, web crawlers accesses all or most of the links inside a page when indexing that page, right? What if I wanted some links not to be accessed by the search engine crawler? I don't want that because I count the number of times those links are clicked - an essential feature of my web application - and that would influence the count. Can I use javascript somehow?

My problem is about HTML links (the anchor tag) and web search engines. As far as I know, web crawlers accesses all or most of the links inside a page when indexing that page, right? What if I wanted some links not to be accessed by the search engine crawler? I don't want that because I count the number of times those links are clicked - an essential feature of my web application - and that would influence the count. Can I use javascript somehow?

Share Improve this question asked Jun 14, 2012 at 16:10 Costel SocianuCostel Socianu 1491 gold badge2 silver badges10 bronze badges 3
  • 3 Do a web search for robots.txt. – Graham Commented Jun 14, 2012 at 16:11
  • Coul you be more specific please, I heard about robots, but... – Costel Socianu Commented Jun 14, 2012 at 16:14
  • I see stackoverflow uses vote up and down as links – Costel Socianu Commented Jun 14, 2012 at 16:15
Add a ment  | 

6 Answers 6

Reset to default 10

There may or may not be one, fool-proof technique for doing this. However, you can implement the following just to be safe:

Disallow those links in your robots.txt file. This entails creating a file called /robots.txt and adding the line:

Disallow: /YourPage.html

To the file.

You can also use a no-follow link:

<a href="http://www.example./" rel="nofollow">Link text</a>

However, according to Wikipedia, most, if not all, search engines will still actually follow the link, just not index it or use it in ranking.

Another idea would be to not use a URL at all, and use script instead. Something like:

<a href="javascript:void(0)" onclick="GoSomewhere()">Google Can't Find Me!</a>
<script>
   function GoSomewhere()
   {
      window.location = '/YourPage.html';
   }
</script>

You also might want to re-think about how you count hits. Perhaps rather than counting a hit as any HTTP request, you could use Javascript to register a hit, as a bot will usually not execute any script on the page. This is how things like Google Analytics and Clicky work.

You could also exclude any hits that came from a a user-agent containing the word Googlebot.

Hope this helps!

This question is kind of old, but nofollow does act as a "suggestion" for search bots to not follow links.

Use rel="nofollow" for specific links

This page describes how google interprets nofollow. Basically it says that "in general" it doesn't follow them, but the target may still appear in the index if other sites link to them without using "nofollow".

Google and Bing webmaster tools also have a portion where it allows you to remove URL's that it has in its index.

The last option is robots.txt like someone else mentioned.

You can add use nofollow:

 <a rel="nofollow"> Bla Bla </a>

This is a suggestion for the web crawler not to follow the link.

The nofollow option will prevent search engines who adhere to web standards from following the links. If you want to protect the links from bots, skimmers, etc as well, I would suggest using Javascript to add the links to your html content upon DOM ready.

This will prevent most bots and all search engines from even seeing the links in the first place, and will prevent people scanning your site for forms/email addresses/phone numbers/etc from inadvertently following the links as well.

You could use PHP to remove the links if the user agent belongs to a bot.

As Google wants to enable Javascript for its crawlers the

<a href="javascript:void(0)" onclick="openLink()">Link</a>

suggestion might be outdated. One can argue that it is still Link semantic (a) and therefore the crawler will follow the link. A possible way to prevent that might be to convert all links that shouldn't be followed to spans

<span onclick="openLink()">Link</span>

Although this still might not work as there is still information in the DOM that this element provides a click handler. For a further workaround one would need to add a click event listener to body and deduce from the coordinates which button was actually clicked. This might be putationally very expensive.

本文标签: javascriptHow to make an HTML ltagt tag so that search engine crawlers don39t access themStack Overflow