admin管理员组

文章数量:1410724

I want to determine if ining requests are from a bot (eg google, bing), or a human, and serve different data to each, for example, json data for client javascript to construct the site or preprocessed html.

Using expressjs, is there an easy way to do this? Thanks.

I want to determine if ining requests are from a bot (eg google, bing), or a human, and serve different data to each, for example, json data for client javascript to construct the site or preprocessed html.

Using expressjs, is there an easy way to do this? Thanks.

Share asked Sep 22, 2011 at 2:00 HarryHarry 55.1k76 gold badges187 silver badges270 bronze badges 2
  • FYI, search engines tend to not like when they get substantially different content from what a normal client gets. – icktoofay Commented Sep 22, 2011 at 2:07
  • @icktoofay it's the same content, if you read google's ajax documentation they expressly allow for this – Harry Commented Sep 22, 2011 at 2:29
Add a ment  | 

3 Answers 3

Reset to default 4

You can check the req.header('User-Agent') for 'Mozilla/5.0 (patible; Googlebot/2.1; +http://www.google./bot.html'. If it's that you know it's Google and can send it different data.

http://www.google./support/webmasters/bin/answer.py?answer=1061943

How to get headers http://expressjs./4x/api.html#req.get

I remend you to response according to the requested MIME type (which is present in the "Accept" header). You can do this with Express this way:

app.get('/route', function (req, res) {
    if (req.is('json')) res.json(data);
    else if (req.is('html')) res.render('view', {});
    else ...
});

Checking for request header User-Agent or MIME type as suggested is not reliable, since any HTTP GET request can define User-Agent and headers at will.

The most reliable and secure approach is to check by IP.

Therefore I developed an NPM package that does exactly that. It stores at startup in-memory all known IP ranges ing from Google bots and crawlers, for very fast middleware processing.

const express = require('express')
const isGCrawler = require('express-is-googlecrawler')

const app = express()
app.use(isGCrawler)

app.get('/', (req, res) => {
  res.send(res.locals.isGoogleCrawler) // Boolean
})

app.listen(3000)

本文标签: javascriptexpressjs nodejs serve different data to googleetc bot and human trafficStack Overflow