admin管理员组

文章数量:1221306

When using FormRequest.from_response, Scrapy sends all the form fields in the request. We can override value of a input field by defining in formdata, but I need to exclude a field.

How can I exclude a field, or force scrapy to skip a particular field in the request payload. Since I'm dealing with ASP.NET pages, I need the from_response to handle the various ASP parameters.

When using FormRequest.from_response, Scrapy sends all the form fields in the request. We can override value of a input field by defining in formdata, but I need to exclude a field.

How can I exclude a field, or force scrapy to skip a particular field in the request payload. Since I'm dealing with ASP.NET pages, I need the from_response to handle the various ASP parameters.

Share Improve this question asked Feb 7 at 20:02 Alex KeyAlex Key 496 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 0

You can modify your request body in process_request method of a downloadermiddleware, and enable it in spider settings.

Spider:

class TestSpider(scrapy.Spider):
    name = 'test'
    custom_settings = {
        "DOWNLOADER_MIDDLEWARES": {
            'xy_spider.middlewares.TestDownloaderMiddleware': 200,
        },
    }

    def start_requests(self):
        yield scrapy.FormRequest(
            'https://postman-echo.com/post',
            formdata={'a': '1', 'b': '2'}  # two fields in formdata
        )

    def parse(self, response):
        print(response.text)

And middlewares.py:

class TestDownloaderMiddleware:
    @classmethod
    def from_crawler(cls, crawler):
        # This method is used by Scrapy to create your spiders.
        s = cls()
        crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
        return s

    def process_request(self, request, spider):
        body = {p.split('=')[0]: '='.join(p.split('=')[1:]) for p in request.body.decode('utf8').split('&')}
        if 'b' in body:  # here you can exclude `b` field
            body.pop('b')
            return request.replace(body=''.join([f"{k}={v}" for k, v in body.items()]))
        return None

本文标签: Scrapy Exclude input field in fromresponseStack Overflow