python - Scrapy CrawlOnce Middleware: struct.error: unpack requires a buffer of 4 bytes on Job Restart - Stack Overflow

IT技术

更新时间：2025-04-092

admin管理员组
文章数量:1400161

I'm using Scrapy's built-in CrawlOnce middleware, and when I stop and restart a job, I sometimes get this error:

Traceback (most recent call last):
  File "/app/tasks/crawl.py", line 250, in crawl
    crawler_proc.start()
  File "/usr/local/lib/python3.9/site-packages/scrapy/crawler.py", line 346, in start
    reactor.run(installSignalHandlers=False)  # blocking call
  File "/usr/local/lib/python3.9/site-packages/twisted/internet/base.py", line 1318, in run
    self.mainLoop()
  File "/usr/local/lib/python3.9/site-packages/twisted/internet/base.py", line 1328, in mainLoop
    reactorBaseSelf.runUntilCurrent()
--- <exception caught here> ---
  File "/usr/local/lib/python3.9/site-packages/twisted/internet/base.py", line 994, in runUntilCurrent
    call.func(*call.args, **call.kw)
  File "/usr/local/lib/python3.9/site-packages/scrapy/utils/reactor.py", line 51, in __call__
    return self._func(*self._a, **self._kw)
  File "/usr/local/lib/python3.9/site-packages/scrapy/core/engine.py", line 157, in _next_request
    self.crawl(request)
  File "/usr/local/lib/python3.9/site-packages/scrapy/core/engine.py", line 247, in crawl
    self._schedule_request(request, self.spider)
  File "/usr/local/lib/python3.9/site-packages/scrapy/core/engine.py", line 252, in _schedule_request
    if not self.slot.scheduler.enqueue_request(request):  # type: ignore[union-attr]
  File "/usr/local/lib/python3.9/site-packages/scrapy/core/scheduler.py", line 241, in enqueue_request
    dqok = self._dqpush(request)
  File "/usr/local/lib/python3.9/site-packages/scrapy/core/scheduler.py", line 280, in _dqpush
    self.dqs.push(request)
  File "/usr/local/lib/python3.9/site-packages/scrapy/pqueues.py", line 89, in push
    self.queues[priority] = self.qfactory(priority)
  File "/usr/local/lib/python3.9/site-packages/scrapy/pqueues.py", line 76, in qfactory
    return create_instance(
  File "/usr/local/lib/python3.9/site-packages/scrapy/utils/misc.py", line 166, in create_instance
    instance = objcls.from_crawler(crawler, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/scrapy/squeues.py", line 68, in from_crawler
    return cls(crawler, key)
  File "/usr/local/lib/python3.9/site-packages/scrapy/squeues.py", line 64, in __init__
    super().__init__(key)
  File "/usr/local/lib/python3.9/site-packages/scrapy/squeues.py", line 23, in __init__
    super().__init__(path, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/queuelib/queue.py", line 208, in __init__
    (self.size,) = struct.unpack(self.SIZE_FORMAT, qsize)
struct.error: unpack requires a buffer of 4 bytes

Deleting the request queue fixes it, but this causes another issue: If pagination exists in the request flow, stopping the spider at an intermediate paginated URL and then deleting the queue results in losing the pagination state. Since my crawler starts only with base URLs provided in an input file, it loses track of subsequent pagination requests that would have been generated dynamically.

本文标签：

版权声明：本文标题：python - Scrapy CrawlOnce Middleware: struct.error: unpack requires a buffer of 4 bytes on Job Restart - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1744146676a2592870.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

更多相关文章

asp.net - Why am I seeing different font sizes when printing a page from an IFrame? - Stack Overflow

IT技术

25分钟前

I have a simple, table based HTML page with a very simple style sheet. I can open the page in IE7 and F

SQL User Query by Multiple Roles using PHP

IT技术

25分钟前

I can successfully get all my WordPress users using the following code:global $wpdb;$sql = 'SELECT * FROM ' .

javascript - Angular.foreach asynchronous callback not working - Stack Overflow

IT技术

24分钟前

I am trying to detect if the foreach statement is pleted together with the statements inside the foreac

How does Javascript know what type a variable is? - Stack Overflow

IT技术

24分钟前

I don't know why I never asked myself that questioned the last years before, but suddenly I could

javascript - Split causes "Object doesn't support this property or method" exception - Stack Overflow

IT技术

23分钟前

call to split on a variable causes a "Object doesn't support this property or method" ex

plugins - Woocommerce Deposit, then random payments until product paid in full

IT技术

23分钟前

Closed. This question is off-topic. It is not currently accepting answers.Your question should be specific to WordPress.

Are there security or other issues with changing PHP time limit and max input vars?

IT技术

20分钟前

A developer that created some specialty appsscripts has requested we increase PHP time limit from 30 to 50,000.Plus,

javascript - toggle class on multiple elements react - Stack Overflow

IT技术

17分钟前

I got rendered array of filters, by clicking on element I'm pushing them to selectedProperties arr

git - Equivalent of "unshallow" for a treelessblobless clone - Stack Overflow

IT技术

16分钟前

Having a treeless clone made withgit clone --filter:0and now I want to download all missing trees and

php - Using javascriptjquery to submit a form upon the loading of a page - Stack Overflow

IT技术

16分钟前

I wrote an array to pass variables in the GET to a search page. The search page only has 4 fields but I

bash - How to set Jmeter constant throughput timer to JMX XML as user input - Stack Overflow

IT技术

14分钟前

I am running a load test using JMeter and I want take the ConstantThroughputTimer value as user input a

javascript - How Do I Format The Column Headers In Handsontable? - Stack Overflow

IT技术

12分钟前

How do I format the column headers in handsontable?I have a jsfiddle to demonstrate what I have so far.

QuerySelector of javascript VS Find() of Jquery - Stack Overflow

IT技术

9分钟前

On performance basics QuerySelector() of javascript or Find() of Jquery which is better to use in code

javascript - Vue: get list of components - Stack Overflow

IT技术

7分钟前

I have simple ponent structure for my single page website:<template><div id="app">

ASP.NET Core 8 and Angular Vite Proxy Error: connect ECONNREFUSED 127.0.0.1:40443 - Stack Overflow

IT技术

5分钟前

I'm working on an ASP.NET Core 8 and Angular project and encountering an error with the Vite proxy

javascript - Run assertions directly in browser context in Playwright - Stack Overflow

IT技术

5分钟前

I would like to test some apis which are tied directly into browser and not necessary have UI represent

javascript - "Unexpected EOF" : How do I remove all newlines from php code? - Stack Overflow

IT技术

2分钟前

I have a php script that is reading a file in (file_get_contents). I want to replace all newlines with

changing json column into maptype in pyspark - Stack Overflow

IT技术

2分钟前

I am trying to convert a JSON column in a PySpark DataFrame to a MapType using from_json. While using M

javascript - Mongoose won't save huge text - Stack Overflow

IT技术

1分钟前

I'm trying to save a very big text into my mongodb database but it crash. I'm trying to build

php - How would I trim the input to a JQuery auto-complete box? - Stack Overflow

IT技术

44秒前

Is there any way to trim (remove leadingtrailing spaces) the input entered by a user into a jQuery aut

发表评论

全部评论 0

暂无评论

编程频道|软件玩家 - 软件改变生活！

python - Scrapy CrawlOnce Middleware: struct.error: unpack requires a buffer of 4 bytes on Job Restart - Stack Overflow

更多相关文章

asp.net - Why am I seeing different font sizes when printing a page from an IFrame? - Stack Overflow

SQL User Query by Multiple Roles using PHP

javascript - Angular.foreach asynchronous callback not working - Stack Overflow

How does Javascript know what type a variable is? - Stack Overflow

javascript - Split causes &quot;Object doesn&#39;t support this property or method&quot; exception - Stack Overflow

plugins - Woocommerce Deposit, then random payments until product paid in full

Are there security or other issues with changing PHP time limit and max input vars?

javascript - toggle class on multiple elements react - Stack Overflow

git - Equivalent of &quot;unshallow&quot; for a treelessblobless clone - Stack Overflow

php - Using javascriptjquery to submit a form upon the loading of a page - Stack Overflow

bash - How to set Jmeter constant throughput timer to JMX XML as user input - Stack Overflow

javascript - How Do I Format The Column Headers In Handsontable? - Stack Overflow

QuerySelector of javascript VS Find() of Jquery - Stack Overflow

javascript - Vue: get list of components - Stack Overflow

ASP.NET Core 8 and Angular Vite Proxy Error: connect ECONNREFUSED 127.0.0.1:40443 - Stack Overflow

javascript - Run assertions directly in browser context in Playwright - Stack Overflow

javascript - &quot;Unexpected EOF&quot; : How do I remove all newlines from php code? - Stack Overflow

changing json column into maptype in pyspark - Stack Overflow

javascript - Mongoose won&#39;t save huge text - Stack Overflow

php - How would I trim the input to a JQuery auto-complete box? - Stack Overflow

发表评论

推荐文章

javascript - ReferenceError: NODE_ENV is not defined with Express - Stack Overflow

javascript - Storing Value from JSON response to AsyncStorage React Native - Stack Overflow

python - Apply VTK color series to render a point cloud - Stack Overflow

node.js - AWS javascript API example to list all user pools associated with an account - Stack Overflow

typescript - Converting a wider type to a more narrow type - Stack Overflow

热门文章

javascript - how to add search functionality in multiselect? - Stack Overflow

javascript - How to sort vuetify v-data-table by item value rather than template slot displayed value - Stack Overflow

javascript - How can I check whether innerHTML is empty? - Stack Overflow

javascript - React : How to stop re-rendering component or making an API calls on router change for the same route - Stack Overf

javascript - Object destructuring in typescript - Stack Overflow

javascript - Incorrect positioning of getBoundingClientRect after newline character - Stack Overflow

javascript - Accessing value inside &quot;a&quot; tag , which is inside &quot;li&quot; tag? - Stack Overflow

functions - Remove clickable Link of Wordpress Site Logo from Woocommerce Single Product page

javascript - slow down time take by domtoimage js - Stack Overflow

Why aren&#39;t links clickable on my 404 error page?

最新文章

windows设置断电重启开机后自动输入锁屏密码登录

Windows系统设置开机默认开启数字小键盘

Windows11 开机自动同步时间（开机时间不更新问题）

windows配置开机自启动软件或脚本

【Redis】Windows设置Redis为开机自启动

c# - MVC-4 FileUpload success message - Stack Overflow

php - How would I trim the input to a JQuery auto-complete box? - Stack Overflow

ios - Use the Stripe M2 reader to take the first payment for a subscription - Stack Overflow

this - Call Typescript function in Javascript function - Stack Overflow

javascript - Mongoose won&#39;t save huge text - Stack Overflow

惠普OMEN 15-CE001TX 2EF91PA参数报价

苹果新款MacBook Pro 15英寸 i732GB1TBVega Pro 20参数报价

联想Y330A-PSE L参数报价

神舟战神Z7 D6 i7-12650H16GB512GBRTX4050旗舰版参数报价

神舟战神Z7 D6 i7-12650H16GB1TBRTX4050参数报价

javascript - Split causes "Object doesn't support this property or method" exception - Stack Overflow

git - Equivalent of "unshallow" for a treelessblobless clone - Stack Overflow

javascript - "Unexpected EOF" : How do I remove all newlines from php code? - Stack Overflow

javascript - Mongoose won't save huge text - Stack Overflow

javascript - Accessing value inside "a" tag , which is inside "li" tag? - Stack Overflow

Why aren't links clickable on my 404 error page?

javascript - Mongoose won't save huge text - Stack Overflow