admin管理员组

文章数量:1399830

ExitCodeException                                                                                         _common.py:271
Traceback (most recent call last):
  File "C:\<USER>\apps\python\current\Lib\site-packages\ocrmypdf\_exec\tesseract.py", line 313, in generate_hocr
    p = run(args_tesseract, stdout=PIPE, stderr=STDOUT, timeout=timeout, check=True)
  File "C:\<USER>\apps\python\current\Lib\site-packages\ocrmypdf\subprocess\__init__.py", line 62, in run
    proc = subprocess_run(args, env=env, check=check, **kwargs)
  File "C:\<USER>\apps\python\current\Lib\subprocess.py", line 579, in run
    raise CalledProcessError(retcode, process.args,
                             output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['C:\\<USER>\\shims\\tesseract.EXE', '-l', 'eng',
'C:\\<USER>\\AppData\\Local\\Temp\\ocrmypdf.io.<RANDOM>\\000045_ocr.png',
'C:\\<USER>\\AppData\\Local\\Temp\\ocrmypdf.io.<RANDOM>\\000045_ocr_hocr', 'hocr', 'txt']' returned
non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\<USER>\apps\python\current\Lib\site-packages\ocrmypdf\_pipelines\_common.py", line 261, in cli_exception_handler
    return fn(options, plugin_manager)
  File "C:\<USER>\apps\python\current\Lib\site-packages\ocrmypdf\_pipelines\ocr.py", line 181, in _run_pipeline
    optimize_messages = exec_concurrent(context, executor)
  File "C:\<USER>\apps\python\current\Lib\site-packages\ocrmypdf\_pipelines\ocr.py", line 117, in exec_concurrent
    executor(
    ~~~~~~~~^
        use_threads=options.use_threads,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<10 lines>...
        task_finished=update_page,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^ 
  File "C:\<USER>\apps\python\current\Lib\site-packages\ocrmypdf\_concurrent.py", line 78, in __call__
    self._execute(
    ~~~~~~~~~~~~~^
        use_threads=use_threads,
        ^^^^^^^^^^^^^^^^^^^^^^^^
    ...<5 lines>...
        task_finished=task_finished,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
  File "C:\<USER>\apps\python\current\Lib\site-packages\ocrmypdf\builtin_plugins\concurrency.py", line 144, in _execute
    result = future.result()
  File "C:\<USER>\apps\python\current\Lib\concurrent\futures\_base.py", line 449, in result
    return self.__get_result()
           ~~~~~~~~~~~~~~~~~^^
  File "C:\<USER>\apps\python\current\Lib\concurrent\futures\_base.py", line 401, in __get_result
    raise self._exception
  File "C:\<USER>\apps\python\current\Lib\concurrent\futures\thread.py", line 59, in run
    result = self.fn(*self.args, **self.kwargs)
  File "C:\<USER>\apps\python\current\Lib\site-packages\ocrmypdf\_pipelines\ocr.py", line 81, in _exec_page_sync
    ocr_out, text_out = _image_to_ocr_text(page_context, ocr_image_out)
                        ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\<USER>\apps\python\current\Lib\site-packages\ocrmypdf\_pipelines\ocr.py", line 62, in _image_to_ocr_text
    hocr_out, text_out = ocr_engine_hocr(ocr_image_out, page_context)
                         ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\<USER>\apps\python\current\Lib\site-packages\ocrmypdf\_pipeline.py", line 678, in ocr_engine_hocr
    ocr_engine.generate_hocr(
    ~~~~~~~~~~~~~~~~~~~~~~~~^
        input_file=input_file,
        ^^^^^^^^^^^^^^^^^^^^^^
    ...<9 lines>...
        user_patterns=options.user_patterns,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
  File "C:\<USER>\apps\python\current\Lib\site-packages\ocrmypdf\builtin_plugins\tesseract_ocr.py", line 268, in generate_hocr
    tesseract.generate_hocr(
    ~~~~~~~~~~~~~~~~~~~~~~~^
        input_file=input_file,
        ^^^^^^^^^^^^^^^^^^^^^^
    ...<9 lines>...
        options=options,
        ^^^^^^^^^^^^^^^^
    )
  File "C:\<USER>\apps\python\current\Lib\site-packages\ocrmypdf\_exec\tesseract.py", line 327, in generate_hocr
    raise SubprocessOutputError() from e
ocrmypdf.exceptions.SubprocessOutputError

This error came as a result of using "ocrmypdf --skip-text '.\input.pdf' output.pdf -v" I get the above error using OCRMYPDF, I installed it with scoop on Windows 11. The PDF was originally a DJVU file, which I converted into a PostScript file and then converted to a PDF.

I used this tutorial to install OCRMYPDF on Windows: .html

This all is a massive headache and haven't found a solution to.

本文标签: pythonTesseract OCR Command in ocrmypdf Fails with 39SubprocessOutputError39 on WindowsStack Overflow