Error working with large image files (>5.8 GB) in Python - Stack Overflow

IT技术

更新时间：2025-04-083

admin管理员组
文章数量:1399984

I am trying to anonymize whole slide imaging files, using Python code that I found from Tomi Lilja (/), which I have modified slightly to aid in debugging (I added print statements, as reproduced in the code below [with the file locations replaced by *** to protect privacy]). This program worked excellently well for 56 of my 59 files, ranging in size from 2,397,110 KB to 5,450,684 KB.

Unfortunately, I cannot get it to work for the three largest files - 5,820,441 KB, 5,881,189 KB, and 6,096,842 KB.



import os
import tifftools

source_dir = 'C:\\***\\'
target_dir = 'C:\\***\\'

for filename in os.listdir(source_dir):

    if filename.endswith('.ndpi'):
        print('line 10')

        sourcefile = os.path.join(source_dir, filename)
        print('line 12')

        temporaryfile = os.path.join(target_dir, filename.replace(".ndpi",".tmp"))
        print('line 14')

        targetfile = os.path.join(target_dir, filename)    
        print('line 16')

        slide = tifftools.read_tiff(sourcefile)
        print('line 18')

        # make sure the file is in NDPI format
        if slide['ifds'][0]['tags'][tifftools.Tag.NDPI_FORMAT_FLAG.value]['data'][0] == 1:
            # create Reference- and concat-lists for tifftools commands
            reference_array = []
            concat_array = []

            for x in range(len(slide['ifds'])):
                if slide['ifds'][x]['tags'][tifftools.Tag.NDPI_SOURCELENS.value]['data'][0] != -1.0:
                    concat_array += [sourcefile+","+str(x)]

            for x in range(len(slide['ifds'])-1):
                reference_array += ["NDPI_REFERENCE,"+str(x)]
            print('line 29')

            # remove the label image
            tifftools.tiff_concat(concat_array, temporaryfile)
            print('line 32')

            # remove the Reference tags
            tifftools.tiff_set(temporaryfile, targetfile, unset=reference_array)
            os.remove(temporaryfile)

            print('line36')

print("completed")

I get the following console output and error message (with the file location replaced by *** to protect privacy):

runfile('C:/***/NDPI_Anon.py', wdir='C:/***')
line 10
line 12
line 14
line 16
Traceback (most recent call last):

  File ~\anaconda3\Lib\site-packages\spyder_kernels\py3compat.py:356 in compat_exec
    exec(code, globals, locals)

  File c:\***\ndpi_anon.py:17
    slide = tifftools.read_tiff(sourcefile)

  File ~\anaconda3\Lib\site-packages\tifftools\tifftools.py:115 in read_tiff
    nextifd = read_ifd(tiff, info, nextifd, info['ifds'])

  File ~\anaconda3\Lib\site-packages\tifftools\tifftools.py:211 in read_ifd
    read_ifd_tag_data(tiff, info, ifd, tagSet)

  File ~\anaconda3\Lib\site-packages\tifftools\tifftools.py:250 in read_ifd_tag_data
    taginfo['data'] = list(struct.unpack(

MemoryError

I put a few print statements in to see where it gets bogged down - it seems to be an issue with tifftools.read_tiff.

I'm running this in Spyder through Anaconda on a Windows 11 machine with 16gb of RAM, if that matters.

Has anyone run into this issue before with large image files, or have suggestions on how I may be able to resolve this?

I am trying to anonymize whole slide imaging files, using Python code that I found from Tomi Lilja (https://scribesroom.wordpress/2024/03/15/anonymizing-ndpi-slide-scans/), which I have modified slightly to aid in debugging (I added print statements, as reproduced in the code below [with the file locations replaced by *** to protect privacy]). This program worked excellently well for 56 of my 59 files, ranging in size from 2,397,110 KB to 5,450,684 KB.

Unfortunately, I cannot get it to work for the three largest files - 5,820,441 KB, 5,881,189 KB, and 6,096,842 KB.



import os
import tifftools

source_dir = 'C:\\***\\'
target_dir = 'C:\\***\\'

for filename in os.listdir(source_dir):

    if filename.endswith('.ndpi'):
        print('line 10')

        sourcefile = os.path.join(source_dir, filename)
        print('line 12')

        temporaryfile = os.path.join(target_dir, filename.replace(".ndpi",".tmp"))
        print('line 14')

        targetfile = os.path.join(target_dir, filename)    
        print('line 16')

        slide = tifftools.read_tiff(sourcefile)
        print('line 18')

        # make sure the file is in NDPI format
        if slide['ifds'][0]['tags'][tifftools.Tag.NDPI_FORMAT_FLAG.value]['data'][0] == 1:
            # create Reference- and concat-lists for tifftools commands
            reference_array = []
            concat_array = []

            for x in range(len(slide['ifds'])):
                if slide['ifds'][x]['tags'][tifftools.Tag.NDPI_SOURCELENS.value]['data'][0] != -1.0:
                    concat_array += [sourcefile+","+str(x)]

            for x in range(len(slide['ifds'])-1):
                reference_array += ["NDPI_REFERENCE,"+str(x)]
            print('line 29')

            # remove the label image
            tifftools.tiff_concat(concat_array, temporaryfile)
            print('line 32')

            # remove the Reference tags
            tifftools.tiff_set(temporaryfile, targetfile, unset=reference_array)
            os.remove(temporaryfile)

            print('line36')

print("completed")

I get the following console output and error message (with the file location replaced by *** to protect privacy):

runfile('C:/***/NDPI_Anon.py', wdir='C:/***')
line 10
line 12
line 14
line 16
Traceback (most recent call last):

  File ~\anaconda3\Lib\site-packages\spyder_kernels\py3compat.py:356 in compat_exec
    exec(code, globals, locals)

  File c:\***\ndpi_anon.py:17
    slide = tifftools.read_tiff(sourcefile)

  File ~\anaconda3\Lib\site-packages\tifftools\tifftools.py:115 in read_tiff
    nextifd = read_ifd(tiff, info, nextifd, info['ifds'])

  File ~\anaconda3\Lib\site-packages\tifftools\tifftools.py:211 in read_ifd
    read_ifd_tag_data(tiff, info, ifd, tagSet)

  File ~\anaconda3\Lib\site-packages\tifftools\tifftools.py:250 in read_ifd_tag_data
    taginfo['data'] = list(struct.unpack(

MemoryError

I put a few print statements in to see where it gets bogged down - it seems to be an issue with tifftools.read_tiff.

I'm running this in Spyder through Anaconda on a Windows 11 machine with 16gb of RAM, if that matters.

Has anyone run into this issue before with large image files, or have suggestions on how I may be able to resolve this?

Share Improve this question asked Mar 27 at 0:01 user30073632 33 bronze badges

If you directly try just one of the largest files, does it still give the error? I'm guessing it will but in case it works there might be something you can do with sorcing garbage collection. – JonSG Commented Mar 27 at 1:03
1 I tried just running one large file at a time. It still didn't work. I restarted my computer and tried again - that didn’t work either. – user30073632 Commented Mar 27 at 2:00
Conceptually, it seems possible to me to stream read and write the file and modify the metadata as you go, but I don't know the till file layout at all so I can't advise you there. – JonSG Commented Mar 27 at 14:20

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

This is simply a memory problem. I suggest you:

Get more RAM. Even if it's 16GB, the error code clearly says it's not enough.
Resizing the largest ones with the size 4-th largest picture in the dataset. This will give the most minimal impact to what you do.

本文标签： Error working with large image files (gt58 GB) in PythonStack Overflow

版权声明：本文标题：Error working with large image files (>5.8 GB) in Python - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1744119896a2591688.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

Error working with large image files (>5.8 GB) in Python - Stack Overflow

1 Answer 1

更多相关文章