Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory is slowly leaking #2019

Closed
dmpetrov opened this issue Jul 8, 2016 · 38 comments
Closed

Memory is slowly leaking #2019

dmpetrov opened this issue Jul 8, 2016 · 38 comments
Assignees
Labels
Bug Any unexpected behavior, until confirmed feature. Memory NumPy Platform A catchall for platform-related

Comments

@dmpetrov
Copy link

dmpetrov commented Jul 8, 2016

Memory is slowly leaking when process many image files one by one.

Leak size:
100K images --> ~100MB
300K images --> ~200MB
1M images --> ~1GB

Here you can find reproducing code examples: PyWavelets/pywt#180

@hugovk
Copy link
Member

hugovk commented Jul 8, 2016

@dmpetrov Please can you give a minimal code example reproducing the problem that uses Pillow but no third-party libraries like numpy or pywt? Thanks!

@dmpetrov
Copy link
Author

dmpetrov commented Jul 9, 2016

import os
import PIL
from PIL import Image

dir = "/Volumes/Seagate/storage/kaggle/avito/images"
onlyfiles = []
dirs = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
for d in dirs:
    filedir = os.path.join(dir, str(d))
    files = [f for f in os.listdir(filedir) if os.path.isfile(os.path.join(filedir, f))]
    filepaths = map( lambda x: os.path.join(filedir, x), files )
    onlyfiles += list(filepaths)

s = 0
for i in range(len(onlyfiles)):
    fname = onlyfiles[i]
    image = PIL.Image.open(fname)
    # Do something
    s += image.size[0]*image.size[1]

    if i % 10000 == 0:
        print('iteration ', i)

input("Press Enter to continue...")

Input data: image dataset (10M small images) from Kaggle competition https://www.kaggle.com/c/avito-duplicate-ads-detection

Result: It starts with 221MB for the file names.
500 images - 374MB
1M images - 510MB

gc() inside the loop did not help.

@homm homm added this to the 3.4.0 milestone Jul 11, 2016
@damaestro
Copy link

Do we know what release this started in? I'd like to pin to a lower release to mitigate any production impact without having to set memory scaling alarm/actions.

@homm
Copy link
Member

homm commented Jul 14, 2016

@dmpetrov I can't reproduce this on Ubuntu 14.04 with Python 2.7 on Pillow master. Could you tell your Pillow, Python, OS, libjpeg versions?

@damaestro This is not confirmed yet.

@homm
Copy link
Member

homm commented Jul 14, 2016

@dmpetrov I've tried Python 2.7 and 3.4 under OS X. I can confirm the leak in 3.4.

@homm homm changed the title Memory is slowly leaking Memory is slowly leaking in Python 3.x Jul 14, 2016
@homm
Copy link
Member

homm commented Jul 14, 2016

So, this is a minimal test case, which I came to:

https://gist.github.com/homm/c6a79ce7b445f47a74a4d296742a5af8

As you can see, there is no Pillow at all! I'm pretty sure this is pure Python 3 memory leak.

The script accepts path to the root folder with tons of files as argument and iterates all files in the folder twice: First time it opens the files and calls .close() method. At this stage, memory usage doesn't grow. Second time the script doesn't call .close() method and memory usage constanlty increasing. I've tested Python 3.4 and 3.5 on OS X and Python 3.4 on Ubuntu 14.04. It doesn't appear on Python 2.7.

This is Python's memory leak because there are no pointers to the file objects from the application space. This is most likely only memory leak. At least lsof doesn't show any opened files after script's end. This leak appears only on different files. It doesn't work if we are reopening one file again and again. Average memory consumption is 370 bytes per file.

As I understand we can't fix it on Pillow's level because we are assuming that file pointer may be shared with the other code. There are two possibilities: It can be fixed in Python itself. Or it can be fixed on the application level. Something like this:

f = open(filename, 'rb')
image = Image.open(f)
image.load()
f.close()

instead of Image.open(filename)

@homm
Copy link
Member

homm commented Jul 14, 2016

I want to Invite @asvetlov to the thread. Maybe he could clarify something.

@hugovk
Copy link
Member

hugovk commented Jul 14, 2016

What happens when using with open(file.path, 'rb', 0) as f:?

@dmpetrov
Copy link
Author

dmpetrov commented Jul 14, 2016

@homm thank you for the investigation. It is becoming more and more interesting

Yes, I use python 3.5. As far as I know, Python does not guarantee that file will be closed. So, it might be Python 3+ "feature", not a bug.

It would be great to have opinions of Python experts.

@homm
Copy link
Member

homm commented Jul 14, 2016

Python does not guarantee that file will be closed.

Of course it guarantees. Moreover, it guarantees that exactly .close() method will be called. It doesn't guarantee when this will be done, that is all.

But as I said we are not speaking about the file descriptors leak. All files are closed. The problem is there is a memory leak.

@damaestro
Copy link

I'm seeing the issue with Py 2.7 running thumbor, so I don't think this is isolated to Py 3.x.

screenshot from 2016-07-14 15-22-59

@homm
Copy link
Member

homm commented Jul 14, 2016

@damaestro

so I don't think this is isolated to Py 3.x.

Any reasons why you are thinking this is the same leak? Memory leaks are not isolated to Python 3 or Python or any other platform or library. In this thread, we are discussing leak in Python 3 which affects Pillow users. You can report thumbor leaks in appropriate place.

@homm
Copy link
Member

homm commented Jul 14, 2016

@hugovk with file statement correctly frees all memory like .close() do.

@asvetlov
Copy link

Guys, sorry.
I really don't know what the difference between f.close()/with f and del f except deleting of non-closed file should raise a warning along with actual file closing.

@asvetlov
Copy link

But explicit resource cleanup (either via with statement or by .close() call is considered as good practice).

@homm
Copy link
Member

homm commented Jul 15, 2016

deleting of non-closed file should raise a warning

I haven't met this in the docs. Could you elaborate?

I'll try to simplify the test and fill report to the Python bug tracker at holidays.

Also, I'm sure that we can at least explicit close files which were opened in Image.open by the filename. This should fix bug on our side for most of the users.

@homm
Copy link
Member

homm commented Jul 15, 2016

Looks like this is exactly warnings issue. If I'm adding f._dealloc_warn(f) right before close, the memory is also leakings.

The memory is allocated to store all already occurred warnings to prevent occurrence second time. Horrible decision in my opinion. Not closed files might lead to the memory leaks, while storing warning message for all files is absolutely guaranteed memory leak.

@damaestro
Copy link

@homm no reason. I'll continue to follow this discussion and if I find something indicating it's an issue with Pillow I'll speak up. Thanks.

@homm
Copy link
Member

homm commented Jul 17, 2016

The issue in the Python tracker: http://bugs.python.org/issue27535

@homm
Copy link
Member

homm commented Oct 3, 2016

Sorry, moving to 3.5.0 cause of lack of time. The workaround for this is using file objects instead of file names:

with open(filename, 'rb') as f:
    image = Image.open(f)
    # We need to consume the whole file inside the `with` statement
    image.load()
    . . .
# Unref here to be sure that there is nothing happens
# with the image after file is closed
image = None

@homm homm modified the milestones: 3.5.0, 3.4.0 Oct 3, 2016
@aclark4life aclark4life modified the milestones: 4.1.0, 4.0.0 Jan 5, 2017
@radarhere
Copy link
Member

Is this now fixed on master?

@radarhere
Copy link
Member

If anyone in this issue has thoughts on #3577, that could be helpful.

@hugovk
Copy link
Member

hugovk commented Mar 19, 2019

The issue in the Python tracker: http://bugs.python.org/issue27535

This has been fixed in Python 3.7.

@aclark4life aclark4life added Memory Platform A catchall for platform-related labels May 11, 2019
@radarhere
Copy link
Member

Note that implicitly closing an image's file has now been deprecated - https://pillow.readthedocs.io/en/stable/releasenotes/6.1.0.html#deprecations

@radarhere radarhere changed the title Memory is slowly leaking in Python 3.x Memory is slowly leaking Oct 1, 2019
@radarhere
Copy link
Member

Support has now been removed for implicitly closing an image's file - #3577

karolyi added a commit to karolyi/forum-django that referenced this issue May 3, 2020
karolyi added a commit to karolyi/forum-django that referenced this issue May 3, 2020
karolyi added a commit to karolyi/forum-django that referenced this issue May 3, 2020
@SaschaHeyer
Copy link

It's hard to follow this issue can someone explain what is the suggested solution?

@radarhere
Copy link
Member

@SaschaHeyer I think the problem discussed here is likely resolved. If you have a situation, I'd recommend that you open a new issue with a self-contained example

@SaschaHeyer
Copy link

@radarhere
Thank you for your quick response.

I wanted to know how the issue is resolved.
What steps are required to solve?

@radarhere
Copy link
Member

If you are using the latest version of Pillow, then make sure that you close images properly. Either by explicitly calling close(),

im = Image.new("RGB", (100, 100))
im.close()

or using a context manager

with Image.open("hopper.jpg") as im:
    pass

@hugovk
Copy link
Member

hugovk commented Aug 30, 2020

Let's close this, and we can re-open if needed, or open a new issue.

@radarhere
Copy link
Member

@radarhere
Thank you for your quick response.

I wanted to know how the issue is resolved.
What steps are required to solve?

To clarify, I meant that I suspect it is resolved in the current version of Pillow, fixed by changes to our code, not to yours.

@Mvbbb
Copy link

Mvbbb commented Apr 16, 2024

Is this problem got fixed in python2.7 pillow6.2.2

@radarhere
Copy link
Member

It's not completely clear in this issue what the problem was, or if/how it was fixed. Here are some notes though

@Mvbbb
Copy link

Mvbbb commented Apr 16, 2024

It's not completely clear in this issue what the problem was, or if/how it was fixed. Here are some notes though在这个问题上,并不完全清楚问题是什么,或者是否/如何解决。不过,这里有一些注意事项

Our service encountered some memory leaks and continuous memory growth when using Pillow to extract frames from webp animations. Can you identify any potential issues with the following code? Deeply grateful!

  • python: 2.7.9
  • pillow: 6.2.2
def extract_gif_use_pillow(image):
    frames = []
    try:
        with Image.open(StringIO(image)) as im:
            with BytesIO() as frame:
                while True:
                    try:
                        frame.seek(0)
                        im.save(frame, format='WEBP', quality=100)
                        frames.append(frame.getvalue())
                        # move to next frame
                        im.seek(im.tell() + 1)
                    except EOFError:
                        break
    except Exception as e:
        logger.error('extract_gif_use_pillow got exception: %s', e)
    gc.collect()
    logger.info('extract_gif_use_pillow got frames num: %s', len(frames))
    return frames

@radarhere
Copy link
Member

I see no obvious problems, but I do recommend that you use a version of Python that is still maintained, and a more recent version of Pillow. Pillow has also made significant improvements to how it reads GIF images.

You may like to have a read of #7935 (comment). It is possible that is the explanation for your memory issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Any unexpected behavior, until confirmed feature. Memory NumPy Platform A catchall for platform-related
Projects
None yet
Development

No branches or pull requests