-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory is slowly leaking #2019
Comments
@dmpetrov Please can you give a minimal code example reproducing the problem that uses Pillow but no third-party libraries like numpy or pywt? Thanks! |
import os
import PIL
from PIL import Image
dir = "/Volumes/Seagate/storage/kaggle/avito/images"
onlyfiles = []
dirs = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
for d in dirs:
filedir = os.path.join(dir, str(d))
files = [f for f in os.listdir(filedir) if os.path.isfile(os.path.join(filedir, f))]
filepaths = map( lambda x: os.path.join(filedir, x), files )
onlyfiles += list(filepaths)
s = 0
for i in range(len(onlyfiles)):
fname = onlyfiles[i]
image = PIL.Image.open(fname)
# Do something
s += image.size[0]*image.size[1]
if i % 10000 == 0:
print('iteration ', i)
input("Press Enter to continue...") Input data: image dataset (10M small images) from Kaggle competition https://www.kaggle.com/c/avito-duplicate-ads-detection Result: It starts with 221MB for the file names. gc() inside the loop did not help. |
Do we know what release this started in? I'd like to pin to a lower release to mitigate any production impact without having to set memory scaling alarm/actions. |
@dmpetrov I can't reproduce this on Ubuntu 14.04 with Python 2.7 on Pillow master. Could you tell your Pillow, Python, OS, libjpeg versions? @damaestro This is not confirmed yet. |
@dmpetrov I've tried Python 2.7 and 3.4 under OS X. I can confirm the leak in 3.4. |
So, this is a minimal test case, which I came to: https://gist.github.com/homm/c6a79ce7b445f47a74a4d296742a5af8 As you can see, there is no Pillow at all! I'm pretty sure this is pure Python 3 memory leak. The script accepts path to the root folder with tons of files as argument and iterates all files in the folder twice: First time it opens the files and calls This is Python's memory leak because there are no pointers to the file objects from the application space. This is most likely only memory leak. At least As I understand we can't fix it on Pillow's level because we are assuming that file pointer may be shared with the other code. There are two possibilities: It can be fixed in Python itself. Or it can be fixed on the application level. Something like this: f = open(filename, 'rb')
image = Image.open(f)
image.load()
f.close() instead of |
I want to Invite @asvetlov to the thread. Maybe he could clarify something. |
What happens when using |
@homm thank you for the investigation. It is becoming more and more interesting Yes, I use python 3.5. As far as I know, Python does not guarantee that file will be closed. So, it might be Python 3+ "feature", not a bug. It would be great to have opinions of Python experts. |
Of course it guarantees. Moreover, it guarantees that exactly But as I said we are not speaking about the file descriptors leak. All files are closed. The problem is there is a memory leak. |
Any reasons why you are thinking this is the same leak? Memory leaks are not isolated to Python 3 or Python or any other platform or library. In this thread, we are discussing leak in Python 3 which affects Pillow users. You can report thumbor leaks in appropriate place. |
@hugovk |
Guys, sorry. |
But explicit resource cleanup (either via |
I haven't met this in the docs. Could you elaborate? I'll try to simplify the test and fill report to the Python bug tracker at holidays. Also, I'm sure that we can at least explicit close files which were opened in |
Looks like this is exactly warnings issue. If I'm adding The memory is allocated to store all already occurred warnings to prevent occurrence second time. Horrible decision in my opinion. Not closed files might lead to the memory leaks, while storing warning message for all files is absolutely guaranteed memory leak. |
@homm no reason. I'll continue to follow this discussion and if I find something indicating it's an issue with Pillow I'll speak up. Thanks. |
The issue in the Python tracker: http://bugs.python.org/issue27535 |
Sorry, moving to 3.5.0 cause of lack of time. The workaround for this is using file objects instead of file names: with open(filename, 'rb') as f:
image = Image.open(f)
# We need to consume the whole file inside the `with` statement
image.load()
. . .
# Unref here to be sure that there is nothing happens
# with the image after file is closed
image = None |
Is this now fixed on master? |
If anyone in this issue has thoughts on #3577, that could be helpful. |
This has been fixed in Python 3.7. |
Note that implicitly closing an image's file has now been deprecated - https://pillow.readthedocs.io/en/stable/releasenotes/6.1.0.html#deprecations |
Support has now been removed for implicitly closing an image's file - #3577 |
This reverts commit 4c46504.
It's hard to follow this issue can someone explain what is the suggested solution? |
@SaschaHeyer I think the problem discussed here is likely resolved. If you have a situation, I'd recommend that you open a new issue with a self-contained example |
@radarhere I wanted to know how the issue is resolved. |
If you are using the latest version of Pillow, then make sure that you close images properly. Either by explicitly calling im = Image.new("RGB", (100, 100))
im.close() or using a context manager with Image.open("hopper.jpg") as im:
pass |
Let's close this, and we can re-open if needed, or open a new issue. |
To clarify, I meant that I suspect it is resolved in the current version of Pillow, fixed by changes to our code, not to yours. |
Is this problem got fixed in python2.7 pillow6.2.2 |
It's not completely clear in this issue what the problem was, or if/how it was fixed. Here are some notes though
|
Our service encountered some memory leaks and continuous memory growth when using Pillow to extract frames from webp animations. Can you identify any potential issues with the following code? Deeply grateful!
def extract_gif_use_pillow(image):
frames = []
try:
with Image.open(StringIO(image)) as im:
with BytesIO() as frame:
while True:
try:
frame.seek(0)
im.save(frame, format='WEBP', quality=100)
frames.append(frame.getvalue())
# move to next frame
im.seek(im.tell() + 1)
except EOFError:
break
except Exception as e:
logger.error('extract_gif_use_pillow got exception: %s', e)
gc.collect()
logger.info('extract_gif_use_pillow got frames num: %s', len(frames))
return frames |
I see no obvious problems, but I do recommend that you use a version of Python that is still maintained, and a more recent version of Pillow. Pillow has also made significant improvements to how it reads GIF images. You may like to have a read of #7935 (comment). It is possible that is the explanation for your memory issues. |
Memory is slowly leaking when process many image files one by one.
Leak size:
100K images --> ~100MB
300K images --> ~200MB
1M images --> ~1GB
Here you can find reproducing code examples: PyWavelets/pywt#180
The text was updated successfully, but these errors were encountered: