-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added new check B910 to suggest using Counter() instead of defaultdict(int) #489
Added new check B910 to suggest using Counter() instead of defaultdict(int) #489
Conversation
README.rst
Outdated
@@ -273,6 +273,8 @@ on the first line and urls or paths that are on their own line:: | |||
"https://some-super-long-domain-name.com/with/some/very/long/paths" | |||
) | |||
|
|||
**B910**: Use Counter() instead of defaultdict(int) to avoid memory leaks. Using the latter can result in using as much as ~1000 MB of memory even for small and simple dicts. Refactoring the code to use the former can reduce memory usage from mentioned ~1000 MB to flat ~20 MB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this number come from? I would expect it depends on the size of the data; i.e., you can construct a Counter that is a 1000 MB and also a defaultdict that is 20 MB.
It appears the difference is that when you access a key that does not exist, Counter simply returns 0, but defaultdict sets the value to 0 in the dict as well. How much that affects your memory usage obviously depends on how often you access keys that don't exist. We should mention the pattern that leads to the difference, not the exact number, because the exact number is going to depend on your use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the numbers were based on the test case I provided in #323, and yes they will depend on the number of missing keys accessed.
How about: "Use Counter() instead of defaultdict(int) to avoid excessive memory use as the default dict will record missing keys with the default value when accessed."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 - Thanks all for discussing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I agree that it wasn't the best decision to mention any numbers there. I changed that.
bugbear.py
Outdated
@@ -2380,6 +2391,9 @@ def visit_Lambda(self, node) -> None: | |||
"B909 editing a loop's mutable iterable often leads to unexpected results/bugs" | |||
) | |||
) | |||
B910 = Error( | |||
message="B910 use Counter() instead of defaultdict(int) to avoid memory leaks" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd change "memory leaks" to "memory usage", although that is perhaps a matter of perspective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Fixed.
@@ -273,6 +273,8 @@ on the first line and urls or paths that are on their own line:: | |||
"https://some-super-long-domain-name.com/with/some/very/long/paths" | |||
) | |||
|
|||
**B910**: Use Counter() instead of defaultdict(int) to avoid excessive memory use as the default dict will record missing keys with the default value when accessed." | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like an unwanted trailing double quote?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes 😕
Fixed
…) instead of defaultdict(int)
3caa1b9
to
a5db163
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for eventually implementing it with tests. Appreciate it.
Implementing #323.
I'm not sure if the number B910 is OK, but it's just the next available number (not counting B950).