Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
I Tried to Reduce Pylint Memory Usage (rtpg.co)
249 points by zdw on Oct 12, 2020 | hide | past | favorite | 32 comments


Pylint is a great tool, but it could definitely be faster. One school of thought says: of course it's too slow, it's written in Python, if you want it faster you should rewrite it in C. But I don't think such a drastic and destabilizing change is necessary. There are critical sites in the code where a slightly wrong move is made and it gets magnified into something larger. This post points out one such site. The 80/20 rule says that fixing those errors will give most of the benefit of a full rewrite with way less effort.

A few years ago I got fed up with how slow Pylint was, so I made some changes to improve its performance. The main issue was that the tree traversal code was written generically. It was nice and elegant in terms of readability, but it meant that there was a lot of unnecessary runtime type-checking. It also meant that a lot of work was getting done for no reason. For example, if you are trying to apply lint rules for assign statements, you only want to check places where assign statement can legally occur. But Pylint was checking every single node for assign statements, including places where they cannot occur, such as inside function calls. Breaking up that generic logic into specialized instances had an enormous impact on performance.

Here are the PRs that implemented these changes:

  https://github.com/PyCQA/astroid/pull/497
  https://github.com/PyCQA/astroid/pull/519
  https://github.com/PyCQA/astroid/pull/552


Smaller memory footprints can make code faster, but making code faster often creates a bigger memory footprint.

For instance, pipelining an operation to allow multiple CPUs to work on a problem at once can reduce the pressure on the tall tent pole, be it IO bottlenecks or one phase. Time is still dictated by the tallest tent poles, but not also by the fourth through tenth tallest poles.

But now everything is happening at once, which means all of the temporary data structures exist in memory in parallel, instead of sequentially.

Heavily paraphrasing someone else: The simple things get taken care of early. If we're standing around talking about problems in a successful tool, they're complicated. Although I disagree with that person on one point: we can also be talking about problems that were complicated to diagnose (but straightforward to fix).


> if you want it faster you should rewrite it in C. But I don't think such a drastic and destabilizing change is necessary.

I've worked around the python linter and it is written in C. The bits in python (especially our custom rules) were horribly slow.

Lint is building a tree of the code and going through it recursively. Going through a large tree in pure python is horribly slow, it has to interpret every single line of code again and resolve every variable again, because they could change anytime, the overhead is insane.

It's the one thing that python cannot do, parsing tree. If python had a JIT it might be doable but currently it isn't.


black (the python auto formatter) uses mypyc

https://github.com/python/mypy/tree/master/mypyc


That looks exciting!

Compiles mypy-annotated Python to a Python C extension. Claims a ~4x speedup, but still quite unstable/buggy, so who knows how much performance they'll have to give up for working software. Fingers crossed!


Try PyPy.


Now that PyPy is up to Python 3.6 and has the bugs from the 3.5 series fixed I am using it for real work.

The contextvars polyfill brings in the one thing I really need from Py 3.7.

I have a toolkit that puffs up XML, JSON, whatever files into an RDF graph that has extra blank nodes that let you annotate anything. A year ago I was complaining about how slow it was, with PyPy it is 5 times faster and I am not complaining.

I hear people get similar speed-ups for branchy monte carlo simulations too.


I recently went through a similar exercise using the same tools on a large open source Python codebase. The solution is the same as the author found: don't keep Exception objects around past the actual lifetime of the exception.

An Exception has the traceback, and the traceback has all the frames of the call stack, and each frame has a reference to the local variables in that frame.

Keeping an Exception around past its time can yield a huge tangle of circular references that puts a lot of pressure on the GC.

I reduced memory utilization dramatically by deleting 1 line of code.


Does it do the right thing if you unset the traceback portion of the Exception object?


Sure. However, instead of reaching into objects to prune them, the better approach is to ask why you're keeping the object around at all. Extracting just the parts you want into your own structure is usually the better way to go.


A fair point. The scenario I'm thinking of is with APIs like asyncio.gather, which can be asked to return exception objects rather than re-raise them, for coroutines which fail/raise. This isn't in and of itself a case of long-lived exception objects, but I could see how this approach might lead some users to treat the exception as a long-lived results object, perform type checks against it, etc.

If you were in the position of having an app that was set up like this, it would be nice to know you had the option of trimming the Exception's footprint while still allowing it to retain its Exception-ness.


[flagged]


This impacts memory (RAM) usage and has nothing to do with storage (SSD).


I did a similar exercise on a Python program a few years ago now.

In the end I ended up adding __slots__ to two classes and the memory usage shrunk by 2/3 (there were a lot of those two objects).

I also managed to double the speed by re-writing (re-wording really) a bit of string handling which just happened to be the bottleneck. Can't remember the details now but it was a trivial code change.

Profiling for the win!


I would assume (but haven't tested) that if you have a small class with a lot of instances, that you could make it even more efficient by doing the "Array of Structures" to "Structure of Arrays" transformation[0]. Since with a structure of arrays, you could use numpy, and not have object overhead for every integer or float or bool or whatever it is you are storing.

https://en.wikipedia.org/wiki/AoS_and_SoA


This is a great example of the sort of meandering process one might take while examining a performance defect in a large and not entirely familiar code base. You don't always hit the exact right cause immediately, and it's not realistic to assume you will do so without some digging and false starts. Kudos to the author for explaining their work, including the things that didn't pan out.


That's a very important insight that, in many real applications, memory is the the limiting factor for performance.


It's awesome that the resources of the cloud are less limited.

We can work at a higher level.

However, the old salts who had to care about every byte are rightly horrified at the lack of frugality today.


Cloud is if anything where memory hurts more, because it's less shareable than CPU time and so renting it costs more. One of the big reasons I've seen cited for teams with a cloud based application switching from a GCd language like java to rust is because of the memory usage cost of the GC, which can be a 2x or even 3x multiplier compared to the same in rust.


Talk about having your priorities backwards...


Seconding TFortunato: in my experience the people who didn’t ignore performance have much better cloud experiences because they don’t get a massive bill when a Lambda or auto scaled system ramps up to handle that inefficient code. When you were running your own servers you could ignore this somewhat since the sunk cost had already been paid.


I wouldn't say they are less limited, so much as you are able to trade your money for resources a lot faster.. You can scale quickly, but at some point the monthly cloud bill come due, and code optimization starts to look a little more appealing :-)


> code optimization starts to look a little more appealing

If you are on a cloud, you should always consider using cloud native functions and services. Keeping your toe out of the lock-in is expensive :(


I find it curious that "the power of python" is so heavily credited here. Seems most of those tricks are easily done on the jvm, as well. I'd imagine any interpreter based environment could do similar. No?


I'm not super familiar with the JVM, so correct me if I'm wrong on that front, but I think the distinction is that Python provides those kinds of introspection tools in the same language, not just the runtime platform, so it can be easier on tool writers and _much_ easier for individuals who only need to peek into a few of the internals as a small component of some other project.


The profiling tools on the jvm are quite good. Easily comparable to what was presented in this article. I can see some benefit to the idea that you can get a repl looking at the results, though, again, I imagine any runtime based language could give this. (I said interpreter based last time, but it is the runtime that is important, I think.)

That said, I have not seen this done with lisp. I would assume it would look a lot like this.


Hey, I wrote the original post.

Though theoretically any interpreted language gives you all of this, you actually might not have access to the internals in practice. JS (well, node.js and browsers) doesn’t expose GC internals (partly due to a spec requirement that the language be deterministic), and barely gives you good exception introspection.

There is also the ergonomic advantage of no static typing: you just get a reference and can figure out the type later. Very useful when poking around unknown objects. I imagine an API like this in Java would be much more verbose and requiring a lot of casts etc.


Having taken a dive on some memory issues with Java, the tools that you have at your disposal are quite nice, actually. https://www.yourkit.com/java/profiler/features/ shows roughly what the memory charts look like.

So, yeah, if your want is to inspect using the host language, I can see how this makes sense. That said, as you noted in the post, you really just want to get to the root asap. Which is why most visualizations will start from there. :D


Coincidentally I went through a very similar process debugging memory usage of an internal application with guppy recently. One thing I learned is that file dumps and the profile browser are kindof a trap. You discard a huge amount of information when you dump the heap summary to file. The guppy docs aren't too great, but you can poke around the entire heap in detail if you break into a debugger after taking a heap snapshot. You can explore references and referrers, group objects by various categories, and inspect their values without the tedious ctypes cast trick.


You can just use ‘gc.get_referrers’ without any libraries.


I had a similar issue with a clj-kondo, a Clojure linter.

https://github.com/borkdude/clj-kondo/issues/1036

The reason of the memory leak was: I used a memoized function on some argument that should be GC-ed after one run, but as memoized functions store their arguments for future comparison, it was kept in memory forever.


I have some large projects that I wanted to adopt pylint for. The initial hurdle was too large so I wrote this:

https://pypi.org/project/pylint-ignore/


Get more memory.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: