Memory not released after parse on Linux/Mac
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| lxml |
Confirmed
|
Medium
|
scoder | ||
Bug Description
Python Info:
There seems to be some sort of issue where memory isn't being freed after the ElementTree object goes out of scope even upon forcing a garbage collection via `gc.collect()`. Interestingly on Windows, I don't see the leak. I have slightly different Python versions, but the same version of lxml itself (though different versions of underlying components.)
My test script uses a sample file from wikipedia. It can be found directly https:/
Here is my test script (lxmltest.py). Put it next to wikidatawiki-
```
import psutil
from threading import Thread, Lock
from lxml.etree import parse
from pympler import tracker
import gc
import time
import sys
from lxml import etree
PRINT_LOCK = Lock()
FILE_STR = 'wikidatawiki-
def parse_once():
print(
def start_memory_
def memory_usage():
proc = psutil.Process()
while True:
with PRINT_LOCK:
thread = Thread(
thread.daemon = True
thread.start()
if __name__ == '__main__':
print("%-20s: %s" % ('Python', sys.version_info))
print("%-20s: %s" % ('lxml.etree', etree.LXML_
print("%-20s: %s" % ('libxml used', etree.LIBXML_
print("%-20s: %s" % ('libxml compiled', etree.LIBXML_
print("%-20s: %s" % ('libxslt used', etree.LIBXSLT_
print("%-20s: %s" % ('libxslt compiled', etree.LIBXSLT_
start_
mem_info = tracker.
for i in range(5):
with PRINT_LOCK:
with PRINT_LOCK:
with PRINT_LOCK:
```
Here is the output on Windows:
```
C:\Users\
Python : sys.version_
lxml.etree : (5, 4, 0, 0)
libxml used : (2, 11, 9)
libxml compiled : (2, 11, 9)
libxslt used : (1, 1, 39)
libxslt compiled : (1, 1, 39)
Memory usage: 0.0348930358886
Start: 0
=======
function (store_info) | 1 | 152 B
functools.
builtin_
Memory usage: 0.15768051147460938 GB
Memory usage: 0.3460693359375 GB
Memory usage: 0.5390472412109375 GB
Memory usage: 0.7269554138183594 GB
Memory usage: 0.9176521301269531 GB
Memory usage: 1.0989875793457031 GB
Memory usage: 1.2846717834472656 GB
Memory usage: 1.4683494567871094 GB
Memory usage: 1.6554908752441406 GB
<lxml.etree.
End: 0 (pre-collect)
=======
lxml.
lxml.
|------
|------
End: 0 (post-collect)
types | # objects | total size
======= | =========== | ============
list | 1 | 80 B
str | 1 | 74 B
|------
|------
Start: 1
=======
builtin_
Memory usage: 0.04637908935546875 GB
Memory usage: 0.22696685791015625 GB
Memory usage: 0.4188079833984375 GB
Memory usage: 0.6088142395019531 GB
Memory usage: 0.7968864440917969 GB
Memory usage: 0.9865493774414062 GB
Memory usage: 1.1704559326171875 GB
Memory usage: 1.3574256896972656 GB
Memory usage: 1.5435066223144531 GB
<lxml.etree.
Memory usage: 0.04497528076171875 GB
End: 1 (pre-collect)
=======
builtin_
|------
|------
End: 1 (post-collect)
types | # objects | total size
======= | =========== | ============
|------
|------
Start: 2
types | # objects | total size
======= | =========== | ============
Memory usage: 0.1110076904296875 GB
Memory usage: 0.29840850830078125 GB
Memory usage: 0.4913749694824219 GB
Memory usage: 0.6786384582519531 GB
Memory usage: 0.8669891357421875 GB
Memory usage: 1.0520248413085938 GB
Memory usage: 1.2321434020996094 GB
Memory usage: 1.4098663330078125 GB
Memory usage: 1.5959892272949219 GB
<lxml.etree.
Memory usage: 0.0462150573730
End: 2 (pre-collect)
types | # objects | total size
======= | =========== | ============
|------
|------
End: 2 (post-collect)
types | # objects | total size
======= | =========== | ============
|------
|------
Start: 3
types | # objects | total size
======= | =========== | ============
Memory usage: 0.11426544189453125 GB
Memory usage: 0.2979087829589844 GB
Memory usage: 0.4909019470214844 GB
Memory usage: 0.6770286560058594 GB
Memory usage: 0.8621673583984375 GB
Memory usage: 1.0484466552734375 GB
Memory usage: 1.2336997985839844 GB
Memory usage: 1.4193458557128906 GB
Memory usage: 1.6043853759765625 GB
<lxml.etree.
Memory usage: 0.04677581787109375 GB
End: 3 (pre-collect)
types | # objects | total size
======= | =========== | ============
|------
|------
End: 3 (post-collect)
types | # objects | total size
======= | =========== | ============
|------
|------
Start: 4
types | # objects | total size
======= | =========== | ============
Memory usage: 0.11605453491210938 GB
Memory usage: 0.2964820861816406 GB
Memory usage: 0.4882354736328125 GB
Memory usage: 0.6756134033203125 GB
Memory usage: 0.8640823364257812 GB
Memory usage: 1.0492401123046875 GB
Memory usage: 1.2366714477539062 GB
Memory usage: 1.4215164184570312 GB
Memory usage: 1.606719970703125 GB
<lxml.etree.
Memory usage: 0.04656219482421875 GB
End: 4 (pre-collect)
types | # objects | total size
======= | =========== | ============
|------
|------
End: 4 (post-collect)
types | # objects | total size
======= | =========== | ============
|------
|------
```
Notice how on Windows, the memory usage moves like a sine wave where memory goes up while loading the document then goes down as it goes out of scope and gets garbage collected. This behavior makes sense and seems fine.
Here is the output on Linux (via WSL, though it also reproduced similarly in docker). It was much slower, thought that was likely WSL more than lxml:
```
csm10495@
Python : sys.version_
lxml.etree : (5, 4, 0, 0)
libxml used : (2, 13, 8)
libxml compiled : (2, 13, 8)
libxslt used : (1, 1, 43)
libxslt compiled : (1, 1, 43)
Memory usage: 0.0167045593261
Start: 0
=======
function (store_info) | 1 | 136 B
builtin_
Memory usage: 0.0414848327636
Memory usage: 0.07395553588867188 GB
Memory usage: 0.10471725463867188 GB
Memory usage: 0.13499069213867188 GB
Memory usage: 0.16550827026367188 GB
Memory usage: 0.19480514526367188 GB
Memory usage: 0.22434616088867188 GB
Memory usage: 0.2546195983886719 GB
Memory usage: 0.2846488952636719 GB
Memory usage: 0.3134574890136719 GB
Memory usage: 0.3434867858886719 GB
Memory usage: 0.3732719421386719 GB
Memory usage: 0.4040336608886719 GB
Memory usage: 0.4328422546386719 GB
Memory usage: 0.4623832702636719 GB
Memory usage: 0.4921684265136719 GB
Memory usage: 0.5212211608886719 GB
Memory usage: 0.5505180358886719 GB
Memory usage: 0.5798149108886719 GB
Memory usage: 0.6088676452636719 GB
Memory usage: 0.6393852233886719 GB
Memory usage: 0.6684379577636719 GB
Memory usage: 0.6967582702636719 GB
Memory usage: 0.7267875671386719 GB
Memory usage: 0.7568168640136719 GB
Memory usage: 0.7863578796386719 GB
Memory usage: 0.8161430358886719 GB
Memory usage: 0.8459281921386719 GB
Memory usage: 0.8742485046386719 GB
Memory usage: 0.9023246765136719 GB
Memory usage: 0.9291801452636719 GB
Memory usage: 0.9560356140136719 GB
Memory usage: 0.9846000671386719 GB
Memory usage: 1.0129203796386719 GB
Memory usage: 1.0427055358886719 GB
Memory usage: 1.0720024108886719 GB
Memory usage: 1.1003227233886719 GB
Memory usage: 1.1279106140136719 GB
Memory usage: 1.1564750671386719 GB
Memory usage: 1.1847953796386719 GB
Memory usage: 1.2133598327636719 GB
Memory usage: 1.2424125671386719 GB
Memory usage: 1.2712211608886719 GB
Memory usage: 1.3000297546386719 GB
Memory usage: 1.3288383483886719 GB
Memory usage: 1.3571586608886719 GB
Memory usage: 1.3852348327636719 GB
Memory usage: 1.4142875671386719 GB
Memory usage: 1.4428520202636719 GB
Memory usage: 1.4719047546386719 GB
Memory usage: 1.5009574890136719 GB
Memory usage: 1.5287895202636719 GB
Memory usage: 1.5563774108886719 GB
Memory usage: 1.5837211608886719 GB
Memory usage: 1.6144828796386719 GB
<lxml.etree.
End: 0 (pre-collect)
=======
lxml.
lxml.
|------
|------
End: 0 (post-collect)
types | # objects | total size
======= | =========== | ============
list | 1 | 80 B
str | 1 | 74 B
|------
|------
Start: 1
types | # objects | total size
======= | =========== | ============
Memory usage: 1.6229896545410156 GB
Memory usage: 1.6261634826660156 GB
Memory usage: 1.6293373107910156 GB
Memory usage: 1.6329994201660156 GB
Memory usage: 1.6369056701660156 GB
Memory usage: 1.6403236389160156 GB
Memory usage: 1.6432533264160156 GB
Memory usage: 1.6474037170410156 GB
Memory usage: 1.6505775451660156 GB
Memory usage: 1.6539955139160156 GB
Memory usage: 1.6579017639160156 GB
Memory usage: 1.6618080139160156 GB
Memory usage: 1.6652259826660156 GB
Memory usage: 1.6686439514160156 GB
Memory usage: 1.6720619201660156 GB
Memory usage: 1.6754798889160156 GB
Memory usage: 1.6791419982910156 GB
Memory usage: 1.6823158264160156 GB
Memory usage: 1.6862220764160156 GB
Memory usage: 1.6901283264160156 GB
Memory usage: 1.6940345764160156 GB
Memory usage: 1.6974525451660156 GB
Memory usage: 1.7008705139160156 GB
Memory usage: 1.7045326232910156 GB
Memory usage: 1.7086830139160156 GB
Memory usage: 1.7123451232910156 GB
Memory usage: 1.7155189514160156 GB
Memory usage: 1.7191810607910156 GB
Memory usage: 1.7228431701660156 GB
Memory usage: 1.7265052795410156 GB
Memory usage: 1.7304115295410156 GB
Memory usage: 1.7338294982910156 GB
Memory usage: 1.7377357482910156 GB
Memory usage: 1.7413978576660156 GB
Memory usage: 1.7453041076660156 GB
Memory usage: 1.7492103576660156 GB
Memory usage: 1.7528724670410156 GB
Memory usage: 1.7565345764160156 GB
Memory usage: 1.7597084045410156 GB
Memory usage: 1.7638587951660156 GB
Memory usage: 1.7680091857910156 GB
Memory usage: 1.7716712951660156 GB
Memory usage: 1.7750892639160156 GB
Memory usage: 1.7785072326660156 GB
Memory usage: 1.7821693420410156 GB
Memory usage: 1.7860755920410156 GB
Memory usage: 1.7897377014160156 GB
Memory usage: 1.7931556701660156 GB
Memory usage: 1.7965736389160156 GB
Memory usage: 1.7999916076660156 GB
Memory usage: 1.8036537170410156 GB
<lxml.etree.
End: 1 (pre-collect)
types | # objects | total size
======= | =========== | ============
|------
|------
End: 1 (post-collect)
types | # objects | total size
======= | =========== | ============
|------
|------
Start: 2
types | # objects | total size
======= | =========== | ============
Memory usage: 1.8046035766601562 GB
Memory usage: 1.8046035766601562 GB
Memory usage: 1.8048477172851562 GB
Memory usage: 1.8050918579101562 GB
Memory usage: 1.8055801391601562 GB
Memory usage: 1.8060684204101562 GB
Memory usage: 1.8065567016601562 GB
Memory usage: 1.8072891235351562 GB
Memory usage: 1.8075332641601562 GB
Memory usage: 1.8077774047851562 GB
Memory usage: 1.8085098266601562 GB
Memory usage: 1.8089981079101562 GB
Memory usage: 1.8092422485351562 GB
Memory usage: 1.8094863891601562 GB
Memory usage: 1.8099746704101562 GB
Memory usage: 1.8107070922851562 GB
Memory usage: 1.8111953735351562 GB
Memory usage: 1.8114395141601562 GB
Memory usage: 1.8119277954101562 GB
Memory usage: 1.8124160766601562 GB
Memory usage: 1.8126602172851562 GB
Memory usage: 1.8129043579101562 GB
Memory usage: 1.8133926391601562 GB
Memory usage: 1.8136367797851562 GB
Memory usage: 1.8141250610351562 GB
Memory usage: 1.8143692016601562 GB
Memory usage: 1.8146133422851562 GB
Memory usage: 1.8151016235351562 GB
Memory usage: 1.8155899047851562 GB
Memory usage: 1.8158340454101562 GB
Memory usage: 1.8165664672851562 GB
Memory usage: 1.8168106079101562 GB
Memory usage: 1.8175430297851562 GB
Memory usage: 1.8180313110351562 GB
Memory usage: 1.8185195922851562 GB
Memory usage: 1.8190078735351562 GB
Memory usage: 1.8192520141601562 GB
Memory usage: 1.8197402954101562 GB
Memory usage: 1.8202285766601562 GB
Memory usage: 1.8204727172851562 GB
Memory usage: 1.8209609985351562 GB
Memory usage: 1.8214492797851562 GB
Memory usage: 1.8216934204101562 GB
Memory usage: 1.8219375610351562 GB
Memory usage: 1.8224258422851562 GB
Memory usage: 1.8229141235351562 GB
Memory usage: 1.8231582641601562 GB
Memory usage: 1.8236465454101562 GB
Memory usage: 1.8238906860351562 GB
Memory usage: 1.8243789672851562 GB
Memory usage: 1.8243789672851562 GB
Memory usage: 1.8253555297851562 GB
<lxml.etree.
End: 2 (pre-collect)
types | # objects | total size
======= | =========== | ============
|------
|------
End: 2 (post-collect)
types | # objects | total size
======= | =========== | ============
code | 0 | 112 B
|------
|------
Start: 3
types | # objects | total size
======= | =========== | ============
Memory usage: 1.8261222839355469 GB
Memory usage: 1.8261222839355469 GB
Memory usage: 1.8261222839355469 GB
Memory usage: 1.8263664245605469 GB
Memory usage: 1.8266105651855469 GB
Memory usage: 1.8266105651855469 GB
Memory usage: 1.8266105651855469 GB
Memory usage: 1.8266105651855469 GB
Memory usage: 1.8266105651855469 GB
Memory usage: 1.8266105651855469 GB
Memory usage: 1.8266105651855469 GB
Memory usage: 1.8266105651855469 GB
Memory usage: 1.8266105651855469 GB
Memory usage: 1.8266105651855469 GB
Memory usage: 1.8268547058105469 GB
Memory usage: 1.8268547058105469 GB
Memory usage: 1.8268547058105469 GB
Memory usage: 1.8268547058105469 GB
Memory usage: 1.8268547058105469 GB
Memory usage: 1.8268547058105469 GB
Memory usage: 1.8268547058105469 GB
Memory usage: 1.8273429870605469 GB
Memory usage: 1.8273429870605469 GB
Memory usage: 1.8273429870605469 GB
Memory usage: 1.8273429870605469 GB
Memory usage: 1.8273429870605469 GB
Memory usage: 1.8273429870605469 GB
Memory usage: 1.8273429870605469 GB
Memory usage: 1.8273429870605469 GB
Memory usage: 1.8273429870605469 GB
Memory usage: 1.8275871276855469 GB
Memory usage: 1.8275871276855469 GB
Memory usage: 1.8278312683105469 GB
Memory usage: 1.8278312683105469 GB
Memory usage: 1.8278312683105469 GB
Memory usage: 1.8278312683105469 GB
Memory usage: 1.8278312683105469 GB
Memory usage: 1.8280754089355469 GB
Memory usage: 1.8280754089355469 GB
Memory usage: 1.8283195495605469 GB
Memory usage: 1.8283195495605469 GB
Memory usage: 1.8283195495605469 GB
Memory usage: 1.8283195495605469 GB
Memory usage: 1.8283195495605469 GB
Memory usage: 1.8283195495605469 GB
Memory usage: 1.8283195495605469 GB
Memory usage: 1.8283195495605469 GB
Memory usage: 1.8285636901855469 GB
Memory usage: 1.8285636901855469 GB
Memory usage: 1.8285636901855469 GB
<lxml.etree.
End: 3 (pre-collect)
types | # objects | total size
======= | =========== | ============
code | 0 | 70 B
|------
|------
Memory usage: 1.8325309753417969 GB
End: 3 (post-collect)
types | # objects | total size
======= | =========== | ============
|------
|------
Start: 4
types | # objects | total size
======= | =========== | ============
Memory usage: 1.8325424194335938 GB
Memory usage: 1.8325424194335938 GB
Memory usage: 1.8325424194335938 GB
Memory usage: 1.8325424194335938 GB
Memory usage: 1.8325424194335938 GB
Memory usage: 1.8325424194335938 GB
Memory usage: 1.8325424194335938 GB
Memory usage: 1.8325424194335938 GB
Memory usage: 1.8325424194335938 GB
Memory usage: 1.8325424194335938 GB
Memory usage: 1.8325424194335938 GB
Memory usage: 1.8325424194335938 GB
Memory usage: 1.8325424194335938 GB
Memory usage: 1.8325424194335938 GB
Memory usage: 1.8327865600585938 GB
Memory usage: 1.8327865600585938 GB
Memory usage: 1.8327865600585938 GB
Memory usage: 1.8327865600585938 GB
Memory usage: 1.8327865600585938 GB
Memory usage: 1.8327865600585938 GB
Memory usage: 1.8327865600585938 GB
Memory usage: 1.8327865600585938 GB
Memory usage: 1.8327865600585938 GB
Memory usage: 1.8327865600585938 GB
Memory usage: 1.8327865600585938 GB
Memory usage: 1.8327865600585938 GB
Memory usage: 1.8327865600585938 GB
Memory usage: 1.8327865600585938 GB
Memory usage: 1.8327865600585938 GB
Memory usage: 1.8327865600585938 GB
Memory usage: 1.8327865600585938 GB
Memory usage: 1.8327865600585938 GB
Memory usage: 1.8327865600585938 GB
Memory usage: 1.8327865600585938 GB
Memory usage: 1.8327865600585938 GB
Memory usage: 1.8327865600585938 GB
Memory usage: 1.8327865600585938 GB
Memory usage: 1.8327865600585938 GB
Memory usage: 1.8327865600585938 GB
Memory usage: 1.8327865600585938 GB
Memory usage: 1.8327865600585938 GB
Memory usage: 1.8327865600585938 GB
Memory usage: 1.8327865600585938 GB
Memory usage: 1.8327865600585938 GB
Memory usage: 1.8327865600585938 GB
Memory usage: 1.8327865600585938 GB
Memory usage: 1.8327865600585938 GB
Memory usage: 1.8327865600585938 GB
Memory usage: 1.8327865600585938 GB
Memory usage: 1.8327865600585938 GB
<lxml.etree.
End: 4 (pre-collect)
types | # objects | total size
======= | =========== | ============
|------
|------
End: 4 (post-collect)
types | # objects | total size
======= | =========== | ============
|------
|------
```
In Linux, the memory usage doesn't go down between cycles (or even after a forced garbage collection via `gc.collect()`).
It seems like on Linux memory is leaked upon every file parsed via lxml. I have a feeling if we had multiple different xml files, the memory usage would keep going up. It almost seems like it caches some parts of the file to eventually level out the memory usage when using the same file over and over.

On Mac:
``` cmachalo- mn2:/tmp/ v $ python lxmltest.py info(major= 3, minor=13, micro=2, releaselevel= 'final' , serial=0)
types | # objects | total size ======= ======= ======= | =========== | =============
list | 1793 | 155.81 KB
str | 1764 | 107.73 KB
int | 360 | 9.84 KB
dict | 2 | 288 B
code | 0 | 112 B
cell | 2 | 80 B
_sre. SRE_Template | 1 | 72 B function_ or_method | 1 | 72 B _lru_list_ elem | 1 | 56 B
tuple | -81 | -5736 B _ElementTree object at 0x101dd4280>
types | # objects | total size ======= ======= ====== | =========== | ============
list | 5 | 328 B etree._ ParserContext | 1 | 120 B
str | 2 | 120 B
lxml.etree. _ErrorLog | 1 | 80 B etree._ TempStore | 1 | 48 B
tuple | -3 | -208 B
code | -1 | -280 B ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------| ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------| ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------| ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------| _ElementTree object at 0x101dfbec0> ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------| ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------. ..
(venv:v) cmachalo@
Python : sys.version_
lxml.etree : (5, 4, 0, 0)
libxml used : (2, 13, 8)
libxml compiled : (2, 13, 8)
libxslt used : (1, 1, 43)
libxslt compiled : (1, 1, 43)
Memory usage: 0.0308380126953125 GB
Start: 0
=======
function (store_info) | 1 | 160 B
builtin_
functools.
Memory usage: 0.2293853759765625 GB
Memory usage: 0.473175048828125 GB
Memory usage: 0.7103729248046875 GB
Memory usage: 0.9484405517578125 GB
Memory usage: 1.175750732421875 GB
Memory usage: 1.39697265625 GB
Memory usage: 1.6256103515625 GB
<lxml.etree.
End: 0 (pre-collect)
=======
lxml.
lxml.
|------
|------
End: 0 (post-collect)
types | # objects | total size
======= | =========== | ============
list | 1 | 80 B
str | 1 | 66 B
|------
|------
Memory usage: 0.8919830322265625 GB
Start: 1
types | # objects | total size
======= | =========== | ============
Memory usage: 0.9423065185546875 GB
Memory usage: 1.128265380859375 GB
Memory usage: 1.2782135009765625 GB
Memory usage: 1.4556732177734375 GB
Memory usage: 1.6227569580078125 GB
Memory usage: 1.7854156494140625 GB
Memory usage: 1.961761474609375 GB
<lxml.etree.
End: 1 (pre-collect)
types | # objects | total size
======= | =========== | ============
|------
|------