Sunday, March 12, 2017

[Python] Deleting files by date from directory with a few million files

Recent problem: Single directory with 5.3 mio files. Using find did not work because it was filling memory up. I tried anything I can think of but in the end had to go with python to iterate over the directory. I used https://github.com/benhoyt/scandir 
Following Code will do the job

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import scandir
import time
import os

old = time.time() - 28 * 24 * 60 * 60 #28 days

for entry in scandir.scandir('/tmp/test'):
 #print(os.path.getctime(os.path.join('/tmp/test',item))) 
 try:
  if (os.path.getctime(os.path.join('/tmp/test',entry.name)) < old):
   os.unlink(os.path.join('/tmp/test',entry.name))
 except:
  pass

This new scandir version is implemented since python 3.5. If you don't have it just download the scandir.py and put it in the same directory as this code.

No comments:

Post a Comment