Ben Chuanlong Du's Blog

It is never too late to learn.

Iterate All Descendant Files in a Directory in Python

Using pathlib.Path.glob

This is the easiest way to iterate through all descendant files of a directory in Python.

In [1]:
from pathlib import Path
In [11]:
paths = Path("cn").glob("**/*")
[path for path in paths if path.is_dir()][:20]
Out[11]:
[PosixPath('cn/__pycache__'),
 PosixPath('cn/output'),
 PosixPath('cn/content'),
 PosixPath('cn/output/tag'),
 PosixPath('cn/output/author'),
 PosixPath('cn/output/feeds'),
 PosixPath('cn/output/.git'),
 PosixPath('cn/output/category'),
 PosixPath('cn/output/theme'),
 PosixPath('cn/output/drafts'),
 PosixPath('cn/output/blog'),
 PosixPath('cn/output/.git/logs'),
 PosixPath('cn/output/.git/objects'),
 PosixPath('cn/output/.git/info'),
 PosixPath('cn/output/.git/hooks'),
 PosixPath('cn/output/.git/branches'),
 PosixPath('cn/output/.git/refs'),
 PosixPath('cn/output/.git/logs/refs'),
 PosixPath('cn/output/.git/logs/refs/remotes'),
 PosixPath('cn/output/.git/logs/refs/heads')]
In [12]:
paths = Path("misc").glob("**/*")
[path for path in paths if path.is_file()][:20]
Out[12]:
[PosixPath('misc/pconf.py'),
 PosixPath('misc/pages/shopping.markdown'),
 PosixPath('misc/pages/links.markdown'),
 PosixPath('misc/pages/tools.markdown'),
 PosixPath('misc/pages/stat.markdown'),
 PosixPath('misc/pages/job.markdown'),
 PosixPath('misc/pages/news.markdown'),
 PosixPath('misc/pages/forum.markdown'),
 PosixPath('misc/pages/learning.markdown'),
 PosixPath('misc/__pycache__/pconf.cpython-37.pyc'),
 PosixPath('misc/output/index45.html'),
 PosixPath('misc/output/index146.html'),
 PosixPath('misc/output/index150.html'),
 PosixPath('misc/output/index56.html'),
 PosixPath('misc/output/index29.html'),
 PosixPath('misc/output/index65.html'),
 PosixPath('misc/output/index160.html'),
 PosixPath('misc/output/index151.html'),
 PosixPath('misc/output/index49.html'),
 PosixPath('misc/output/index60.html')]

Using os.walk

In [3]:
import os

for subdir, dirs, files in os.walk("."):
    for file in files:
        filepath = os.path.join(subdir, file)
        if filepath.endswith(".csv"):
            print(subdir)
            print(dirs)
            print(filepath)
./f2.csv
[]
./f2.csv/part-00000-96dab35f-bfbb-4134-babe-14553e963d25-c000.csv

Or you can implement it yourself using Path.iterdir().

In [ ]:
 

Comments