How to Extract Python Library Imports from Files

Learn how to extract all the Python library imports from your code files with this simple Python script.

Key Takeaways

  • You can extract all the Python library imports from a file using a simple Python script.
  • The script uses the built-in pathlib library to recursively search for all Python files in a directory.
  • The script uses a regular expression to extract the library names from import statements.

Introduction

As your Python project grows, you may find it difficult to keep track of all the libraries that you’ve imported in your code files. This information can be useful for various reasons, such as checking for unused libraries or generating a list of dependencies.

In this article, we’ll show you how to extract all the Python library imports from your code files with a simple Python script. The script will recursively search for all Python files in a directory and extract the library names from import statements.

Step 1: Define the Helper Functions

Before we start, let’s define two helper functions that we’ll use to extract the imports and search for files:


from pathlib import Path
import re

def get_imports(file_path):
    imports = []
    with open(file_path, "r") as file:
        for line in file:
            match = re.search("^import ([^\n]+)", line)
            if match:
                imports += match.group(1).split(", ")
            else:
                match = re.search("^from ([^\n]+) import", line)
                if match:
                    imports.append(match.group(1))
    return imports

def get_files(directory):
    files = []
    for path in Path(directory).rglob("*.py"):
        if path.is_file():
            files.append(str(path))
    return files

The get_imports function takes a file path as input and returns a list of library names that are imported in the file.

The function reads the file line by line and uses regular expressions to match import statements. It extracts the library names from the matched statements and returns them as a list.

The get_files function takes a directory path as input and returns a list of all Python files in the directory and its subdirectories. The function uses the built-in pathlib library to recursively search for all files with the .py extension in the directory.

Step 2: Search for Files and Extract Imports

Now that we have the helper functions, we can use them to search for Python files and extract their imports:


directory_path = "/path/to/your/directory"

for file_path in get_files(directory_path):
    imports = get_imports(file_path)
    if imports:
        print(f"{file_path}:")
        for import_name in imports:
            print(f"{import_name}")

The code above defines the directory_path variable as the path to the directory where your Python files are located. It then uses the

get_files function to get a list of all Python files in the directory and its subdirectories. For each file, it uses the get_imports function to extract the library imports and prints them to the console.

The output will look something like this:

/path/to/your/directory/file1.py:
    os
    sys
/path/to/your/directory/subdirectory/file2.py:
    numpy
    matplotlib.pyplot

Conclusion

With the simple Python script we’ve shown you in this article, you can easily extract all the imports from your code files and get a better understanding of your project’s dependencies.