admin管理员组

文章数量:1336622

I've been scouring the internet trying to find a solution to this, and I have yet to find an answer despite this seeming like something that should be a very common use case:

How can I safely type check methods or arguments in Python across different files or packages without creating a circular dependency?

Let me give an example of something that should work, and does work in most other strongly typed language (eg C/C++, Java, and even Hack), but does not seem to have a sane solution in Python. Let's take the example of a very common data structure - a Graph.

I want to anize the structure into multiple files to separate concerns. In duck-typed, Python, it might look like this:

# Node.py
class Node:
  def __init__(self):
    self.edges = [] # Will be a list of Edge objects

  def add_edge(self, edge):
    self.edges.append(edge)

  ...
# Edge.py
class Edge:
  def __init__(self, node, weight):
    self.node = node
    self.weight = weight

Now in the above example, neither object needs to import the other, so everything works. At runtime, another class such as Graph might be responsible for creating nodes and edges, and adding the edges to the nodes. No circular dependency so far.

But suppose I want to implement typechecking to ensure no one accidentally attempts to pass an integer type to add_edge, or a string to "Edge()". I would type it like the following:

# Node.py
from graph.Edge import Edge
class Node:
  def __init__(self):
    self.edges: List[Edge] = [] # Will be a list of Edge objects

  def add_edge(self, edge: Edge):
    self.edges.append(edge)

  ...
# Edge.py
from graph.Node import Node
class Edge:
  def __init__(self, node: Node, weight: int):
    self.node: Node = node
    self.weight: int = weight

This of course presents a problem, because we now have a circular dependency. In our attempt to make the code more type-safe, we've actually introduced something that will crash at runtime.

The most common solution I've seen posted for this is to use if TYPE_CHECKING. But this will cause the typechecker to miss a very common, and critical case - where someone tries to use the import for something other than type hinting. Let's take an example:

# Node.py
if TYPE_CHECKING:
  from graph.Edge import Edge
class Node:
  def __init__(self):
    self.edges: List[Edge] = [] # Will be a list of Edge objects

  def add_edge(self, edge: Edge):
    self.edges.append(edge)

This prevents the circular import, and running pyright or pyre will show no errors, since the typing is correct. But now suppose someone adds a new method to the Node class like the following:

# Node.py
if TYPE_CHECKING:
  from graph.Edge import Edge
class Node:
  def __init__(self):
    self.edges: List[Edge] = [] # Will be a list of Edge objects

  def add_edge(self, edge: Edge):
    self.edges.append(edge)

  def easy_add_edge(self, node: "Node", weight: int):
    self.add_edges(Edge(node, weight))

The developer then runs pyright, which confirms that there are no type errors in this method. They push it to production, and suddenly everything breaks. Since the import for Edge was inside if TYPE_CHECKING which is False at runtime, python will fail to resolve self.add_edges(Edge(node, weight)) and throw an exception. But the typechecker didn't pick up on this because when it ran, TYPE_CHECKING was True!

You might be thinking "why didn't the developer write tests for that method", and you'd be right, but this is exactly the type of bug that typechecking is supposed to solve for! For most developers coming from other languages, the typechecker would be the test for that type of bug!

Without knowing a ton about the inner workings of the python typehinting system, the solution would normally seem obvious - typehint using fully qualified names and never go near the TYPE_CHECKING variable (we could even write a lint rule to catch cases where people are doing this). But this is the part of the problem where I've hit a wall, as I can't figure out if this is something that is, or ever will be supported in Python.

So the question: How can I safely typecheck code in python in a way that both does not introduce circular dependencies, and also ensures developers aren't trying to use undeclared types at runtime? Can fully-qualified names be used, and if so how?

Note: It is not reasonable to place everything that may need a type hinted dependency in the same file, or even the same package. Separation of concerns into multiple packages is something that is expected on any engineering team for almost all production sized applications. I know there is a way to do this if everything is in the same file, but there's no way I can convert my 200k line Django application into a flat-structured single file app.

I've been scouring the internet trying to find a solution to this, and I have yet to find an answer despite this seeming like something that should be a very common use case:

How can I safely type check methods or arguments in Python across different files or packages without creating a circular dependency?

Let me give an example of something that should work, and does work in most other strongly typed language (eg C/C++, Java, and even Hack), but does not seem to have a sane solution in Python. Let's take the example of a very common data structure - a Graph.

I want to anize the structure into multiple files to separate concerns. In duck-typed, Python, it might look like this:

# Node.py
class Node:
  def __init__(self):
    self.edges = [] # Will be a list of Edge objects

  def add_edge(self, edge):
    self.edges.append(edge)

  ...
# Edge.py
class Edge:
  def __init__(self, node, weight):
    self.node = node
    self.weight = weight

Now in the above example, neither object needs to import the other, so everything works. At runtime, another class such as Graph might be responsible for creating nodes and edges, and adding the edges to the nodes. No circular dependency so far.

But suppose I want to implement typechecking to ensure no one accidentally attempts to pass an integer type to add_edge, or a string to "Edge()". I would type it like the following:

# Node.py
from graph.Edge import Edge
class Node:
  def __init__(self):
    self.edges: List[Edge] = [] # Will be a list of Edge objects

  def add_edge(self, edge: Edge):
    self.edges.append(edge)

  ...
# Edge.py
from graph.Node import Node
class Edge:
  def __init__(self, node: Node, weight: int):
    self.node: Node = node
    self.weight: int = weight

This of course presents a problem, because we now have a circular dependency. In our attempt to make the code more type-safe, we've actually introduced something that will crash at runtime.

The most common solution I've seen posted for this is to use if TYPE_CHECKING. But this will cause the typechecker to miss a very common, and critical case - where someone tries to use the import for something other than type hinting. Let's take an example:

# Node.py
if TYPE_CHECKING:
  from graph.Edge import Edge
class Node:
  def __init__(self):
    self.edges: List[Edge] = [] # Will be a list of Edge objects

  def add_edge(self, edge: Edge):
    self.edges.append(edge)

This prevents the circular import, and running pyright or pyre will show no errors, since the typing is correct. But now suppose someone adds a new method to the Node class like the following:

# Node.py
if TYPE_CHECKING:
  from graph.Edge import Edge
class Node:
  def __init__(self):
    self.edges: List[Edge] = [] # Will be a list of Edge objects

  def add_edge(self, edge: Edge):
    self.edges.append(edge)

  def easy_add_edge(self, node: "Node", weight: int):
    self.add_edges(Edge(node, weight))

The developer then runs pyright, which confirms that there are no type errors in this method. They push it to production, and suddenly everything breaks. Since the import for Edge was inside if TYPE_CHECKING which is False at runtime, python will fail to resolve self.add_edges(Edge(node, weight)) and throw an exception. But the typechecker didn't pick up on this because when it ran, TYPE_CHECKING was True!

You might be thinking "why didn't the developer write tests for that method", and you'd be right, but this is exactly the type of bug that typechecking is supposed to solve for! For most developers coming from other languages, the typechecker would be the test for that type of bug!

Without knowing a ton about the inner workings of the python typehinting system, the solution would normally seem obvious - typehint using fully qualified names and never go near the TYPE_CHECKING variable (we could even write a lint rule to catch cases where people are doing this). But this is the part of the problem where I've hit a wall, as I can't figure out if this is something that is, or ever will be supported in Python.

So the question: How can I safely typecheck code in python in a way that both does not introduce circular dependencies, and also ensures developers aren't trying to use undeclared types at runtime? Can fully-qualified names be used, and if so how?

Note: It is not reasonable to place everything that may need a type hinted dependency in the same file, or even the same package. Separation of concerns into multiple packages is something that is expected on any engineering team for almost all production sized applications. I know there is a way to do this if everything is in the same file, but there's no way I can convert my 200k line Django application into a flat-structured single file app.

Share Improve this question edited Nov 19, 2024 at 18:32 wjandrea 33.2k10 gold badges69 silver badges98 bronze badges asked Nov 19, 2024 at 18:11 EphraimEphraim 8,4019 gold badges34 silver badges49 bronze badges 6
  • Just thinking out loud: Have you considered using stubs? There are pros and cons, but I just wanted to put that out there. Have you considered using interfaces/ABCs instead of concrete classes? – wjandrea Commented Nov 19, 2024 at 18:40
  • Quotes. Use "Edge" instead of Edge in your hints when the name isn't available at runtime. That does only work for hints -- your code using Edge(node, weight) at runtime means you have a separate "real" problem that has nothing to do with type hints / type checking. – Charles Duffy Commented Nov 19, 2024 at 18:41
  • @CharlesDuffy I'm aware of python forward references, but this still requires importing the parent package of the imported class, doesn't it? The main issue is that if I want to have separated concerns, quotes don't seem to work (at least as far as I have been able to figure out) unless I just have one giant flat structured package. For example, if Edge was in the an "edges" package, and node was in a "node" package, I would still have to import edges in Node.py, and import node in Edge.py - which from my understanding still creates a circular dependency. Is my premise on this incorrect – Ephraim Commented Nov 19, 2024 at 18:51
  • @CharlesDuffy - Maybe this is a question of semantics, but isn't the issue of Edge(node, weight) still a type checking issue? The "true" type of Edge for use in runtime is actually undefined. But the typechecker is "tricked" into thinking by the imports in if TYPE_CHECKING into thinking it is the class inside of Edge.py. In reality, the edge in add_edge(edge) and Edge() are in fact two different types within the scope of Node.py. One is the Edge class, and the other is undefined. The problem is that the typechecker thinks they are the same type which they aren't at runtime. – Ephraim Commented Nov 19, 2024 at 18:59
  • @Ephraim, you'd still have the issue of being unable to call graph.Edge without importing it first at runtime in a version of Python that had no support for type hints at all. I'd argue that that makes it unambiguously not a type-checking issue. – Charles Duffy Commented Nov 19, 2024 at 19:03
 |  Show 1 more comment

3 Answers 3

Reset to default 3

Using qualified names to refer to symbols doesn't work and will probably never work, simply because Python doesn't work that way. imports find and run files when and only when they are executed; this is why you need TYPE_CHECKING to begin with.

(As an aside, annotations will be evaluated lazily from Python 3.14 onwards, so update as soon as possible.)

The solution here is, guess what, to also use a linter: Ruff. It has two powerful rules that we will be utilizing: missing required imports (I002) and runtime import in TYPE_CHECKING block (TCH004).

First, put this in a ruff.toml file in the project root (see the docs for advanced usages):

[lint]
select = ["I002", "TCH004"]

[lint.isort]
required-imports = ["from __future__ import annotations"]

After that, go back to Node.py. You should then see something similar to this:

(playground)

# I002: Missing required import: `from __future__ import annotations`

from typing import TYPE_CHECKING

if TYPE_CHECKING:
  from graph.Edge import Edge
  #                      ^^^^
  # TCH004: Import is used for more than type hinting.

class Node:

  def __init__(self):
    # This usage counts as "type hinting" (with the required import).
    #                vvvv
    self.edges: List[Edge] = []

  # This too (with the required import).
  #                        vvvv
  def add_edge(self, edge: Edge):
    self.edges.append(edge)

  # And even this quoted one (with or without).
  #                             vvvvvv
  def easy_add_edge(self, node: "Node", weight: int):
    self.add_edge(Edge(node, weight))
    #             ^^^^ But not this.
    # Try adding the required import above, then removing it.

I use Ruff here because I'm most familiar with it, but you could replace it with any other linter (Pylint, Flake8, etc.) that has equivalent capabilities.

One way to tackle this might be generics. In the initial version of your code you don't need Node and Edge to fully "know" about each other, just to store references to some other generic object. In the later iteration with easy_add_edge, Node still doesn't need Edge in its entirety, it just needs to know the signature of its constructor; you can accomplish that by bounding the generic to a Protocol. (If this feels weird and cumbersome, think of it as declaring an interface in a header file in another language -- header files exist so that implementation files can share interfaces without circularly including each other).

Here's an example where an EdgeProto protocol is declared in a Protos.py file that can be imported anywhere else (since it does not itself import anything else), and Node and Edge have no direct dependencies on each other:

# Protos.py
from typing import Protocol
class EdgeProto[N](Protocol):
    def __init__(self, node: N, weight: int) -> None: ...

# Node.py
from typing import Generic, Type, TypeVar
# from .Protos import EdgeProto
E = TypeVar("E", bound=EdgeProto)

class Node(Generic[E]):
  def __init__(self, edge_cls: Type[E]) -> None:
    self.edges: list[E] = []
    self.edge_cls = edge_cls

  def add_edge(self, edge: E):
    self.edges.append(edge)
    
  def easy_add_edge(self, node: "Node", weight: int):
    self.add_edge(self.edge_cls(node, weight))

# Edge.py
# from .Protos import EdgeProto

class Edge[N](EdgeProto):
  def __init__(self, node: N, weight: int) -> None:
    self.node = node
    self.weight = weight
    
# Graph.py
# from .Edge import Edge
# from .Node import Node

class Graph:
  def __init__(self) -> None:
    self.root = Node(Edge)
    self.root.add_edge(Edge(Node(Edge), 10))

g = Graph()

Note that Graph does know about both classes, and at the time it constructs an actual Node it injects the actual Edge class so as to allow it to construct objects of that type itself.

Try it out here: https://mypy-play/?mypy=latest&python=3.12&gist=6c045eba0b6a4c8931e862535b6fc723

There are two separate issues that need to be addressed.

  1. Forward references: how can type annotations refer to an object that does not exist yet?
  2. Circular imports: how can two tightly coupled modules use names from each other?

Forward references are critical to the answer. Circular imports are required because how you have chosen to structure your code, and could be avoided with a flatter structure.

Forward References

Forward references are critical to the answer. And there is a simpler example that gets straight to the issue -- a linked list. For example:

class LinkedList:
    def __init__(self: head, tail=None):
        self.head = head
        assert tail is None or isinstance(tail, LinkedList) 
        self.tail = tail

How do we annotate tail to show that it must be an instance of LinkedList? At the time of creation of the __init__ function, the name LinkedList does not exist. So python is unable to evaluate the name LinkedList for the type annotation.

Strongly typed languages have a two-stage compilation process that is required for a class to be able to type its functions as taking or returning an instance of itself. The first stage parses the source file to find all the possible symbols. The second stage compiles the source fully. It uses knowledge from the first parse to help identify and appropriately fill in any forward references.

Python does not do things this way. This gives it more flexibility, but also causes its own issues. To deal with forward references in python, you can either provide a string that can be eval-ed to the appropriate type at a later stage. Or, you can use from __future__ import annotations at the top of your module, which is just syntactic sugar for the former.

Using stringify-ied type annotations

class LinkedList:
    def __init__(self, head: int, tail: 'LinkedList | None') -> None:
        self.head = head
        self.tail = tail

from typing import get_type_hints
hints = get_type_hints(LinkedList.__init__)
# NB. tail annotation is not a string in this dict (it is still a string in the actual
# annotation data)
assert hints == {'head': int, 'tail': LinkedList | None, 'return': type(None)}

From python 3.14, annotations are processed differently. Evaluation of the annotations is deferred until they are requested. No need to use string annotations any more. This would have been done earlier, but there were implementation concerns on how annotation would be able to refer to scopes that no longer existed (ie. function bodies), or scopes that had changed how their names are referenced (ie. class bodies).

Circular Imports

Classes that need to refer to each other are usually tightly coupled enough that they should be defined in the same module. However, this is not always desirable. For instance, if each class is very large. By using fully qualified names, you can defer having to reference the tightly coupled classes until they both exist. This takes advantage of how python deals with modules that are in circular imports. When this happens, python makes the module available, but makes no guarantees about the contents of the modules until both modules have finished importing.

Given:

a.py

import b
A = 1

b.py

import a
B = 2

If a gets imported first (by a separate __main__ module), then the execution order will:

  1. Start import of A (from evaluating a statement from __main__)
  2. Create an uninitialised module called a, and then start executing its source.
  3. Start import of B (when evaluating first statement of a.py [import b])
  4. Create an uninitialised module called b, and then start executing its source.
  5. Try to import a, but detect that it is currently importing. Instead, store uninitialised module a in name a in module b (when evaluating first statement of b.py [import a])
  6. Set b.B equal to 2 (second statement of b.py)
  7. Finish import of b
  8. Continue import of a
  9. Set a.A equal to 1 (second statement of a.py)
  10. Finish import of a
  11. END

This works for runtime uses of the class. However, for type annotations, you need to use string forward references as well.

Fully Qualified Names

# from edge import Edge  # would produce an import error
# instead:
import edge

class Node:
   # NB: type annotations can be fully or partially stringify-ied
   # Here we use a partial string, just for the class that does not exist yet
   def __init__(self) -> None:
       self.edges: list['edge.Edge'] = []

   # Using a fully qualified name means that by the time we get to running
   # this function, `edge.Edge` will exist, even though it did not exist when
   # the function was created at import time
   def add_edge(self: weight: float) -> 'edge.Edge':
       edge = edge.Edge(self, weight)
       self.edges.append(edge)
       return edge

本文标签: Python typehinting packages without creating circular importsStack Overflow