admin管理员组文章数量:1336622
I've been scouring the internet trying to find a solution to this, and I have yet to find an answer despite this seeming like something that should be a very common use case:
How can I safely type check methods or arguments in Python across different files or packages without creating a circular dependency?
Let me give an example of something that should work, and does work in most other strongly typed language (eg C/C++, Java, and even Hack), but does not seem to have a sane solution in Python. Let's take the example of a very common data structure - a Graph.
I want to anize the structure into multiple files to separate concerns. In duck-typed, Python, it might look like this:
# Node.py
class Node:
def __init__(self):
self.edges = [] # Will be a list of Edge objects
def add_edge(self, edge):
self.edges.append(edge)
...
# Edge.py
class Edge:
def __init__(self, node, weight):
self.node = node
self.weight = weight
Now in the above example, neither object needs to import the other, so everything works. At runtime, another class such as Graph
might be responsible for creating nodes and edges, and adding the edges to the nodes. No circular dependency so far.
But suppose I want to implement typechecking to ensure no one accidentally attempts to pass an integer type to add_edge
, or a string to "Edge()". I would type it like the following:
# Node.py
from graph.Edge import Edge
class Node:
def __init__(self):
self.edges: List[Edge] = [] # Will be a list of Edge objects
def add_edge(self, edge: Edge):
self.edges.append(edge)
...
# Edge.py
from graph.Node import Node
class Edge:
def __init__(self, node: Node, weight: int):
self.node: Node = node
self.weight: int = weight
This of course presents a problem, because we now have a circular dependency. In our attempt to make the code more type-safe, we've actually introduced something that will crash at runtime.
The most common solution I've seen posted for this is to use if TYPE_CHECKING
. But this will cause the typechecker to miss a very common, and critical case - where someone tries to use the import for something other than type hinting. Let's take an example:
# Node.py
if TYPE_CHECKING:
from graph.Edge import Edge
class Node:
def __init__(self):
self.edges: List[Edge] = [] # Will be a list of Edge objects
def add_edge(self, edge: Edge):
self.edges.append(edge)
This prevents the circular import, and running pyright
or pyre
will show no errors, since the typing is correct. But now suppose someone adds a new method to the Node class like the following:
# Node.py
if TYPE_CHECKING:
from graph.Edge import Edge
class Node:
def __init__(self):
self.edges: List[Edge] = [] # Will be a list of Edge objects
def add_edge(self, edge: Edge):
self.edges.append(edge)
def easy_add_edge(self, node: "Node", weight: int):
self.add_edges(Edge(node, weight))
The developer then runs pyright
, which confirms that there are no type errors in this method. They push it to production, and suddenly everything breaks. Since the import for Edge was inside if TYPE_CHECKING
which is False
at runtime, python will fail to resolve self.add_edges(Edge(node, weight))
and throw an exception. But the typechecker didn't pick up on this because when it ran, TYPE_CHECKING
was True
!
You might be thinking "why didn't the developer write tests for that method", and you'd be right, but this is exactly the type of bug that typechecking is supposed to solve for! For most developers coming from other languages, the typechecker would be the test for that type of bug!
Without knowing a ton about the inner workings of the python typehinting system, the solution would normally seem obvious - typehint using fully qualified names and never go near the TYPE_CHECKING
variable (we could even write a lint rule to catch cases where people are doing this). But this is the part of the problem where I've hit a wall, as I can't figure out if this is something that is, or ever will be supported in Python.
So the question: How can I safely typecheck code in python in a way that both does not introduce circular dependencies, and also ensures developers aren't trying to use undeclared types at runtime? Can fully-qualified names be used, and if so how?
Note: It is not reasonable to place everything that may need a type hinted dependency in the same file, or even the same package. Separation of concerns into multiple packages is something that is expected on any engineering team for almost all production sized applications. I know there is a way to do this if everything is in the same file, but there's no way I can convert my 200k line Django application into a flat-structured single file app.
I've been scouring the internet trying to find a solution to this, and I have yet to find an answer despite this seeming like something that should be a very common use case:
How can I safely type check methods or arguments in Python across different files or packages without creating a circular dependency?
Let me give an example of something that should work, and does work in most other strongly typed language (eg C/C++, Java, and even Hack), but does not seem to have a sane solution in Python. Let's take the example of a very common data structure - a Graph.
I want to anize the structure into multiple files to separate concerns. In duck-typed, Python, it might look like this:
# Node.py
class Node:
def __init__(self):
self.edges = [] # Will be a list of Edge objects
def add_edge(self, edge):
self.edges.append(edge)
...
# Edge.py
class Edge:
def __init__(self, node, weight):
self.node = node
self.weight = weight
Now in the above example, neither object needs to import the other, so everything works. At runtime, another class such as Graph
might be responsible for creating nodes and edges, and adding the edges to the nodes. No circular dependency so far.
But suppose I want to implement typechecking to ensure no one accidentally attempts to pass an integer type to add_edge
, or a string to "Edge()". I would type it like the following:
# Node.py
from graph.Edge import Edge
class Node:
def __init__(self):
self.edges: List[Edge] = [] # Will be a list of Edge objects
def add_edge(self, edge: Edge):
self.edges.append(edge)
...
# Edge.py
from graph.Node import Node
class Edge:
def __init__(self, node: Node, weight: int):
self.node: Node = node
self.weight: int = weight
This of course presents a problem, because we now have a circular dependency. In our attempt to make the code more type-safe, we've actually introduced something that will crash at runtime.
The most common solution I've seen posted for this is to use if TYPE_CHECKING
. But this will cause the typechecker to miss a very common, and critical case - where someone tries to use the import for something other than type hinting. Let's take an example:
# Node.py
if TYPE_CHECKING:
from graph.Edge import Edge
class Node:
def __init__(self):
self.edges: List[Edge] = [] # Will be a list of Edge objects
def add_edge(self, edge: Edge):
self.edges.append(edge)
This prevents the circular import, and running pyright
or pyre
will show no errors, since the typing is correct. But now suppose someone adds a new method to the Node class like the following:
# Node.py
if TYPE_CHECKING:
from graph.Edge import Edge
class Node:
def __init__(self):
self.edges: List[Edge] = [] # Will be a list of Edge objects
def add_edge(self, edge: Edge):
self.edges.append(edge)
def easy_add_edge(self, node: "Node", weight: int):
self.add_edges(Edge(node, weight))
The developer then runs pyright
, which confirms that there are no type errors in this method. They push it to production, and suddenly everything breaks. Since the import for Edge was inside if TYPE_CHECKING
which is False
at runtime, python will fail to resolve self.add_edges(Edge(node, weight))
and throw an exception. But the typechecker didn't pick up on this because when it ran, TYPE_CHECKING
was True
!
You might be thinking "why didn't the developer write tests for that method", and you'd be right, but this is exactly the type of bug that typechecking is supposed to solve for! For most developers coming from other languages, the typechecker would be the test for that type of bug!
Without knowing a ton about the inner workings of the python typehinting system, the solution would normally seem obvious - typehint using fully qualified names and never go near the TYPE_CHECKING
variable (we could even write a lint rule to catch cases where people are doing this). But this is the part of the problem where I've hit a wall, as I can't figure out if this is something that is, or ever will be supported in Python.
So the question: How can I safely typecheck code in python in a way that both does not introduce circular dependencies, and also ensures developers aren't trying to use undeclared types at runtime? Can fully-qualified names be used, and if so how?
Note: It is not reasonable to place everything that may need a type hinted dependency in the same file, or even the same package. Separation of concerns into multiple packages is something that is expected on any engineering team for almost all production sized applications. I know there is a way to do this if everything is in the same file, but there's no way I can convert my 200k line Django application into a flat-structured single file app.
Share Improve this question edited Nov 19, 2024 at 18:32 wjandrea 33.2k10 gold badges69 silver badges98 bronze badges asked Nov 19, 2024 at 18:11 EphraimEphraim 8,4019 gold badges34 silver badges49 bronze badges 6 | Show 1 more comment3 Answers
Reset to default 3Using qualified names to refer to symbols doesn't work and will probably never work, simply because Python doesn't work that way. import
s find and run files when and only when they are executed; this is why you need TYPE_CHECKING
to begin with.
(As an aside, annotations will be evaluated lazily from Python 3.14 onwards, so update as soon as possible.)
The solution here is, guess what, to also use a linter: Ruff. It has two powerful rules that we will be utilizing: missing required imports (I002
) and runtime import in TYPE_CHECKING
block (TCH004
).
First, put this in a ruff.toml
file in the project root (see the docs for advanced usages):
[lint]
select = ["I002", "TCH004"]
[lint.isort]
required-imports = ["from __future__ import annotations"]
After that, go back to Node.py
. You should then see something similar to this:
(playground)
# I002: Missing required import: `from __future__ import annotations`
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from graph.Edge import Edge
# ^^^^
# TCH004: Import is used for more than type hinting.
class Node:
def __init__(self):
# This usage counts as "type hinting" (with the required import).
# vvvv
self.edges: List[Edge] = []
# This too (with the required import).
# vvvv
def add_edge(self, edge: Edge):
self.edges.append(edge)
# And even this quoted one (with or without).
# vvvvvv
def easy_add_edge(self, node: "Node", weight: int):
self.add_edge(Edge(node, weight))
# ^^^^ But not this.
# Try adding the required import above, then removing it.
I use Ruff here because I'm most familiar with it, but you could replace it with any other linter (Pylint, Flake8, etc.) that has equivalent capabilities.
One way to tackle this might be generics. In the initial version of your code you don't need Node
and Edge
to fully "know" about each other, just to store references to some other generic object. In the later iteration with easy_add_edge
, Node
still doesn't need Edge
in its entirety, it just needs to know the signature of its constructor; you can accomplish that by bounding the generic to a Protocol
. (If this feels weird and cumbersome, think of it as declaring an interface in a header file in another language -- header files exist so that implementation files can share interfaces without circularly including each other).
Here's an example where an EdgeProto
protocol is declared in a Protos.py
file that can be imported anywhere else (since it does not itself import anything else), and Node
and Edge
have no direct dependencies on each other:
# Protos.py
from typing import Protocol
class EdgeProto[N](Protocol):
def __init__(self, node: N, weight: int) -> None: ...
# Node.py
from typing import Generic, Type, TypeVar
# from .Protos import EdgeProto
E = TypeVar("E", bound=EdgeProto)
class Node(Generic[E]):
def __init__(self, edge_cls: Type[E]) -> None:
self.edges: list[E] = []
self.edge_cls = edge_cls
def add_edge(self, edge: E):
self.edges.append(edge)
def easy_add_edge(self, node: "Node", weight: int):
self.add_edge(self.edge_cls(node, weight))
# Edge.py
# from .Protos import EdgeProto
class Edge[N](EdgeProto):
def __init__(self, node: N, weight: int) -> None:
self.node = node
self.weight = weight
# Graph.py
# from .Edge import Edge
# from .Node import Node
class Graph:
def __init__(self) -> None:
self.root = Node(Edge)
self.root.add_edge(Edge(Node(Edge), 10))
g = Graph()
Note that Graph
does know about both classes, and at the time it constructs an actual Node
it injects the actual Edge
class so as to allow it to construct objects of that type itself.
Try it out here: https://mypy-play/?mypy=latest&python=3.12&gist=6c045eba0b6a4c8931e862535b6fc723
There are two separate issues that need to be addressed.
- Forward references: how can type annotations refer to an object that does not exist yet?
- Circular imports: how can two tightly coupled modules use names from each other?
Forward references are critical to the answer. Circular imports are required because how you have chosen to structure your code, and could be avoided with a flatter structure.
Forward References
Forward references are critical to the answer. And there is a simpler example that gets straight to the issue -- a linked list. For example:
class LinkedList:
def __init__(self: head, tail=None):
self.head = head
assert tail is None or isinstance(tail, LinkedList)
self.tail = tail
How do we annotate tail
to show that it must be an instance of LinkedList
? At the time of creation of the __init__
function, the name LinkedList
does not exist. So python is unable to evaluate the name LinkedList
for the type annotation.
Strongly typed languages have a two-stage compilation process that is required for a class to be able to type its functions as taking or returning an instance of itself. The first stage parses the source file to find all the possible symbols. The second stage compiles the source fully. It uses knowledge from the first parse to help identify and appropriately fill in any forward references.
Python does not do things this way. This gives it more flexibility, but also causes its own issues. To deal with forward references in python, you can either provide a string that can be eval
-ed to the appropriate type at a later stage. Or, you can use from __future__ import annotations
at the top of your module, which is just syntactic sugar for the former.
Using stringify-ied type annotations
class LinkedList:
def __init__(self, head: int, tail: 'LinkedList | None') -> None:
self.head = head
self.tail = tail
from typing import get_type_hints
hints = get_type_hints(LinkedList.__init__)
# NB. tail annotation is not a string in this dict (it is still a string in the actual
# annotation data)
assert hints == {'head': int, 'tail': LinkedList | None, 'return': type(None)}
From python 3.14, annotations are processed differently. Evaluation of the annotations is deferred until they are requested. No need to use string annotations any more. This would have been done earlier, but there were implementation concerns on how annotation would be able to refer to scopes that no longer existed (ie. function bodies), or scopes that had changed how their names are referenced (ie. class bodies).
Circular Imports
Classes that need to refer to each other are usually tightly coupled enough that they should be defined in the same module. However, this is not always desirable. For instance, if each class is very large. By using fully qualified names, you can defer having to reference the tightly coupled classes until they both exist. This takes advantage of how python deals with modules that are in circular imports. When this happens, python makes the module available, but makes no guarantees about the contents of the modules until both modules have finished importing.
Given:
a.py
import b
A = 1
b.py
import a
B = 2
If a
gets imported first (by a separate __main__
module), then the execution order will:
- Start import of A (from evaluating a statement from
__main__
) - Create an uninitialised module called
a
, and then start executing its source. - Start import of B (when evaluating first statement of
a.py
[import b
]) - Create an uninitialised module called
b
, and then start executing its source. - Try to import
a
, but detect that it is currently importing. Instead, store uninitialised modulea
in namea
in moduleb
(when evaluating first statement ofb.py
[import a
]) - Set
b.B
equal to 2 (second statement ofb.py
) - Finish import of
b
- Continue import of
a
- Set
a.A
equal to 1 (second statement ofa.py
) - Finish import of
a
- END
This works for runtime uses of the class. However, for type annotations, you need to use string forward references as well.
Fully Qualified Names
# from edge import Edge # would produce an import error
# instead:
import edge
class Node:
# NB: type annotations can be fully or partially stringify-ied
# Here we use a partial string, just for the class that does not exist yet
def __init__(self) -> None:
self.edges: list['edge.Edge'] = []
# Using a fully qualified name means that by the time we get to running
# this function, `edge.Edge` will exist, even though it did not exist when
# the function was created at import time
def add_edge(self: weight: float) -> 'edge.Edge':
edge = edge.Edge(self, weight)
self.edges.append(edge)
return edge
本文标签: Python typehinting packages without creating circular importsStack Overflow
版权声明:本文标题:Python typehinting packages without creating circular imports - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1742407432a2469111.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
"Edge"
instead ofEdge
in your hints when the name isn't available at runtime. That does only work for hints -- your code usingEdge(node, weight)
at runtime means you have a separate "real" problem that has nothing to do with type hints / type checking. – Charles Duffy Commented Nov 19, 2024 at 18:41import edges
in Node.py, andimport node
in Edge.py - which from my understanding still creates a circular dependency. Is my premise on this incorrect – Ephraim Commented Nov 19, 2024 at 18:51Edge(node, weight)
still a type checking issue? The "true" type ofEdge
for use in runtime is actually undefined. But the typechecker is "tricked" into thinking by the imports inif TYPE_CHECKING
into thinking it is the class inside ofEdge.py
. In reality, theedge
inadd_edge(edge)
andEdge()
are in fact two different types within the scope ofNode.py
. One is the Edge class, and the other is undefined. The problem is that the typechecker thinks they are the same type which they aren't at runtime. – Ephraim Commented Nov 19, 2024 at 18:59graph.Edge
without importing it first at runtime in a version of Python that had no support for type hints at all. I'd argue that that makes it unambiguously not a type-checking issue. – Charles Duffy Commented Nov 19, 2024 at 19:03