ruff/crates/ty_python_semantic at 4b80f5fa4f08dd0948761e15e36f5138658793e4 - language-servers/ruff

mirror of https://github.com/astral-sh/ruff.git synced 2025-10-22 08:12:17 +00:00

History

David Peter 4b80f5fa4f [ty] Optimize TDD atom ordering (#20098 ) ## Summary While looking at some logging output that I added to `ReachabilityConstraintBuilder::add_and_constraint` in order to debug https://github.com/astral-sh/ty/issues/1091, I noticed that it seemed to suggest that the TDD was built in an imbalanced way for code like the following, where we have a sequence of non-nested `if` conditions: ```py def f(t1, t2, t3, t4, …): x = 0 if t1: x = 1 if t2: x = 2 if t3: x = 3 if t4: x = 4 … ``` To understand this a bit better, I added some code to the `ReachabilityConstraintBuilder` to render the resulting TDD. On `main`, we get a tree that looks like the following, where you can see a pattern of N sub-trees that grow linearly with N (number of `if` statements). This results in an overall tree structure that has N² nodes (see graph below): <img alt="normal order" src="https://github.com/user-attachments/assets/aab40ce9-e82a-4fcd-823a-811f05f15f66" /> If we zoom in to one of these subgraphs, we can see what the problem is. When we add new constraints that represent combinations like `t1 AND ~t2 AND ~t3 AND t4 AND …`, they start with the evaluation of "early" conditions (`t1`, `t2`, …). This means that we have to create new subgraphs for each new `if` condition because there is little sharing with the previous structure. We evaluate the Boolean condition in a right-associative way: `t1 AND (~t2 AND (~t3 AND t4)))`: <img width="500" align="center" src="https://github.com/user-attachments/assets/31ea7182-9e00-4975-83df-d980464f545d" /> If we change the ordering of TDD atoms, we can change that to a left-associative evaluation: `(((t1 AND ~t2) AND ~t3) AND t4) …`. This means that we can re-use previous subgraphs `(t1 AND ~t2)`, which results in a much more compact graph structure overall (note how "late" conditions are now at the top, and "early" conditions are further down in the graph): <img alt="reverse order" src="https://github.com/user-attachments/assets/96a6b7c1-3d35-4192-a917-0b2d24c6b144" /> If we count the number of TDD nodes for a growing number if `if` statements, we can see that this change results in a slower growth. It's worth noting that the growth is still superlinear, though: <img width="800" height="600" alt="plot" src="https://github.com/user-attachments/assets/22e8394f-e74e-4a9e-9687-0d41f94f2303" /> On the actual code from the referenced ticket (the `t_main.py` file reduced to its main function, with the main function limited to 2000 lines instead of 11000 to allow the version on `main` to run to completion), the effect is much more dramatic. Instead of 26 million TDD nodes (`main`), we now only create 250 thousand (this branch), which is slightly less than 1%. The change in this PR allows us to build the semantic index and type-check the problematic `t_main.py` file in https://github.com/astral-sh/ty/issues/1091 in 9 seconds. This is still not great, but an obvious improvement compared to running out of memory after minutes of execution. An open question remains whether this change is beneficial for all kinds of code patterns, or just this linear sequence of `if` statements. It does not seem unreasonable to think that referring to "earlier" conditions is generally a good idea, but I learned from Doug that it's generally not possible to find a TDD-construction heuristic that is non-pathological for all kinds of inputs. Fortunately, it seems like this change here results in performance improvements across all of our benchmarks, which should increase the confidence in this change: \| Benchmark \| Improvement \| \|---------------------\|-------------------------\| \| hydra-zen \| +13% \| \| DateType \| +5% \| \| sympy (walltime) \| +4% \| \| attrs \| +4% \| \| pydantic (walltime) \| +2% \| \| pandas (walltime) \| +2% \| \| altair (walltime) \| +2% \| \| static-frame \| +2% \| \| anyio \| +1% \| \| freqtrade \| +1% \| \| colour-science \| +1% \| \| tanjun \| +1% \| closes https://github.com/astral-sh/ty/issues/1091 --------- Co-authored-by: Douglas Creager <dcreager@dcreager.net>		2025-08-27 20:42:09 +02:00
..
resources	[ty] Preserve qualifiers when accessing attributes on unions/intersections (#20114 )	2025-08-27 20:01:45 +02:00
src	[ty] Optimize TDD atom ordering (#20098 )	2025-08-27 20:42:09 +02:00
tests	[ty] Track open files in the server (#19264 )	2025-07-18 19:33:35 +05:30
build.rs
Cargo.toml	[ty] Avoid overcounting shared memory usage (#19773 )	2025-08-06 15:32:02 -04:00
mdtest.py
mdtest.py.lock