In the free threaded build, the `_PyObject_LookupSpecial()` call can lead to
reference count contention on the returned function object becuase it
doesn't use stackrefs. Refactor some of the callers to use
`_PyObject_MaybeCallSpecialNoArgs`, which uses stackrefs internally.
This fixes the scaling bottleneck in the "lookup_special" microbenchmark
in `ftscalingbench.py`. However, the are still some uses of
`_PyObject_LookupSpecial()` that need to be addressed in future PRs.
Move pycore_obmalloc.h include from pycore_interp_structs.h to
pycore_runtime_structs.h.
Add also comment explaining the purpose of each include in
pycore_interp_structs.h, pycore_runtime_structs.h and
pycore_structs.h.
Remove <stdbool.h> and <stddef.h> from pycore_structs.h.