Imagine there is a framework which provides a method called logutils.set_up()
which sets up the logging according to some config.
Setting up the logging should be done as early as possible since warnings emitted during importing libraries should not be lost.
Since the old way (if __name__=='__main__':
) looks ugly, we use console_script entrypoints to register the main()
method.
# foo/daily_report.py
from framework import logutils
logutils.set_up()
def main():...
My problem is that logutils.set_up()
might be called twice:
Imagine there is a second console script which calls logutils.set_up()
and imports daily_report.py
.
I can change the framework code and set_up()
to do nothing in the second call to logutils.set_up()
, but this feels clumsy. I would like to avoid it.
How can I be sure that logutils.set_up()
gets only executed once?
There are a few ways to achieve the goal, each with its advantages and disadvantages.
(some of these overlap with the other answers. I don't mean to plagiarize, only to provide a comprehensive answer).
Approach 1: The function should do it
One way to guarantee a function only gets executed once, is to make the function itself stateful, making it "remember" it has already been called. This is more or less what is described by @eestrada and @qarma.
As to implementing this, I agree with @qarma that using memoization is the simplest and most ideomatic way. There are a few simple memoization decorators for python on the internet. The one included in the standard library is functools.lru_cache
. You can simply use it like:
@functools.lru_cache
def set_up(): # this is your original set_up() function, now decorated<...same as before...>
The disadvantage here is that it is arguably not the set_up
's responsibility to maintain the state, it is merely a function. One can argue it should execute twice if being called twice, and it's caller's responsibility to only call it when it needs it (what if you really do want to run it twice)? The general argument is that a function (in order to be useful and reusable) should not make assumptions about the context in which it is called.
Is this argument valid in your case? It is up to you to decide.
Another disadvantage here is that this can be cosidered an abuse of the memoization tool. Memoization is a tool closely related to functional programming, and should be applied to pure functions. Memoizing a funciton implies "no need to run it again, because we already know the result", and not "no need to run it again, because there's some side effect we want to avoid".
Approach 2: the one you think is ugly (if __name__=='__main__'
)
The most common pythonic way, which you already mention in your question, is using the infamous if __name__=='__main__'
construct.
This guarantees the function is only called once, because it is only called from the module named __main__
, and the interpreter guarantees there is only one such module in your process.
This works. There are no complications nor caveats. This is the way running main-code (including setup code) is done in python. It is considered pythonic simply because it is so darn common in python (since there are no better ways).
The only disadvantage is that it is arguably ugly (asthetics-wise, not code-quality-wise). I admit I also winced the first few times I saw it or wrote it, but it grows on you.
Approach 3: leverage python's module-importing mechanism
Python already has a caching mechanism preventing modules from being doubly-imported. You can leverage this mechanism by running the setup code in a new module, then import it. This is similar to @rll's answer. This is simple, to do:
# logging_setup.py
from framework import logutils
logutils.set_up()
Now, each caller can run this by importing the new module:
# foo/daily_report.py
import logging_setup # side effect!
def main():...
Since a module is only imported once, set_up
is only called once.
The disadvantage here is that it violates the "explicit is better than implicit" principle. I.e. if you want to call a function, call it. It isn't good practice to run code with side-effects on module-import time.
Approach 4: monkey patching
This is by far the worst of the approaches in this answer. Don't use it. But it is still a way to get the job done.
The idea is that if you don't want the function to get called after the first call, monkey-patch it (read: vandalize it) after the first call.
from framework import logutils
logutils.set_up_only_once()
Where set_up_only_once
can be implemented like:
def set_up_only_once():# run the actual setup (or nothing if already vandalized):set_up()# vandalize it so it never walks again:import syssys.modules['logutils'].set_up = lambda: None
Disadvantages: your colleagues will hate you.
tl;dr:
The simplest way is to memoize using functools.lru_cache
, but it might not be the best solution code-quality-wise. It is up to you if this solution is good enough in your case.
The safest and most pythonic way, while not pleasing to the eye, is using if __name__=='__main__': ...
.