Logging in a Framework

2024/9/25 12:27:52

Imagine there is a framework which provides a method called logutils.set_up() which sets up the logging according to some config.

Setting up the logging should be done as early as possible since warnings emitted during importing libraries should not be lost.

Since the old way (if __name__=='__main__':) looks ugly, we use console_script entrypoints to register the main() method.

# foo/daily_report.py
from framework import logutils
logutils.set_up()
def main():...

My problem is that logutils.set_up() might be called twice:

Imagine there is a second console script which calls logutils.set_up() and imports daily_report.py.

I can change the framework code and set_up() to do nothing in the second call to logutils.set_up(), but this feels clumsy. I would like to avoid it.

How can I be sure that logutils.set_up() gets only executed once?

Answer

There are a few ways to achieve the goal, each with its advantages and disadvantages.

(some of these overlap with the other answers. I don't mean to plagiarize, only to provide a comprehensive answer).


Approach 1: The function should do it

One way to guarantee a function only gets executed once, is to make the function itself stateful, making it "remember" it has already been called. This is more or less what is described by @eestrada and @qarma.

As to implementing this, I agree with @qarma that using memoization is the simplest and most ideomatic way. There are a few simple memoization decorators for python on the internet. The one included in the standard library is functools.lru_cache. You can simply use it like:

@functools.lru_cache
def set_up():  # this is your original set_up() function, now decorated<...same as before...>

The disadvantage here is that it is arguably not the set_up's responsibility to maintain the state, it is merely a function. One can argue it should execute twice if being called twice, and it's caller's responsibility to only call it when it needs it (what if you really do want to run it twice)? The general argument is that a function (in order to be useful and reusable) should not make assumptions about the context in which it is called.

Is this argument valid in your case? It is up to you to decide.

Another disadvantage here is that this can be cosidered an abuse of the memoization tool. Memoization is a tool closely related to functional programming, and should be applied to pure functions. Memoizing a funciton implies "no need to run it again, because we already know the result", and not "no need to run it again, because there's some side effect we want to avoid".

Approach 2: the one you think is ugly (if __name__=='__main__')

The most common pythonic way, which you already mention in your question, is using the infamous if __name__=='__main__' construct.

This guarantees the function is only called once, because it is only called from the module named __main__, and the interpreter guarantees there is only one such module in your process.

This works. There are no complications nor caveats. This is the way running main-code (including setup code) is done in python. It is considered pythonic simply because it is so darn common in python (since there are no better ways).

The only disadvantage is that it is arguably ugly (asthetics-wise, not code-quality-wise). I admit I also winced the first few times I saw it or wrote it, but it grows on you.

Approach 3: leverage python's module-importing mechanism

Python already has a caching mechanism preventing modules from being doubly-imported. You can leverage this mechanism by running the setup code in a new module, then import it. This is similar to @rll's answer. This is simple, to do:

# logging_setup.py
from framework import logutils
logutils.set_up()

Now, each caller can run this by importing the new module:

# foo/daily_report.py
import logging_setup # side effect!
def main():...

Since a module is only imported once, set_up is only called once.

The disadvantage here is that it violates the "explicit is better than implicit" principle. I.e. if you want to call a function, call it. It isn't good practice to run code with side-effects on module-import time.

Approach 4: monkey patching

This is by far the worst of the approaches in this answer. Don't use it. But it is still a way to get the job done.

The idea is that if you don't want the function to get called after the first call, monkey-patch it (read: vandalize it) after the first call.

from framework import logutils
logutils.set_up_only_once()

Where set_up_only_once can be implemented like:

def set_up_only_once():# run the actual setup (or nothing if already vandalized):set_up()# vandalize it so it never walks again:import syssys.modules['logutils'].set_up = lambda: None

Disadvantages: your colleagues will hate you.


tl;dr:

The simplest way is to memoize using functools.lru_cache, but it might not be the best solution code-quality-wise. It is up to you if this solution is good enough in your case.

The safest and most pythonic way, while not pleasing to the eye, is using if __name__=='__main__': ....

https://en.xdnf.cn/q/71579.html

Related Q&A

Working of the Earth Mover Loss method in Keras and input arguments data types

I have found a code for the Earth Mover Loss in Keras/Tensrflow. I want to compute the loss for the scores given to images but I can not do it until I get to know the working of the Earth Mover Loss gi…

Django Rest Framework writable nested serializer with multiple nested objects

Im trying to create a writable nested serializer. My parent model is Game and the nested models are Measurements. I am trying to post this data to my DRF application using AJAX. However, when try to po…

Django How to Serialize from ManyToManyField and List All

Im developing a mobile application backend with Django 1.9.1 I implemented the follower model and now I want to list all of the followers of a user but Im currently stuck to do that. I also use Django…

PyDrive and Google Drive - automate verification process?

Im trying to use PyDrive to upload files to Google Drive using a local Python script which I want to automate so it can run every day via a cron job. Ive stored the client OAuth ID and secret for the G…

Using rm * (wildcard) in envoy: No such file or directory

Im using Python and Envoy. I need to delete all files in a directory. Apart from some files, the directory is empty. In a terminal this would be:rm /tmp/my_silly_directory/*Common sense dictates that i…

cant import django model into celery task

i have the following task:from __future__ import absolute_importfrom myproject.celery import appfrom myapp.models import Entity@app.task def add(entity_id):entity = Entity.objects.get(pk=entity_id)retu…

Running unit tests with Nose inside a Python environment such as Autodesk Maya?

Id like to start creating unit tests for my Maya scripts. These scripts must be run inside the Maya environment and rely on the maya.cmds module namespace.How can I run Nose tests from inside a runnin…

Python Newline \n not working in jupyter notebooks

Im trying to display the tuples of a postgreSQL table neatly in my Jupyter Notebook, but the newline \n escape character doesnt seem to work here (it works for my python scripts w/ same code outside of…

Dynamically calling functions - Python

I have a list of functions... e.g.def filter_bunnies(pets): ...def filter_turtles(pets): ...def filter_narwhals(pets): ...Is there a way to call these functions by using a string representing their nam…

py2exe windows service problem

I have successfully converted my python project to a service. When using the usual options of install and start/stop, everything works correctly. However, I wish to compile the project using py2exe, …