Persist Completed Pipeline in Luigi Visualiser

2024/9/23 15:23:02

I'm starting to port a nightly data pipeline from a visual ETL tool to Luigi, and I really enjoy that there is a visualiser to see the status of jobs. However, I've noticed that a few minutes after the last job (named MasterEnd) completes, all of the nodes disappear from the graph except for MasterEnd. This is a little inconvenient, as I'd like to see that everything is complete for the day/past days.

Further, if in the visualiser I go directly to the last job's URL, it can't find any history that it ran: Couldn't find task MasterEnd(date=2015-09-17, base_url=http://aws.east.com/, log_dir=/home/ubuntu/logs/). I have verified that it ran successfully this morning.

One thing to note is that I have a cron that runs this pipeline every 15 minutes to check for a file on S3. If it exists, it runs, otherwise it stops. I'm not sure if that is causing the removal of tasks from the visualiser or not. I've noticed it generates a new PID every run, but I couldn't find a way to persist one PID/day in the docs.

So, my questions: Is it possible to persist the completed graph for the current day in the visualiser? And is there a way to see what has happened in the past?

Appreciate all the help

Answer

I'm not 100% positive if this is correct, but this is what I would try first. When you call luigi.run, pass it --scheduler-remove-delay. I'm guessing this is how long the scheduler waits before forgetting a task after all of its dependents have completed. If you look through luigi's source, the default is 600 seconds. For example:

luigi.run(["--workers", "8", "--scheduler-remove-delay","86400")], main_task_cls=task_name)
https://en.xdnf.cn/q/71812.html

Related Q&A

How to assign python requests sessions for single processes in multiprocessing pool?

Considering the following code example:import multiprocessing import requestssession = requests.Session() data_to_be_processed = [...]def process(arg):# do stuff with arg and get urlresponse = session.…

Missing values in Pandas Pivot table?

I have a data set that looks like the following:student question answer number Bob How many donuts in a dozen? A 1 Sally How many donuts in a do…

Selecting Element followed by text with Selenium WebDriver

I am using Selenium WebDriver and the Python bindings to automate some monotonous WordPress tasks, and it has been pretty straightforward up until this point. I am trying to select a checkbox, but the …

AttributeError: module keras.backend has no attribute image_dim_ordering

I tried to execute some tutorial transfer learning project. But Ive got attribute error.I checked my tensorflow and keras version.tensorflow : 1.14.0 keras : 2.2.5and python 3.6.9 version.the code is h…

Python Interpreter String Pooling Optimization [duplicate]

This question already has answers here:What determines which strings are interned and when? [duplicate](3 answers)Closed 6 years ago.After seeing this question and its duplicate a question still remai…

Flattening an array in pandas

One of the columns in DataFrame is an array. How do I flatten it? column1 column2 column3 var1 var11 [1, 2, 3, 4] var2 var22 [1, 2, 3, 4, -2, 12] var3 var33 [1, 2, 3, 4, 33, 544]Afte…

Difficulty in using sympy solver in python

Please run the following codefrom sympy.solvers import solvefrom sympy import Symbolx = Symbol(x)R2 = solve(-109*x**5/3870720+4157*x**4/1935360-3607*x**3/69120+23069*x**2/60480+5491*x/2520+38-67,x)prin…

Add custom html between two model fields in Django admins change_form

Lets say Ive two models:class Book(models.Model):name = models.CharField(max_length=50)library = models.ForeignKeyField(Library)class Library(models.Model):name = models.CharField(max_length=50) addr…

Plotly: How to add a horizontal scrollbar to a plotly express figure?

Im beginning to learn more about plotly and pandas and have a multivariate time series I wish to plot and interact with using plotly.express features. I also want my plot to a horizontal scrollbar so t…

How to run script in Pyspark and drop into IPython shell when done?

I want to run a spark script and drop into an IPython shell to interactively examine data. Running both:$ IPYTHON=1 pyspark --master local[2] myscript.pyand$ IPYTHON=1 spark-submit --master local[2] my…