Are you frustrated monitoring long running scripts? Get a jump start

3 min readJun 22, 2021

Although I prefer full-fledged solutions and try to cover everything within my product's codebase itself, still I can’t deny the importance of scripts within the development process.

Let's suppose you want to improve a major portion of the feature. Other than the new business logic that will be added in the code base, some tasks needed to be performed to successfully make features working, like data migration, transformation, etc.

When these scripts take a long time to complete, the chances of failure and debugging the issues could become cumbersome. So, I shifted the whole active monitoring of the scripts to passive or a better term would be event-driven with a recovery point and progress status.

I will be using python to demonstrate the concepts, you can implement them in any language.

Progress and ETA

Long-running tasks can be overwhelming since we are not sure when the execution will be completed. We can simply handle it by adding a progress bar that also shows the estimated time to complete.

from time
from tqdm import tqdmdef getDataFromSource():
	# Getting data from DB or extract
	return range(1,100)def transformFunction(data):
	# Transforming data
	return dataif __name__=="__main__":
	data_itr = tqdm(getDataFromSource(), desc="Transforming ")
	for data in data_itr: # Can be used for any for loop
		data_itr.set_description("Transforming : " + data)
		time.sleep(1) # Just to slow the process, remove it

We know the progress and how much time it will take to complete it? Next thing is to check it after the approximate time.

Failure notification

Upon checking the script we got to know that there was an error. It will be going to be very frustrating. We wasted so much time waiting for it to complete in which we could have taken steps to repair.

...
import smtplib...def notify_concern_person():
    sender_email = "product_email@gmail.com"
    sender_password = "*********" 
    receiver_email = "random@gmail.com"
    message = "Subject: {} \\n\\n{}".format("Product | Data transformation failed", "Some issue with script. Please check.\\n\\nRegards,\\nYour Bot")
	
  server = smtplib.SMTP("smtp.gmail.com", 587)  server.starttls()
  server.login(sender_email, sender_password) 
  server.sendmail(sender_email, receiver_email, message)...

Now, we know when the script is failing and ready to take action. The next step is to debug the issue and rerun the script after changes. Suppose you were processing 1 million records and 30% of it was processed before failure, so rerunning the script from the start will be painful.

Recovery point

You can eliminate this situation by writing a unique key per line in a file, that simply open file in a+ mode, read the content, and using it skip the already processed data, and add the new data you are processing

# Recovery file content1
2
3
4
5
6
...# template.py...if __name__=="__main__":
	recovery_file = open("recovery_file","a+")
	already_proccesed_data = [x.strip('\\n') for x in recovery_file.readlines()]
	...
	data_itr = tqdm(getDataFromSource(), desc="Transforming ")
	for str(data) in data_itr: # Can be used for any for loop
		if data in already_proccesed_data:
			data_itr.set_description("Skipping : " + str(data))
			continue
		data_itr.set_description("Transforming : " + str(data))
		recovery_file.write( data +"\\n")
...

You will get a jump-start directly from where it failed last time, it will also help in the case of debugging since you will be aware of the point of failure.

Recovery can be handled better by loggers but my objective was to implement the easiest way and these scripts are kind of use and throw. I will recommend using loggers if you are planning to integrate them within your product.

Below is the final script template.

Hope it helps to get you a jump start.

Are you frustrated monitoring long running scripts? Get a jump start

Written by Varun Sharma

No responses yet