Ghost Tasks - Tasks in Permanent Queue W/No Active Process in Web Client #1503

skyybleu · 2024-05-17T18:13:06Z

How did you install WebODM (docker, installer, etc.)?

I installed WebODM (and it's pre-requisites) from a Bash CLI Script in Ubuntu 20.04 LTS.

What's your browser and operating system? (Copy/paste the output of https://www.whatismybrowser.com/)

Server: Ubuntu 20.04 LTS

Host: Windows 10 Enterprise | Firefox

What is the problem?

When a reboot is triggered while a task is processing, processes can sometimes hang for abnormally long times when power is restored. This is usually fixed by canceling the hanging task and restarting it.

However: With 8 tasks queued and after cancelling the hanging task, instead of beginning processing on another task (or project), every project indicated it was queued and further attempts to cancel queued projects and relaunch the tasks indicated an unchanging queue length but no active process (again, I checked every project and every task).

Further investigation led to the discovery of a few symptoms:

CPU/GPU/Memory/Disk usage were all elevated and the docker container appeared to be running when monitoring statistics from the server itself (remote desktop).
Docker GPU container took a while to spin up (20-30 seconds) when they normally take about 3 or less [docker run -dp 3001:3000 --gpus all --name nodeodmgpu opendronemap/nodeodm:gpu || docker start nodeodmgpu && ../webodm.sh start]

The issue was eventually traced back to the corresponding /var/lib/docker/overlay2/[Very-Long-ID]/diff/var/www/data/tasks.json file for the hanging node whereby the codes indicated by each project were not consistent with the web client, and failed to sync automatically during any reboot/stop/start.

I.e {"code":10} (Queued) or {"code":20} (Processing??) indicated on a project that was canceled and should read {"code":50}

The solution was to use the command below to manually terminate the tasks after stopping the docker container and issuing a webodm.sh down:

sudo sed -i -e "s/:10}/:50}/g" /var/lib/docker/overlay2/[Very-Long-Node-ID]/diff/var/www/data/tasks.json
sudo sed -i -e "s/:20}/:50}/g" /var/lib/docker/overlay2/[Very-Long-Node-ID]/diff/var/www/data/tasks.json

The result was retaining the task list and not orphaning 100gb of task resources. It functions normally after restarting the services

How can we reproduce this? (What steps trigger the problem? What parameters are you using for processing? Include screenshots. If you are having issues processing a dataset, you must include a copy of your dataset uploaded on Dropbox, Google Drive or https://dronedb.app)

It seemed to be triggered by an abrupt reboot from another user account session which may have prevented docker from stopping /starting cleanly causing a de-sync between the web client container information and the node whereby the node was still commanded to process tasks that were apparently canceled or queued from the web client.

I would recommend queueing a few tasks, killing the docker process (or otherwise stopping the docker container abruptly), and cancelling the remaining tasks. This should replicate the issue, though I have not yet been able to test this as our production and test environments are both occupied

skyybleu · 2024-05-21T19:57:48Z

Update: Issue seems to duplicate when cancelling projects in rapid succession. Clicking cancel on multiple queued projects will replicate the de-sync about 50% of the time and the queue length will remain >0 until the manual fix above is employed. Reboots do not fix it, UNLESS the queue that hangs is for the default CPU processing node that is set up by default - in which case restarting WebODM from the terminal WILL correctly update and clear the queue.

Additional launched Docker Containers for GPU Processing will not automatically sync, and upon completion of the project will generate orphaned files in the /var/lib/docker/overlay2/ directory that are not viewable, deletable or otherwise accessible from the Web Client. In this case files must be manually removed after issuing a webodm.sh stop and docker stop command

github-actions bot added the enhancement label May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ghost Tasks - Tasks in Permanent Queue W/No Active Process in Web Client #1503

Ghost Tasks - Tasks in Permanent Queue W/No Active Process in Web Client #1503

skyybleu commented May 17, 2024 •

edited

Loading

skyybleu commented May 21, 2024

Ghost Tasks - Tasks in Permanent Queue W/No Active Process in Web Client #1503

Ghost Tasks - Tasks in Permanent Queue W/No Active Process in Web Client #1503

Comments

skyybleu commented May 17, 2024 • edited Loading

How did you install WebODM (docker, installer, etc.)?

What's your browser and operating system? (Copy/paste the output of https://www.whatismybrowser.com/)

What is the problem?

How can we reproduce this? (What steps trigger the problem? What parameters are you using for processing? Include screenshots. If you are having issues processing a dataset, you must include a copy of your dataset uploaded on Dropbox, Google Drive or https://dronedb.app)

skyybleu commented May 21, 2024

skyybleu commented May 17, 2024 •

edited

Loading