Task postponed: Waiting to acquire slot directory lock. Another instance may be running.

Message boards : Number crunching : Task postponed: Waiting to acquire slot directory lock. Another instance may be running.
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1969941 - Posted: 12 Dec 2018, 2:04:11 UTC

I've got two cpu tasks from the last of my cache that refuse to run. If i exit BOINC and then restart them, they run for 34-35 seconds each time and shift to postponed.

They are the only cpu tasks running. I can't figure out why they won't run. There ISN'T another instance running which I assume means another instance of BOINC. And just to make sure, they aren't the same task copied to two slots. They are different work units.

Anyone care to offer an explanation as to what is happening?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1969941 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1969944 - Posted: 12 Dec 2018, 2:12:46 UTC

OK, I think I just figured it out. I think if there is an existing boinc_lockfile in the slot when the task starts computing is what was causing the tasks to keep getting postponed after 35 seconds. I don't think that slot cleanup happened after the last task occupied the slot. I deleted the boinc_lockfile in both of the offending slots and the task postponement output file and restarted BOINC and the tasks are now computing past 35 seconds.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1969944 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1969945 - Posted: 12 Dec 2018, 2:14:24 UTC
Last modified: 12 Dec 2018, 2:15:07 UTC

To change the thread title.
Problem solved.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1969945 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5126
Credit: 276,046,078
RAC: 462
Message 1969962 - Posted: 12 Dec 2018, 4:36:07 UTC - in response to Message 1969944.  

OK, I think I just figured it out. I think if there is an existing boinc_lockfile in the slot when the task starts computing is what was causing the tasks to keep getting postponed after 35 seconds. I don't think that slot cleanup happened after the last task occupied the slot. I deleted the boinc_lockfile in both of the offending slots and the task postponement output file and restarted BOINC and the tasks are now computing past 35 seconds.



Thank you. I have had this issue within memory and ended up rebooting the system to get it cleared. So that is a lockfile symptom.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1969962 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1969991 - Posted: 12 Dec 2018, 12:11:10 UTC
Last modified: 12 Dec 2018, 12:17:55 UTC

I has a similar problem in the past. Not sure could be related to what you talk about. The source is the way the Linux kills the crunching process when called by the rescheduler. Eventually it kills the crunching process but not clears the lock file. But that happening randomly, never really catch when or how the error happening. Fixed by adding a clearing of all lock files when running the scheduler. After that a i made some experiences and discovered if i change the time the boinc auto saves the task from 120 secs to 180 secs (more time than the task needs to complete the process up to 130 secs on my GPU's ) the problem disappears. What i imagine is something like: if the kill process happening when the autosave is creating the backup it leaves the file locked. That's could explain why the error is rare. Why that happening is well beyond my knowledge.
ID: 1969991 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1970010 - Posted: 12 Dec 2018, 16:06:00 UTC - in response to Message 1969991.  

Could be. I did reschedule at the last minute yesterday morning before the outage. Could have caught those two cpu tasks with the lockfile present at the one reschedule. Now I understand it, it it happens again I know the simple and fast fix. Thanks for the insight Juan.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1970010 · Report as offensive

Message boards : Number crunching : Task postponed: Waiting to acquire slot directory lock. Another instance may be running.


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.