Message boards :
Number crunching :
Deferred communications and Resource share.
Message board moderation
Author | Message |
---|---|
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 19534 Credit: 40,757,560 RAC: 67 ![]() ![]() |
Due to the deferred comms at 5:00 and nearly all Seti GPU tasks taking less than 5:00 to complete, when a task completes BOINC goes to the other Project for replacement work, as Seti is blocked. So far in 9 days* on this new computer, RTX 2060 GPU, the secondary Project has completed 1250 tasks, when if the resource share was observed it should have only completed 650. (* the first 2 days it only crunched Seti) Should the project only enforce the Communication deferred timeout for a limited period after outages? |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
When the your client has issues contacting the Seti project, the normal "backoff" for communication is 60 minutes. If it fails again, the backoff increases by a nominal value of 40 - 60 minutes in my observation. Not sure what your "5:00" is referencing. Are you referring to the standard 305 second scheduler reply interval? Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 19534 Credit: 40,757,560 RAC: 67 ![]() ![]() |
Are you referring to the standard 305 second scheduler reply interval? Probably, can't say that I watched it that closely. |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Until all attached projects have a steady state developed REC, the scheduler makes best guess choices about scheduling and resource commitments. Sounds like at least one project (Seti) was just added and had to make up a lot of ground to your other mature projects. There have been a lot of changes made to work fetch in the upcoming BOINC release 7.16 client that will and should address some of the scheduler deficiencies with regard to resource allocation with multiple attached projects running. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 19534 Credit: 40,757,560 RAC: 67 ![]() ![]() |
Until all attached projects have a steady state developed REC, the scheduler makes best guess choices about scheduling and resource commitments. Sounds like at least one project (Seti) was just added and had to make up a lot of ground to your other mature projects. It's a new computer 27th April, for two days Seti was the only project running before the side panel was screwed on. Delayed due needing extension cables for 12V 8 pin and two of the 4 pin fans. After which I added other projects. Einstein as backup with 0 resource share and Seti Beta with 10% share, as expected Beta ran for reasonable period to catch up, but since then has been downloading more tasks than required due to Seti being effectively disabled because of the "communication deferred" timeout. P.S. AKA WinterKnight, https://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=1023&postid=22983#22983 the ID 666 is from BOINC. |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Did you dump a bunch of errors on the Seti account. That would account for the communication deferred as you got put into the penalty box and will have to return validated work before the project will give you some more. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 19534 Credit: 40,757,560 RAC: 67 ![]() ![]() |
Did you dump a bunch of errors on the Seti account. That would account for the communication deferred as you got put into the penalty box and will have to return validated work before the project will give you some more. I'm not sure why that happened, a few strange things happened while Win 10 pro decided the latest version needed a big pile of updates. This caused multiple restarts and restarts within restarts while I was busy doing essentials like cooking and cleaning. I live alone. As far as I can tell, Seti requested new work and almost immediately Win 10 decided to re-boot or disable comms for a period, so that the requested tasks never reached the computer. As you can see they were timed out after about 5 mins. edit] I've PM'd Event log |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
I don't see anything done by Windows. I see tasks suspended by user and then NNT set by user. Everything looks normal in the log. To get a better idea of scheduling, work_fetch_debug and at minimum sched_op_debug would have had to be set beforehand to get a better idea of what the client was requesting for work and what the scheduler's responses would have been. I would recommend that you set sched_op_debug flag option for the Event Log. It doesn't throw all that much extra output into the Event Log but does give you an indication of exactly how much work you are requesting at each scheduler contact. I would only set work_fetch_debug for one scheduler connection cycle as the amount of output it creates is too much for permanency in the Log. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 19534 Credit: 40,757,560 RAC: 67 ![]() ![]() |
I don't see anything done by Windows. I see tasks suspended by user and then NNT set by user. Everything looks normal in the log. To get a better idea of scheduling, work_fetch_debug and at minimum sched_op_debug would have had to be set beforehand to get a better idea of what the client was requesting for work and what the scheduler's responses would have been. I don't think windows has had any recent effect on BOINC and the projects since the last restart. What I am seeing if I allow Beta to download, is that when Main completes a task, BOINC decides it needs more work, if the 'comms deferred' countdown has completed it asks for work from Main if the 'comms deferred' countdown has not completed it asks for work from Beta. This work from Beta just piles up and is not processed as the resource share for Beta has been exceeded, I decided to get rid of it by suspending Main and setting Beta to 'no new work'. The Beta 'no new work' is still in place. I suspect that if I allow Beta to download, which as you can see from the first part of that log, is approx. one Beta task for each Main task. With a resource share of Main 90:10 Beta, this is not right. If it remains as it is, with Main crunching Green Bank VLAR's (~04m:30s) and Beta Arecibo mid range AR (~2m:00s) I would expect to see a count of units of Main 20:5 Beta if the resource share is to be maintained. This is NOT happening. |
rob smith ![]() ![]() ![]() Send message Joined: 7 Mar 03 Posts: 22713 Credit: 416,307,556 RAC: 380 ![]() ![]() |
The time constant for your 20:5 ratio is weeks, not days, hours or minutes. With a fairly new machine that has had a re-booting issue like yours then it is perfectly normal to get a gross imbalance for a few days until BOINC settles down and sorts things out. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 19534 Credit: 40,757,560 RAC: 67 ![]() ![]() |
The time constant for your 20:5 ratio is weeks, not days, hours or minutes. We are talking 10 days now. |
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 19534 Credit: 40,757,560 RAC: 67 ![]() ![]() |
Try telling me that this is what is expected with a Main 90:10 Beta resource share. 09/05/2019 09:03:34 | SETI@home | Sending scheduler request: To fetch work. |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
It can happen, yes. If that first request at 09:03:34 just topped you up to the 'limit of tasks in progress' (100 per GPU), you'd be backed off until 09:08:39 - 303 seconds from the reply time. So you'd be prevented from fetching from the main project throughout the period of your log. But if your cache length specification ('Store at least --- days of work') hadn't been used up already, BOINC would need to find extra work from somewhere else. Beta only enforces a limit of 7 seconds between requests, so it's an easy target. |
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 19534 Credit: 40,757,560 RAC: 67 ![]() ![]() |
The cache is virtually full, each request is just to top it up. At the moment because Beta is not getting processed, the cache is >60% Beta and < 40% Main, by time remaining. Beta presumably not being processed because it has already done about 40 hrs since the 29th April, normally with the resoure share it should have only done 24 hrs of work. 1 day out of 10. |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
But are your requests being inhibited by the 'limit on tasks in progress'? Main certainly has one, Beta probably does (though I'm not sure). If those are kicking in, BOINC will always reach out for the low-hanging fruit - whichever project is free to request work first. Most of the time, that will be Beta. |
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 19534 Credit: 40,757,560 RAC: 67 ![]() ![]() |
But are your requests being inhibited by the 'limit on tasks in progress'? Main certainly has one, Beta probably does (though I'm not sure). If those are kicking in, BOINC will always reach out for the low-hanging fruit - whichever project is free to request work first. Most of the time, that will be Beta. My cache is small enough that the 100 tasks limit is not a factor**, I'm trying to catch Astropulse. edit] ** on main |
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 19534 Credit: 40,757,560 RAC: 67 ![]() ![]() |
Is there a 50 task limit at Beta, one request to there got 0 tasks and none since. |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
I think I've seen that number, yes, but I don't have logged evidence. Hitting a limit wouldn't (by itself) stop BOINC asking, but after each attempt to fetch, you'd see a line saying This computer has reached a limit on tasks in progress |
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 19534 Credit: 40,757,560 RAC: 67 ![]() ![]() |
I think I've seen that number, yes, but I don't have logged evidence. Didn't get that 09/05/2019 11:41:21 | SETI@home | Started upload of blc33_2bit_guppi_58406_02255_HIP116258_0035.1782.818.21.44.126.vlar_1_r1882011805_0 Since then no more requests to Beta. Check with Beta, State: All (1301) · In progress (50) · Validation pending (4) · Validation inconclusive (1) · Valid (1246) · Invalid (0) · Error (0 |
Sirius B ![]() ![]() Send message Joined: 26 Dec 00 Posts: 24926 Credit: 3,081,182 RAC: 7 ![]() |
Is there a 50 task limit at Beta, one request to there got 0 tasks and none since.Yes. |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.