Message boards :
Number crunching :
The Server Issues / Outages Thread - Panic Mode On! (119)
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 107 · Next
Author | Message |
---|---|
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I've already cut down to just 3 machines running SETI, now, one of those can't get enough work to keep busy. Since the outage yesterday morning it has only had work for about 7 hours. It's currently Out of Work again, https://setiathome.berkeley.edu/results.php?hostid=6813106 Perhaps I should just cut back to just 2 machines? |
Lazydude Send message Joined: 17 Jan 01 Posts: 45 Credit: 96,158,001 RAC: 136 |
I use RTT as early warnig 32h and above are in my eyes OK below 31h WARNING under 30h "Houston we have a small problem" As of time i wrote this Result turnaround time (last hour average) 29.60 hours |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
I use RTT as early warnig By your scale: Now at 31.34 hours (just 30 min after your post) we are in the Warning stage. <edit> Reached 32.25 hours at 19:10:04 UTC few minutes after. At this increase rate: Are we doomed? |
Lazydude Send message Joined: 17 Jan 01 Posts: 45 Credit: 96,158,001 RAC: 136 |
At this increase rate: Are we doomed? No - when the trend is going at shorter times then is warning Now its in recovey mode now when the trend is uppwards May add that over round about 36h - then we have had an outake .. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Now the 2nd out of 3 machines has run Out of Work, https://setiathome.berkeley.edu/results.php?hostid=6796479 That leaves 1 machine still working. I suppose when that one runs Out of Work I'll just shut every thing down and brag about how much money I'm saving on electricity. |
W-K 666 Send message Joined: 18 May 99 Posts: 19463 Credit: 40,757,560 RAC: 67 |
Now the 2nd out of 3 machines has run Out of Work, https://setiathome.berkeley.edu/results.php?hostid=6796479 I can only assume it is your problem. I've had very few problems since 08:00 26th UTC. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
That's what you get for assuming, Since making that post the machine that had been run out of work now has 400 tasks instead of Zero. I had absolutely nothing to do with it. |
Boiler Paul Send message Joined: 4 May 00 Posts: 232 Credit: 4,965,771 RAC: 64 |
work can be hard to come by. all I've gotten over the past few hours is the Project has no tasks available in the log. Just need to be patient |
Boiler Paul Send message Joined: 4 May 00 Posts: 232 Credit: 4,965,771 RAC: 64 |
and, of course, after I post, I receive work! |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1857 Credit: 268,616,081 RAC: 1,349 |
I'm still convinced that somehow, whether it be intent or just net result, the higher your RAC is the lower you are in the priority stack in terms of actually getting work during a recovery from outage. This is entirely too consistent to be the luck of the draw. |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
I'm still convinced that somehow, whether it be intent or just net result, the higher your RAC is the lower you are in the priority stack in terms of actually getting work during a recovery from outage. This is entirely too consistent to be the luck of the draw. +1 To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
W-K 666 Send message Joined: 18 May 99 Posts: 19463 Credit: 40,757,560 RAC: 67 |
Could it be that the assimilation process is the problem. How difficult is it to translate the data we produce and all the other details necessary and put onto the science database. this is what the Server Status page says; sah_assimilator/ap_assimilator : Takes scientific data from validated results and puts them in the SETI@home (or Astropulse) database for later analysis. |
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 |
I'm still convinced that somehow, whether it be intent or just net result, the higher your RAC is the lower you are in the priority stack in terms of actually getting work during a recovery from outage. This is entirely too consistent to be the luck of the draw.It is an illusion. Everyone has the same priority but higher your RAC, the more successful scheduler request you need to keep your cache not depleting. If every 12th request wins the lottery and gets some work, then you get some work once every hour and this may be all that a slow host needs to refill its cache to the brim but nowhere near the one hour production of a fast host. |
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 |
Could it be that the assimilation process is the problem.There has clearly been some problem in assimilation for the last several weeks, but the problem can be in many different places. It could be the throughput of the boinc database that somehow hits the assimilator harder than the other processes. Or it could be a problem in the assimilator program itself. Or it can be the throughput of the science databases. Or the throughput of the upload filesystem where the result files the assimilator needs to read are. |
W-K 666 Send message Joined: 18 May 99 Posts: 19463 Credit: 40,757,560 RAC: 67 |
I'm still convinced that somehow, whether it be intent or just net result, the higher your RAC is the lower you are in the priority stack in terms of actually getting work during a recovery from outage. This is entirely too consistent to be the luck of the draw. Maybe related, but I think it is more to do with how much work the host requests. When I get up Wednesday mornings, UTC times rule in the UK winter, if the computer hasn't started receiving work, I set the cache to a very low level. I find that usually works after a few attempts, and as I receive work, I increase the cache in steps up to 0.6 days which, unless the servers give me oddles of AP, fills the GPU cache to 150 tasks. Also, here are some numbers on tasks downloaded and validated in the 24 hrs since 08:06:31 26th Feb, ~24hours ago. After 12 hours at ~20:00 26th Downloaded - 345; In Progress 150; Valid 86 Processed = 345 - 150 = 195 Percentage of tasks downloaded and Validated in 12 hours = 100 * 86 / 195 = 44.1% After 4 hours at ~08:00 27th Downloaded - 523; In Progress 150; Valid 253 Processed = 523 - 150 = 373 Percentage of tasks downloaded and Validated in 24 hours = 100 * 253 / 373 = 67.8% I only crunch on the GPU so it is fairly simple just to scroll through the pages and count each page then add up the page numbers. edit] Prior to 08:06 yesterday the Seti cache was empty. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14683 Credit: 200,643,578 RAC: 874 |
That's my experience, too. I now have two machines in the 'high RAC' category (top 100): they were both completely dry yesterday morning. I did a little Einstein backup work while the servers were sorting themselves out, but once work started flowing, I ramped them up gently by requesting an hour of work at a time (0.05 days) and increasing the cache a step at a time as they filled up. Reached full cache by evening, with just a little tweak any time I happened to be passing.I'm still convinced that somehow, whether it be intent or just net result, the higher your RAC is the lower you are in the priority stack in terms of actually getting work during a recovery from outage. This is entirely too consistent to be the luck of the draw.Maybe related, but I think it is more to do with how much work the host requests. |
AllgoodGuy Send message Joined: 29 May 01 Posts: 293 Credit: 16,348,499 RAC: 266 |
Validation Pending still steadily growing, looks like around 23 million objects waiting to be satisfied. Still getting work though, despite the RTS showing a pretty steady 0. I even fell asleep in the wrong configuration night before last to decrease my Pending column below normal average, but I'm well over that again. This poor system needs a break. |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1857 Credit: 268,616,081 RAC: 1,349 |
That's my experience, too. I now have two machines in the 'high RAC' category (top 100): they were both completely dry yesterday morning. I did a little Einstein backup work while the servers were sorting themselves out, but once work started flowing, I ramped them up gently by requesting an hour of work at a time (0.05 days) and increasing the cache a step at a time as they filled up. Reached full cache by evening, with just a little tweak any time I happened to be passing.I'm still convinced that somehow, whether it be intent or just net result, the higher your RAC is the lower you are in the priority stack in terms of actually getting work during a recovery from outage. This is entirely too consistent to be the luck of the draw.Maybe related, but I think it is more to do with how much work the host requests. Sounds like a reality, not "an illusion". Main cruncher cache here is around |
rob smith Send message Joined: 7 Mar 03 Posts: 22624 Credit: 416,307,556 RAC: 380 |
I can't help wondering if the splitters are being deliberately throttled in an attempt to reduce the amount of work sitting around in the various queues. After all work not being split will have that effect Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Wiggo Send message Joined: 24 Jan 00 Posts: 37207 Credit: 261,360,520 RAC: 489 |
I can't help wondering if the splitters are being deliberately throttled in an attempt to reduce the amount of work sitting around in the various queues. After all work not being split will have that effectYes they are, it was stated by Eric that this is being done to try and keep the system within it's RAM limits, I just can't remember where that post was made and whether Eric actually made it or it was passed along ATM. Cheers. |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.