Panic Mode On (116) Server Problems?

Message boards : Number crunching : Panic Mode On (116) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 46 · Next

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1987914 - Posted: 30 Mar 2019, 2:13:28 UTC - in response to Message 1987913.  
Last modified: 30 Mar 2019, 2:13:43 UTC

Yes, they were cranking at 70/sec for about 90 minutes, then they broke something else in the servers and the splitter output went to zero.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1987914 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13915
Credit: 208,696,464
RAC: 304
Australia
Message 1987915 - Posted: 30 Mar 2019, 2:16:22 UTC - in response to Message 1987913.  
Last modified: 30 Mar 2019, 2:16:36 UTC

Instant timeouts on downloads might not be an issue for too long as the splitter output has fallen to 0, even though the Server Status page shows them as running, and the Ready-to-send buffer is still empty.

Or not.
Splitter output is zero. Ready-to-send buffer is (slowly) filling, and work in progress is slowly increasing. And WUs returned per hour is way, way down (about 30-40k) on it's usual levels.
???
Grant
Darwin NT
ID: 1987915 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1987927 - Posted: 30 Mar 2019, 3:11:39 UTC - in response to Message 1987915.  

I see the splitters have come back to life. I always wonder if they are tied to the script that runs the WU and Results purge mechanism. On the Haveland graphs, they seem to run in lockstep. When the purgers hit bottom and stop the decline in the graph, the splitters spring to life. And vice versa, when the purgers start reducing the WU and Results in the database, the splitters take a break.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1987927 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1859
Credit: 268,616,081
RAC: 1,349
United States
Message 1987934 - Posted: 30 Mar 2019, 3:52:38 UTC - in response to Message 1987927.  

I always wonder if they are tied to the script that runs the WU and Results purge mechanism.

I keep wondering if the throttle slows splitting down the further the backup db is behind the main. Since db issues seem to be at the heart of a lot of this, it makes you wonder just what's in that new throttle process.
ID: 1987934 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13915
Credit: 208,696,464
RAC: 304
Australia
Message 1987936 - Posted: 30 Mar 2019, 4:05:43 UTC

And the download server issues seem to alternate between downloading OK, to the timeout issue getting worse & worse each time it occurs. Really having to hit the Retry Pending transfers a lot more than previously now to get any downloads happening.
Grant
Darwin NT
ID: 1987936 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5126
Credit: 276,046,078
RAC: 462
Message 1987965 - Posted: 30 Mar 2019, 12:54:04 UTC
Last modified: 30 Mar 2019, 12:54:31 UTC

If I am reading my log (and my all tasks screen in the Manager) correctly, I have a full gpu cache but only about 30~ cpu tasks waiting.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1987965 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1987981 - Posted: 30 Mar 2019, 14:31:38 UTC

The download problem is hitting me too, and I'm a slow machine that only requests one at a time. I'm getting them to download, but there is an issue with each one.

Sat Mar 30 04:43:29 2019 | SETI@home | Started download of blc23_2bit_guppi_58406_24925_HIP20215_0097.3029.409.22.45.23.vlar
Sat Mar 30 04:43:31 2019 | | Project communication failed: attempting access to reference site
Sat Mar 30 04:43:31 2019 | SETI@home | Temporarily failed download of blc23_2bit_guppi_58406_24925_HIP20215_0097.3029.409.22.45.23.vlar: transient HTTP error
Sat Mar 30 04:43:31 2019 | SETI@home | Backing off 00:02:01 on download of blc23_2bit_guppi_58406_24925_HIP20215_0097.3029.409.22.45.23.vlar
Sat Mar 30 04:43:33 2019 | | Internet access OK - project servers may be temporarily down.
Sat Mar 30 04:45:33 2019 | SETI@home | Started download of blc23_2bit_guppi_58406_24925_HIP20215_0097.3029.409.22.45.23.vlar
Sat Mar 30 04:45:35 2019 | SETI@home | Finished download of blc23_2bit_guppi_58406_24925_HIP20215_0097.3029.409.22.45.23.vlar

Also my RAC is falling. I don't care about RAC, but I think it is a diagnostic tool. I find it interesting that my RAC is falling when my machine has been continuously crunching and returning WUs. My machine doesn't run out.

Sorry to those who have to babysit their machines through this mess. I hope they figure it out soon.
ID: 1987981 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11451
Credit: 29,581,041
RAC: 66
United States
Message 1987995 - Posted: 30 Mar 2019, 15:26:57 UTC - in response to Message 1987981.  

My RAC is also falling a bit but my pendings are up so in my case that makes sense.
ID: 1987995 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9958
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1988006 - Posted: 30 Mar 2019, 16:14:37 UTC

I think the problem with RAC is that whilst people here do what they can when uploads/downloads time out, many others don't so the general throughput of tasks falls, hence pendings rise.
ID: 1988006 · Report as offensive
FurryGuy
Volunteer tester

Send message
Joined: 1 Jun 04
Posts: 6
Credit: 9,294,513
RAC: 1
United States
Message 1988018 - Posted: 30 Mar 2019, 17:53:17 UTC

And every time a problem like this occurs over a weekend/holiday there is the extended wait period after being fixed for the thrashing by clients begging for work to become less frantic.
ID: 1988018 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13915
Credit: 208,696,464
RAC: 304
Australia
Message 1988066 - Posted: 30 Mar 2019, 22:36:15 UTC

Still getting the occasional instant timeout.

The drop in RAC would be due to change in mix of WUs being processed, in addition to any systems that would have run out of work for some time due to the download issues, and the period where the splitters called it quits for a while (and the Seti servers overall had a bit of an episode).
Grant
Darwin NT
ID: 1988066 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1988069 - Posted: 30 Mar 2019, 22:52:20 UTC

Down around 10-20K RAC on all hosts. My Intel host peaked at 427K back on 18 Mar before all the upsets on the project. Will take months to regain barring any further upsets.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1988069 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1988075 - Posted: 30 Mar 2019, 23:23:49 UTC

I'd like to remind everyone what happens when the workunits go from a 50% mix of Arecibo tasks to 100% BLC tasks.
Basically, your RAC drops by around 100% after a few months. It appears the Arecibo Rerun show is over.
Gird your loins....
ID: 1988075 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1988080 - Posted: 31 Mar 2019, 0:07:32 UTC - in response to Message 1988075.  

I'd like to remind everyone what happens when the workunits go from a 50% mix of Arecibo tasks to 100% BLC tasks.
Basically, your RAC drops by around 100% after a few months. It appears the Arecibo Rerun show is over.
Gird your loins....

. . Yes it seems the crumbs have all been cleared away. But a change from 100% Arecibo to 100% GBT will cause RACs to roughly halve (50% drop). Going from a mix, even one that was slightly higher in Arecibo tasks, should be somewhat less severe in the drop. But time will tell ... :(

Stephen

<shrug>
ID: 1988080 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1859
Credit: 268,616,081
RAC: 1,349
United States
Message 1988165 - Posted: 31 Mar 2019, 16:33:20 UTC

Not sure about anyone else, but caches are full and it's been about 12 hours since I've seen a stalled download ...
ID: 1988165 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9958
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1988176 - Posted: 31 Mar 2019, 17:41:27 UTC - in response to Message 1988165.  

Not sure about anyone else, but caches are full and it's been about 12 hours since I've seen a stalled download ...


Shhh!!!

ID: 1988176 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1859
Credit: 268,616,081
RAC: 1,349
United States
Message 1988180 - Posted: 31 Mar 2019, 18:31:38 UTC - in response to Message 1988176.  

Not sure about anyone else, but caches are full and it's been about 12 hours since I've seen a stalled download ...
Shhh!!!
:)
ID: 1988180 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1988185 - Posted: 31 Mar 2019, 19:09:05 UTC

Results out in the field is over 4.5 million which does seem to mean something is working better.

I am concerned that my uploaded WUs take an hour for an ack to come back.

We are now entering the processing the tail end (possibly garbage) of a bunch of blc files. It always makes me feel good to see the old files get finished and clear out of the table like a weird digital spring cleaning.
ID: 1988185 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1988186 - Posted: 31 Mar 2019, 19:21:49 UTC - in response to Message 1988165.  

Not sure about anyone else, but caches are full and it's been about 12 hours since I've seen a stalled download ...


. . Same here ...

Stephen

:)
ID: 1988186 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1988236 - Posted: 1 Apr 2019, 0:51:35 UTC

@ ALL

. . Has anyone heard when Parkes' Data might make it to Beta? Thinking of trying to resurrect an old rig for Beta to be ready to try it ...

Stephen

:)
ID: 1988236 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 46 · Next

Message boards : Number crunching : Panic Mode On (116) Server Problems?


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.