Message boards :
Number crunching :
Panic Mode On (116) Server Problems?
Message board moderation
Author | Message |
---|---|
![]() ![]() Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 ![]() |
|
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
I have been creating these threads for 10.5 years. . . I believe these threads you create are without doubt the MOST used threads in the system ... :) Stephen :) or should that be :( |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Still having major download issues and keeping on top of the stalled and backoffed downloads. I think Eric's change is the cause of the issue. (My workaround was just to not let connection attempts sit in the local queues for long periods of time. Quick drops are often much better than those that hang around and prevent other connections. Whatever he changed to shorten the time a connection attempt sits in the local queue is not long enough. The tasks don't even start to download, just immediately go to backoff when the client asks for work. His comment that it might affect people is true, though I can connect, but I can't maintain a steady download queue and some tasks always stall out on the connection leaving them hanging around to prevent a normal client connection at the normal intervals. Until those stalled downloads clear, I don't ask for work which could be for several hours depending on the backoff length. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
Still having major download issues and keeping on top of the stalled and backoffed downloads. I think Eric's change is the cause of the issue. . . Since the problem existed before Eric made the change it is certainly NOT the cause but it may be an imperfect cure. It may, as you say, need to be a trifle longer to prevent momentary traffic conflicts from causing the instant and quickly prolonged backoffs. Stephen |
Cosmic_Ocean ![]() Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 ![]() ![]() |
haven't been here in a while, heard CPU fan revved up and wondered why, saw one AP was running. Checked over in Manager and saw one running.. 8 were downloading. All in project backoff. Came here to see what's up with that.. saw there's complications. Did the only sensible thing you CAN do.. and I remember having to do this all the time back before the move down to the co-lo... hammer the retry button, of course ![]() They DO start transferring after 1-5 tries Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Still having major download issues and keeping on top of the stalled and backoffed downloads. I think Eric's change is the cause of the issue. Yes we had download issues before. What I was commenting on was the "patch on top of the patch" He made changes back when we lost one entire download server and were reduced to one server. He made some configuration changes to get it back online that was not the normal or previous configuration if I remember. Now the aformentioned patch on top of that patch. Not optimal. Could we return to the previous download server configuration before that failure? Things we going great beforehand. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
haven't been here in a while, heard CPU fan revved up and wondered why, saw one AP was running. Checked over in Manager and saw one running.. 8 were downloading. All in project backoff. Not in my case. If I hammer the retry button I just increment the backoff by 45 minutes till it hits 6 hours. A fruitless exercise that makes matters worse before I did anything. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 ![]() ![]() |
It's getting late and I still have download problems. The Hammer deal works for me, it's about all that does work at the moment. The biggest problem is the Mac with 5 GPUs and a 500 WU cache, it can't seem to make it 5 minutes without stalling a download. This morning it was Out of work with a cache full of stalled downloads, can't leave it more than a few hours or it stops working. |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
May have figured something out on my download issues. Ian's suggestion to revert to stock <max_file_xfers_per_project>2</max_file_xfers_per_project> seems to have improved things greatly. But did not solve the issue entirely. I was still having stalled downloads that turned into backoffs. I was also getting the instant retries though the max_file_xfers change greatly reduced those but didn't eliminate them. What I do think made some difference is putting the <http_transfer_timeout></http_transfer_timeout> back to stock 300 seconds. I had changed that for the earlier problem of only having one download server along with the <max_file_xfers_per_project>2</max_file_xfers_per_project> change a month ago. That value was still set for 90 seconds. I think I realized that with the reduction of the allowed connections from my normal 8 connections to the project with my many hundred plus task downloads on every connection, and with the length of time it now takes to download that many tasks, two at a time, that I may have exceeded the 90 second http_transfer_timeout. That may have been what was forcing so many tasks into backoff and retries. Now that I allow the connection to last for 300 seconds, I am not getting retries or backoffs. Or if I do get a retry, the connection is still alive when the first retry counts down. So if anyone else had made that change in the parameter, I suggest nulling it out again. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Ian&Steve C. ![]() Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 ![]() ![]() |
Nice Keith. My systems have been pretty hands off for me all day. Once I changed it back to max xfers 2, I pretty much didn’t have to touch it. Now I wonder what’s going on with the stagnant RAC. Prior to the outage on Tuesday, my RAC was steadily climbing. I took the hit from the outage and the beast running out of work. But expected RAC to recover after a day or two like it usually does. But still RAC has been stagnant for several days now. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ![]() ![]() |
![]() ![]() Send message Joined: 24 Jan 00 Posts: 37564 Credit: 261,360,520 RAC: 489 ![]() ![]() |
It's sorta like the upload problem we had before the last outage, but it seems now that I've gotta check my downloads every hour or 2 to stay on top of things. :-( I'm sorta glad that I'm not running Linux w/ SS yet. Cheers. |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
I noticed that the stats export to BOINCStats has changed from around 1430 hours UTC to now around 2130 hours UTC. So the later time might mean it doesn't update the stats till the next day. I too have noticed a rather severe drop in RAC across all hosts. Normally would have recovered by now. But maybe the change in data mix is the thing affecting the RAC. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
![]() ![]() ![]() Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22 ![]() ![]() |
The status page is missing for me. I hope it is only me, or just a weird fluke that clears up in 5 minutes. no panic, just weirdness. Hopefully all the systems are working and it is only a page problem. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13903 Credit: 208,696,464 RAC: 304 ![]() ![]() |
I see we are still having download issues. Came home to find one system out of CPU work as there were several downloads in super extended backoff mode. Cleared those, and the next batch to download was interesting. about half stared downloading straight off and at pretty good speed. The others took quite a while to start downloading, and they tended to star & stop resulting in download speeds of around 10kB/s. So one download server is now mostly OK, the other still borked? Edit- Next couple of mass downloads, all managed to download at reasonable speeds. Grant Darwin NT |
![]() ![]() Send message Joined: 16 Mar 00 Posts: 634 Credit: 7,246,513 RAC: 9 ![]() |
The status page is missing for me. I hope it is only me, or just a weird fluke that clears up in 5 minutes. no panic, just weirdness. Hopefully all the systems are working and it is only a page problem. Nope, not just you. The Server status page is blank for me too. :) ![]() |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13903 Credit: 208,696,464 RAC: 304 ![]() ![]() |
The status page is missing for me. I hope it is only me, or just a weird fluke that clears up in 5 minutes. no panic, just weirdness. Hopefully all the systems are working and it is only a page problem. And the Haveland graphs are starved for data as well. Grant Darwin NT |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 ![]() ![]() |
It appears the Host Web Pages aren't updating either. I am trying to test an App, it would be nice if the Web pages were working. Oh well, I guess it's tested enough anyway... Hey, the Web Pages are working again. I'm going to bed anyway, got all ready, and then it started working again. Blah, false alarm. Only a couple of pages updated but they are still way behind. The other pages never updated. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13903 Credit: 208,696,464 RAC: 304 ![]() ![]() |
Still getting the occasional instant/near instant download timeout. Grant Darwin NT |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
And the Haveland graphs are starved for data as well.Yes, the data sources went dark at - it seems - exactly 05:00 UTC. But in the two hours before that, the replica database started to fall behind and the MB result creation rate fell to near zero. Not looking good. |
![]() Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 ![]() |
Just had this error show up at the top of my browser Notice: unserialize(): Error at offset 4074 of 4096 bytes in /disks/carolyn/b/home/boincadm/projects/sah/html/inc/user.inc on line 43 Then it went away. ---edit--- Then it came back. Notice: unserialize(): Error at offset 4074 of 4096 bytes in /disks/carolyn/b/home/boincadm/projects/sah/html/inc/user.inc on line 43 Tom A proud member of the OFA (Old Farts Association). |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.