Deprecated: Function get_magic_quotes_gpc() is deprecated in /disks/centurion/b/carolyn/b/home/boincadm/projects/beta/html/inc/util.inc on line 663
S@h v7 6.98 L/x86 (CPU): Hangs sometimes + seti_698.jpg disappears

S@h v7 6.98 L/x86 (CPU): Hangs sometimes + seti_698.jpg disappears

Message boards : SETI@home Enhanced : S@h v7 6.98 L/x86 (CPU): Hangs sometimes + seti_698.jpg disappears
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
-ShEm-
Volunteer tester

Send message
Joined: 14 Jun 08
Posts: 23
Credit: 454,002
RAC: 0
Message 43958 - Posted: 5 Oct 2012, 2:47:27 UTC

Computer: Asus K93S laptop using 4 cores
OS: CrunchBang Debian/Sid x86_64, kernel 3.5.0-4.dmz.1-liquorix-amd64
BOINC: 7.0.27 from Debian repositories running as daemon (service)

Got 2 minor issues:
1. Computation hangs sometimes, with elapsed time increasing but no progress. I usually catch them and simply restart the daemon. Missed this one.
2. seti_698.jpg keeps getting deleted and re-downloaded. Can't put my finger on when it happens. A few minutes ago I set the file-permission to 444 (read-only) after seeing this in event log during startup:
2012-10-05T07:56:42 BDT |  | file projects/setiweb.ssl.berkeley.edu_beta/seti_698.jpg not found
<SNIP>
2012-10-05T07:59:15 BDT | SETI@home Beta Test | Started download of seti_698.jpg
<SNIP>
2012-10-05T07:59:21 BDT | SETI@home Beta Test | Finished download of seti_698.jpg

As I haven't received x86_64 application and WUs yet, I don't know if same issues happens with that one.
ID: 43958 · Report as offensive
-ShEm-
Volunteer tester

Send message
Joined: 14 Jun 08
Posts: 23
Credit: 454,002
RAC: 0
Message 43967 - Posted: 6 Oct 2012, 2:38:59 UTC - in response to Message 43958.  

Well, seti_698.jpg still gets deleted and re-downloaded, despite setting permissions to read-only :( I'll try again, this time setting owner to root (daemon is running as user boinc in group boinc).
ID: 43967 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 18 Jun 08
Posts: 76
Credit: 113,089
RAC: 0
Finland
Message 43968 - Posted: 6 Oct 2012, 9:58:29 UTC

The text "file %s not found" appears in BOINC source only in CLIENT_STATE::check_file_existence() in cs_files.cpp (which I happened to copy-paste here just a few days ago).

According to the comment that function is run only at client start-up, so, has the client re-started just before that message appeared?

And, since that message is written to log only if the file's size isn't what the server said it would be, do you by any chance have the "Skip image file verification" checked and do you actually need to use it?

Also, has BOINC removed Astropulse image file ap_601.jpg?

Can you tell if the file was really removed from the disk when you had it set to read-only?

-------------

Does the processing hang at the beginning of the workunit or at some later stage? If in the beginning is this the first time an Intel processor gets bitten my the multi-core AMD bug?
ID: 43968 · Report as offensive
-ShEm-
Volunteer tester

Send message
Joined: 14 Jun 08
Posts: 23
Credit: 454,002
RAC: 0
Message 43971 - Posted: 7 Oct 2012, 5:27:48 UTC - in response to Message 43968.  
Last modified: 7 Oct 2012, 5:28:50 UTC

The text "file %s not found" appears in BOINC source only in CLIENT_STATE::check_file_existence() in cs_files.cpp (which I happened to copy-paste here just a few days ago).

According to the comment that function is run only at client start-up, so, has the client re-started just before that message appeared?

Yes, that snippet was from after a power-up. Looking in stdoutdae.txt it doesn't always show "file %s not found" after a reboot/restart, but after I resume network* after a restart of BOINC, it will start to download seti_698.jpg again.

And, since that message is written to log only if the file's size isn't what the server said it would be, do you by any chance have the "Skip image file verification" checked and do you actually need to use it?

Yes, I have that checked and it's needed partly due to my current connections* and partly exactly because of SETI beta never giving me expected bytes*. Just tried unchecking "Skip image file verification" (and all beta-WU's error'ed out of course :( ):
2012-10-07T10:23:04 BDT | SETI@home Beta Test | [error] File seti_698.jpg has wrong size: expected 9068, got 5712

And I get 5712 bytes always, not 9068, even with:
shem@Nikita-CB:~/downloads/BOINC$ wget http://boinc2.ssl.berkeley.edu/beta/download/seti_698.jpg
--2012-10-07 10:25:16--  http://boinc2.ssl.berkeley.edu/beta/download/seti_698.jpg
Resolving boinc2.ssl.berkeley.edu (boinc2.ssl.berkeley.edu)... 208.68.240.13, 208.68.240.21
Connecting to boinc2.ssl.berkeley.edu (boinc2.ssl.berkeley.edu)|208.68.240.13|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5712 (5,6K) [image/jpeg]
Saving to: ‘seti_698.jpg’

100%[===================================================================>] 5.712       9,03KB/s   in 0,6s   

2012-10-07 10:25:20 (9,03 KB/s) - ‘seti_698.jpg’ saved [5712/5712]


Also, has BOINC removed Astropulse image file ap_601.jpg?

Yes, but all things related to ap_601 is removed. However, everything ap_602 related (apps, pics, etc.) is there.

Can you tell if the file was really removed from the disk when you had it set to read-only?

Yes, definately but not immediately. Most worryingly it also got removed after I chown'ed it to root:root :( Folder is of course owned by boinc:boinc. Checking into how that's possible...
Just now did a couple of quick tests: File is still there after I stop boinc-client, but removed immediately when boinc-client is started, even if set to both read-only and root:root :(

Does the processing hang at the beginning of the workunit or at some later stage? If in the beginning is this the first time an Intel processor gets bitten my the multi-core AMD bug?

When I catch them hanging, it's seemingly at the beginning of the WU. At least progress after 2-5 hours is at 0.000% with elapsed time increasing. Another one I didn't catch here. It says restarted at 60.7% so it must have progressed some.


*I set network activity in BOINC to suspended unless I want it to connect because my internet-connection isn't good and WUs fail/restart when BOINC is waiting for connection. Current connections I use:
When in capital city, I mostly use a cable connection, because it's unlimited bandwidth. The way it works is exactly like a home network (most ISPs here do that), except no DHCP. Have to set IP (local in 10.x.x.x range), mask, gateway, DNS manually. Also locked to a specific MAC-address. Speed is max. 39KB/s and because they use very old equipment have to set <http_1_0>1</http_1_0> in cc_config.xml ([rant]Why can't BOINC figure this out by itself when seemingly _all_ other programs that go online can?[/rant])
When outside of capital city, I use a mobile modem capable of 3mbit/s, but ISP only gives max. 25KB/s, usually between 5-10KB/s and very limited in bandwidth (1GB-5GB up/down combined per month depending on price), which is why I notice things transferring unnecessarily. Capable of http1.1 so I change that in cc_config.xml.
Both connections goes through proxy-servers, which reduces pictures (at least jpg's). Getting pictures directly (wget or ctrl-R in browser) usually goes around that.


[edit]Speeling ;)
ID: 43971 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 29 May 06
Posts: 1037
Credit: 8,440,339
RAC: 0
United Kingdom
Message 43972 - Posted: 7 Oct 2012, 5:47:55 UTC - in response to Message 43971.  

Download seti_698.jpg directly using your browser, then copy it across and overwrite your under size version in your project folder,

Claggy
ID: 43972 · Report as offensive
-ShEm-
Volunteer tester

Send message
Joined: 14 Jun 08
Posts: 23
Credit: 454,002
RAC: 0
Message 43976 - Posted: 7 Oct 2012, 7:57:22 UTC - in response to Message 43972.  

I did already... several times. Didn't help. Did it again now just to be sure (and hit Shift-R to be sure to get original, not ctrl-R as posted previously), no luck. Seems the image really is 5712 bytes and not 9068 as stated in client_state.xml. What has helped for now is I changed the size in client_state.xml from 9068 to 5712, but I think that's only a temporary solution.
ID: 43976 · Report as offensive
Alex Storey
Volunteer tester
Avatar

Send message
Joined: 10 Feb 12
Posts: 107
Credit: 305,151
RAC: 0
Greece
Message 43977 - Posted: 7 Oct 2012, 10:44:32 UTC - in response to Message 43976.  
Last modified: 7 Oct 2012, 10:56:26 UTC

I just tried it and

http://boinc2.ssl.berkeley.edu/beta/download/seti_698.jpg

is showing up as 9068 in properties.

Maybe your browser/ISP is compressing the image? Are you using Opera? Try getting it from another PC or if you want I guess I could email it to you.

Unless of course I overlooked/misunderstood something that has already been mentioned in this thread, in which case please ignore this msg:)

Edit: Or maybe deleting your Temp Internet files?
ID: 43977 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 43978 - Posted: 7 Oct 2012, 11:57:04 UTC - in response to Message 43977.  

The copy on my hard disk, downloaded by BOINC on 20 Sept, is also 9068 bytes.

It does sound as if the OP's download is being compressed by his ISP. Are you using some sort of mobile/cellfone internet connection?
ID: 43978 · Report as offensive
Christoph
Volunteer tester

Send message
Joined: 16 Oct 09
Posts: 58
Credit: 662,990
RAC: 0
Germany
Message 43979 - Posted: 7 Oct 2012, 13:08:24 UTC - in response to Message 43978.  

Yes he is partly using mobile connection. He wrote that in the small print in his longer post above.
Christoph
ID: 43979 · Report as offensive
-ShEm-
Volunteer tester

Send message
Joined: 14 Jun 08
Posts: 23
Credit: 454,002
RAC: 0
Message 43980 - Posted: 7 Oct 2012, 14:10:18 UTC - in response to Message 43979.  
Last modified: 7 Oct 2012, 14:20:18 UTC

Hm, ok, thanks all. Yeah, ISP(s) are compressing images here, but they all (and the browser, firefox, says so too) claim to just press "shift+r" to reload and get the uncompressed image. I can see the image reload in the browser, but I'm getting the same size, so apparently something isn't working.

I have one other thing to try: Using my VPN-connection to a server in foreign country to download the image to the server, rename it to non-image to avoid it being compressed, then transfer it here.

BUT that doesn't really solve the underlying problem, does it? I'm telling BOINC to skip image file verification, and yet it still seem to do that on restart? [Edit added] And for me it only happens with Seti Beta...
ID: 43980 · Report as offensive
-ShEm-
Volunteer tester

Send message
Joined: 14 Jun 08
Posts: 23
Credit: 454,002
RAC: 0
Message 43981 - Posted: 7 Oct 2012, 15:14:00 UTC - in response to Message 43980.  

(Sorry, ability to edit just expired while I was updating)

Update: Yes, got it with correct size (9068 B)through VPN. And other projects I'm connected to uses PNGs instead of JPGs, so could explain why they're not affected. Still... on Seti Beta I get ap_601.jpg and ap_602.jpg with correct size, but not seti_697.jpg and seti_698.jpg :(
ID: 43981 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 18 Jun 08
Posts: 76
Credit: 113,089
RAC: 0
Finland
Message 43982 - Posted: 7 Oct 2012, 21:12:19 UTC - in response to Message 43981.  

Still... on Seti Beta I get ap_601.jpg and ap_602.jpg with correct size, but not seti_697.jpg and seti_698.jpg :(

Maybe you got those while using cable connection or your mobile ISP had different settings then or they have some minimum size limit on compression or something.

Anyway, BOINC should check the files same way both after download and at client start-up. Does anyone want to report a bug?

Yes, definately but not immediately. Most worryingly it also got removed after I chown'ed it to root:root :( Folder is of course owned by boinc:boinc. Checking into how that's possible...
Just now did a couple of quick tests: File is still there after I stop boinc-client, but removed immediately when boinc-client is started, even if set to both read-only and root:root :(

I think the permissions work so that the owner of the directory has permissions to remove files even if it can't do anything else with the files.

If you want to play more try setting 'i' attribute with chattr. But if something blows up don't blame me...
ID: 43982 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 18 Jun 08
Posts: 76
Credit: 113,089
RAC: 0
Finland
Message 43983 - Posted: 7 Oct 2012, 22:35:30 UTC - in response to Message 43971.  

When I catch them hanging, it's seemingly at the beginning of the WU. At least progress after 2-5 hours is at 0.000% with elapsed time increasing. Another one I didn't catch here. It says restarted at 60.7% so it must have progressed some.

I'm wondering whether the app really hangs or does BOINC somehow lose track of it. The next time that happens could you check with top or whatever tool you like to use that the MB executable is really running? Seeing what the app thinks shouldn't hurt either. I think progress is in state.sah in <prog> but I'm not sure, I don't have any Seti workunits at the moment.

I found two of these workunits on your task list here on Beta. Both of them show different timings for v_avxGetPowerSpectrum so that hints that the app has restarted from beginning (unless it always does the optimal function testing).

Some observations I've done regarding these hangs:

I think that reports like these has appeared quite seldom in the past year or two. Maybe one per half a year or so. I didn't take any notes but weren't these more common a few years back, maybe one per month? So maybe the bug affects only older AMDs. Athlon 64 X2/X4s and not the newer Phenoms?

Also, maybe it's just my imagination but has there been more reports of tasks been hung in the past few weeks than previously and not just MB but also Astropulse and not only in beginning of the workunit?
ID: 43983 · Report as offensive
Josef W. Segur
Volunteer tester

Send message
Joined: 14 Oct 05
Posts: 1137
Credit: 1,848,733
RAC: 0
United States
Message 43984 - Posted: 8 Oct 2012, 4:57:33 UTC - in response to Message 43983.  

When I catch them hanging, it's seemingly at the beginning of the WU. At least progress after 2-5 hours is at 0.000% with elapsed time increasing. Another one I didn't catch here. It says restarted at 60.7% so it must have progressed some.

I'm wondering whether the app really hangs or does BOINC somehow lose track of it. The next time that happens could you check with top or whatever tool you like to use that the MB executable is really running? Seeing what the app thinks shouldn't hurt either. I think progress is in state.sah in <prog> but I'm not sure, I don't have any Seti workunits at the moment.

Yes, that field records progress at each checkpoint, but I doubt that checkpoints are being written.

I found two of these workunits on your task list here on Beta. Both of them show different timings for v_avxGetPowerSpectrum so that hints that the app has restarted from beginning (unless it always does the optimal function testing).

The application does always run the function testing, but only prints the result of the tests to stderr if progress is zero. Otherwise it issues the single "Restarted at xx.xx percent" line. There is a -verbose command line option which forces it to show full detail of the testing in both cases, but anonymous platform mode would be needed to use it.

What the Task 11220210 page indicates is that the task originally hung during the function testing. Then it was restarted at zero progress, completed the function testing and began actual crunching. Later it was restarted once more at 60.70 percent but did not finish before the BOINC client killed it for "Maximum elapsed time exceeded". Task 11220197 never did manage to make it through the function testing in four tries.

It seems host 55007 sometimes hangs in the transpose testing which comes just after the chirp testing. There are a lot of transpose variants to test, and looking for a bug is hard when the host usually gets through that testing OK. I'll note that the testing is not affected at all by the data in the WU, but the folding test does adapt to testing the actual array lengths which crunching will use (indirectly an angle range dependence).

Some observations I've done regarding these hangs:

I think that reports like these has appeared quite seldom in the past year or two. Maybe one per half a year or so. I didn't take any notes but weren't these more common a few years back, maybe one per month? So maybe the bug affects only older AMDs. Athlon 64 X2/X4s and not the newer Phenoms?

On Windows, Athlon 64 x2 hosts were definitely more likely than others to be afflicted, but at least one Phenom has been affected and a few Intel systems too.

Also, maybe it's just my imagination but has there been more reports of tasks been hung in the past few weeks than previously and not just MB but also Astropulse and not only in beginning of the workunit?

You may be right, though the reports have generally not given enough information for me to judge whether it's the same problem. With the relatively quick purging at main, I'm often too late to see any task details.
                                                                  Joe
ID: 43984 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 18 Jun 08
Posts: 76
Credit: 113,089
RAC: 0
Finland
Message 43992 - Posted: 8 Oct 2012, 17:44:44 UTC - in response to Message 43984.  

Yes, that field records progress at each checkpoint, but I doubt that checkpoints are being written.

The idea I had was to check for the easy stuff first. It would be quite embarrassing to spend awful lot of time chasing bugs in science app and maybe even in OS and hardware if it later turns out to be just another BOINC bug.

[function testing]

Thanks. I haven't looked much into Astropulse code. Does it have variants of different functions or does it work in one size fits all style.


On Windows, Athlon 64 x2 hosts were definitely more likely than others to be afflicted, but at least one Phenom has been affected and a few Intel systems too.

Ok, I didn't know about the Intels.

You may be right, though the reports have generally not given enough information for me to judge whether it's the same problem. With the relatively quick purging at main, I'm often too late to see any task details.

I have to admit I haven't even tried looking into task details. It's just that I've been recently getting a vague feeling that didn't someone ask about this just the other day.
ID: 43992 · Report as offensive
Josef W. Segur
Volunteer tester

Send message
Joined: 14 Oct 05
Posts: 1137
Credit: 1,848,733
RAC: 0
United States
Message 43998 - Posted: 9 Oct 2012, 5:11:25 UTC - in response to Message 43992.  

Yes, that field records progress at each checkpoint, but I doubt that checkpoints are being written.

The idea I had was to check for the easy stuff first. It would be quite embarrassing to spend awful lot of time chasing bugs in science app and maybe even in OS and hardware if it later turns out to be just another BOINC bug.

Definitely it's sensible to check. I left my doubt based on the other evidence and as a sort of warning to -ShEm- that the state.sah file might not even be created before the application hung.

[function testing]

Thanks. I haven't looked much into Astropulse code. Does it have variants of different functions or does it work in one size fits all style.

The CPU Astropulse apps are definitely one size fits all and have no internal checks for capabilities, but the FFTW DLL does check and adapt to the CPU capabilities.
                                                                    Joe
ID: 43998 · Report as offensive
-ShEm-
Volunteer tester

Send message
Joined: 14 Jun 08
Posts: 23
Credit: 454,002
RAC: 0
Message 44005 - Posted: 10 Oct 2012, 17:44:06 UTC - in response to Message 43998.  

FWIW it took 2 tries before boinc finally stopped re-downloading seti_698.jpg... and of course it got the wrong size again. I just copied the correct one back both times and now it seems to stick :)

Regarding hanged wu's: Your "discussion" goes a little over my head, but I understand the basics... I think ;) Dunno if it could influence it, but it's a laptop and preferences are set to suspend computation when on battery. That happens quite a lot, since power goes away quite often for 3-5 minutes and for 1 hour at least twice a day (and it's mostly shutdown when I sleep). Maybe that interruption is the cause of wu's hanging during those capability checks if it happens at that time?!

I didn't crunch beta for some months on this linux laptop, since both seti's and ap's just errored out when I did. Around October 1st I thought I'd try again when I saw new versions was available. And both seems to work fine... when they work ;)
ID: 44005 · Report as offensive
-ShEm-
Volunteer tester

Send message
Joined: 14 Jun 08
Posts: 23
Credit: 454,002
RAC: 0
Message 44009 - Posted: 12 Oct 2012, 12:12:25 UTC - in response to Message 44005.  
Last modified: 12 Oct 2012, 12:27:53 UTC

Maybe that interruption is the cause of wu's hanging during those capability checks if it happens at that time?!

Replying to myself, I know, but: Yeah, that could actually be part of the reason. Just caught resultid=11286983 hanging for 2½ hours. According to log resultid=11286757 started 20 minutes later without hanging. Neither of them are finished at the time of posting this. There was a 1 hour power-cut, during which those tasks was downloaded from Seti beta. When power came back, the task that hanged was started:
12-Oct-2012 15:00:33 [---] Suspending network activity - user request
12-Oct-2012 15:06:57 [---] Resuming computation
12-Oct-2012 15:06:57 [SETI@home Beta Test] Starting task 05ap10al.14997.13569.140733193388036.14.93_1 using setiathome_v7 version 698 in slot 3


(Edited to correct myself after re-reading the log)
ID: 44009 · Report as offensive
-ShEm-
Volunteer tester

Send message
Joined: 14 Jun 08
Posts: 23
Credit: 454,002
RAC: 0
Message 44129 - Posted: 18 Oct 2012, 6:35:53 UTC

Task 11310330 doesn't seem to want to progress at all for me. BOINC has been restarted 3-4 times now and still no progress (computer was powered down / powered up once). Elasped times AFAIR has been 9+ hours, 2+ hours, 4+ hours and right now it's 03:36:57 hours, all with 0% progress. Wingman doesn't seem to have had problems. I've suspended it for now. Other beta-wu's are running normally ATM. Anyone got some advice?

Here's current stderr.txt for task 11310330:
setiathome_v7 6.98 Revision: 1423 g++ (GCC) 4.4.1 20090725 (Red Hat 4.4.1-2)
libboinc: BOINC 7.1.0

Work Unit Info:
...............
WU true angle range is :  0.422486
Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)
           v_avxGetPowerSpectrum 0.000107 0.00000 
setiathome_v7 6.98 Revision: 1423 g++ (GCC) 4.4.1 20090725 (Red Hat 4.4.1-2)
libboinc: BOINC 7.1.0

Work Unit Info:
...............
WU true angle range is :  0.422486
Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
setiathome_v7 6.98 Revision: 1423 g++ (GCC) 4.4.1 20090725 (Red Hat 4.4.1-2)
libboinc: BOINC 7.1.0

Work Unit Info:
...............
WU true angle range is :  0.422486
Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
setiathome_v7 6.98 Revision: 1423 g++ (GCC) 4.4.1 20090725 (Red Hat 4.4.1-2)
libboinc: BOINC 7.1.0

Work Unit Info:
...............
WU true angle range is :  0.422486
Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)
           v_avxGetPowerSpectrum 0.000040 0.00000 
setiathome_v7 6.98 Revision: 1423 g++ (GCC) 4.4.1 20090725 (Red Hat 4.4.1-2)
libboinc: BOINC 7.1.0

Work Unit Info:
...............
WU true angle range is :  0.422486
Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)
           v_avxGetPowerSpectrum 0.000057 0.00000 
                 avx_ChirpData_b 0.002757 0.00000 
          v_avxTranspose8x8ntw_a -0.717707 0.00000 
                JS AVX_a folding 0.000621 0.00000 
08:19:05 (5699): No heartbeat from core client for 30 sec - exiting
setiathome_v7 6.98 Revision: 1423 g++ (GCC) 4.4.1 20090725 (Red Hat 4.4.1-2)
libboinc: BOINC 7.1.0

Work Unit Info:
...............
WU true angle range is :  0.422486
Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)
           v_avxGetPowerSpectrum 0.000064 0.00000
ID: 44129 · Report as offensive
Josef W. Segur
Volunteer tester

Send message
Joined: 14 Oct 05
Posts: 1137
Credit: 1,848,733
RAC: 0
United States
Message 44172 - Posted: 19 Oct 2012, 5:58:38 UTC - in response to Message 44129.  

Task 11310330 doesn't seem to want to progress at all for me. BOINC has been restarted 3-4 times now and still no progress (computer was powered down / powered up once). Elasped times AFAIR has been 9+ hours, 2+ hours, 4+ hours and right now it's 03:36:57 hours, all with 0% progress. Wingman doesn't seem to have had problems. I've suspended it for now. Other beta-wu's are running normally ATM. Anyone got some advice?

Thanks for the stderr.txt, there's one section near the end which might give further clues:

setiathome_v7 6.98 Revision: 1423 g++ (GCC) 4.4.1 20090725 (Red Hat 4.4.1-2)
libboinc: BOINC 7.1.0

Work Unit Info:
...............
WU true angle range is :  0.422486
Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)
           v_avxGetPowerSpectrum 0.000057 0.00000 
                 avx_ChirpData_b 0.002757 0.00000 
          v_avxTranspose8x8ntw_a -0.717707 0.00000 
                JS AVX_a folding 0.000621 0.00000 
08:19:05 (5699): No heartbeat from core client for 30 sec - exiting

That managed to get through the full set of tests but then the application quit for the no heartbeat. The negative timing for the chosen transpose function is obviously bogus, I've only seen one other host showing that symptom, and it means we can't guess how long the tests took. If you can find the BOINC message in the event log showing that the application exited without a finished file at about 08:19:05 and look at earlier messages to find when BOINC started that attempt, it could help limit the possibilities.

AFAIK there's no logical reason one task should be more likely to show the problem than any other, but that task certainly seems to demonstrate my logic is flawed. You might save a copy of the WU file for offline testing if we think of something useful to do.

That's not particularly helpful advice, maybe someone else will have additional ideas.
                                                                  Joe
ID: 44172 · Report as offensive
1 · 2 · Next

Message boards : SETI@home Enhanced : S@h v7 6.98 L/x86 (CPU): Hangs sometimes + seti_698.jpg disappears


 
©2023 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.