Astropulse 4.34

log in

Advanced search

Message boards : AstroPulse : Astropulse 4.34

Previous · 1 · 2
Author Message
Winterknight
Volunteer tester
Send message
Joined: 15 Jun 05
Posts: 693
Credit: 246,694
RAC: 0
Message 34260 - Posted: 18 Jul 2008, 7:05:15 UTC

Task 4113884 completed in 140687s (~39hrs) claimed 1869.04. That is about 5000s (3.5%) slower than units completed with 4.33.

Profile Pappa
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 13 Nov 05
Posts: 1724
Credit: 3,121,901
RAC: 0
Message 34263 - Posted: 18 Jul 2008, 15:57:56 UTC

This Result was returned with what seems a bit large runtime.

4089386

The reason time is out of whack compared to the other I have returned is at 45 hours had a crash. One WU restarted fine this one started from zero.

____________
Thanks to Paul and Friends
Please consider a Donation to the Seti Project

vonkorff
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 10 Feb 07
Posts: 84
Credit: 24,876
RAC: 0
Message 34264 - Posted: 18 Jul 2008, 23:15:05 UTC - in response to Message 34255.
Last modified: 18 Jul 2008, 23:15:37 UTC

Found a second linux host, which has at least 50% longer runtimes with APv4.34 compared to APv4.33. Check the tasklist of hostid 23827. I think that verifies my earlier statements: APv4.34 on linux has a problem.


I just reproduced the problem on my own machine. The problem appears to be that 4.33 was compiled with optimization (-O2) and 4.34 was not. The public version will be compiled with optimization. Thanks for pointing this out.

vonkorff
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 10 Feb 07
Posts: 84
Credit: 24,876
RAC: 0
Message 34265 - Posted: 18 Jul 2008, 23:20:21 UTC - in response to Message 34263.
Last modified: 18 Jul 2008, 23:22:23 UTC

This Result was returned with what seems a bit large runtime.

4089386

The reason time is out of whack compared to the other I have returned is at 45 hours had a crash. One WU restarted fine this one started from zero.


The lines in stderr that say:

In ap_fileio.cpp, Statefile::write, statefile is 0'd, trying again: iteration 100
Statefile::Read: 2 iterations with 0'd statefile.


mean that this WU encountered the "0'd statefile" problem, which is unrecoverable since the statefile is corrupt. So the application has to start over. Evidently the application is capable of finishing successfully in spite of this problem, which I wasn't sure about. I still don't know why the statefile gets corrupted.

Profile Pappa
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 13 Nov 05
Posts: 1724
Credit: 3,121,901
RAC: 0
Message 34266 - Posted: 19 Jul 2008, 0:45:04 UTC - in response to Message 34265.

Josh

This occurred during testing of BOINC 6.2.12 and a machine lockup. After banging on the keyboard for about 5 minutes and no response it was the Power Off Button.

A near as I can figure the statefile was probably open for "file write" when the power went away. When Astropulse restarted it could "not open" an "already open" statefile.

To test for it I suppose that you could set a semiphore when the file is open, and check for the semiphore when AP can not open/write to the statefile. Then Issue a Statefile::close command.

This Result was returned with what seems a bit large runtime.

4089386

The reason time is out of whack compared to the other I have returned is at 45 hours had a crash. One WU restarted fine this one started from zero.


The lines in stderr that say:

In ap_fileio.cpp, Statefile::write, statefile is 0'd, trying again: iteration 100
Statefile::Read: 2 iterations with 0'd statefile.


mean that this WU encountered the "0'd statefile" problem, which is unrecoverable since the statefile is corrupt. So the application has to start over. Evidently the application is capable of finishing successfully in spite of this problem, which I wasn't sure about. I still don't know why the statefile gets corrupted.


____________
Thanks to Paul and Friends
Please consider a Donation to the Seti Project

Jim Wilkins
Volunteer tester
Send message
Joined: 1 Nov 06
Posts: 55
Credit: 344,829
RAC: 0
Message 34267 - Posted: 19 Jul 2008, 2:54:47 UTC

Sigh! I have yet to successfully complete one of the these AP tasks. Here is my latest failure.

At ap_graphics_init
boinc_graphics_make_shmem failed: 0

Jim

Father Ambrose
Volunteer tester
Send message
Joined: 1 May 07
Posts: 546
Credit: 4,809,913
RAC: 3,088
Message 34270 - Posted: 19 Jul 2008, 8:08:31 UTC

Just completed two more AP4.34 wu’s Pending.

(4146632)42:44:16 claimed 1869.04

(4148192)42:32:10 claimed 1869.04

Urs Echternacht
Volunteer tester
Send message
Joined: 18 Jan 06
Posts: 824
Credit: 9,627,770
RAC: 21,295
Message 34274 - Posted: 19 Jul 2008, 16:00:04 UTC - in response to Message 34264.

I just reproduced the problem on my own machine. The problem appears to be that 4.33 was compiled with optimization (-O2) and 4.34 was not. The public version will be compiled with optimization. Thanks for pointing this out.

Luckily an easy to solve problem. Good you've found the reason that quick.
____________
_\|/_
Urs

Profile Richard U
Volunteer tester
Avatar
Send message
Joined: 6 May 07
Posts: 69
Credit: 496,888
RAC: 3
Message 34299 - Posted: 21 Jul 2008, 7:50:06 UTC - in response to Message 34139.
Last modified: 21 Jul 2008, 7:51:18 UTC

....In the process, it creates a file called zeroed_statefile_log.txt. The number in the file represents the number of times it has tried so far. (Don't change the number.)

Let me know if this happens to you ...


I have three of these
ap_23ap08aa_B3_P0_00052_20080714_20232.wu_0_0 - zeroed_statefile_log.txt = 2
ap_23ap08ab_B1_P0_00103_20080716_06439.wu_1_0 - zeroed_statefile_log.txt = 1
ap_23ap08aa_B3_P0_00037_20080714_20232.wu_1_0 - zeroed_statefile_log.txt = 2

All three are back at just over 1% compleated at 60, 75 and 79 hours CPU Time
____________
Richard U

Previous · 1 · 2

Message boards : AstroPulse : Astropulse 4.34


Return to SETI@home/AstroPulse Beta main page


Copyright © 2013 University of California

AstroPulse is funded in part by the NSF through grant AST-0307956