Astropulse 7,00 released for Linux 32&64, Win 32&64, Win32+AMD/NVIDIA/Intel GPU

Message boards : News : Astropulse 7,00 released for Linux 32&64, Win 32&64, Win32+AMD/NVIDIA/Intel GPU
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 35 · Next

AuthorMessage
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 51502 - Posted: 6 Jul 2014, 18:46:47 UTC - in response to Message 51498.  

No problems here on my GTX-750.
Still waiting for the first task with blanking (I sure hated these 'wasting' resources) so I can see how much improvement there is.
Are there any other changes in the new v7 version or is it just the improved handling of tasks with blanking ?

Tom


Most important one - blanking handling.
There were many incremental changes and one quite big one - FFA_TWIN so all that one could see in still delayed new Lunatics installer release for GPU AstroPulse one could see here.
ID: 51502 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 51503 - Posted: 6 Jul 2014, 19:48:47 UTC

Here is how iGPU gone mad could look on AP task:

http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=17284447

Single pulse: peak_power=2.997e+004 dm=-4528 fft_num=6291456 peak_bin=6291462 scale=0
Single pulse: peak_power=3.043e+004 dm=-4528 fft_num=6291456 peak_bin=6291470 scale=0
Single pulse: peak_power=3.027e+004 dm=-4528 fft_num=6291456 peak_bin=6291478 scale=0
Single pulse: peak_power=2.995e+004 dm=-4528 fft_num=6291456 peak_bin=6291486 scale=0
Single pulse: peak_power=2.984e+004 dm=-4528 fft_num=6291456 peak_bin=6291494 scale=0
Single pulse: peak_power=2.968e+004 dm=-4528 fft_num=6291456 peak_bin=6291502 scale=0


Perhaps we need to add some sanity checking to AstroPulse as we did for MultiBeam recently.
ID: 51503 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 51504 - Posted: 6 Jul 2014, 21:41:45 UTC
Last modified: 6 Jul 2014, 21:51:00 UTC

BOINC 6.10.60
Can't get ATi work:

7/7/2014 01:40:05 AM SETI@home Beta Test Requesting new tasks for GPU
7/7/2014 01:40:05 AM SETI@home Beta Test [sched_op_debug] CPU work request: 0.00 seconds; 0.00 CPUs
7/7/2014 01:40:05 AM SETI@home Beta Test [sched_op_debug] ATI GPU work request: 503430.59 seconds; 1.00 GPUs
7/7/2014 01:40:08 AM SETI@home Beta Test Scheduler request completed: got 0 new tasks
7/7/2014 01:40:08 AM SETI@home Beta Test [sched_op_debug] Server version 703
7/7/2014 01:40:08 AM SETI@home Beta Test Project requested delay of 7 seconds
7/7/2014 01:40:08 AM SETI@home Beta Test [sched_op_debug] Deferring communication for 7 sec
7/7/2014 01:40:08 AM SETI@home Beta Test [sched_op_debug] Reason: requested by project

7/7/2014 01:37:55 AM ATI GPU 0: ATI unknown (CAL version 1.4.1848, 256MB, 44 GFLOPS peak)

No OpenCL in this BOINC.
Will we support such versions with APv7 ? If yes, new plan class is needed.
ID: 51504 · Report as offensive
Josef W. Segur
Volunteer tester

Send message
Joined: 14 Oct 05
Posts: 1137
Credit: 1,848,733
RAC: 0
United States
Message 51505 - Posted: 7 Jul 2014, 0:53:42 UTC - in response to Message 51503.  

Here is how iGPU gone mad could look on AP task:

http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=17284447

Single pulse: peak_power=2.997e+004 dm=-4528 fft_num=6291456 peak_bin=6291462 scale=0
Single pulse: peak_power=3.043e+004 dm=-4528 fft_num=6291456 peak_bin=6291470 scale=0
Single pulse: peak_power=3.027e+004 dm=-4528 fft_num=6291456 peak_bin=6291478 scale=0
Single pulse: peak_power=2.995e+004 dm=-4528 fft_num=6291456 peak_bin=6291486 scale=0
Single pulse: peak_power=2.984e+004 dm=-4528 fft_num=6291456 peak_bin=6291494 scale=0
Single pulse: peak_power=2.968e+004 dm=-4528 fft_num=6291456 peak_bin=6291502 scale=0


Perhaps we need to add some sanity checking to AstroPulse as we did for MultiBeam recently.

IMO, what makes that set of signals definitely wrong is it would surely have been detected at earlier dispersions with lower powers (but plenty to be reported and get to the single pulse limit). Perhaps a solitary single pulse with that kind of power may be possible, I'm not sure.

I'm in favor of sanity checks wherever they can be added without slowing processing significantly. A simple check might only take a couple of nanoseconds on a current processor, so if done 500 million times would add 1 second to the run time. More complex checks could be carefully placed to not run too often, of course.

The other difficult thing is deciding what's best to do when there's an apparent problem. For a case with significant progress on a task, IMO restarting from the last checkpoint is most sensible and can simply use BOINC's temporary exit feature. That does a fresh intialization and rereads the WU file, so can cure some corrupted data cases. BOINC doesn't tell the application anything about restarts, though, so if there's no cure the app will try the temporary exit again and again until BOINC stops the cycle for too many exits. Perhaps we should consider adding application code to keep track to allow shifting to a different strategy.

That task and the similar invalid AP v6 task 15788381 last January both show some variation in the peaks, but having the peaks occur at 8 sample intervals in both cases is possibly significant. It's really the single pulse code claiming it sees a repetitive pulse sequence with a 3.2 usec period, too fast to be seen by even the short FFA.
                                                                  Joe
ID: 51505 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 51507 - Posted: 7 Jul 2014, 6:41:12 UTC - in response to Message 51505.  

I rather looked on fft_num and peak_bin vaues.
peak_bin of 6,3kk with only 32k in array... quite strong sign of error IMO.
Starting from checkpoint approach (as we already discussed) will not cure logged signals so far. Hence could not save task from turning invalid in the end but just add some time fore restarting and reprocessing. Restarting from checkpoint is applicable only when we sure we detected very first attempt to log invalid signal.
ID: 51507 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 51510 - Posted: 7 Jul 2014, 7:47:22 UTC - in response to Message 51471.  

  403de6:       8b 0d c0 b5 4e 00       mov    0x4eb5c0,%ecx
  403dec:       8b 15 c8 eb 4c 00       mov    0x4cebc8,%edx
  403df2:       8b 04 91                mov    (%ecx,%edx,4),%eax
  403df5:       50                      push   %eax
  403df6:       ff 15 b8 90 4a 00       call   *0x4a90b8


That seems odd to begin with. It appear to be reading into an array of ints starting at address 0x4eb5c0 (5158336) to get element 0x4cebc8 (5041096) which is at address 0x18264E0 (25322720). These numbers appear to be set at compile time, not at load time,

0x4eb5c0 appears a lot of times in the code. It looks like its a statically allocated structure. The value of the first 4 byte element, possibly a boolean is tested a lot. That's about all I can tell without a symbol table.


Symbol table: https://dl.dropboxusercontent.com/u/60381958/for_APv7_00_AP7_win_x86_SSE2_OpenCL_NV.pdb.7z
ID: 51510 · Report as offensive
Josef W. Segur
Volunteer tester

Send message
Joined: 14 Oct 05
Posts: 1137
Credit: 1,848,733
RAC: 0
United States
Message 51513 - Posted: 7 Jul 2014, 16:42:14 UTC - in response to Message 51507.  

I rather looked on fft_num and peak_bin vaues.
peak_bin of 6,3kk with only 32k in array... quite strong sign of error IMO.
Starting from checkpoint approach (as we already discussed) will not cure logged signals so far. Hence could not save task from turning invalid in the end but just add some time fore restarting and reprocessing. Restarting from checkpoint is applicable only when we sure we detected very first attempt to log invalid signal.

fft_num and peak_bin are locations within the full 32 Mebisample array. fft_num marks the beginning of the particular data chunk, the difference between peak_bin and fft_num is the location within the data chunk. Those differences go 6, 14, 22, 30... for your task 17284447, a pattern with period 8.

Restarting from checkpoint absolutely discards any signals which were logged after the checkpoint. Those signals have not been written to disk, starting from checkpoint loads the signal vector from the pulse.out file written at the checkpoint. In this case all 30 single pulses are in a single data chunk at a single dispersion, so a sanity check which could detect the error pattern before the next checkpoint is all that's needed. The restart would begin with no signals.

Making a smart sanity check which could be triggered by having reached the single pulse limit is one possibility, since it would only be run once it could be quite complex. Determining the parameters it should use is the difficult part.
                                                                  Joe
ID: 51513 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 51519 - Posted: 7 Jul 2014, 19:36:17 UTC
Last modified: 7 Jul 2014, 20:35:46 UTC

(can't put image, it changed every time someone added new screenshot to service).
In short, beta project properties on that host say there is no ATi app. And hence no ATi work asked :(

How so??
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 51519 · Report as offensive
Matthias Lehmkuhl
Volunteer tester

Send message
Joined: 15 Jul 05
Posts: 176
Credit: 1,674,830
RAC: 0
Germany
Message 51523 - Posted: 8 Jul 2014, 8:46:25 UTC - in response to Message 51452.  

I got some errors on one Windows 7 x64 machine with AstroPulse v7 v7.00 (sse2) and AstroPulse v7 v7.00 (sse)
<core_client_version>7.3.19</core_client_version>
CPUID: Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz
AstroPulse v7 Windows x64 rev 2488, V7 match

Exit status -226 (0xffffffffffffff1e) ERR_TOO_MANY_EXITS

BOINC client no longer exists - exiting
timer handler: client dead, exiting
...
all results had a run time of 10 sec.

Edit: boinc is running as service.


Sounds like app can't communicate with BOINC client.
Try to change BOINc version and ask for this issue on BOINc forums too.

It's no longer a matter of communicating with the BOINC client. BOINC puts its ProcessID in the init_data.xml file, the application periodically checks whether that ProcessID is still active and exits if not.

That checking of course is done by BOINC API code built into the application, and for the issue to get any attention from BOINC developers they would want to know what revision of that API code is being used. However, I agree it might make sense to replace the 7.3.19 alpha with the latest alpha of BOINC just in case the issue has already been fixed.

BOINC running as a service is sort of rare these days, that may be significant.
                                                                  Joe


I've updated to the latest alpha of BOINC (7.4.8) last Friday -> same behavior
got some additional work on that machine with AstroPulse v7 v7.00 (sse + sse2)
http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=17297264 <core_client_version>7.4.8</core_client_version>

on the other hand, I could finish results from AstroPulse v7 v7.00 on both BOINC alpha versions, so there must be a difference in the sse + sse2 builds

http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=17246249 <core_client_version>7.3.19</core_client_version>

http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=17271866 <core_client_version>7.4.8</core_client_version>

Now I've gone back to the lastest stable BOINC 7.2.42, waiting for sse and sse2 results
Matthias

ID: 51523 · Report as offensive
Conan
Volunteer tester
Avatar

Send message
Joined: 4 Jun 10
Posts: 6
Credit: 526,721
RAC: 0
Australia
Message 51526 - Posted: 8 Jul 2014, 12:09:41 UTC
Last modified: 8 Jul 2014, 12:11:17 UTC

Would it be possible for the CPUs that are SSE2 capable to only get the SSE2 work units?
The non-SSE2 capable work units take over 2 1/2 times as long to run on the same computer.

AMD Phenom XII 64 Bit Linux

AstroPulse v7 v7.00 92,979 seconds

AstroPulse v7 v7.00 SSE2 37,436 seconds

Thanks

Conan
ID: 51526 · Report as offensive
Josef W. Segur
Volunteer tester

Send message
Joined: 14 Oct 05
Posts: 1137
Credit: 1,848,733
RAC: 0
United States
Message 51527 - Posted: 8 Jul 2014, 15:01:39 UTC - in response to Message 51526.  

Would it be possible for the CPUs that are SSE2 capable to only get the SSE2 work units?
The non-SSE2 capable work units take over 2 1/2 times as long to run on the same computer.

AMD Phenom XII 64 Bit Linux

AstroPulse v7 v7.00 92,979 seconds

AstroPulse v7 v7.00 SSE2 37,436 seconds

Thanks

Conan

This is Beta testing and all the application versions need to be tested. But the server code favors the faster application so your host should not get many tasks for the slower generic version.
                                                                   Joe
ID: 51527 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 51528 - Posted: 8 Jul 2014, 15:41:14 UTC - in response to Message 51519.  

(can't put image, it changed every time someone added new screenshot to service).
In short, beta project properties on that host say there is no ATi app. And hence no ATi work asked :(

How so??


This host: http://setiweb.ssl.berkeley.edu/beta/show_host_detail.php?hostid=39394 can't recive ATi work on beta, cause in project properties stated, that this project has no application for AMD/ATi.
Please, fix! What can I do client-side to solve this issue?
Host receives APv7 CPU work OK.
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 51528 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 51529 - Posted: 8 Jul 2014, 15:48:45 UTC - in response to Message 51527.  
Last modified: 8 Jul 2014, 15:50:16 UTC

Would it be possible for the CPUs that are SSE2 capable to only get the SSE2 work units?
The non-SSE2 capable work units take over 2 1/2 times as long to run on the same computer.

AMD Phenom XII 64 Bit Linux

AstroPulse v7 v7.00 92,979 seconds

AstroPulse v7 v7.00 SSE2 37,436 seconds

Thanks

Conan

This is Beta testing and all the application versions need to be tested. But the server code favors the faster application so your host should not get many tasks for the slower generic version.
                                                                   Joe


Also, very that mechanism under testing too. Hence "true tester" should not abort slower tasks until it will be evident that mechanism gone wrong (more than few dozens of tasks for slowest app processed and can't collect 10 tasks for faster app) as, for example, this one does (example of wrong behavior):
http://setiweb.ssl.berkeley.edu/beta/show_host_detail.php?hostid=70383
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 51529 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 29 May 06
Posts: 1037
Credit: 8,440,339
RAC: 0
United Kingdom
Message 51533 - Posted: 8 Jul 2014, 20:06:51 UTC - in response to Message 51528.  
Last modified: 8 Jul 2014, 20:12:04 UTC

(can't put image, it changed every time someone added new screenshot to service).
In short, beta project properties on that host say there is no ATi app. And hence no ATi work asked :(

How so??


This host: http://setiweb.ssl.berkeley.edu/beta/show_host_detail.php?hostid=39394 can't recive ATi work on beta, cause in project properties stated, that this project has no application for AMD/ATi.
Please, fix! What can I do client-side to solve this issue?
Host receives APv7 CPU work OK.

Update to Boinc 7.2.42

[boinc_alpha] Boinc 7.2.18, after removing a device specific app_info, Boinc won't ask for work for other devices.

It now clears the flags on every scheduler RPC; that should suffice.
-- David


The workaround is to remove the <no_rsc_apps>type of device</no_rsc_apps> entry from the client_state.xml

Claggy
ID: 51533 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 51534 - Posted: 8 Jul 2014, 21:52:49 UTC - in response to Message 51533.  
Last modified: 8 Jul 2014, 21:54:57 UTC

The workaround is to remove the <no_rsc_apps>type of device</no_rsc_apps> entry from the client_state.xml

Claggy


Thanks, Claggy! Will try.

EDIT: Yes, it asks for work now! Thanks again!
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 51534 · Report as offensive
SusieQ
Volunteer tester

Send message
Joined: 12 Nov 10
Posts: 1132
Credit: 31,600,160
RAC: 13,143
United Kingdom
Message 51536 - Posted: 9 Jul 2014, 8:26:23 UTC
Last modified: 9 Jul 2014, 8:28:31 UTC

Hi

Running on a Windows 7 (64-bit) machine - BOINC 7.2.42.

In the Task list an AstroPulse v7.7.00 (sse) work unit lists as 100% Progress and Computation Error and the following message is in the Event Log

09/07/2014 09:20:32 | SETI@home Beta Test | Task ap_10fe09ab_B4_P0_00143_20140707_28726.wu_0 exited with zero status but no 'finished' file
09/07/2014 09:20:32 | SETI@home Beta Test | If this happens repeatedly you may need to reset the project.

It has just started another work unit (sse2) and the time elapsed gets up to 09 seconds and then restarts

SusieQ
ID: 51536 · Report as offensive
Conan
Volunteer tester
Avatar

Send message
Joined: 4 Jun 10
Posts: 6
Credit: 526,721
RAC: 0
Australia
Message 51538 - Posted: 9 Jul 2014, 10:56:16 UTC - in response to Message 51536.  

Hi

Running on a Windows 7 (64-bit) machine - BOINC 7.2.42.

In the Task list an AstroPulse v7.7.00 (sse) work unit lists as 100% Progress and Computation Error and the following message is in the Event Log

09/07/2014 09:20:32 | SETI@home Beta Test | Task ap_10fe09ab_B4_P0_00143_20140707_28726.wu_0 exited with zero status but no 'finished' file
09/07/2014 09:20:32 | SETI@home Beta Test | If this happens repeatedly you may need to reset the project.

It has just started another work unit (sse2) and the time elapsed gets up to 09 seconds and then restarts

SusieQ


In the error log it says the work units is losing contact with the BOINC Client, saying it "no longer exists and is dead".
The work unit then starts over again.
This will continue until BOINC says "Too Many Exits" and terminates the work unit.

I have no idea why.

Conan
ID: 51538 · Report as offensive
Matthias Lehmkuhl
Volunteer tester

Send message
Joined: 15 Jul 05
Posts: 176
Credit: 1,674,830
RAC: 0
Germany
Message 51539 - Posted: 9 Jul 2014, 13:51:27 UTC

I found an invalid AstroPulse v7 v7.00 (opencl_intel_gpu_100) result
http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=6478986
Valid are
AstroPulse v7 v7.00 (opencl_ati_100)
AstroPulse v7 v7.00 (opencl_nvidia_100)
with
state.fold_buf_size_short=65536; state.fold_buf_size_long=262144

single pulses: 2
repetitive pulses: 2
percent blanked: 0.00

invalid is AstroPulse v7 v7.00 (opencl_intel_gpu_100)
state.fold_buf_size_short=65536; state.fold_buf_size_long=262144
Found 30 single pulses and 30 repeating pulses, exiting.
percent blanked: 0.00

how can it be possible to find so much additional pulses on the same WU?

Actual I get valid results also for AstroPulse v7 v7.00 (opencl_intel_gpu_100) on that machine
e.g.: http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=17298162
Matthias

ID: 51539 · Report as offensive
SusieQ
Volunteer tester

Send message
Joined: 12 Nov 10
Posts: 1132
Credit: 31,600,160
RAC: 13,143
United Kingdom
Message 51540 - Posted: 9 Jul 2014, 15:05:43 UTC - in response to Message 51538.  

Hi

Running on a Windows 7 (64-bit) machine - BOINC 7.2.42.

In the Task list an AstroPulse v7.7.00 (sse) work unit lists as 100% Progress and Computation Error and the following message is in the Event Log

09/07/2014 09:20:32 | SETI@home Beta Test | Task ap_10fe09ab_B4_P0_00143_20140707_28726.wu_0 exited with zero status but no 'finished' file
09/07/2014 09:20:32 | SETI@home Beta Test | If this happens repeatedly you may need to reset the project.

It has just started another work unit (sse2) and the time elapsed gets up to 09 seconds and then restarts

SusieQ


In the error log it says the work units is losing contact with the BOINC Client, saying it "no longer exists and is dead".
The work unit then starts over again.
This will continue until BOINC says "Too Many Exits" and terminates the work unit.

I have no idea why.

Conan



I've got a couple of AstroPulse 7.7.00 work units on the go now that have been running for 4 hrs+ and 1 hr+ respectively so it looks as though the error was only on the (sse) and (sse2) work units.

SusieQ
ID: 51540 · Report as offensive
Josef W. Segur
Volunteer tester

Send message
Joined: 14 Oct 05
Posts: 1137
Credit: 1,848,733
RAC: 0
United States
Message 51542 - Posted: 9 Jul 2014, 17:26:22 UTC - in response to Message 51539.  

...
invalid is AstroPulse v7 v7.00 (opencl_intel_gpu_100)
state.fold_buf_size_short=65536; state.fold_buf_size_long=262144
Found 30 single pulses and 30 repeating pulses, exiting.
percent blanked: 0.00

how can it be possible to find so much additional pulses on the same WU?
...

The exact "how" generally cannot be pinned down. Could be a software bug, hardware related, or combination.

If the GPU were being used to play a game or a video, that might be a brief flash of wrong color or brightness someplace on the screen which you might not notice at all. A GPU manufacturer might even know that the hardware can spontaneously do that sometimes, but release it anyhow because it happens rarely enough.

Almost any GPU or CPU will do similar things if it gets too hot, and RAM is also more subject to bit flips if too hot. A PSU can contribute to the problem if it cannot stay well regulated under peak loads.

All in all, doing distributed science processing with consumer grade equipment requires some kind of checking of the results. Redundant processing and validation using cross-checking is the method in use here, and appears to be effective though probably not perfect.
                                                                  Joe
ID: 51542 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 35 · Next

Message boards : News : Astropulse 7,00 released for Linux 32&64, Win 32&64, Win32+AMD/NVIDIA/Intel GPU


 
©2020 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.