Tools for analyzing CUDA special app results

Message boards : Number crunching : Tools for analyzing CUDA special app results
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Joseph Stateson Project Donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 309
Credit: 70,759,933
RAC: 3
United States
Message 2022107 - Posted: 7 Dec 2019, 14:22:38 UTC
Last modified: 7 Dec 2019, 14:26:09 UTC

Upgraded Petri's "special sauce" to NVidia's CUDA 10.2 library. Did not make as much of a difference in elapsed time as the driver change.

I first tried the _autosetup and configure approach but that was a disaster so I used Tbar's recommendation of editing the Makefile which worked.

The change from 10.1 to 10.2 went in at midnight which was convenient for my statistics program. Only non-erroring results are in the below stats.

TB85 (5 GTX 1070 or slightly better)
Driver 440 with 10.1 CUDA
Number of selections 9,506
AVG elapsed (minutes) 1.22	00:01:13
STD of elapsed time 0.60

====using 10.2 CUDA same driver===
Number of selections 1,717
AVG elapsed (minutes) 1.34	00:01:20
STD of elapsed time 0.45


My other system had driver 435 so it would not run the 10.2 as I did not build the CUDA90
H110BTC (7 GTX 1060)
Driver 435
Number of selections 6,105
AVG elapsed (minutes) 2.27	00:02:16
STD of elapsed time 1.08 

===using 10.2 CUDA and upgrade to 440 driver===
Number of selections 1,577
AVG elapsed (minutes) 2.09	00:02:05
STD of elapsed time 0.81
ID: 2022107 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2022117 - Posted: 7 Dec 2019, 15:00:04 UTC - in response to Message 2022107.  

Joseph, I would recommend using a collection of 10-20 WUs of varying types and use those with an offline benchmarking app for quantifying the effect of system changes.

different WUs run at different speeds, sometimes by a very large margin. that means that the "mix" of WUs in your first set of 9500 WUs can be very different than the set of 1700 WUs you compared with.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2022117 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2022119 - Posted: 7 Dec 2019, 15:05:26 UTC - in response to Message 2022117.  

Joseph, I would recommend using a collection of 10-20 WUs of varying types and use those with an offline benchmarking app for quantifying the effect of system changes.

different WUs run at different speeds, sometimes by a very large margin. that means that the "mix" of WUs in your first set of 9500 WUs can be very different than the set of 1700 WUs you compared with.

IMHO The only way to be sure is to run the same set of WU's with each build bu using a test program.
The test set must contain a wide range of WU's with different AR.
ID: 2022119 · Report as offensive
Profile Joseph Stateson Project Donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 309
Credit: 70,759,933
RAC: 3
United States
Message 2022125 - Posted: 7 Dec 2019, 15:29:24 UTC - in response to Message 2022119.  
Last modified: 7 Dec 2019, 15:30:33 UTC

Joseph, I would recommend using a collection of 10-20 WUs of varying types and use those with an offline benchmarking app for quantifying the effect of system changes.

different WUs run at different speeds, sometimes by a very large margin. that means that the "mix" of WUs in your first set of 9500 WUs can be very different than the set of 1700 WUs you compared with.

IMHO The only way to be sure is to run the same set of WU's with each build bu using a test program.
The test set must contain a wide range of WU's with different AR.


I would like to do all of that. I recall there is a way to run the app using Boinc and not report or upload. I recall doing that once for debugging purposes.

Can the AR be determined somehow from the file name like the ones below?
16jl09ab.22871.9065.7.34.215_1
blc61_2bit_guppi_58642_05986_HIP54072_0023.6488.0.21.44.69.vlar_0


It is convenient to use the Boinctask "history" feature to obtain runtime data but, unfortunately, the source is not available for a special purpose mod. However, I can put together a script to run my own "special sauce boinc client" and iterate through some "benchmark" work units. I will need those units.

Also want to get the _autosetup and configure to work. It is a real PITA to edit the Makefile. In addition want to look at a Windows version. I retired when my company switched platforms from Windows to Linux & CORBA. Linux was OK with me but I had exactly ZERO interest in CORBA.
ID: 2022125 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2022144 - Posted: 7 Dec 2019, 17:16:34 UTC

The best way to test applications and work units on the same tasks and application is with an offline bench program. I think the easiest to use is Rick's BenchMT program (RueiKe)
It can be downloaded at Github.
https://github.com/Ricks-Lab/benchMT
Very simple to use with just some slight manipulation of the conf file to allow/disallow what applications you want to bench. Already had the standard stock application installed and benched for the standard test WU's. You can add whatever additional WU's you want to bench from the current project mix. Benches both cpu and gpu and AP and MB. Just drop your new CUDA 10.2 special app into the Test Applications folder and run benchMT.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2022144 · Report as offensive
Profile Joseph Stateson Project Donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 309
Credit: 70,759,933
RAC: 3
United States
Message 2022166 - Posted: 7 Dec 2019, 19:15:46 UTC - in response to Message 2022144.  
Last modified: 7 Dec 2019, 19:41:44 UTC

The best way to test applications and work units on the same tasks and application is with an offline bench program. I think the easiest to use is Rick's BenchMT program (RueiKe)
It can be downloaded at Github.
https://github.com/Ricks-Lab/benchMT
Very simple to use with just some slight manipulation of the conf file to allow/disallow what applications you want to bench. Already had the standard stock application installed and benched for the standard test WU's. You can add whatever additional WU's you want to bench from the current project mix. Benches both cpu and gpu and AP and MB. Just drop your new CUDA 10.2 special app into the Test Applications folder and run benchMT.



Ran this but had to fix a problem in the python script. I am not a regex person. My guess is that lspci puts out something different now than it did a year ago. Took a guess and changed
            'lspci | grep -E \
                     \"^.*(VGA|Display).*\[AMD\/ATI\].*$\" | grep -Eo \"^([0-9a-fA-F]+:[0-9a-fA-F]+.[0-9a-fA-F])\"',


to get rid of the AMD ATI stuff. It was returning null from the first grep so nothing went into the second one.

     'lspci | grep -E \
                     \"^.*(VGA|Display).*$\" | grep -Eo \"^([0-9a-fA-F]+:[0-9a-fA-F]+.[0-9a-fA-F])\"',


ran the following on a gtx1060 with 6gb mem

 ./benchMT --boinc_home /usr/bin --max_gpus 1 --gpu_devices 0 --std_signals


go some results for the 10.1 and 10.2

┌────┬────┬───┬────────────────────────────────────────────────────────────┬────────┬────────┬───────────┬────────┐
│Job#│Slot│xPU│app_name                                                    │  start │ finish │tot_time   │ state  │
│    │    │   │app_args                                                    │wu_name                               │
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│0   │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101         │18:53:41│18:53:59│0:00:18.036│COMPLETE│
│    │    │   │ -device 0                                                  │PG0009_v8.wu                          │
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│1   │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda102         │18:53:59│18:54:17│0:00:18.020│COMPLETE│
│    │    │   │ -device 0                                                  │PG0009_v8.wu                          │
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│2   │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101         │18:54:17│18:54:29│0:00:12.014│COMPLETE│
│    │    │   │ -device 0                                                  │PG0444_v8.wu                          │
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│3   │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda102         │18:54:29│18:54:41│0:00:12.014│COMPLETE│
│    │    │   │ -device 0                                                  │PG0444_v8.wu                          │
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│4   │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101         │18:54:41│18:54:53│0:00:12.012│COMPLETE│
│    │    │   │ -device 0                                                  │PG1327_v8.wu                          │
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│5   │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda102         │18:54:53│18:55:05│0:00:12.013│COMPLETE│
│    │    │   │ -device 0                                                  │PG1327_v8.wu                          │
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│6   │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101         │18:55:05│18:55:17│0:00:12.014│COMPLETE│
│    │    │   │ -device 0                                                  │PG0395_v8.wu                          │
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│7   │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda102         │18:55:17│18:55:29│0:00:12.014│COMPLETE│
│    │    │   │ -device 0                                                  │PG0395_v8.wu                          │
└────┴────┴───┴────────────────────────────────────────────────────────────┴──────────────────────────────────────┘


looks like the first work unit did a lot better with the 10.2
remaining ones not so much difference
ID: 2022166 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2022171 - Posted: 7 Dec 2019, 19:35:58 UTC - in response to Message 2022166.  

Try the old 10.1 App from the Old All-In-One with these tasks, http://www.arkayn.us/lunatics/Test_WUs.7z
Those tasks were selected because the results were not as expected.
Then try those same tasks with the New 10.2 App from the All-In-One.
See what you get. I'm not familiar with the Benchmark App you are running, does it compare the Result Values with each other the way the Lunatics App does?
With the Lunatics App the values are compared the same way the SETI Server Validator compares them.
ID: 2022171 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2022174 - Posted: 7 Dec 2019, 19:56:09 UTC - in response to Message 2022166.  

Really ought to grab a handful of current work from your host cache and put them into the WU_Test folder to crunch to be indicative of how the two CUDA's compare on what we are processing now.
Just copy the WU's and append .wu to the taskname of each. Remove the --std_signals from your command line to crunch the new WU's.

I use a mix of Arecibo and BLC tasks since they each run very differently on Nvidia hardware because of the angle ranges.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2022174 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2022175 - Posted: 7 Dec 2019, 19:58:20 UTC - in response to Message 2022171.  

Try the old 10.1 App from the Old All-In-One with these tasks, http://www.arkayn.us/lunatics/Test_WUs.7z
Those tasks were selected because the results were not as expected.
Then try those same tasks with the New 10.2 App from the All-In-One.
See what you get. I'm not familiar with the Benchmark App you are running, does it compare the Result Values with each other the way the Lunatics App does?
With the Lunatics App the values are compared the same way the SETI Server Validator compares them.

Yes, it use the rescmpv5_l program to compare against the standard results of the stock apps. But can't compare AP tasks unfortunately.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2022175 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2022178 - Posted: 7 Dec 2019, 20:05:07 UTC

But can it compare the two GPU Apps against a CPU App the way the Lunatics App does?
You really need to compare Both GPU Apps against a CPU App to see which GPU App is correct.
ID: 2022178 · Report as offensive
Profile Joseph Stateson Project Donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 309
Credit: 70,759,933
RAC: 3
United States
Message 2022186 - Posted: 7 Dec 2019, 20:26:21 UTC - in response to Message 2022178.  
Last modified: 7 Dec 2019, 20:44:04 UTC

But can it compare the two GPU Apps against a CPU App the way the Lunatics App does?
You really need to compare Both GPU Apps against a CPU App to see which GPU App is correct.


Am I running CPU apps? Maybe that is why nothing is happening when I put those files into the data directory

[edit] Figured it out. The benchmark is running 5 CPU programs and will then run the 5 GPUI
Five CPU work units means a very big coffee break for me. At least I got 4 cores * 2 threads and not a celeron
ID: 2022186 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2022191 - Posted: 7 Dec 2019, 20:37:35 UTC - in response to Message 2022186.  
Last modified: 7 Dec 2019, 20:50:06 UTC

Apparently the Benchmark Apps are different. In the Lunatics version it is common to place the Test App(s) into APPS and then a CPU App into REF_APPs. Although you can place any App in the REF_APPs folder, you would generally test a GPU App against a CPU App. On a Mac it would look as though,
TBarsMacPro:~ Tom$ cd /Users/Tom/KWSN-OSX-bench-MB_v2.1.07 
TBarsMacPro:KWSN-OSX-bench-MB_v2.1.07 Tom$ ./benchmark
KWSN-Darwin-MBbench v2.1.07
Running on TBarsMacPro.local at Sat Dec 7 20:12:04 2019
---------------------------------------------------
Starting benchmark run...
---------------------------------------------------
Listing wu-file(s) in /testWUs :
blc14_2bit_guppi_58691_83520_HIP79781_0103.15702.0.21.44.152.vlar.wu blc14_2bit_guppi_58691_83520_HIP79781_0103.25370.0.22.45.105.vlar.wu blc14_2bit_guppi_58691_83520_HIP79781_0103.8969.0.22.45.117.vlar.wu blc14_2bit_guppi_58692_02937_HIP79792_0121.10280.409.21.44.21.vlar.wu blc64_2bit_guppi_58642_02075_3C295_0008.17286.0.22.45.27.vlar.wu

Listing executable(s) in /APPS :
setiathome_x41p_v0.98b1_x86_64-apple-darwin_cuda101 setiathome_x41p_v0.98b1_x86_64-apple-darwin_cuda102

Listing executable in /REF_APPs :
MBv8_8.22r3605_avx2_x86_64-apple-darwin
---------------------------------------------------
Current WU: blc14_2bit_guppi_58691_83520_HIP79781_0103.15702.0.21.44.152.vlar.wu
---------------------------------------------------
Skipping default app MBv8_8.22r3605_avx2_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 2287 seconds
---------------------------------------------------
Running app with command : setiathome_x41p_v0.98b1_x86_64-apple-darwin_cuda101 -nobs -device 0
      304.84 real       272.34 user        23.27 sys
Elapsed Time : ……………………………… 305 seconds
Speed compared to default : 749 %
-----------------
Comparing results
                ------------- R1:R2 ------------     ------------- R2:R1 ------------
                Exact  Super  Tight  Good    Bad     Exact  Super  Tight  Good    Bad
        Spike      0      5      5      5      0        0      5      5      5      0
     Autocorr      0      0      0      0      0        0      0      0      0      0
     Gaussian      0      0      0      0      0        0      0      0      0      0
        Pulse      0      8      8      8      2        0      8      8      8      0
      Triplet      0      4      4      4      0        0      4      4      4      0
   Best Spike      0      1      1      1      0        0      1      1      1      0
Best Autocorr      0      1      1      1      0        0      1      1      1      0
Best Gaussian      1      1      1      1      0        1      1      1      1      0
   Best Pulse      0      0      0      0      1        0      0      0      0      1
 Best Triplet      0      1      1      1      0        0      1      1      1      0
                ----   ----   ----   ----   ----     ----   ----   ----   ----   ----
                   1     21     21     21      3        1     21     21     21      1

Unmatched signal(s) in R1 at line(s) 511 559 815
Unmatched signal(s) in R2 at line(s) 753
For R1:R2 matched signals only, Q= 99.27%
Result      : Weakly similar.
---------------------------------------------------
Running app with command : setiathome_x41p_v0.98b1_x86_64-apple-darwin_cuda102 -nobs -device 0
      308.15 real       277.09 user        22.87 sys
Elapsed Time : ……………………………… 308 seconds
Speed compared to default : 742 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.27%
---------------------------------------------------
Done with blc14_2bit_guppi_58691_83520_HIP79781_0103.15702.0.21.44.152.vlar.wu.
Current WU: blc14_2bit_guppi_58691_83520_HIP79781_0103.25370.0.22.45.105.vlar.wu
---------------------------------------------------
Skipping default app MBv8_8.22r3605_avx2_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 2331 seconds
---------------------------------------------------
Running app with command : setiathome_x41p_v0.98b1_x86_64-apple-darwin_cuda101 -nobs -device 0
      305.54 real       275.85 user        22.95 sys
Elapsed Time : ……………………………… 306 seconds
Speed compared to default : 761 %
-----------------
Comparing results
                ------------- R1:R2 ------------     ------------- R2:R1 ------------
                Exact  Super  Tight  Good    Bad     Exact  Super  Tight  Good    Bad
        Spike      0      3      3      3      0        0      3      3      3      0
     Autocorr      0      0      0      0      0        0      0      0      0      0
     Gaussian      0      0      0      0      0        0      0      0      0      0
        Pulse      0      4      4      4      2        0      4      4      4      0
      Triplet      0      4      4      4      0        0      4      4      4      0
   Best Spike      0      1      1      1      0        0      1      1      1      0
Best Autocorr      0      1      1      1      0        0      1      1      1      0
Best Gaussian      1      1      1      1      0        1      1      1      1      0
   Best Pulse      0      0      0      0      1        0      0      0      0      1
 Best Triplet      0      1      1      1      0        0      1      1      1      0
                ----   ----   ----   ----   ----     ----   ----   ----   ----   ----
                   1     15     15     15      3        1     15     15     15      1

Unmatched signal(s) in R1 at line(s) 501 532 676
Unmatched signal(s) in R2 at line(s) 614
For R1:R2 matched signals only, Q= 99.93%
Result      : Weakly similar.
---------------------------------------------------
Running app with command : setiathome_x41p_v0.98b1_x86_64-apple-darwin_cuda102 -nobs -device 0
      309.23 real       279.33 user        22.52 sys
Elapsed Time : ……………………………… 310 seconds
Speed compared to default : 751 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.76%
---------------------------------------------------
Done with blc14_2bit_guppi_58691_83520_HIP79781_0103.25370.0.22.45.105.vlar.wu.
Current WU: blc14_2bit_guppi_58691_83520_HIP79781_0103.8969.0.22.45.117.vlar.wu
---------------------------------------------------
Skipping default app MBv8_8.22r3605_avx2_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 2309 seconds
---------------------------------------------------
Running app with command : setiathome_x41p_v0.98b1_x86_64-apple-darwin_cuda101 -nobs -device 0
      303.91 real       277.18 user        21.30 sys
Elapsed Time : ……………………………… 304 seconds
Speed compared to default : 759 %
-----------------
Comparing results
                ------------- R1:R2 ------------     ------------- R2:R1 ------------
                Exact  Super  Tight  Good    Bad     Exact  Super  Tight  Good    Bad
        Spike      0      3      3      3      0        0      3      3      3      0
     Autocorr      0      0      0      0      0        0      0      0      0      0
     Gaussian      0      0      0      0      0        0      0      0      0      0
        Pulse      0     11     11     11      1        0     11     11     11      0
      Triplet      0      0      0      0      0        0      0      0      0      0
   Best Spike      0      1      1      1      0        0      1      1      1      0
Best Autocorr      0      1      1      1      0        0      1      1      1      0
Best Gaussian      1      1      1      1      0        1      1      1      1      0
   Best Pulse      0      0      0      0      1        0      0      0      0      1
 Best Triplet      0      0      0      0      0        0      0      0      0      0
                ----   ----   ----   ----   ----     ----   ----   ----   ----   ----
                   1     17     17     17      2        1     17     17     17      1

Unmatched signal(s) in R1 at line(s) 536 746
Unmatched signal(s) in R2 at line(s) 715
For R1:R2 matched signals only, Q= 99.37%
Result      : Weakly similar.
---------------------------------------------------
Running app with command : setiathome_x41p_v0.98b1_x86_64-apple-darwin_cuda102 -nobs -device 0
      306.61 real       278.21 user        21.98 sys
Elapsed Time : ……………………………… 306 seconds
Speed compared to default : 754 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.37%
---------------------------------------------------
Done with blc14_2bit_guppi_58691_83520_HIP79781_0103.8969.0.22.45.117.vlar.wu.
Current WU: blc14_2bit_guppi_58692_02937_HIP79792_0121.10280.409.21.44.21.vlar.wu
---------------------------------------------------
Skipping default app MBv8_8.22r3605_avx2_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 1794 seconds
---------------------------------------------------
Running app with command : setiathome_x41p_v0.98b1_x86_64-apple-darwin_cuda101 -nobs -device 0
      223.55 real       201.49 user        16.82 sys
Elapsed Time : ……………………………… 223 seconds
Speed compared to default : 804 %
-----------------
Comparing results
                ------------- R1:R2 ------------     ------------- R2:R1 ------------
                Exact  Super  Tight  Good    Bad     Exact  Super  Tight  Good    Bad
        Spike      0      0      0      0      0        0      0      0      0      0
     Autocorr      0      1      1      1      0        0      1      1      1      0
     Gaussian      0      0      0      0      0        0      0      0      0      0
        Pulse      0      4      4      4      1        0      4      4      4      0
      Triplet      0      1      1      1      0        0      1      1      1      0
   Best Spike      0      1      1      1      0        0      1      1      1      0
Best Autocorr      0      1      1      1      0        0      1      1      1      0
Best Gaussian      1      1      1      1      0        1      1      1      1      0
   Best Pulse      0      1      1      1      0        0      1      1      1      0
 Best Triplet      0      1      1      1      0        0      1      1      1      0
                ----   ----   ----   ----   ----     ----   ----   ----   ----   ----
                   1     11     11     11      1        1     11     11     11      0

Unmatched signal(s) in R1 at line(s) 370
For R1:R2 matched signals only, Q= 99.75%
Result      : Weakly similar.
---------------------------------------------------
Running app with command : setiathome_x41p_v0.98b1_x86_64-apple-darwin_cuda102 -nobs -device 0
Although this is a Mac, I have seen similar results in Linux. Just interested in what you get using those WUs I posted.
ID: 2022191 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2022193 - Posted: 7 Dec 2019, 20:45:46 UTC - in response to Message 2022178.  

But can it compare the two GPU Apps against a CPU App the way the Lunatics App does?
You really need to compare Both GPU Apps against a CPU App to see which GPU App is correct.

It does if you run the task against the standard cpu app. Or just use the already computed standard signals. You configure the bench to run the standard cpu app and your two test gpu apps and they all get compared to each other with the standard comparison app outputs.

#############################################################################
## Blank lines and any part of a line beginning with # are ignored
#############################################################################
##
## List of applications with desired arguments.
##
## Format as would be used for executing the application from a
## command line, app -arg -arg etc. Multiple instances of the
## same app with different (or same) arguments will run that
## many if the app is in the APPS_[C,G]PU directories, although
## the --num_repetitions argument is the preferred way of running
## an entry more than once.
##
## Needs the full application name with extension. Zero to many
## arguments are possible. The -device N option of a GPU app will
## be ingored, as this command line option is used to manage slot
## assignment.  Specifing physical GPUs can be accomplished with the
## --max_gpus X and --gpu_devices 0,1 options. The value X must be 
## equal to the number of devices specified.
##
##
##############################################################################
## Set benchMT command line options
##############################################################################
## 
## Command line options can be specified as modes in the BenchCFG file or an
## alternate CFG file specified on the command line.  Options specified on
## the command line will override those specified with mode in a CFG file.
## 
##
#Don't ask confirmation before running jobs
mode yes False
#
#Specify name of this run
mode run_name Petri test
#
#Specify path for BOINC
mode boinc_home /home/keith/Desktop/BOINC/
#
#Do not suspend BOINC
#mode noBS False
#
#Display compact run status
#mode display_compact False
#
#Display run status by slots instead of jobs
#mode display_slots False
#
#Specify number of times to run benchmark
mode num_repetitions 1
#
#Specify max number of threads to load
#mode max_threads 8
#
#Specify max number of GPUs to load
mode max_gpus 1
#
#Specify GPU devices to use
mode gpu_devices 0
#
#Specify GPU mapping between boinc device # and driver card #, required for energy option
mode devmap 0:1
#
#Specify Energy mode
#mode energy True
#
#Specify AstroPulse mode
#mode astropulse True
#
#Use standard signal WUs instead of Test WUs
#mode std_signals True
##
##
##############################################################################
## Entries to define benchmark run
##############################################################################
##
#setiathome_8.22_x86_64-pc-linux-gnu__opencl_nvidia_SoG -sbs 1024 -period_iterations_num 1 -tt 1500 -high_perf -high_prec_timer -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#setiathome_x41p_V0.97b2_x86_64-pc-linux-gnu_cuda100 -nobs
#setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 -nobs
#setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 -nobs -unroll 2
#astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100 -unroll 20 -oclFFT_plan 256 16 256 -ffa_block 2304 -ffa_block_fetch 1152 -tune 1 64 8 1 -tune 2 64 8 1



#ap_7.05r2728_sse3_linux64 --nographics
#MBv8_8.04r3306_sse2_linux64 --nographics
#MBv8_8.04r3306_sse41_linux64 --nographics
#MBv8_8.04r3306_ssse3_linux64 --nographics
MBv8_8.22r3711_sse41_x86_64-pc-linux-gnu --nographics
#MBv8_8.22r3712_avx2_x86_64-pc-linux-gnu --nographics
#MBv8_8.04r3306_sse42_linux64 --nographics
MBv8_8.05r3345_avx_linux64 --nographics
#MBv8_8.04r3306_sse3_linux64 --nographics

Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2022193 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2022196 - Posted: 7 Dec 2019, 20:54:35 UTC - in response to Message 2022186.  

But can it compare the two GPU Apps against a CPU App the way the Lunatics App does?
You really need to compare Both GPU Apps against a CPU App to see which GPU App is correct.


Am I running CPU apps? Maybe that is why nothing is happening when I put those files into the data directory

[edit] Figured it out. The benchmark is running 5 CPU programs and will then run the 5 GPUI
Five CPU work units means a very big coffee break for me. At least I got 4 cores * 2 threads and not a celeron

You can change the number of iterations of each app and just run a single cpu app or single gpu app for comparison. Just comment out the applications you don't want to run and leave the apps you want to run uncommented. If you use the already computed standard cpu results, you don't have to crunch them again and the benchmark finishes quickly for the gpu runs.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2022196 · Report as offensive

Message boards : Number crunching : Tools for analyzing CUDA special app results


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.