Deprecated: Function get_magic_quotes_gpc() is deprecated in /disks/centurion/b/carolyn/b/home/boincadm/projects/beta/html/inc/util.inc on line 663
Large amount of Images, using large bandwidth. Don't open unless you are aware. For Raistmer

Large amount of Images, using large bandwidth. Don't open unless you are aware. For Raistmer

Message boards : SETI@home Enhanced : Large amount of Images, using large bandwidth. Don't open unless you are aware. For Raistmer
Message board moderation

To post messages, you must log in.

AuthorMessage
Zalster
Volunteer tester

Send message
Joined: 30 Dec 13
Posts: 258
Credit: 12,340,341
RAC: 0
United States
Message 57871 - Posted: 17 Apr 2016, 7:07:44 UTC

The purpose of this thread to show Raistmer what I was seeing while using -cpu_lock on a system with multiple GPU and a set number of cores.

I created it so that I don't congest his SoG thread with all these images.

I expect that I explained it wrong to him in his thread and my understanding is probably not correct but it's hard to explain without seeing it. Hench this thread.

When the number of work units total was greater than the number of physical cores and using -cpu_lock I notice a very irregular occurrence.

First how work on the 4 GPUs with 3 work units each progressed without -cpu_lock on a 8 core Hyperthreaded to 16 should run



Next is image of how it progresses normally



Next is 4 GPU running 3 work units each with -cpu_lock in the commandline Notice how 2 of the 3 work units on each GPU have low CPU usage while the 3rd uses almost a full core




Next we see how those 4 work units with a full core progress faster than the other 8 work units




Below are the links to the stderr of the first 4 work units to finish

https://setiweb.ssl.berkeley.edu/beta//result.php?resultid=23626106
https://setiweb.ssl.berkeley.edu/beta//result.php?resultid=23625489
https://setiweb.ssl.berkeley.edu/beta//result.php?resultid=23626016
https://setiweb.ssl.berkeley.edu/beta//result.php?resultid=23625977

Next we see that 4 of the original 8 have moved up the column. Their elapsed time is still the same but the %Progress has returned to 0 and is starting anew. There are also 4 new work units at the bottom of the column of work




The 4 work units that restarted are now utilizing more cores than they were before and rapidly make up the %progressed compared to the other work units.



Finally they get near completion. Look at the elapsed time for those 4 work units


Here is an image as 2 complete and 2 of the semicompleted now lose all %progress and restart at 0 but time elapsed continues.



Here are the 4 that started over and progressed to finished.

https://setiweb.ssl.berkeley.edu/beta//result.php?resultid=23625796
https://setiweb.ssl.berkeley.edu/beta//result.php?resultid=23625802
https://setiweb.ssl.berkeley.edu/beta//result.php?resultid=23625914
https://setiweb.ssl.berkeley.edu/beta//result.php?resultid=23625945

This is the image of the last 4 of the original 12 as they start from scratch again, notice time elapsed.



progression, notice how the percentage of CPU is increasing


near completion



Here we have a Panic mode as the computer now thinks the work is taking much longer than it normally would



and the last image is as the last of the original 12 work units finally are completed.




Links to the stderr of the last of the original 12 work units

https://setiweb.ssl.berkeley.edu/beta//result.php?resultid=23626079
https://setiweb.ssl.berkeley.edu/beta//result.php?resultid=23626105
https://setiweb.ssl.berkeley.edu/beta//result.php?resultid=23626102
https://setiweb.ssl.berkeley.edu/beta//result.php?resultid=23626195

At this point I suspended all new work and let all the rest just run.

Currently I'm looking at over 1 hr and 30 per each work unit.
ID: 57871 · Report as offensive
Grumpy Swede
Volunteer tester
Avatar

Send message
Joined: 10 Mar 12
Posts: 1700
Credit: 13,216,373
RAC: 0
Sweden
Message 57872 - Posted: 17 Apr 2016, 8:06:08 UTC - in response to Message 57871.  
Last modified: 17 Apr 2016, 8:07:47 UTC

You haven't included the command "-instances_per_device N". Can't find the text "Number of app instances per device set to:N" in your stderrs.

As Raistmer said on main:

"CPUlock will hardly work correctly w/o knowing number of instances per GPU."
ID: 57872 · Report as offensive
Profile Mike
Volunteer tester
Avatar

Send message
Joined: 16 Jun 05
Posts: 2530
Credit: 1,074,556
RAC: 0
Germany
Message 57873 - Posted: 17 Apr 2016, 8:30:54 UTC

Also try -total_GPU_instances_num N.
In conjunction with -use_sleep it should help running multiple instances.
With each crime and every kindness we birth our future.
ID: 57873 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 57874 - Posted: 17 Apr 2016, 9:03:05 UTC
Last modified: 17 Apr 2016, 9:23:23 UTC

1. Unfortunately, you ignored my request of AFFINITY distribution screenshot. All those images don't show how CPU allocated for particular process so I'm unable to see if CPUlock works right or wrong.

2. From one of stderrs:

Running on device number: 0
Maximum single buffer size set to:512MB
Number of period iterations for PulseFind set to:80
SpikeFind FFT size threshold override set to:8192
TUNE: kernel 1 now has workgroup size of (64,1,4)
oclFFT global radix override set to:256
oclFFT local radix override set to:16
oclFFT max WG size override set to:256
oclFFT max local FFT size override set to:512
oclFFT number of local memory banks set to:64
oclFFT minimal memory coalesce width set to:64
CPU affinity adjustment enabled
Priority of worker thread raised successfully
Priority of process adjusted successfully, high priority class used
OpenCL platform detected: NVIDIA Corporation
GPUlock enabled. Use -instances_per_device N switch to provide number of instances to run if BOINC is configured to launch few tasks per device.
BOINC assigns device 0, slots 0 to 0 (including) will be checked
WARNING: Can't follow BOINC's proposal, will try to find free device..
Used slot is 0; Info: BOINC provided OpenCL device ID used
Info: CPU affinity mask used: 1; system mask is ffff

All inconsistencies highlighted. As was said before few times (and as said in stderr itself!): if you want to run few instances per GPU - tell the app how many. W/o this additional info app assumes default, that is, 1 task per GPU and distributes them correspondingly THAT assumption and not your wishes. Telepatic agent out of scope of this app ;)

3. So, please, correct cmd line and try again. And I don't need BOINCTasks pictures, I need TaskManager screenshots with affinity dialog open (in case that issue will remain even after cmd line correction).

P.S. That "will try find device" mostly describes issue you observe. App doesn't see available slot for execution (own slot, governed by MutEx). So it awaits on mutex until slot is freed. In such awaiting state it consumes ZERO CPU, does ZERO progress and even doesn't allocate memory buffers for execution. So, all progress and reset of that progress you observed are just OWN BOINC ARTEFACTS. BOINC attempts to emulate execution progress.

P.P.S.
Currently I'm looking at over 1 hr and 30 per each work unit.

And yes, I'm thankful for your dedication and very rich and thorough description with good illustrations... but please if you would follow my guidances little more directly next time it would both save your efforts and speedup ultimate solvation of issue.
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 57874 · Report as offensive
Grumpy Swede
Volunteer tester
Avatar

Send message
Joined: 10 Mar 12
Posts: 1700
Credit: 13,216,373
RAC: 0
Sweden
Message 57875 - Posted: 17 Apr 2016, 9:11:54 UTC
Last modified: 17 Apr 2016, 9:14:23 UTC

Hmm, and running SIV, as I see from the pictures, could potentially also be a problem, especially if SIV's own CPU affinity function is ticked on.
ID: 57875 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 57876 - Posted: 17 Apr 2016, 9:20:38 UTC - in response to Message 57875.  
Last modified: 17 Apr 2016, 9:20:55 UTC

Hmm, and running SIV, as I see from the pictures, could potentially also be a problem, especially if SIV's own CPU affinity function is ticked on.

The main issue is plain waiting with zero CPU consumption on mutex cause number of actually running processes exceeds number of slots allocated for them. GPUlock in action - its lock part :)
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 57876 · Report as offensive
Zalster
Volunteer tester

Send message
Joined: 30 Dec 13
Posts: 258
Credit: 12,340,341
RAC: 0
United States
Message 57879 - Posted: 17 Apr 2016, 13:24:56 UTC
Last modified: 17 Apr 2016, 13:38:30 UTC

1. Unfortunately, you ignored my request of AFFINITY distribution screenshot. All those images don't show how CPU allocated for particular process so I'm unable to see if CPUlock works right or wrong.


It also helps if you tell me how to find such a screenshot.

Unless it's the standard view otherwise I'll have to do some goggle searches to try and find it.

I'm not a developer or coder, so I have to guess what you want when you say you want a certain thing....

@Mike

I will try the -total_GPU_instances_num N and see what it does.

I had assumed the app_config would tell it how many to run per device but didn't know it needed to be in the command line rather than the app_config, so I guess that make app_config useless?

@ Tut

I would have to find where that CPU affinity check box is. That runs stock since there are like a million settings you can have with it.

But this will have to wait...Work has priority
ID: 57879 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 57880 - Posted: 17 Apr 2016, 14:23:25 UTC - in response to Message 57879.  

It also helps if you tell me how to find such a screenshot.

He said Task Manager. Ctrl-Alt-Del, it's the last on the list. Click processes tab, right-click SoG application name, choose 'Affinity...' and voila:



That's not development, that's knowing how to drive an operating system.

I had assumed the app_config would tell it how many to run per device but didn't know it needed to be in the command line rather than the app_config, so I guess that make app_config useless?

That's a reasonable assumption, but no. App_config is necessary, because that is what makes BOINC run the application the way you want. You have to tell Raistmer's application the same thing: it would be far better if the application could read the same instructions (or work it out for itself), but it can't. Currently, both instructions are necessary.
ID: 57880 · Report as offensive
Zalster
Volunteer tester

Send message
Joined: 30 Dec 13
Posts: 258
Credit: 12,340,341
RAC: 0
United States
Message 57881 - Posted: 17 Apr 2016, 15:03:13 UTC - in response to Message 57880.  


That's not development, that's knowing how to drive an operating system.


Sort of like my telling you to go listen to the fetal heart tones on the lady in bed 5 with the ultrasound machine. It's just knowing how to operate the machine.......

right?

You don't use that machine every day and I don't go crawling around the OS of my machines either. I appreciate the help, not the attitude.
ID: 57881 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 57883 - Posted: 17 Apr 2016, 15:38:19 UTC - in response to Message 57880.  

it would be far better if the application could read the same instructions (or work it out for itself), but it can't. Currently, both instructions are necessary.

Yes, it would be simplier and more consistent indeed. Do you aware of the way how to query BOINC about number of tasks?
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 57883 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 57888 - Posted: 17 Apr 2016, 18:53:59 UTC - in response to Message 57883.  

it would be far better if the application could read the same instructions (or work it out for itself), but it can't. Currently, both instructions are necessary.

Yes, it would be simplier and more consistent indeed. Do you aware of the way how to query BOINC about number of tasks?

From what you've said in the documentation, presumably you mean the number of tasks allowed as a maximum. What tasks would you need to include?

Categories I can think of include:

Your builds with CPU_lock enabled
Other GPU tasks which don't use affinity
Current project, or all projects (even SETI Main + SETI Beta)
Current BOINC instance, or multiple BOINCs on the same host

and probably more. It's not an easy question.
ID: 57888 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 57896 - Posted: 18 Apr 2016, 8:22:42 UTC - in response to Message 57888.  

It's not an easy question.

That's why I use additional cmd line param for GPUlock.
But maybe it can be avoided for CPUlock-only configs. I'll re-look code at possibility.
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 57896 · Report as offensive
Zalster
Volunteer tester

Send message
Joined: 30 Dec 13
Posts: 258
Credit: 12,340,341
RAC: 0
United States
Message 57909 - Posted: 18 Apr 2016, 17:06:17 UTC - in response to Message 57896.  

Well the additions of the lines for number of instances in the command line helps with the -cpu_lock
It's now running like it's supposed to and each work unit has affinity for only 1 core

Currently testing to see how much longer it takes to process the work with the -use_sleep commandline

It's taking longer than it did 2 days ago, not sure if it's because the work units are from a different source and are more complex or if something with the server changed.

By tonight, I should be done testing and can post my findings.
ID: 57909 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 57913 - Posted: 18 Apr 2016, 18:49:33 UTC - in response to Message 57909.  

Not nessesary that NV build benefits from -cpu_lock in the same degreee as ATi one does. It depends of runtime implementation by vendor.
So -use_sleep should be tested with and w/o -cpu_lock IMHO.
News about SETI opt app releases: https://twitter.com/Raistmer
ID: 57913 · Report as offensive
Zalster
Volunteer tester

Send message
Joined: 30 Dec 13
Posts: 258
Credit: 12,340,341
RAC: 0
United States
Message 57925 - Posted: 18 Apr 2016, 19:57:40 UTC - in response to Message 57913.  

I'm doing that as well.

At least several more hours before I get some results.

Will post tomorrow with what I have

So far tested

commandline with -cpu_lock
commandline with -cpu_lock and -use_sleep
commandline with -no_cpu_lock
commandline with -no_cpu-lock and -use_sleep

Now testing the above with multiple instances of work units per card

1 per card (finished testing)
2 per card (currently testing)
3 per card (needs to be repeated)
ID: 57925 · Report as offensive
Zalster
Volunteer tester

Send message
Joined: 30 Dec 13
Posts: 258
Credit: 12,340,341
RAC: 0
United States
Message 57944 - Posted: 19 Apr 2016, 1:17:07 UTC - in response to Message 57925.  
Last modified: 19 Apr 2016, 1:46:03 UTC

Ok, sorry about all the editing.

So here are my preliminary results

This is based on 4 GPU in a 8 physical(hyperthreaded to 16) core machine

As stated above. There needs to be 2 areas where you specify how many work units you will be running on each GPU. The first is the app_config.xml The second is in the commandline itself

Thanks to Mike for giving me the commands and to Richard to pointing out the 2 different places where the number of work needed to be placed


-use_sleep -sbs 512 -total_GPU_instances_num 12 -instance_per_device 3 -period_iterations_num 80 _spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -cpu_lock

Using the -cpu_lock does cause the work unit to have an affinity for 1 core.
Using the -use_sleep drops the % of each core down to a more reasonable level

The main drawback is increase in time when using the -use_sleep but that can be compensated for when running more than 1 work unit per card but time to complete increase per each work unit.

I have not yet tested how having more work units than cores affects the systems. That is the next test.



Edit..

CPU usage went from 97% of a core for each work unit down to 3-4% of a core with -use_sleep

There is some variation in times to complete with -use_sleep. I've notice a distribution at 2 different time (maybe due to data?) Not really seen when there is no -use_sleep

This is a rough collection but gives a good starting point.
ID: 57944 · Report as offensive
Zalster
Volunteer tester

Send message
Joined: 30 Dec 13
Posts: 258
Credit: 12,340,341
RAC: 0
United States
Message 57958 - Posted: 19 Apr 2016, 22:55:16 UTC - in response to Message 57944.  

On the question of core affinity,

I was able to run more instances of work compared to the number of cores while using the -use_sleep command. (Otherwise, without the -use_sleep, the CPU cores max out at 100% and system stalls.)

Some cores had 2 work units assigned to a single core.

The work units progressed as normal until complete.

So that completes my testing, hoped this helps some people.
ID: 57958 · Report as offensive
Zalster
Volunteer tester

Send message
Joined: 30 Dec 13
Posts: 258
Credit: 12,340,341
RAC: 0
United States
Message 57959 - Posted: 20 Apr 2016, 0:06:24 UTC - in response to Message 57958.  
Last modified: 20 Apr 2016, 0:08:38 UTC

So I just noticed something....

Looking at the stderr reports, I notice that priority was wrong.

-hp was missing from the commandlines of all the work units.

So I don't know how much that is going to affect the results.

If anyone copied the commandline I posted to use, make sure you include that -hp as well
ID: 57959 · Report as offensive

Message boards : SETI@home Enhanced : Large amount of Images, using large bandwidth. Don't open unless you are aware. For Raistmer


 
©2023 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.