Message boards :
SETI@home Enhanced :
Large amount of Images, using large bandwidth. Don't open unless you are aware. For Raistmer
Message board moderation
Author | Message |
---|---|
Send message Joined: 30 Dec 13 Posts: 258 Credit: 12,340,341 RAC: 0 ![]() |
The purpose of this thread to show Raistmer what I was seeing while using -cpu_lock on a system with multiple GPU and a set number of cores. I created it so that I don't congest his SoG thread with all these images. I expect that I explained it wrong to him in his thread and my understanding is probably not correct but it's hard to explain without seeing it. Hench this thread. When the number of work units total was greater than the number of physical cores and using -cpu_lock I notice a very irregular occurrence. First how work on the 4 GPUs with 3 work units each progressed without -cpu_lock on a 8 core Hyperthreaded to 16 should run ![]() Next is image of how it progresses normally ![]() Next is 4 GPU running 3 work units each with -cpu_lock in the commandline Notice how 2 of the 3 work units on each GPU have low CPU usage while the 3rd uses almost a full core ![]() Next we see how those 4 work units with a full core progress faster than the other 8 work units ![]() Below are the links to the stderr of the first 4 work units to finish https://setiweb.ssl.berkeley.edu/beta//result.php?resultid=23626106 https://setiweb.ssl.berkeley.edu/beta//result.php?resultid=23625489 https://setiweb.ssl.berkeley.edu/beta//result.php?resultid=23626016 https://setiweb.ssl.berkeley.edu/beta//result.php?resultid=23625977 Next we see that 4 of the original 8 have moved up the column. Their elapsed time is still the same but the %Progress has returned to 0 and is starting anew. There are also 4 new work units at the bottom of the column of work ![]() The 4 work units that restarted are now utilizing more cores than they were before and rapidly make up the %progressed compared to the other work units. ![]() Finally they get near completion. Look at the elapsed time for those 4 work units ![]() Here is an image as 2 complete and 2 of the semicompleted now lose all %progress and restart at 0 but time elapsed continues. ![]() Here are the 4 that started over and progressed to finished. https://setiweb.ssl.berkeley.edu/beta//result.php?resultid=23625796 https://setiweb.ssl.berkeley.edu/beta//result.php?resultid=23625802 https://setiweb.ssl.berkeley.edu/beta//result.php?resultid=23625914 https://setiweb.ssl.berkeley.edu/beta//result.php?resultid=23625945 This is the image of the last 4 of the original 12 as they start from scratch again, notice time elapsed. ![]() progression, notice how the percentage of CPU is increasing ![]() near completion ![]() Here we have a Panic mode as the computer now thinks the work is taking much longer than it normally would ![]() and the last image is as the last of the original 12 work units finally are completed. ![]() Links to the stderr of the last of the original 12 work units https://setiweb.ssl.berkeley.edu/beta//result.php?resultid=23626079 https://setiweb.ssl.berkeley.edu/beta//result.php?resultid=23626105 https://setiweb.ssl.berkeley.edu/beta//result.php?resultid=23626102 https://setiweb.ssl.berkeley.edu/beta//result.php?resultid=23626195 At this point I suspended all new work and let all the rest just run. Currently I'm looking at over 1 hr and 30 per each work unit. |
![]() Send message Joined: 10 Mar 12 Posts: 1700 Credit: 13,216,373 RAC: 0 ![]() |
You haven't included the command "-instances_per_device N". Can't find the text "Number of app instances per device set to:N" in your stderrs. As Raistmer said on main: "CPUlock will hardly work correctly w/o knowing number of instances per GPU." |
![]() ![]() Send message Joined: 16 Jun 05 Posts: 2530 Credit: 1,074,556 RAC: 0 ![]() |
Also try -total_GPU_instances_num N. In conjunction with -use_sleep it should help running multiple instances. With each crime and every kindness we birth our future. |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
1. Unfortunately, you ignored my request of AFFINITY distribution screenshot. All those images don't show how CPU allocated for particular process so I'm unable to see if CPUlock works right or wrong. 2. From one of stderrs: Running on device number: 0 Maximum single buffer size set to:512MB Number of period iterations for PulseFind set to:80 SpikeFind FFT size threshold override set to:8192 TUNE: kernel 1 now has workgroup size of (64,1,4) oclFFT global radix override set to:256 oclFFT local radix override set to:16 oclFFT max WG size override set to:256 oclFFT max local FFT size override set to:512 oclFFT number of local memory banks set to:64 oclFFT minimal memory coalesce width set to:64 CPU affinity adjustment enabled Priority of worker thread raised successfully Priority of process adjusted successfully, high priority class used OpenCL platform detected: NVIDIA Corporation GPUlock enabled. Use -instances_per_device N switch to provide number of instances to run if BOINC is configured to launch few tasks per device. BOINC assigns device 0, slots 0 to 0 (including) will be checked WARNING: Can't follow BOINC's proposal, will try to find free device.. Used slot is 0; Info: BOINC provided OpenCL device ID used Info: CPU affinity mask used: 1; system mask is ffff All inconsistencies highlighted. As was said before few times (and as said in stderr itself!): if you want to run few instances per GPU - tell the app how many. W/o this additional info app assumes default, that is, 1 task per GPU and distributes them correspondingly THAT assumption and not your wishes. Telepatic agent out of scope of this app ;) 3. So, please, correct cmd line and try again. And I don't need BOINCTasks pictures, I need TaskManager screenshots with affinity dialog open (in case that issue will remain even after cmd line correction). P.S. That "will try find device" mostly describes issue you observe. App doesn't see available slot for execution (own slot, governed by MutEx). So it awaits on mutex until slot is freed. In such awaiting state it consumes ZERO CPU, does ZERO progress and even doesn't allocate memory buffers for execution. So, all progress and reset of that progress you observed are just OWN BOINC ARTEFACTS. BOINC attempts to emulate execution progress. P.P.S. Currently I'm looking at over 1 hr and 30 per each work unit. And yes, I'm thankful for your dedication and very rich and thorough description with good illustrations... but please if you would follow my guidances little more directly next time it would both save your efforts and speedup ultimate solvation of issue. News about SETI opt app releases: https://twitter.com/Raistmer |
![]() Send message Joined: 10 Mar 12 Posts: 1700 Credit: 13,216,373 RAC: 0 ![]() |
Hmm, and running SIV, as I see from the pictures, could potentially also be a problem, especially if SIV's own CPU affinity function is ticked on. |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
Hmm, and running SIV, as I see from the pictures, could potentially also be a problem, especially if SIV's own CPU affinity function is ticked on. The main issue is plain waiting with zero CPU consumption on mutex cause number of actually running processes exceeds number of slots allocated for them. GPUlock in action - its lock part :) News about SETI opt app releases: https://twitter.com/Raistmer |
Send message Joined: 30 Dec 13 Posts: 258 Credit: 12,340,341 RAC: 0 ![]() |
1. Unfortunately, you ignored my request of AFFINITY distribution screenshot. All those images don't show how CPU allocated for particular process so I'm unable to see if CPUlock works right or wrong. It also helps if you tell me how to find such a screenshot. Unless it's the standard view otherwise I'll have to do some goggle searches to try and find it. I'm not a developer or coder, so I have to guess what you want when you say you want a certain thing.... @Mike I will try the -total_GPU_instances_num N and see what it does. I had assumed the app_config would tell it how many to run per device but didn't know it needed to be in the command line rather than the app_config, so I guess that make app_config useless? @ Tut I would have to find where that CPU affinity check box is. That runs stock since there are like a million settings you can have with it. But this will have to wait...Work has priority |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
It also helps if you tell me how to find such a screenshot. He said Task Manager. Ctrl-Alt-Del, it's the last on the list. Click processes tab, right-click SoG application name, choose 'Affinity...' and voila: ![]() That's not development, that's knowing how to drive an operating system. I had assumed the app_config would tell it how many to run per device but didn't know it needed to be in the command line rather than the app_config, so I guess that make app_config useless? That's a reasonable assumption, but no. App_config is necessary, because that is what makes BOINC run the application the way you want. You have to tell Raistmer's application the same thing: it would be far better if the application could read the same instructions (or work it out for itself), but it can't. Currently, both instructions are necessary. |
Send message Joined: 30 Dec 13 Posts: 258 Credit: 12,340,341 RAC: 0 ![]() |
Sort of like my telling you to go listen to the fetal heart tones on the lady in bed 5 with the ultrasound machine. It's just knowing how to operate the machine....... right? You don't use that machine every day and I don't go crawling around the OS of my machines either. I appreciate the help, not the attitude. |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
it would be far better if the application could read the same instructions (or work it out for itself), but it can't. Currently, both instructions are necessary. Yes, it would be simplier and more consistent indeed. Do you aware of the way how to query BOINC about number of tasks? News about SETI opt app releases: https://twitter.com/Raistmer |
Send message Joined: 3 Jan 07 Posts: 1451 Credit: 3,272,268 RAC: 0 ![]() |
it would be far better if the application could read the same instructions (or work it out for itself), but it can't. Currently, both instructions are necessary. From what you've said in the documentation, presumably you mean the number of tasks allowed as a maximum. What tasks would you need to include? Categories I can think of include: Your builds with CPU_lock enabled Other GPU tasks which don't use affinity Current project, or all projects (even SETI Main + SETI Beta) Current BOINC instance, or multiple BOINCs on the same host and probably more. It's not an easy question. |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
It's not an easy question. That's why I use additional cmd line param for GPUlock. But maybe it can be avoided for CPUlock-only configs. I'll re-look code at possibility. News about SETI opt app releases: https://twitter.com/Raistmer |
Send message Joined: 30 Dec 13 Posts: 258 Credit: 12,340,341 RAC: 0 ![]() |
Well the additions of the lines for number of instances in the command line helps with the -cpu_lock It's now running like it's supposed to and each work unit has affinity for only 1 core Currently testing to see how much longer it takes to process the work with the -use_sleep commandline It's taking longer than it did 2 days ago, not sure if it's because the work units are from a different source and are more complex or if something with the server changed. By tonight, I should be done testing and can post my findings. |
![]() ![]() Send message Joined: 18 Aug 05 Posts: 2423 Credit: 15,878,738 RAC: 0 ![]() |
Not nessesary that NV build benefits from -cpu_lock in the same degreee as ATi one does. It depends of runtime implementation by vendor. So -use_sleep should be tested with and w/o -cpu_lock IMHO. News about SETI opt app releases: https://twitter.com/Raistmer |
Send message Joined: 30 Dec 13 Posts: 258 Credit: 12,340,341 RAC: 0 ![]() |
I'm doing that as well. At least several more hours before I get some results. Will post tomorrow with what I have So far tested commandline with -cpu_lock commandline with -cpu_lock and -use_sleep commandline with -no_cpu_lock commandline with -no_cpu-lock and -use_sleep Now testing the above with multiple instances of work units per card 1 per card (finished testing) 2 per card (currently testing) 3 per card (needs to be repeated) |
Send message Joined: 30 Dec 13 Posts: 258 Credit: 12,340,341 RAC: 0 ![]() |
Ok, sorry about all the editing. So here are my preliminary results This is based on 4 GPU in a 8 physical(hyperthreaded to 16) core machine As stated above. There needs to be 2 areas where you specify how many work units you will be running on each GPU. The first is the app_config.xml The second is in the commandline itself Thanks to Mike for giving me the commands and to Richard to pointing out the 2 different places where the number of work needed to be placed -use_sleep -sbs 512 -total_GPU_instances_num 12 -instance_per_device 3 -period_iterations_num 80 _spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -cpu_lock Using the -cpu_lock does cause the work unit to have an affinity for 1 core. Using the -use_sleep drops the % of each core down to a more reasonable level The main drawback is increase in time when using the -use_sleep but that can be compensated for when running more than 1 work unit per card but time to complete increase per each work unit. I have not yet tested how having more work units than cores affects the systems. That is the next test. ![]() Edit.. CPU usage went from 97% of a core for each work unit down to 3-4% of a core with -use_sleep There is some variation in times to complete with -use_sleep. I've notice a distribution at 2 different time (maybe due to data?) Not really seen when there is no -use_sleep This is a rough collection but gives a good starting point. |
Send message Joined: 30 Dec 13 Posts: 258 Credit: 12,340,341 RAC: 0 ![]() |
On the question of core affinity, I was able to run more instances of work compared to the number of cores while using the -use_sleep command. (Otherwise, without the -use_sleep, the CPU cores max out at 100% and system stalls.) Some cores had 2 work units assigned to a single core. The work units progressed as normal until complete. So that completes my testing, hoped this helps some people. |
Send message Joined: 30 Dec 13 Posts: 258 Credit: 12,340,341 RAC: 0 ![]() |
So I just noticed something.... Looking at the stderr reports, I notice that priority was wrong. -hp was missing from the commandlines of all the work units. So I don't know how much that is going to affect the results. If anyone copied the commandline I posted to use, make sure you include that -hp as well |
©2023 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.