Errors with Fermi

Message boards : SETI@home Enhanced : Errors with Fermi
Message board moderation

To post messages, you must log in.

AuthorMessage
Todd Hebert
Volunteer tester

Send message
Joined: 16 Apr 10
Posts: 2
Credit: 3,823
RAC: 0
United States
Message 39132 - Posted: 16 Apr 2010, 16:25:48 UTC

Good day, I am having a crash of each WU that is assigned to any of my three GTX-480's.
I am running the .43 client on Win Server 2008 x64 (Not R2) with the WHQL drivers for this card. The cards are not overclock and the system has performed exactly as it should until the cards were swapped out. The app crashes within seconds of the task being started.

I have tried with one, two and all three cards installed with the same behavior. None of these cards are running in SLI and no bridge is installed. Fans are running at max to maintain proper cooling with these high energy cards.

The system configuration is an Intel Skulltrail DX5400XS motherboard, Intel Dual Quad X5470's and SATA Raid 0.

Any help would be appreciated.
Thanks!
ID: 39132 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 29 May 06
Posts: 1037
Credit: 8,440,339
RAC: 21
United Kingdom
Message 39133 - Posted: 16 Apr 2010, 16:51:48 UTC - in response to Message 39132.  

You could try Boinc 6.10.45, it has a couple of fixes for Fermi GPU's, unlikely to be the solution through,

- client: NVIDIA peak FLOPS estimate was wrong for Fermi (32 cores, not 8).

- client: Fermi compute capability is 2, not 3.

The only other host i've seen here with 480 GTX's is this one: hostid=45362

It's running 256.60 drivers!, Boinc 6.10.18 and is successfully completing fermi wu's, Cuda23 wu's were another matter through,

Claggy
ID: 39133 · Report as offensive
Todd Hebert
Volunteer tester

Send message
Joined: 16 Apr 10
Posts: 2
Credit: 3,823
RAC: 0
United States
Message 39141 - Posted: 17 Apr 2010, 3:59:15 UTC - in response to Message 39133.  

Thanks for the information! I know these are totally new cards so it will take a period to get everything ironed out.
ID: 39141 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,266,428
RAC: 0
United Kingdom
Message 39220 - Posted: 27 Apr 2010, 22:23:11 UTC

Anybody got the Fermi application to download and run yet?

I can see an application at

http://boinc2.ssl.berkeley.edu/beta/download/setiathome_6.09_windows_intelx86__cuda_fermi.exe

but it has dependencies on

LIBFFTW3F-3-1-1A_UPX.DLL
CUDART32_30_14.DLL
CUFFT32_30_14.DLL
NVCUDA.DLL

- and none of those three CUDA DLLs are in the download directory.

It would be helpful if someone could post the <app_version> data for <plan_class> fermi.
ID: 39220 · Report as offensive
ftpd
Volunteer tester

Send message
Joined: 12 May 10
Posts: 4
Credit: 24,104
RAC: 0
Netherlands
Message 39303 - Posted: 12 May 2010, 9:45:39 UTC - in response to Message 39132.  
Last modified: 12 May 2010, 9:48:04 UTC

Todd,

I am new on this site (from today).
I also have a gtx470 and gtx480.

I have read somewhere that nvidia-drivers for these cards will not support
any windows server computers.

Perhaps that is the problem!

Good luck.

Just downloaded Astropulse 5.05 and not the fermi appl.!!

Ton (ftpd) Netherlands
Ton (ftpd) Netherlands
ID: 39303 · Report as offensive
Numo
Volunteer tester

Send message
Joined: 12 May 10
Posts: 1
Credit: 0
RAC: 0
Message 39315 - Posted: 12 May 2010, 19:09:16 UTC

Ah... Finally found the right place. I am new here also.

My system: Windows 7 pro 64-bit
nvidia gtx480
amd Phenom II x4

What I get is the WU run to 100% in something like a second or 2 and moves on to the next one.

Also I noticed that BOINC reports the FLOP count on my card very low considering what it should be.
ID: 39315 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 29 May 06
Posts: 1037
Credit: 8,440,339
RAC: 21
United Kingdom
Message 39316 - Posted: 12 May 2010, 20:19:51 UTC - in response to Message 39315.  

Also I noticed that BOINC reports the FLOP count on my card very low considering what it should be.


See the 2nd post in this thread.

Claggy
ID: 39316 · Report as offensive
ftpd
Volunteer tester

Send message
Joined: 12 May 10
Posts: 4
Credit: 24,104
RAC: 0
Netherlands
Message 39345 - Posted: 14 May 2010, 8:32:46 UTC - in response to Message 39315.  

Numo,

The fermi-cards (gtx470/480) does not work for seti at this moment.
Also windows-7 is slow with this cards.
Upgrade your boinc-manager to 06.10.55 and the flop-count is better, but not always correct.
You can use your card for gpugrid or collatz. These applications are working OK.
Ton (ftpd) Netherlands
ID: 39345 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,266,428
RAC: 0
United Kingdom
Message 39386 - Posted: 15 May 2010, 12:49:00 UTC - in response to Message 39220.  

....
It would be helpful if someone could post the <app_version> data for <plan_class> fermi.

Now I know how much it costs to get a question answered round here. (£239.99 + VAT, if you're asking)

<app_version>
<app_name>setiathome_enhanced</app_name>
<version_num>609</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.800592</avg_ncpus>
<max_ncpus>0.800592</max_ncpus>
<flops>142429261514.939360</flops>
<plan_class>cuda_fermi</plan_class>
<api_version>6.3.22</api_version>
<file_ref>
<file_name>setiathome_6.09_windows_intelx86__cuda_fermi.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-1-1a_upx.dll</file_name>
<open_name>libfftw3f-3-1-1a_upx.dll</open_name>
</file_ref>
<file_ref>
<file_name>seti_609.jpg</file_name>
<open_name>seti_logo</open_name>
</file_ref>
<file_ref>
<file_name>setiathome-6.09_cuda_AUTHORS</file_name>
<open_name>setiathome-6.09_cuda_AUTHORS</open_name>
</file_ref>
<file_ref>
<file_name>setiathome-6.09_cuda_COPYING</file_name>
<open_name>setiathome-6.09_cuda_COPYING</open_name>
</file_ref>
<file_ref>
<file_name>setiathome-6.09_cuda_COPYRIGHT</file_name>
<open_name>setiathome-6.09_cuda_COPYRIGHT</open_name>
</file_ref>
<file_ref>
<file_name>setiathome-6.09_cuda_README</file_name>
<open_name>setiathome-6.09_cuda_README</open_name>
</file_ref>
<file_ref>
<file_name>cudart32_30_13.dll</file_name>
<open_name>cudart.dll</open_name>

<copy_file/>
</file_ref>
<file_ref>
<file_name>cufft32_30_13.dll</file_name>
<open_name>cufft.dll</open_name>

<copy_file/>
</file_ref>
<coproc>
<type>CUDA</type>
<count>1.000000</count>
</coproc>
<gpu_ram>209715200.000000</gpu_ram>
</app_version>

That's never going to work - the dependencies are wrong. I confidently predict that task 7936847 will crash out with exactly the same error as Michael Malis reported this morning - I'll let it happen (apologies to wingmates) so we have a smoking gun for Eric.

No point in testing Fermi any more until I can raise David or Eric and get those DLLs swapped over. Once that's done, we should be good to go - the app itself is OK (-ish: still lousy at VLAR, and look at the CPU demand), and I've had a validation through on the main project this morning (with app_info).
ID: 39386 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 29 May 06
Posts: 1037
Credit: 8,440,339
RAC: 21
United Kingdom
Message 39389 - Posted: 15 May 2010, 13:51:02 UTC - in response to Message 39386.  

....
It would be helpful if someone could post the <app_version> data for <plan_class> fermi.

Now I know how much it costs to get a question answered round here. (£239.99 + VAT, if you're asking)
Tell me about it, so there was someone to test Raistmer's SSE only r280 Hybrid app, i spent £51.35 + VAT on a HD4650,
then to make my XP3200 host reliable i spent £78.77 (+VAT) on a 750Watt PSU and £26.98 (+VAT) on new Memory, when all i needed to do was clock the memory down from 200MHz to 166MHz, :(
then another month trying to get Windows to work with the HD4650, before i cound tell him it wasn't quite SSE only.

Claggy
ID: 39389 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 29 May 06
Posts: 1037
Credit: 8,440,339
RAC: 21
United Kingdom
Message 39395 - Posted: 15 May 2010, 15:22:46 UTC - in response to Message 39386.  

That's never going to work - the dependencies are wrong. I confidently predict that task 7936847 will crash out with exactly the same error as Michael Malis reported this morning - I'll let it happen (apologies to wingmates) so we have a smoking gun for Eric.

No point in testing Fermi any more until I can raise David or Eric and get those DLLs swapped over. Once that's done, we should be good to go - the app itself is OK (-ish: still lousy at VLAR, and look at the CPU demand), and I've had a validation through on the main project this morning (with app_info).

Eh?, task 7936847 has completed, and Validated:

<core_client_version>6.10.55</core_client_version>
<![CDATA[
<stderr_txt>
setiathome_CUDA: Found 1 CUDA device(s):
Device 1 : GeForce GTX 470
totalGlobalMem = 1309081600
sharedMemPerBlock = 49152
regsPerBlock = 32768
warpSize = 32
memPitch = 2147483647
maxThreadsPerBlock = 1024
clockRate = 810000
totalConstMem = 65536
major = 2
minor = 0
textureAlignment = 512
deviceOverlap = 1
multiProcessorCount = 14
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce GTX 470 is okay
SETI@home using CUDA accelerated device GeForce GTX 470
setiathome_enhanced 6.09 Visual Studio/Microsoft C++
libboinc: 6.3.22

Work Unit Info:
...............
WU true angle range is : 0.448183
Optimal function choices:
-----------------------------------------------------
name
-----------------------------------------------------
v_BaseLineSmooth (no other)
v_GetPowerSpectrum 0.00030 0.00000
v_ChirpData 0.03024 0.00000
v_Transpose4 0.00908 0.00000
FPU opt folding 0.00541 0.00000

Flopcounter: 42133406360370.508000

Spike count: 5
Pulse count: 0
Triplet count: 4
Gaussian count: 0
called boinc_finish

</stderr_txt>

Claggy
ID: 39395 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,266,428
RAC: 0
United Kingdom
Message 39396 - Posted: 15 May 2010, 15:25:22 UTC - in response to Message 39386.  

I confidently predict that task 7936847 will crash out with exactly the same error as Michael Malis reported this morning - I'll let it happen (apologies to wingmates) so we have a smoking gun for Eric.

Well, there are occasions when it's a pleasure to be proved wrong, and this is one of them. Apology to wingmates hereby withdrawn!

Process Explorer shows what happened.



I had downloaded and installed cudatoolkit_3.0_win_32.exe - that's where I got the DLLs for my app_infos from. I just left the installation directory tree alone after it had unpacked itself, and it must have registed the location where the DLLs were available for use. If anyone else wants to try their Fermi card before the project can get round to fixing the plan_class, it might work for you too. [Warning: 52 MB download from NVidia]

And the proof of the pudding - WU 2442876 validated against stock v6.03 - very near a 100x difference in run times!
ID: 39396 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,266,428
RAC: 0
United Kingdom
Message 39430 - Posted: 19 May 2010, 23:57:19 UTC

David Anderson has re-installed the Fermi app as v6.10, this time with the correct DLLs. It all seems to be working properly - my 470 is spitting through shorties in under 200 seconds, I hate to think what a 480 would do!

If a few of the other testers who attached Fermis last time could allow new work, and verify that it works for them too, we should be able to get this transferred to the main project quickly - then everyone will benefit, not just the few prepared to fiddle around with a private drop.
ID: 39430 · Report as offensive
ftpd
Volunteer tester

Send message
Joined: 12 May 10
Posts: 4
Credit: 24,104
RAC: 0
Netherlands
Message 39434 - Posted: 20 May 2010, 8:12:04 UTC - in response to Message 39430.  
Last modified: 20 May 2010, 8:13:36 UTC

Richard,

Just downloaded 50 WU for fermi-gtx480.
Time between 1 min 50 secs and 2 min 04 secs.
Processor time = between 20 and 35 secs.
The machine is doing also 8 other cpu boinc-jobs at the same time.

Ton (ftpd) Netherlands
Ton (ftpd) Netherlands
ID: 39434 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,266,428
RAC: 0
United Kingdom
Message 39435 - Posted: 20 May 2010, 12:15:28 UTC
Last modified: 20 May 2010, 12:15:54 UTC

Something screwy happened this morning. My Fermi was happily downloading away, getting new tasks assigned to plan_class cuda_fermi and app_version 6.10, when this happened:

20/05/2010 10:43:56 SETI@home Beta Test Sending scheduler request: To fetch work.
20/05/2010 10:43:56 SETI@home Beta Test Requesting new tasks for GPU
20/05/2010 10:44:01 SETI@home Beta Test Scheduler request completed: got 11 new tasks
<snip v6.10 tasks>
20/05/2010 10:44:12 SETI@home Beta Test Sending scheduler request: To fetch work.
20/05/2010 10:44:12 SETI@home Beta Test Requesting new tasks for GPU
20/05/2010 10:44:16 SETI@home Beta Test Scheduler request completed: got 1 new tasks
20/05/2010 10:44:41 SETI@home Beta Test Started download of setiathome_6.09_windows_intelx86__cuda23.exe
20/05/2010 10:45:10 SETI@home Beta Test Started download of cudart_23_win32.dll
20/05/2010 10:45:14 SETI@home Beta Test Started download of cufft_23_win32.dll

That looks like a server (scheduler) mistake to me (remember we're still running with experimental BOINC server code here): I'll report it, but you may wish to check your caches.
ID: 39435 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,266,428
RAC: 0
United Kingdom
Message 39436 - Posted: 20 May 2010, 18:38:18 UTC

Heads up, folks

According to David, I had downloaded a daily quota of 100 tasks for cuda_fermi. That's about 3.5 hours crunching with this morning's all-shorty tape.

Once my quota was exhausted, the cuda23 tasks were sent in substitution. THIS DOES NOT WORK - as we already knew, the older apps give the pseudo -9 overflow outcome on a Fermi. No use to anyone, and a big drain on the project's resources.

Please manage your caches manually until something is worked out: set No New Tasks as soon as (or preferably before) you see cuda23 work being downloaded.

NB This applies to the experimental server code on this Beta site only: the Main site does not (yet) have separate quotas for the different app_versions.
ID: 39436 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,266,428
RAC: 0
United Kingdom
Message 39453 - Posted: 25 May 2010, 14:39:57 UTC

FYI

The quota-limiting code on this Beta server has been fixed: for the last two days, my GTX 470 has stopped downloading cuda_fermi work ("reached daily quota"), and not gone on to download the incompatible cuda23 or plain cuda work. So it's now safe to leave Fermi cards attached without manual cache manipulation.

The daily quota has been set at 100, and with these 'shorty' VHAR tasks, I get through that number in less than six hours. Doesn't matter: this is a Beta site, and it's better to test with a variety of hosts and platforms - if one host hoovered up every available WU, testing wouldn't be as thorough!
ID: 39453 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,266,428
RAC: 0
United Kingdom
Message 39454 - Posted: 25 May 2010, 14:47:30 UTC

Does anyone have any idea why some illiterate Mod moved this thread from the setiathome_enhanced message board where it started and belongs, and dumped it on Astropulse? The only links with Astropulse are the letters 'AP' in the title.

I'm not the thread originator, so I didn't get any 'thread move' moderation notice: so I can't take it up with the perpetrator personally. Maybe Todd could look in his inbox.

I did follow approved practice, and reported the problem to the Mods via the 'Red-X' a couple of hours after it happened. That was four days ago: fat lot of good it's done.
ID: 39454 · Report as offensive
Profile Raistmer
Volunteer tester
Avatar

Send message
Joined: 18 Aug 05
Posts: 2423
Credit: 15,878,738
RAC: 0
Russia
Message 39455 - Posted: 26 May 2010, 7:09:03 UTC

Yeah, it's strange move. Moreover I would remove "AP" from thread title, it's misleading.

About daily quota of 100 tasks:
It would be good if Beta will test some new quota-related approaches. Limiting fast hardware with 100 tasks looks counterproductive (though with low number of hosts on Beta it still has sense indeed).

News about SETI opt app releases: https://twitter.com/Raistmer
ID: 39455 · Report as offensive

Message boards : SETI@home Enhanced : Errors with Fermi


 
©2019 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.