Stderr Truncations

Message boards : Number crunching : Stderr Truncations
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · Next

AuthorMessage
Profile Jeff BuckSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1339
Credit: 138,461,842
RAC: 216,474
United States
Message 1702278 - Posted: 16 Jul 2015, 19:24:11 UTC - in response to Message 1702270.  

... Notice that I said "modification" rather than "fix", because now that I believe I've actually caught on to what that code change is doing (a light bulb that flashed on about 2 A.M., of course), I'm wondering if this might just be exchanging one type of rare task failure ("instant" Invalid) for another (Error while computing - Error code 32), at least from a S@h perspective. (For MW, it probably really is a fix.)


Lol, when you reach that point, It's actually a pretty unique feeling isn't it ? Kindof relief that something's done, mixed with dissapointment at the particular choice of chewing gum, bits of string, and duct tape to plug the holes.

Fingers crossed this raft never gets used on the open ocean.

[E.T calls up via Arecibo just to ask "What the heck is that thing you're driving? Impressive! Don't build spaceships though!"

Heh, my first reaction when I thought I understood why the truncations really were fixed was quite satisfying, the second reaction, after crawling back into bed, was to wonder whether the cure might simply introduce a different disease, not exactly a sleep-inducing thought.

Now, having just looked at that code block again with my non-C++ trained eyes, I'm wondering whether I've still misinterpreted what will happen if that 5-second grace period expires. Does the Sharing Violation actually end up causing that Error Code 32 to be reported, or does the normal code path simply resume, presumably still producing a truncated stderr.txt? I dunno, some expert is gonna hafta 'splain that to me! ;^)
ID: 1702278 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7485
Credit: 91,082,550
RAC: 2,712
Australia
Message 1702280 - Posted: 16 Jul 2015, 19:34:44 UTC - in response to Message 1702278.  
Last modified: 16 Jul 2015, 19:37:49 UTC

... Notice that I said "modification" rather than "fix", because now that I believe I've actually caught on to what that code change is doing (a light bulb that flashed on about 2 A.M., of course), I'm wondering if this might just be exchanging one type of rare task failure ("instant" Invalid) for another (Error while computing - Error code 32), at least from a S@h perspective. (For MW, it probably really is a fix.)


Lol, when you reach that point, It's actually a pretty unique feeling isn't it ? Kindof relief that something's done, mixed with dissapointment at the particular choice of chewing gum, bits of string, and duct tape to plug the holes.

Fingers crossed this raft never gets used on the open ocean.

[E.T calls up via Arecibo just to ask "What the heck is that thing you're driving? Impressive! Don't build spaceships though!"

Heh, my first reaction when I thought I understood why the truncations really were fixed was quite satisfying, the second reaction, after crawling back into bed, was to wonder whether the cure might simply introduce a different disease, not exactly a sleep-inducing thought.

Now, having just looked at that code block again with my non-C++ trained eyes, I'm wondering whether I've still misinterpreted what will happen if that 5-second grace period expires. Does the Sharing Violation actually end up causing that Error Code 32 to be reported, or does the normal code path simply resume, presumably still producing a truncated stderr.txt? I dunno, some expert is gonna hafta 'splain that to me! ;^)


Boinc committee next meeting, urgent matters: "Oh no! the Windows people have discovered process monitor!"

A lot becomes clearer if you go look at the scheduler authentication code. It makes this part look solid as... well, something really really solid.

[Edit:] sometimes not being trained in something can be an advantage too. It's easier to point out that it doesn't work, lol.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1702280 · Report as offensive
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11570
Credit: 107,498,236
RAC: 62,621
United Kingdom
Message 1702283 - Posted: 16 Jul 2015, 19:43:29 UTC - in response to Message 1702278.  

Now, having just looked at that code block again with my non-C++ trained eyes, I'm wondering whether I've still misinterpreted what will happen if that 5-second grace period expires. Does the Sharing Violation actually end up causing that Error Code 32 to be reported, or does the normal code path simply resume, presumably still producing a truncated stderr.txt? I dunno, some expert is gonna hafta 'splain that to me! ;^)

To my (also untrained) eye, it looks like it simply falls out of the bottom, five seconds later. So it does what it was going to do anyway, but slower.
ID: 1702283 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7485
Credit: 91,082,550
RAC: 2,712
Australia
Message 1702286 - Posted: 16 Jul 2015, 20:02:13 UTC - in response to Message 1702283.  

Now, having just looked at that code block again with my non-C++ trained eyes, I'm wondering whether I've still misinterpreted what will happen if that 5-second grace period expires. Does the Sharing Violation actually end up causing that Error Code 32 to be reported, or does the normal code path simply resume, presumably still producing a truncated stderr.txt? I dunno, some expert is gonna hafta 'splain that to me! ;^)

To my (also untrained) eye, it looks like it simply falls out of the bottom, five seconds later. So it does what it was going to do anyway, but slower.


Will grab a look with as fresh eyeballs as possible, once things settle down. Even a simple 5 second delay that does nothing else can be an eternity to let things stabilise. The programming to the lowest common denominator approach described to me by Rom, doesn't really add a lot of confidence. But at the same time I'm just glad it's being looked at, which is a great start IMO.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1702286 · Report as offensive
Profile Rom Walton (BOINC)
Volunteer tester
Avatar

Send message
Joined: 28 Apr 00
Posts: 579
Credit: 130,733
RAC: 0
United States
Message 1702287 - Posted: 16 Jul 2015, 20:06:43 UTC - in response to Message 1702280.  

... Notice that I said "modification" rather than "fix", because now that I believe I've actually caught on to what that code change is doing (a light bulb that flashed on about 2 A.M., of course), I'm wondering if this might just be exchanging one type of rare task failure ("instant" Invalid) for another (Error while computing - Error code 32), at least from a S@h perspective. (For MW, it probably really is a fix.)


Lol, when you reach that point, It's actually a pretty unique feeling isn't it ? Kindof relief that something's done, mixed with dissapointment at the particular choice of chewing gum, bits of string, and duct tape to plug the holes.

Fingers crossed this raft never gets used on the open ocean.

[E.T calls up via Arecibo just to ask "What the heck is that thing you're driving? Impressive! Don't build spaceships though!"

Heh, my first reaction when I thought I understood why the truncations really were fixed was quite satisfying, the second reaction, after crawling back into bed, was to wonder whether the cure might simply introduce a different disease, not exactly a sleep-inducing thought.

Now, having just looked at that code block again with my non-C++ trained eyes, I'm wondering whether I've still misinterpreted what will happen if that 5-second grace period expires. Does the Sharing Violation actually end up causing that Error Code 32 to be reported, or does the normal code path simply resume, presumably still producing a truncated stderr.txt? I dunno, some expert is gonna hafta 'splain that to me! ;^)


Boinc committee next meeting, urgent matters: "Oh no! the Windows people have discovered process monitor!"

A lot becomes clearer if you go look at the scheduler authentication code. It makes this part look solid as... well, something really really solid.

[Edit:] sometimes not being trained in something can be an advantage too. It's easier to point out that it doesn't work, lol.


Funny, we have been through 10 or so code reviews and security audits in the last ten years by various companies. IBM (in-house, we have to go through a full audit every time they want a new branded client), Intel (through a third party), a bank or two, an oil company, and at least one hospital.

Your gripes appear to be more about aesthetics than how solid/stable something is. Don't confuse the two.
----- Rom
BOINC Development Team, U.C. Berkeley
My Blog
ID: 1702287 · Report as offensive
Profile Jeff BuckSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1339
Credit: 138,461,842
RAC: 216,474
United States
Message 1702289 - Posted: 16 Jul 2015, 20:14:26 UTC - in response to Message 1702283.  

Now, having just looked at that code block again with my non-C++ trained eyes, I'm wondering whether I've still misinterpreted what will happen if that 5-second grace period expires. Does the Sharing Violation actually end up causing that Error Code 32 to be reported, or does the normal code path simply resume, presumably still producing a truncated stderr.txt? I dunno, some expert is gonna hafta 'splain that to me! ;^)

To my (also untrained) eye, it looks like it simply falls out of the bottom, five seconds later. So it does what it was going to do anyway, but slower.

If that's the case, then it will definitely be a significant improvement for S@h, too, even if it is more of a patch than a true fix. And to use Jason's raft analogy ("Fingers crossed this raft never gets used on the open ocean."), you may not want to launch a raft with this sort of patch, but if you're already in the middle of the ocean, it sure will be helpful to patch the pinholes any way you can, until you can get the raft back to shore and build a whole new one!
ID: 1702289 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7485
Credit: 91,082,550
RAC: 2,712
Australia
Message 1702291 - Posted: 16 Jul 2015, 20:15:45 UTC - in response to Message 1702287.  

Funny, we have been through 10 or so code reviews and security audits in the last ten years by various companies. IBM (in-house, we have to go through a full audit every time they want a new branded client), Intel (through a third party), a bank or two, an oil company, and at least one hospital.

Your gripes appear to be more about aesthetics than how solid/stable something is. Don't confuse the two.


haha true :) At that point it would start to become a pointless contest about who's had more security experience than who, if aesthetics are really important at all, costs, and what level of exposure is acceptable, none of which are my decisions. Only thing I know for certain when my employers security audit, they have some pretty stringent aesthetically based rules.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1702291 · Report as offensive
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11570
Credit: 107,498,236
RAC: 62,621
United Kingdom
Message 1702293 - Posted: 16 Jul 2015, 20:21:46 UTC - in response to Message 1702287.  

Boinc committee next meeting, urgent matters: "Oh no! the Windows people have discovered process monitor!"

A lot becomes clearer if you go look at the scheduler authentication code. It makes this part look solid as... well, something really really solid.

[Edit:] sometimes not being trained in something can be an advantage too. It's easier to point out that it doesn't work, lol.

Funny, we have been through 10 or so code reviews and security audits in the last ten years by various companies. IBM (in-house, we have to go through a full audit every time they want a new branded client), Intel (through a third party), a bank or two, an oil company, and at least one hospital.

Your gripes appear to be more about aesthetics than how solid/stable something is. Don't confuse the two.

I hope I'm not doing that. I don't know what the brief for a 'code review' would be, but a 'security audit' is presumably focused on not causing damage - not interfering with the working of the host machine, not leaking data to third parties, that sort of thing. All vitally important to the reputation of scientific researchers who submit papers on the basis of research data passed through the BOINC infrastructure. And from that, the reputation of BOINC itself.

But as we've seen in recent days, those checks haven't prevented deficiencies in functionality. We've seen wastage (tasks abandoned for unknown reasons, while hosts continue to burn electricity processing them), and at Milkyway I've seen science just thrown away, when these 'validate errors' accumulate to take a workunit beyond the maximum error count.

So maybe there's a case for a different kind of audit, for conformity to deign schema?
ID: 1702293 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7485
Credit: 91,082,550
RAC: 2,712
Australia
Message 1702295 - Posted: 16 Jul 2015, 20:33:05 UTC - in response to Message 1702293.  

So maybe there's a case for a different kind of audit, for conformity to deign schema?


Don't know about the companies Rom listed, but here a security audit typically involves assessing exposure to risks, and minimising the visible footprint.

There's quite valid points Rom appears to be making about what it looks like being less important from an external threat perspective, however the aesthetics & engineering principles come more into play when you consider risks/vulnerabilities of a more internal nature, which can include having to change and the risks/costs of doing so having uninteded consequences.

Obviously the need for change hasn't come up a lot in certain areas, so I imagine those audits would regard the maintainability of that particular code as low priority. Different companies/projects, different needs and risks.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1702295 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 350
Credit: 1,004,404
RAC: 1,744
Finland
Message 1702304 - Posted: 16 Jul 2015, 20:49:46 UTC - in response to Message 1701958.  

I mean, the kernel ought to know when it has closed all files and flushed buffers, right?


Only problem there, is they decided (for unknown reasons) to use an asynchronous TerminateProcess() call and put a 1 second sleep and hard crash, instead of waiting on a synchronisation primitive. No idea why the entire codebase seems to be allergic to synchronisation, and likes to use magic numbers (fixed time intervals) on a non-realtime OS.

So the exit code can indeed appear before the app [especially at low or idle priority] has really finished exiting. No idea why they chose the ugliest possible implementation.


Guess I need to explain.

What I expected is that when the kernel gets a request to terminate process, whether the request comes though ExitProcess or TerminateProcess or some other way, the kernel will flush buffers, close files and other handles and releases any memory the process is using and whatever else that's part of the cleanup. And when the only thing that's left of the process is the process object then the kernel would update the exit code.

Now, since you like so much to mock everything BOINC devs do, why don't you tell us how you would do all this?
ID: 1702304 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7485
Credit: 91,082,550
RAC: 2,712
Australia
Message 1702312 - Posted: 16 Jul 2015, 21:07:25 UTC - in response to Message 1702304.  
Last modified: 16 Jul 2015, 21:28:12 UTC

Now, since you like so much to mock everything BOINC devs do, why don't you tell us how you would do all this?


Actually those suggestions, to actually commit the files involved were apparently taken on board, and a patch applied in another way, and as I said I'm happy it's being looked at. That I would provide callback points for more flexibility with changing technology, some plugin-ness, has been mentioned.

That my descriptions come across as 'mocking everything they do" is unfortunate, but the result of a long road of frustration. It appears that annoying and rocking a few boats gets people talking, looking, debating, analysing, fixing etc. Not something that I used to be equipped for, and doesn't necessarily sit right with me either, but perhaps not liking or agreeing with anything I say was a part of it all. After all, AFAIK in Berkeley's history there's quite a bit of that sortof thing.

[Edit:] for the important technical points you make, and questions you raise, the core issues relate to that multithreaded C-Runtimes became standard circa 2005 in the case of Windows, so it requires a mindset shift from sequential/procedural to parallel and out-of order operation. That's proven over time to be a lot tougher than most I know expected, and for me too. Non-deterministic behaviour is a pretty big red flag for this kindof thing too.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1702312 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 350
Credit: 1,004,404
RAC: 1,744
Finland
Message 1702330 - Posted: 16 Jul 2015, 22:19:00 UTC - in response to Message 1702312.  

Now, since you like so much to mock everything BOINC devs do, why don't you tell us how you would do all this?


Actually those suggestions, to actually commit the files involved were apparently taken on board, and a patch applied in another way, and as I said I'm happy it's being looked at. That I would provide callback points for more flexibility with changing technology, some plugin-ness, has been mentioned.


I don't see the commit mode change actually fixing the problem. The writes have been delayed somewhere by something for some reason. Now all that is changed is that the app is forced to wait until the writes hit the disk. The fact that the copy that's sitting in filesystem cache is up to date when the client goes to read it is just a fortunate side-effect of the change.

And I don't think the commit mode change makes a difference for wrapped apps for that matter.

So how does the client, or the app, tell when it's safe to read the files without simply trying and trying again? I can't see callbacks or anything like that helping. I'm just trying to get from 'ok this works sort of' to how it's really done right (if it can).

That my descriptions come across as 'mocking everything they do" is unfortunate, but the result of a long road of frustration.


My issue with your criticism for BOINC is that for the past few weeks it's been excessive and exaggerating and quite a few other adjectives. It's easy to read your criticism as if you are saying that BOINC is the worst piece of software ever written and the devs are the worst devs to ever live. BOINC just isn't that bad.
ID: 1702330 · Report as offensive
Profile Rom Walton (BOINC)
Volunteer tester
Avatar

Send message
Joined: 28 Apr 00
Posts: 579
Credit: 130,733
RAC: 0
United States
Message 1702334 - Posted: 16 Jul 2015, 22:31:59 UTC - in response to Message 1702312.  

[Edit:] for the important technical points you make, and questions you raise, the core issues relate to that multithreaded C-Runtimes became standard circa 2005 in the case of Windows, so it requires a mindset shift from sequential/procedural to parallel and out-of order operation. That's proven over time to be a lot tougher than most I know expected, and for me too. Non-deterministic behaviour is a pretty big red flag for this kindof thing too.


Back even further than that. 1992 (Windows NT 3.1 October Beta) is when I had to hunker down and learn the basics of processes, threads, and thread sync mechanisms. Prior experience to that was just Windows 3.1 (16-bit preemptive tasking).

IIRC, the Microsoft CRT hadn't even been developed yet. It would be a year or two later, when vendors didn't jump on the NT bandwagon fast enough complaining about difficulties in porting their software to Windows NT.

Anyways, the difficulties are the primary reason why BOINC is not already multi-threaded. At this point it would be more trouble than it is worth. BOINC itself doesn't use much CPU time and, for the most part, isn't time sensitive in that it doesn't require millisecond response times. So going multi-threaded just adds complexity and debugging headaches. More so for platforms other than Windows.
----- Rom
BOINC Development Team, U.C. Berkeley
My Blog
ID: 1702334 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7485
Credit: 91,082,550
RAC: 2,712
Australia
Message 1702336 - Posted: 16 Jul 2015, 22:40:09 UTC - in response to Message 1702330.  

Now, since you like so much to mock everything BOINC devs do, why don't you tell us how you would do all this?


Actually those suggestions, to actually commit the files involved were apparently taken on board, and a patch applied in another way, and as I said I'm happy it's being looked at. That I would provide callback points for more flexibility with changing technology, some plugin-ness, has been mentioned.


I don't see the commit mode change actually fixing the problem. The writes have been delayed somewhere by something for some reason. Now all that is changed is that the app is forced to wait until the writes hit the disk. The fact that the copy that's sitting in filesystem cache is up to date when the client goes to read it is just a fortunate side-effect of the change.

And I don't think the commit mode change makes a difference for wrapped apps for that matter.

So how does the client, or the app, tell when it's safe to read the files without simply trying and trying again? I can't see callbacks or anything like that helping. I'm just trying to get from 'ok this works sort of' to how it's really done right (if it can).

That my descriptions come across as 'mocking everything they do" is unfortunate, but the result of a long road of frustration.


My issue with your criticism for BOINC is that for the past few weeks it's been excessive and exaggerating and quite a few other adjectives. It's easy to read your criticism as if you are saying that BOINC is the worst piece of software ever written and the devs are the worst devs to ever live. BOINC just isn't that bad.


No commit mode is indeed only a workaround. How the Boinc Devs decide to do it better is completely up to them.

Thanks for the criticism, and I'll certainly take it on board. Whether I choose to conform or not from here on will certainly be influenced by the sensible discussions I've now had with Rom, which have been far more constructive than the past 7 years of wishful thinking, and watching not knowing how to make the (very real) problems recognised. It took Milkyway validation failure to do that.

Was it over the top ? Should I change? Probably. At the same time I am now much better equipped to say that BOINC devs are not the worst I've met, and Boinc is slightly better in a small way. I don't need them to like me, nor do things my way, nor even read anything I post/submit, but if putting some noses out of joint did any more to stir any thought at all this time around, then I can live with that.

You're as entitled to not like my behaviour just as much as I am sick to the stomach of saying the things I felt I needed to. That you don't agree with them, or that they needed saying is good too, along with pointing out we aren't the same.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1702336 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7485
Credit: 91,082,550
RAC: 2,712
Australia
Message 1702339 - Posted: 16 Jul 2015, 22:48:02 UTC - in response to Message 1702334.  

[Edit:] for the important technical points you make, and questions you raise, the core issues relate to that multithreaded C-Runtimes became standard circa 2005 in the case of Windows, so it requires a mindset shift from sequential/procedural to parallel and out-of order operation. That's proven over time to be a lot tougher than most I know expected, and for me too. Non-deterministic behaviour is a pretty big red flag for this kindof thing too.


Back even further than that. 1992 (Windows NT 3.1 October Beta) is when I had to hunker down and learn the basics of processes, threads, and thread sync mechanisms. Prior experience to that was just Windows 3.1 (16-bit preemptive tasking).

IIRC, the Microsoft CRT hadn't even been developed yet. It would be a year or two later, when vendors didn't jump on the NT bandwagon fast enough complaining about difficulties in porting their software to Windows NT.

Anyways, the difficulties are the primary reason why BOINC is not already multi-threaded. At this point it would be more trouble than it is worth. BOINC itself doesn't use much CPU time and, for the most part, isn't time sensitive in that it doesn't require millisecond response times. So going multi-threaded just adds complexity and debugging headaches. More so for platforms other than Windows.


Yes, tough road. There'll be a few more hurdles with that legacy, but am very grateful for the discussion and explanations. Thanks for taking the technical approach, and I'm sorry I felt I've had to kick up a royal stink of late. despite criticisms it isn't something that came natural, and I hope I find a better way, even though I'm not convinced returning to a totally conformist attitude is going to be the answer either.

Thanks again,
Jason
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1702339 · Report as offensive
Profile Rom Walton (BOINC)
Volunteer tester
Avatar

Send message
Joined: 28 Apr 00
Posts: 579
Credit: 130,733
RAC: 0
United States
Message 1702345 - Posted: 16 Jul 2015, 23:42:31 UTC - in response to Message 1702330.  
Last modified: 16 Jul 2015, 23:45:24 UTC

I don't see the commit mode change actually fixing the problem. The writes have been delayed somewhere by something for some reason.


I have a hypothesis on this, but I don't have a way to prove or disprove it yet.

Suppose that when an app calls cuInit() to initialize the CUDA/OpenCL library it passes the current stderr/stdout handles to the CUDA kernel code so that fatal compiler errors can be trapped/written to a file for the calling app.

During this process they duplicate and internalize the handle thereby causing it to increase its ref count.

Normally the CUDA library assumes it can clean things up during the dllmain unload event, but because boinc_exit() calls TerminateProcess() the event is never fired.

The kernel decrements the ref count of the handle, after TerminateProcess() is called and the process is cleaned up, but doesn't close it down because its ref count is still greater than 1.

It isn't until the CUDA kernel driver has attempted to do something that it discovers that a handle it holds is no longer valid and cleans things up on its end thereby releasing the write lock on stderr.txt.

The CUDA library doesn't really provide a clean-up routine you are supposed to call after you are done, so there isn't a way to test this.

We would need to talk to somebody at Nvidia to find out what underlying assumption the CUDA library is making with regards to cleaning up on application shutdown to know what is really going on.
----- Rom
BOINC Development Team, U.C. Berkeley
My Blog
ID: 1702345 · Report as offensive
Profile Jeff BuckSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1339
Credit: 138,461,842
RAC: 216,474
United States
Message 1702348 - Posted: 16 Jul 2015, 23:51:08 UTC - in response to Message 1702345.  
Last modified: 17 Jul 2015, 0:02:51 UTC

If I could just interject one thing (without understanding much other than CUDA in that post), it would be that truncated Stderr is not unique to the NVIDIA GPUs. It happens on ATI cards and CPUs as well.

EDIT: See ancient Message 1469381 in "Strange Invalid MB Overflow tasks with truncated Stderr outputs..." for an example and discussion. Also in other messages in that thread.
ID: 1702348 · Report as offensive
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11570
Credit: 107,498,236
RAC: 62,621
United Kingdom
Message 1702350 - Posted: 16 Jul 2015, 23:58:07 UTC - in response to Message 1702345.  

And also that in the Milkyway case, it's the OpenCL component of the NVidia driver/runtime suite which is active.
ID: 1702350 · Report as offensive
Profile Rom Walton (BOINC)
Volunteer tester
Avatar

Send message
Joined: 28 Apr 00
Posts: 579
Credit: 130,733
RAC: 0
United States
Message 1702352 - Posted: 17 Jul 2015, 0:02:43 UTC - in response to Message 1702348.  

If I could just interject one thing (without understanding much other than CUDA in that post), it would be that truncated Stderr is not unique to the NVIDIA GPUs. It happens on ATI cards and CPUs as well.


Okay, that blows that theory out of the water.
----- Rom
BOINC Development Team, U.C. Berkeley
My Blog
ID: 1702352 · Report as offensive
Profile Rom Walton (BOINC)
Volunteer tester
Avatar

Send message
Joined: 28 Apr 00
Posts: 579
Credit: 130,733
RAC: 0
United States
Message 1702353 - Posted: 17 Jul 2015, 0:05:03 UTC - in response to Message 1702350.  

And also that in the Milkyway case, it's the OpenCL component of the NVidia driver/runtime suite which is active.


True, but I suspect that the OpenCL compiler just converts OpenCL code into CUDA instructions.
----- Rom
BOINC Development Team, U.C. Berkeley
My Blog
ID: 1702353 · Report as offensive
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · Next

Message boards : Number crunching : Stderr Truncations


 
©2017 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.