========================================================================= Path: yktvmv!jbs Subject: Re: Computer of the century From: jbs@watson.ibm.com Organization: IBM Newsgroups: alt.folklore.computers,comp.arch,comp.sys.unisys Message-ID: <20000113.152211.159@yktvmv.watson.ibm.com> Date: Thu, 13 Jan 2000 20:22:11 GMT References: <87wvpl45jk.fsf@think.mihalis.net> <85j9da$lhn@nnrp3.farm.idt.net> <85kpe8$14na$1@ausnews.austin.ibm.com> <85l9ac$cl0$1@nntp3.atl.mindspring.net> In article <85l9ac$cl0$1@nntp3.atl.mindspring.net>, on 13 Jan 2000 19:33:32 GMT, sjc@netcom.com (Steven Correll) writes: >Yet a third reason in DoD's mind was software reliability. At the time, there >was a belief in the software engineering community that none of the languages >which were popular among DoD contractors at the time performed enough >compile-time and run-time checking to ensure reliable software, and Ada was >designed from the ground up for that purpose. Note it was Ada's run-time checking which caused the Ariane rocket failure. James B. Shearer ========================================================================= Path: yktvmv!jbs Subject: Re: Computer of the century From: jbs@watson.ibm.com Organization: IBM Newsgroups: alt.folklore.computers,comp.arch,comp.sys.unisys Message-ID: <20000113.221100.622@yktvmv.watson.ibm.com> Date: Fri, 14 Jan 2000 03:11:00 GMT References: <87wvpl45jk.fsf@think.mihalis.net> <85j9da$lhn@nnrp3.farm.idt.net> <85kpe8$14na$1@ausnews.austin.ibm.com> <85l9ac$cl0$1@nntp3.atl.mindspring.net> <20000113.152211.159@yktvmv.watson.ibm.com> <87ya9tfke7.fsf@think.mihalis.net> In article <87ya9tfke7.fsf@think.mihalis.net>, on 13 Jan 2000 21:37:52 -0500, Chris Morgan writes: >jbs@watson.ibm.com writes: > >> Note it was Ada's run-time checking which caused the Ariane >> rocket failure. > >Your implication is totally wrong. Go read the report. The failure was >caused by incorrect software reuse. The software which shut down >operated as designed for Ariane 4. It was designed to shut down if a >certain measurement exceeded some design parameter which was outside >the acceptable range, 'cept it wasn't shut off after it had finished >doing its job (before takeoff), and on the Ariane 5 after takeoff the >parameter did exceed the range, so the software did what it was >designed to do. > >The analogy with C would be some value was guaranteed not to cause >overflow, so overflow was not protected, but because of incorrect >software reuse the code happily overflowed the hardware resolution and >pointed the rocket at a random bit of the ground - same result, >different mechanism. I have read the report. Your account is wrong. The overflow was harmless and the rocket would not have failed if the overflow had not been detected. It is true that the design was very poor but it would not have mattered without the run time error checking. James B. Shearer ========================================================================= Path: yktvmv!jbs Subject: Re: Ariane 5 rocket failure (was: Computer of the century) From: jbs@watson.ibm.com Organization: IBM Newsgroups: alt.folklore.computers,comp.arch,comp.sys.unisys Message-ID: <20000113.221939.402@yktvmv.watson.ibm.com> Date: Fri, 14 Jan 2000 03:19:39 GMT References: <20000113.152211.159@yktvmv.watson.ibm.com> <2000Jan13.205451.1@eisner> In article <2000Jan13.205451.1@eisner>, on Fri, 14 Jan 2000 01:54:51 GMT, kilgallen@eisner.decus.org (Larry Kilgallen) writes: >In article <20000113.152211.159@yktvmv.watson.ibm.com>, jbs@watson.ibm.com writes: > >> Note it was Ada's run-time checking which caused the Ariane >> rocket failure. > >No, it was a management decision to use the Ariane 4 software in >the Ariane 5 rocket without engaging in design review. Ariane 5 >was more powerful -- powerful enough to overflow the acceleration >field in the machine. No language eliminates the need for design. Nevertheless, without run time checking there would not have been a failure. James B. Shearer PS: The design was rotten even in the context of Ariane 4. ========================================================================= Path: yktvmv!jbs Subject: Re: Computer of the century From: jbs@watson.ibm.com Organization: IBM Newsgroups: alt.folklore.computers,comp.arch,comp.sys.unisys Message-ID: <20000114.190116.061@yktvmv.watson.ibm.com> Date: Sat, 15 Jan 2000 00:01:16 GMT References: <87wvpl45jk.fsf@think.mihalis.net> <85j9da$lhn@nnrp3.farm.idt.net> <85kpe8$14na$1@ausnews.austin.ibm.com> <85l9ac$cl0$1@nntp3.atl.mindspring.net> <20000113.152211.159@yktvmv.watson.ibm.com> <87ya9tfke7.fsf@think.mihalis.net> <20000113.221100.622@yktvmv.watson.ibm.com> <87so00g7lq.fsf@think.mihalis.net> In article <87so00g7lq.fsf@think.mihalis.net>, on 14 Jan 2000 07:28:49 -0500, Chris Morgan writes: >jbs@watson.ibm.com writes: > >> I have read the report. Your account is wrong. The >> overflow was harmless and the rocket would not have failed if the >> overflow had not been detected. > >The software was required to detect the overflow and shutdown. In C >you would have had to code the check explicitly. Or alternatively in >Ada you can always suppress these checks either by chosing different >numerical types or using pragmas or compiler switches. You are inventing this requirement. The overflow occurred in the conversion of a 64 bit real to a 16 bit integer. There was no requirement to detect an overflow here, similar conversions were "protected" to prevent overflows. Protection was omitted in this case to save time. The programmers needed to save time to meet the real time requirements. Of course with a more efficient language without all of Ada's runtime checking there might have been no need to cut corners so you can blame this on Ada also. James B. Shearer ========================================================================= Path: yktvmv!jbs Subject: Re: Computer of the century From: jbs@watson.ibm.com Organization: IBM Newsgroups: alt.folklore.computers,comp.arch,comp.sys.unisys Message-ID: <20000114.192140.591@yktvmv.watson.ibm.com> Date: Sat, 15 Jan 2000 00:21:40 GMT References: <87wvpl45jk.fsf@think.mihalis.net> <85j9da$lhn@nnrp3.farm.idt.net> <85kpe8$14na$1@ausnews.austin.ibm.com> <85l9ac$cl0$1@nntp3.atl.mindspring.net> <20000113.152211.159@yktvmv.watson.ibm.com> <87ya9tfke7.fsf@think.mihalis.net> <20000113.221100.622@yktvmv.watson.ibm.com> <%2Nf4.479$dw3.17512@news.wenet.net> In article <%2Nf4.479$dw3.17512@news.wenet.net>, on Fri, 14 Jan 2000 14:05:27 -0800, "Mike Silva" writes: > >jbs@watson.ibm.com wrote in message ><20000113.221100.622@yktvmv.watson.ibm.com>... > >>>> Note it was Ada's run-time checking which caused the Ariane >>>> rocket failure. > > >> I have read the report. Your account is wrong. The >>overflow was harmless and the rocket would not have failed if the >>overflow had not been detected. It is true that the design was >>very poor but it would not have mattered without the run time >>error checking. >> James B. Shearer > >You have an unusual understanding of the word "cause". This is like >connecting too many hair dryers to an outlet and, when the fuse blows, >saying the fuse "caused" the power outage. Well if when the fuse blows it starts a fire and burns down your house whereas without the fuse nothing would have happened, I think it is fair to say the fuse caused the fire. >The fact that another design (without runtime error checking) may have >ignored the conversion error is ironic but not useful. The lesson of Ariane >is not "ignore errors and you'll be better off". The Ada runtime can't tell >which errors are "harmless" unless the programmer tells it this information >(which can be done and was done for other conversions in the same area of >the Ariane code). The Ada runtime has the perfectly reasonable approach of >being vigilant unless told otherwise. The "cause" of the Ariane failure was >both allowing "bad" data into the program's input (both by allowing the >routine to run after it was no longer needed, and by not re-checking the >input ranges for the new design), and incorrectly handling the resultant >error signal output (by dumping debug info onto the system bus, causing the >rocket engines to slam over and rip the rocket apart). > >Blame the hair dryer plugger-inners, not the fuse. The lesson is ignore errors and sometimes you will be better off (as in this case). Most safety features can themselves cause problems a fact that is often ignored by their proponents. Unfortunately telling Ada to ignore this conversion error was expensive (in terms of execution time) which is why it was not done. So this dangerous and useless fuse was difficult to remove. Why shouldn't I blame Ada for this? James B. Shearer ========================================================================= Path: yktvmv!jbs Subject: Re: Computer of the century From: jbs@watson.ibm.com Organization: IBM Newsgroups: alt.folklore.computers,comp.arch,comp.sys.unisys Message-ID: <20000114.220352.674@yktvmv.watson.ibm.com> Date: Sat, 15 Jan 2000 03:03:52 GMT References: <87wvpl45jk.fsf@think.mihalis.net> <85j9da$lhn@nnrp3.farm.idt.net> <85kpe8$14na$1@ausnews.austin.ibm.com> <85l9ac$cl0$1@nntp3.atl.mindspring.net> <20000113.152211.159@yktvmv.watson.ibm.com> <87ya9tfke7.fsf@think.mihalis.net> <20000113.221100.622@yktvmv.watson.ibm.com> <%2Nf4.479$dw3.17512@news.wenet.net> <20000114.192140.591@yktvmv.watson.ibm.com> <85olth$tmd$1@nnrp1.deja.com> In article <85olth$tmd$1@nnrp1.deja.com>, on Sat, 15 Jan 2000 02:27:08 GMT, Ed Falis writes: >In article <20000114.192140.591@yktvmv.watson.ibm.com>, > jbs@watson.ibm.com wrote: > >> Unfortunately telling Ada to ignore this conversion error >> was expensive (in terms of execution time) which is why it was not >> done. So this dangerous and useless fuse was difficult to remove. >> Why shouldn't I blame Ada for this? >> James B. Shearer >> > >So, exactly how is it that Ada is such a perverse language that it defies >the laws of physics by costing more to do nothing than to do something? >This should be entertaining. I can only assume you didn't read the report >without some pretty severe prejudice - if you read the report. Yes, I have read the report. A quote "It has been stated to the Board that not all the conversions were protected because a maximum workload target of 80% had been set for the SRI computer. ..." clearly implying that "protection" was costly. As to why this should be so I would guess protection means replacing int=real with something like if(real.gt.16383.d0)real=16383.d0 if(real.lt.-16384.d0)real=-16384.d0 int=real but I do not know this for a fact. Anyone know for sure? James B. Shearer ========================================================================= Path: yktvmv!jbs Subject: Re: Computer of the century From: jbs@watson.ibm.com Organization: IBM Newsgroups: alt.folklore.computers,comp.arch,comp.sys.unisys Message-ID: <20000115.231057.698@yktvmv.watson.ibm.com> Date: Sun, 16 Jan 2000 04:10:57 GMT References: <87wvpl45jk.fsf@think.mihalis.net> <85j9da$lhn@nnrp3.farm.idt.net> <85kpe8$14na$1@ausnews.austin.ibm.com> <85l9ac$cl0$1@nntp3.atl.mindspring.net> <20000113.152211.159@yktvmv.watson.ibm.com> <87ya9tfke7.fsf@think.mihalis.net> <20000113.221100.622@yktvmv.watson.ibm.com> <%2Nf4.479$dw3.17512@news.wenet.net> <20000114.192140.591@yktvmv.watson.ibm.com> <85ptsf$f40$1@news.campuscwix.net> In article <85ptsf$f40$1@news.campuscwix.net>, on Sat, 15 Jan 2000 06:47:17 -0700, "Larry Elmore" writes: > wrote in message >news:20000114.192140.591@yktvmv.watson.ibm.com... >> In article <%2Nf4.479$dw3.17512@news.wenet.net>, >> on Fri, 14 Jan 2000 14:05:27 -0800, >> "Mike Silva" writes: >> > >> >You have an unusual understanding of the word "cause". This is like >> >connecting too many hair dryers to an outlet and, when the fuse blows, >> >saying the fuse "caused" the power outage. >> >> Well if when the fuse blows it starts a fire and burns down >> your house whereas without the fuse nothing would have happened, I >> think it is fair to say the fuse caused the fire. > >You do realize, I hope, that when a fuse blows, the effect is to _prevent_ a >fire. It _can't_ start one, unless, of course, the fuse is for too high an >amperage for the circuit and _doesn't_ blow when it's supposed to do so. > >Larry For the purpose of this perhaps strained analogy we have to stipulate an extremely defective fuse which starts a fire when it blows. Or another analogy. You are on an iron lung, if the power goes off you will die. Are you safer with a fuse on the circuit? James B. Shearer ========================================================================= Path: yktvmv!jbs Subject: Re: Computer of the century From: jbs@watson.ibm.com Organization: IBM Newsgroups: alt.folklore.computers,comp.arch,comp.sys.unisys Message-ID: <20000115.231637.569@yktvmv.watson.ibm.com> Date: Sun, 16 Jan 2000 04:16:37 GMT References: <87wvpl45jk.fsf@think.mihalis.net> <85j9da$lhn@nnrp3.farm.idt.net> <85kpe8$14na$1@ausnews.austin.ibm.com> <85l9ac$cl0$1@nntp3.atl.mindspring.net> <20000113.152211.159@yktvmv.watson.ibm.com> <87ya9tfke7.fsf@think.mihalis.net> <20000113.221100.622@yktvmv.watson.ibm.com> <%2Nf4.479$dw3.17512@news.wenet.net> <20000114.192140.591@yktvmv.watson.ibm.com> <85olth$tmd$1@nnrp1.deja.com> <20000114.220352.674@yktvmv.watson.ibm.com> <85q3uk$rb6$1@nnrp1.deja.com> In article <85q3uk$rb6$1@nnrp1.deja.com>, on Sat, 15 Jan 2000 15:32:36 GMT, Ed Falis writes: >In article <20000114.220352.674@yktvmv.watson.ibm.com>, > jbs@watson.ibm.com wrote: > >> > >> >> Unfortunately telling Ada to ignore this conversion error >> >> was expensive (in terms of execution time) which is why it was not >> >> done. So this dangerous and useless fuse was difficult to remove. >> >> Why shouldn't I blame Ada for this? >> >> James B. Shearer >> >> >> > >> >So, exactly how is it that Ada is such a perverse language that it defies >> >the laws of physics by costing more to do nothing than to do something? > > >> "It has been stated to the Board that not all the >> conversions were protected >> because a maximum workload target of 80% had been >> set for the SRI computer. ..." >> >> clearly implying that "protection" was costly. > >We must have had a communication breakdown along the way here. >Your first statement said that checking was left on because suppressing >it is too costly in Ada, which is what prompted my remark. Your second >statement, which makes sense, is that checking has a performance >impact, and that as a result the decision to suppress checks was >taken in order to meet workload budgets. Perhaps you should read the report. Protection did not mean suppressing the checks it meant doing something (unspecified in the report) to prevent an overflow from triggering the check, generating an exception and blowing up the rocket. This something was apparently in addition to the check increasing execution time. It was not the checks that were suppressed to save time but the protection. So when an overflow occurred in an unprotected conversion the check generated an exception which caused the computer to shut down which caused the rocket to go out of control and destroy itself. >> As to why this >> should be so I would guess protection means replacing >> int=real >> with something like >> if(real.gt.16383.d0)real=16383.d0 >> if(real.lt.-16384.d0)real=-16384.d0 >> int=real >> but I do not know this for a fact. Anyone know for sure? > > >Depends on the target architecture. For a x86 architecture (which was >not what was used in this case - I believe it was 680x0), the assignment >from an FP register to a general purpose register would be made, followed >by an operation that would cause an overflow flag to be set if the value >was too large. Then an INTO instruction would be executed, which >would cause a hardware exception if it did. Not that much overhead. > >Empirically, on the 32 bit x86 architecture, we found that for a test like >Dhrystone, leaving all checks on without special optimizations for bounds >checking resulted in 15-20% speed degradation relative to suppressing >all checks. When the redundant check elimination optimization was >applied (removing only unnecessary checks), the result was around 7% >degradation relative to suppressed checks. This is generally a pretty >good tradeoff in favor of retaining the checks. You don't understand, the programmers didn't want the check but couldn't eliminate it cheaply. So they left it in and it destroyed the rocket. James B. Shearer ========================================================================= Path: yktvmv!jbs Subject: Re: Computer of the century From: jbs@watson.ibm.com Organization: IBM Newsgroups: alt.folklore.computers,comp.arch,comp.sys.unisys Message-ID: <20000115.233036.950@yktvmv.watson.ibm.com> Date: Sun, 16 Jan 2000 04:30:36 GMT References: <87wvpl45jk.fsf@think.mihalis.net> <85j9da$lhn@nnrp3.farm.idt.net> <85kpe8$14na$1@ausnews.austin.ibm.com> <85l9ac$cl0$1@nntp3.atl.mindspring.net> <20000113.152211.159@yktvmv.watson.ibm.com> <87ya9tfke7.fsf@think.mihalis.net> <20000113.221100.622@yktvmv.watson.ibm.com> <%2Nf4.479$dw3.17512@news.wenet.net> <20000114.192140.591@yktvmv.watson.ibm.com> <3880C05F.8DC1B063@bellatlantic.net> In article <3880C05F.8DC1B063@bellatlantic.net>, on Sat, 15 Jan 2000 17:43:39 GMT, "Jeffrey S. Dutky" writes: >jbs@watson.ibm.com wrote: >> >> The lesson is ignore errors and sometimes you will >> be better off (as in this case). Most safety features can >> themselves cause problems a fact that is often ignored by >> their proponents. > >Good lord! The only time you are better off ignoring errors >is when they aren't happening. Otherwise, you may discard >erroneous inputs safely, but you should at least log the >fact that the error happened. Simply ignoring errors will >ensure that you will be unable to either detect or fix >problems. You are better when the cure (as in this case) is worse than the disease. Or you could consider this a false alarm. You are better off not having alarms if the cost of false alarms will exceed the benefits of detecting real errors. James B. Shearer ========================================================================= Path: yktvmv!jbs Subject: Re: Computer of the century - Ariane subthread From: jbs@watson.ibm.com Organization: IBM Newsgroups: alt.folklore.computers,comp.arch,comp.sys.unisys Message-ID: <20000116.190537.322@yktvmv.watson.ibm.com> Date: Mon, 17 Jan 2000 00:05:37 GMT References: <87wvpl45jk.fsf@think.mihalis.net> <85j9da$lhn@nnrp3.farm.idt.net> <85kpe8$14na$1@ausnews.austin.ibm.com> <85l9ac$cl0$1@nntp3.atl.mindspring.net> <20000113.152211.159@yktvmv.watson.ibm.com> <87ya9tfke7.fsf@think.mihalis.net> <20000113.221100.622@yktvmv.watson.ibm.com> <%2Nf4.479$dw3.17512@news.wenet.net> <20000114.192140.591@yktvmv.watson.ibm.com> <3880C05F.8DC1B063@bellatlantic.net> <20000115.233036.950@yktvmv.watson.ibm.com> <388157b8_1@news.jps.net> In article <388157b8_1@news.jps.net>, on Sat, 15 Jan 2000 21:34:02 -0800, "Mike Silva" writes: > >jbs@watson.ibm.com wrote in message ><20000115.233036.950@yktvmv.watson.ibm.com>... > >> You are better when the cure (as in this case) is worse >>than the disease. >> Or you could consider this a false alarm. You are >>better off not having alarms if the cost of false alarms will >>exceed the benefits of detecting real errors. >> James B. Shearer > > >These are never the only two choices (would you fly on a plane that ignored >real errors based on this approach?). The solution, assuming you want to >build rockets that actually work all the time, is a combination of better >input constraint and analysis, better error handling and recovery, and more >thorough testing (oh, and don't run functions that aren't even being >used...). This is pretty obvious stuff, and, not surprisingly, matches the >essential conclusions of the Ariane investigation. The investigation did >not, by the way, recommend turning off all runtime checks, or changing >languages, since the investigation understood that neither of these "caused" >the failure. > >Mike It is impossible to build rockets which work all the time. Pretending otherwise leads to fuzzy thinking and errors. James B. Shearer ========================================================================= Path: yktvmv!jbs Subject: Re: fuses, iron lungs, range checking (was Re: Computer of the century) From: jbs@watson.ibm.com Organization: IBM Newsgroups: alt.folklore.computers,comp.arch,comp.sys.unisys Message-ID: <20000116.191333.504@yktvmv.watson.ibm.com> Date: Mon, 17 Jan 2000 00:13:33 GMT References: <87wvpl45jk.fsf@think.mihalis.net> <85j9da$lhn@nnrp3.farm.idt.net> <85kpe8$14na$1@ausnews.austin.ibm.com> <85l9ac$cl0$1@nntp3.atl.mindspring.net> <20000113.152211.159@yktvmv.watson.ibm.com> <87ya9tfke7.fsf@think.mihalis.net> <20000113.221100.622@yktvmv.watson.ibm.com> <%2Nf4.479$dw3.17512@news.wenet.net> <20000114.192140.591@yktvmv.watson.ibm.com> <85ptsf$f40$1@news.campuscwix.net> <20000115.231057.698@yktvmv.watson.ibm.com> In article , on 16 Jan 2000 01:11:06 -0800, Eric Smith writes: >The idea that some people consider random, unpredictable behavior to be >better than carefully controlled exception handling boggles my mind. In this case it would have been. Sometimes airbags kill people. Why is this so difficult to accept? James B. Shearer ========================================================================= Path: yktvmv!jbs Subject: Re: Computer of the century - Ariane subthread From: jbs@watson.ibm.com Organization: IBM Newsgroups: alt.folklore.computers,comp.arch,comp.sys.unisys Message-ID: <20000116.191626.100@yktvmv.watson.ibm.com> Date: Mon, 17 Jan 2000 00:16:26 GMT References: <85j9da$lhn@nnrp3.farm.idt.net> <85kpe8$14na$1@ausnews.austin.ibm.com> <85l9ac$cl0$1@nntp3.atl.mindspring.net> <20000113.152211.159@yktvmv.watson.ibm.com> <87ya9tfke7.fsf@think.mihalis.net> <20000113.221100.622@yktvmv.watson.ibm.com> <%2Nf4.479$dw3.17512@news.wenet.net> <20000114.192140.591@yktvmv.watson.ibm.com> <85olth$tmd$1@nnrp1.deja.com> <20000114.220352.674@yktvmv.watson.ibm.com> <85q3uk$rb6$1@nnrp1.deja.com> <20000115.231637.569@yktvmv.watson.ibm.com> <85sjn5$j6j$1@news.campuscwix.net> In article <85sjn5$j6j$1@news.campuscwix.net>, on Sun, 16 Jan 2000 07:12:10 -0700, "Larry Elmore" writes: > wrote in message >news:20000115.231637.569@yktvmv.watson.ibm.com... >> In article <85q3uk$rb6$1@nnrp1.deja.com>, >> on Sat, 15 Jan 2000 15:32:36 GMT, >> Ed Falis writes: >> > >> >We must have had a communication breakdown along the way here. >> >Your first statement said that checking was left on because suppressing >> >it is too costly in Ada, which is what prompted my remark. Your second >> >statement, which makes sense, is that checking has a performance >> >impact, and that as a result the decision to suppress checks was >> >taken in order to meet workload budgets. >> >> Perhaps you should read the report. Protection did not >> mean suppressing the checks it meant doing something (unspecified >> in the report) to prevent an overflow from triggering the check, >> generating an exception and blowing up the rocket. This something >> was apparently in addition to the check increasing execution time. >> It was not the checks that were suppressed to save time but the >> protection. So when an overflow occurred in an unprotected >> conversion the check generated an exception which caused the >> computer to shut down which caused the rocket to go out of control >> and destroy itself. > >It was my understanding that the "protection" would have consisted of an >exception handler for that type of error. When the error occurred, rather >than a handler spending time dealing with it, it threw it all the way to the >top. In the Ariane 4 design, that type of error could only occur in the >event of a hardware failure, so shutting down and handing control over to >the backup kind of makes sense. It's not how I would design it, but that was >their choice. Using the Ariane 4 subsystem in the Ariane 5 without change, >apparently without even any review, is what caused the rocket to go out of >control and destroy itself. Your understanding appears to be wrong since the report clearly states 4 out 7 conversions were protected. James B. Shearer ========================================================================= Path: yktvmv!jbs Subject: Re: Computer of the century - Ariane subthread From: jbs@watson.ibm.com Organization: IBM Newsgroups: alt.folklore.computers,comp.arch,comp.sys.unisys Message-ID: <20000116.192105.722@yktvmv.watson.ibm.com> Date: Mon, 17 Jan 2000 00:21:05 GMT References: <85j9da$lhn@nnrp3.farm.idt.net> <85kpe8$14na$1@ausnews.austin.ibm.com> <85l9ac$cl0$1@nntp3.atl.mindspring.net> <20000113.152211.159@yktvmv.watson.ibm.com> <87ya9tfke7.fsf@think.mihalis.net> <20000113.221100.622@yktvmv.watson.ibm.com> <%2Nf4.479$dw3.17512@news.wenet.net> <20000114.192140.591@yktvmv.watson.ibm.com> <85olth$tmd$1@nnrp1.deja.com> <20000114.220352.674@yktvmv.watson.ibm.com> <85q3uk$rb6$1@nnrp1.deja.com> <20000115.231637.569@yktvmv.watson.ibm.com> <85smhl$hb3@spool.cs.wisc.edu> In article <85smhl$hb3@spool.cs.wisc.edu>, on Sun, 16 Jan 2000 09:09:42 -0600, "Andy Glew" writes: >> You don't understand, the programmers didn't want the check >> but couldn't eliminate it cheaply. So they left it in and it >> destroyed the rocket. > >Actually, it was more like the programmers did not believe the >calculation could ever overflow, so they believed that the check, >although wasteful, was harmless. The programmers were wrong. Sure the programmers made an error but the issue at hand is whether the software environment they were working in encouraged this error. Obviously it did since without runtime checking they could not have made this particular error. James B. Shearer ========================================================================= Path: yktvmv!jbs Subject: Re: Computer of the century - Ariane subthread From: jbs@watson.ibm.com Organization: IBM Newsgroups: alt.folklore.computers,comp.arch,comp.sys.unisys Message-ID: <20000116.192933.681@yktvmv.watson.ibm.com> Date: Mon, 17 Jan 2000 00:29:33 GMT References: <87wvpl45jk.fsf@think.mihalis.net> <85j9da$lhn@nnrp3.farm.idt.net> <85kpe8$14na$1@ausnews.austin.ibm.com> <85l9ac$cl0$1@nntp3.atl.mindspring.net> <20000113.152211.159@yktvmv.watson.ibm.com> <87ya9tfke7.fsf@think.mihalis.net> <20000113.221100.622@yktvmv.watson.ibm.com> <%2Nf4.479$dw3.17512@news.wenet.net> <20000114.192140.591@yktvmv.watson.ibm.com> <3880C05F.8DC1B063@bellatlantic.net> <20000115.233036.950@yktvmv.watson.ibm.com> <85sna8$hu7@spool.cs.wisc.edu> In article <85sna8$hu7@spool.cs.wisc.edu>, on Sun, 16 Jan 2000 09:22:49 -0600, "Andy Glew" writes: >> You are better when the cure (as in this case) is worse >> than the disease. >> Or you could consider this a false alarm. You are >> better off not having alarms if the cost of false alarms will >> exceed the benefits of detecting real errors. >> James B. Shearer > >For a rocket, the cost of a false alarm is a dead rocket, >many millions of dollars lost in rocket, satellite, etc. > >The benefit of detecting a real error is, worst case, not having >a rocket plow into an inhabited area. (French Guiana does have >people, after all, as do neighbouring areas.) > >The Ariane bug was wasteful and costly. And it is *still* a >worthwhile tradeoff. Maybe this particular overflow bug would >not have caused a catastrophe that cost human lives, but do you >want to prove the same for all other possible overflow errors in >the rockets? This is all nonsense. When the exception was detected the rocket did not say to itself "Gee, I seem to be confused, I better destroy myself before I damage something." What happened was when the computer in the inertial reference unit detected the exception it starting dumping diagnostics on the bus. The flight control computer interpreted these diagnostics as navigational information, decided it was suddenly severely off course and ordered large thrust deflections to correct. This caused the rocket to start to tumble and it began to breakup from aerodynamic forces. This breakup was detected and triggered destruct charges. Debris fell over a wide area. So the cost of the false alarm was an out of control rocket which fortunately did not kill anybody. Explain again why this was such a great tradeoff. James B. Shearer ========================================================================= Path: yktvmv!jbs Subject: Re: Ariane 5 rocket failure (was: Computer of the century) From: jbs@watson.ibm.com Organization: IBM Newsgroups: alt.folklore.computers,comp.arch Message-ID: <20000116.194652.301@yktvmv.watson.ibm.com> Date: Mon, 17 Jan 2000 00:46:52 GMT References: <2000Jan14.105616.4668@lorelei.approve.se> <2000Jan14.140835.6404@lorelei.approve.se> <85nnr7$lc9$1@news1.tc.umn.edu> <387FBD87.60D3@compuserve.com> In article , on Sun, 16 Jan 2000 15:21:43 GMT, "George R. Gonzalez" writes: > >Sam Yorko wrote in message <387FBD87.60D3@compuserve.com>... >>George R. Gonzalez wrote: > >>One thing that keeps getting lost in this discussion is that the team >>who wrote the software knew that they could do overflow checking, and >>deliberately did not do so. >> >>Overflow checking takes time, so the design team had to make a decision >>at every point in the software to do an explicit overflow check or not. >>We're talking real-time software here, so time counts. > >So the choices were: > >(1) No checks: Up side: Speed up the code maybe 1%. > Down side: Lost $100 million mission if one bad bit >gets into the data. > > >(2) Checks: > down side: Code runs maybe 1% slower. > Up side: Don't lose $100 mil on one glitch. > >I think I know which option I'd pick. >---- No these were not the choices. The real choices appear to have been: (1) leave automatic check alone - downside if both units generate exceptions rocket is doomed (2) Add additional code to ensure automatic checks can't generate exceptions - downside this increases execution time and if we take too much time rocket is doomed. The question is why there was not another option. (3) Disable automatic check for this conversion - downside may fail to detect hardware failure and switch to healthy unit dooming rocket. James B. Shearer ========================================================================= Path: yktvmv!jbs Subject: Re: Computer of the century - Ariane subthread From: jbs@watson.ibm.com Organization: IBM Newsgroups: alt.folklore.computers,comp.arch,comp.sys.unisys Message-ID: <20000116.195650.478@yktvmv.watson.ibm.com> Date: Mon, 17 Jan 2000 00:56:50 GMT References: <85kpe8$14na$1@ausnews.austin.ibm.com> <85l9ac$cl0$1@nntp3.atl.mindspring.net> <20000113.152211.159@yktvmv.watson.ibm.com> <87ya9tfke7.fsf@think.mihalis.net> <20000113.221100.622@yktvmv.watson.ibm.com> <%2Nf4.479$dw3.17512@news.wenet.net> <20000114.192140.591@yktvmv.watson.ibm.com> <85olth$tmd$1@nnrp1.deja.com> <20000114.220352.674@yktvmv.watson.ibm.com> <85q3uk$rb6$1@nnrp1.deja.com> <20000115.231637.569@yktvmv.watson.ibm.com> <85sjn5$j6j$1@news.campuscwix.net> In article , on 16 Jan 2000 16:34:13 +0000, Tim Bradshaw writes: >* Larry Elmore wrote: > >> It was my understanding that the "protection" would have consisted of an >> exception handler for that type of error. When the error occurred, rather >> than a handler spending time dealing with it, it threw it all the way to the >> top. > >So presumably the performance hit they were worried about was that >involved in setting up & taking down the exception handler? I can see >that being significant I suppose. No this is not accurate, other similar conversions were protected so it wasn't a matter of writing an exception handler for real to integer conversion overflows. Also this shouldn't cost anything for the expected case of no conversion errors. James B. Shearer ========================================================================= Path: yktvmv!jbs Subject: Re: exception handling vs. random, unpredicatble behavior (was Re: fuses, iron lungs, range checking) From: jbs@watson.ibm.com Organization: IBM Newsgroups: alt.folklore.computers,comp.arch,comp.sys.unisys Message-ID: <20000118.194842.365@yktvmv.watson.ibm.com> Date: Wed, 19 Jan 2000 00:48:42 GMT References: <85kpe8$14na$1@ausnews.austin.ibm.com> <85l9ac$cl0$1@nntp3.atl.mindspring.net> <20000113.152211.159@yktvmv.watson.ibm.com> <87ya9tfke7.fsf@think.mihalis.net> <20000113.221100.622@yktvmv.watson.ibm.com> <%2Nf4.479$dw3.17512@news.wenet.net> <20000114.192140.591@yktvmv.watson.ibm.com> <85ptsf$f40$1@news.campuscwix.net> <20000115.231057.698@yktvmv.watson.ibm.com> <20000116.191333.504@yktvmv.watson.ibm.com> In article , on 17 Jan 2000 11:14:19 -0800, Eric Smith writes: >I wrote: >> The idea that some people consider random, unpredictable behavior to be >> better than carefully controlled exception handling boggles my mind. > >jbs@watson.ibm.com writes: >> In this case it would have been. Sometimes airbags kill >> people. Why is this so difficult to accept? > >If you had a drive-by-wire car, would you rather have a range check on >an array access result in an exception, and have the hierarchy of handlers >attempt to perform reasonable error recovery actions specifically designed >with the intent of maintaining passenger safety? > >Or would you rather have the range error go undetected, let the software >scribble effectively random byte into who-knows-what data structures, and >have the car behave in a completely unpredictable manner? > >It is certainly true that in the first case, there may be instances where >the carefully-designed error recovery fails to protect the passenger, just >like the airbag might kill the passenger. > >However, wouldn't you expect the car of the second case, when exhibiting >completely unpredictable behavior, to have a higher likelyhood of harming >the passenger? If not, please explain why. You are assuming that run time checking will always be accompanied by carefully designed error recovery. The Ariane case shows that this is not necessarily true. In judging whether a safety feature is good idea you must consider how it will implemented in the real world, not in some theoretical world in which people don't make mistakes. In the real world projects have budgets, resource spent on the carefully designed error recovery procedures will take resource from other areas such as carefully designed testing. Given this it is not at all clear to me that run time checks are a win. In any case even if they are a good idea in general they still caused the Ariane 5 failure in particular. James B. Shearer ========================================================================= Path: yktvmv!jbs Subject: Re: risk management, fly-by-wire (was Re: fuses, iron lungs, range checking) From: jbs@watson.ibm.com Organization: IBM Newsgroups: alt.folklore.computers,comp.arch,comp.sys.unisys Message-ID: <20000118.203322.946@yktvmv.watson.ibm.com> Date: Wed, 19 Jan 2000 01:33:22 GMT References: <85l9ac$cl0$1@nntp3.atl.mindspring.net> <20000113.152211.159@yktvmv.watson.ibm.com> <87ya9tfke7.fsf@think.mihalis.net> <20000113.221100.622@yktvmv.watson.ibm.com> <%2Nf4.479$dw3.17512@news.wenet.net> <20000114.192140.591@yktvmv.watson.ibm.com> <85ptsf$f40$1@news.campuscwix.net> <20000115.231057.698@yktvmv.watson.ibm.com> <85tjvb$5hm$1@nnrp1.deja.com> In article , on 17 Jan 2000 22:21:16 +0100, Jan Vorbrueggen writes: >Eric Smith writes: > >> All other things being equal (to the limited extent that I can evaluate >> them), I expect a fly-by-wire aircraft to be less reliable than a >> non-fly-by-wire aircraft. > >All others things being equal, I'd prefer to be in a fly-by-wire aircraft if >the middle engine (DC-10, Sioux City) or badly repaired pressure dome (B747, >Nagasaki) breaks the elevator and aileron hydraulics, because the digital >flight control can still enable the pilot to bring the aircraft to a >controlled landing, while three people in the DC-10 did so-so (still an >utterly amazing feat) and no one on board survived the B747 incident. Actually I believe 4 out 524 on board survived the Japan Airlines crash. James B. Shearer ========================================================================= Path: yktvmv!jbs Subject: Re: Ariane 5 rocket failure (was: Computer of the century) From: jbs@watson.ibm.com Organization: IBM Newsgroups: alt.folklore.computers,comp.arch Message-ID: <20000118.204003.378@yktvmv.watson.ibm.com> Date: Wed, 19 Jan 2000 01:40:03 GMT References: <2000Jan14.105616.4668@lorelei.approve.se> <2000Jan14.140835.6404@lorelei.approve.se> <85nnr7$lc9$1@news1.tc.umn.edu> <387FBD87.60D3@compuserve.com> <87zou480e4.fsf@think.mihalis.net> In article <87zou480e4.fsf@think.mihalis.net>, on 17 Jan 2000 23:33:07 -0500, Chris Morgan writes: >ehrice@his.com (Edward Rice) writes: > >> The original authors of the code ensured their blamelessness by putting >> BOLD notices about things like "we decided not to check for overflow" right >> up front in the main module, of course -- right? And it was in the >> documentation for the package, in a well-flagged and easy to find place >> that nobody could miss, right? >> >> Or was this another case of "it runs, ship it -- real programmers don't >> write docs"? > >The original authors were working on Ariane 4, so the fact that it >failed on Ariane 5 is not really their fault. >-- >Chris Morgan http://mihalis.net From the failure report: ...................................................................... n) During design of the software of the inertial reference system used for Ariane 4 and Ariane 5, a decision was taken that it was not necessary to protect the inertial system computer from being made inoperative by an excessive value of the variable related to the horizontal velocity, a protection which was provided for several other variables of the alignment software. When taking this design decision, it was not analysed or fully understood which values this particular variable might assume when the alignment software was allowed to operate after lift-off. o) In Ariane 4 flights using the same type of inertial reference system there has been no such failure because the trajectory during the first 40 seconds of flight is such that the particular variable related to horizontal velocity cannot reach, with an adequate operational margin, a value beyond the limit present in the software. ..................................................................... So they just lucked out for Ariane 4. James B. Shearer ========================================================================= Path: yktvmv!jbs Subject: Re: Computer of the century From: jbs@watson.ibm.com Organization: IBM Newsgroups: alt.folklore.computers,comp.arch,comp.sys.unisys Message-ID: <20000119.215150.737@yktvmv.watson.ibm.com> Date: Thu, 20 Jan 2000 02:51:50 GMT References: <85l9ac$cl0$1@nntp3.atl.mindspring.net> <20000113.152211.159@yktvmv.watson.ibm.com> <87ya9tfke7.fsf@think.mihalis.net> <20000113.221100.622@yktvmv.watson.ibm.com> <%2Nf4.479$dw3.17512@news.wenet.net> <20000114.192140.591@yktvmv.watson.ibm.com> <85olth$tmd$1@nnrp1.deja.com> <20000114.220352.674@yktvmv.watson.ibm.com> <85q3uk$rb6$1@nnrp1.deja.com> <20000115.231637.569@yktvmv.watson.ibm.com> <85smhl$hb3@spool.cs.wisc.edu> <863b5t$fer$8@nntp2.atl.mindspring.net> In article <863b5t$fer$8@nntp2.atl.mindspring.net>, on 19 Jan 2000 03:31:09 GMT, Zalman Stern writes: >In comp.arch Andy Glew wrote: >: Actually, it was more like the programmers did not believe the >: calculation could ever overflow, so they believed that the check, >: although wasteful, was harmless. The programmers were wrong. > >Ok, but the argument here is what do you do in such a situation TO BE >SAFE!!!! If you realize that the check firing an exception will surely kill >the rocket and that simply letting the conversion produce bogus results will >probably do nothing, then you try really hard to have the conversion >produce bogus results. In any event, you should try to figure out what will >happen and it sounds like that was part of the problem with the design >methodology -- they believed that it wouldn't fail and hence didn't analyze >the case. > >James is claiming that it was difficult to have the conversion *not* raise >an exception in event of overflow. Which if true, would be a concern. They analyzed the case. However they had a execution time problem which meant they were reluctant to add unneeded protection. This appears to imply there was no cheap (in terms of execution time) way of preventing conversion errors from raising exceptions. The analysis did not properly account for what would happen when the alignment function, which was only performing a meaningful computation before liftoff, continued to operate after liftoff. It happened that they were ok for Ariane 4 but not Ariane 5. James B. Shearer ========================================================================= Path: yktvmv!jbs Subject: Re: Computer of the century From: jbs@watson.ibm.com Organization: IBM Newsgroups: alt.folklore.computers,comp.arch,comp.sys.unisys Message-ID: <20000119.221231.213@yktvmv.watson.ibm.com> Date: Thu, 20 Jan 2000 03:12:31 GMT References: <85l9ac$cl0$1@nntp3.atl.mindspring.net> <20000113.152211.159@yktvmv.watson.ibm.com> <87ya9tfke7.fsf@think.mihalis.net> <20000113.221100.622@yktvmv.watson.ibm.com> <%2Nf4.479$dw3.17512@news.wenet.net> <20000114.192140.591@yktvmv.watson.ibm.com> <85olth$tmd$1@nnrp1.deja.com> <20000114.220352.674@yktvmv.watson.ibm.com> <85q3uk$rb6$1@nnrp1.deja.com> <20000115.231637.569@yktvmv.watson.ibm.com> <85smhl$hb3@spool.cs.wisc.edu> <863b5t$fer$8@nntp2.atl.mindspring.net> <864m3m$kbj@spool.cs.wisc.edu> In article <864m3m$kbj@spool.cs.wisc.edu>, on Wed, 19 Jan 2000 09:51:56 -0600, "Andy Glew" writes: >Killing the rocket has to stay the deep backup plan. Expensive, but safe. Perhaps not very safe if the software incorrectly kills the rocket on the pad. James B. Shearer ========================================================================= Path: yktvmv!jbs Subject: Re: exception handling vs. random, unpredicatble behavior (was Re: fuses, iron lungs, range checking) From: jbs@watson.ibm.com Organization: IBM Newsgroups: alt.folklore.computers,comp.arch,comp.sys.unisys Message-ID: <20000119.221717.201@yktvmv.watson.ibm.com> Date: Thu, 20 Jan 2000 03:17:17 GMT References: <20000113.152211.159@yktvmv.watson.ibm.com> <87ya9tfke7.fsf@think.mihalis.net> <20000113.221100.622@yktvmv.watson.ibm.com> <%2Nf4.479$dw3.17512@news.wenet.net> <20000114.192140.591@yktvmv.watson.ibm.com> <85ptsf$f40$1@news.campuscwix.net> <20000115.231057.698@yktvmv.watson.ibm.com> <20000116.191333.504@yktvmv.watson.ibm.com> <20000118.194842.365@yktvmv.watson.ibm.com> <864mpf$u2k$4@jetsam.uits.indiana.edu> In article <864mpf$u2k$4@jetsam.uits.indiana.edu>, on 19 Jan 2000 15:55:27 GMT, galexand@sietch.bloomington.in.us (Greg Alexander) writes (in part): > >To repeat the already stated obvious: they would have shot themselves in >the foot given an AK-47 or a BB gun. People (deadlines, management, >budget concerns) killed Ariane 5, not software. It was not inevitable that the rocket would fail. The software worked on the Ariane 4 and almost worked on Ariane 5 (the alignment function failed after 40 seconds, it would have been shut off at 50 seconds). It's not like there were 10 other fatal problems with the rocket. Consider a plane which runs out of gas and crashes because the pilot was distracted by false landing gear up alarm. It is not correct or helpful to say the flight was doomed anyway with such a stupid pilot. James B. Shearer ========================================================================= Path: yktvmv!jbs Subject: Re: exception handling vs. random, unpredicatble behavior (was Re: fuses, iron lungs, range checking) From: jbs@watson.ibm.com Organization: IBM Newsgroups: alt.folklore.computers,comp.arch,comp.sys.unisys Message-ID: <20000119.222618.417@yktvmv.watson.ibm.com> Date: Thu, 20 Jan 2000 03:26:18 GMT References: <20000113.152211.159@yktvmv.watson.ibm.com> <87ya9tfke7.fsf@think.mihalis.net> <20000113.221100.622@yktvmv.watson.ibm.com> <%2Nf4.479$dw3.17512@news.wenet.net> <20000114.192140.591@yktvmv.watson.ibm.com> <85ptsf$f40$1@news.campuscwix.net> <20000115.231057.698@yktvmv.watson.ibm.com> <20000116.191333.504@yktvmv.watson.ibm.com> <20000118.194842.365@yktvmv.watson.ibm.com> In article , on Wed, 19 Jan 2000 10:05:36 -0800, "Mike Silva" writes: > >jbs@watson.ibm.com wrote in message ><20000118.194842.365@yktvmv.watson.ibm.com>... > >> In the real world projects have budgets, resource spent on >>the carefully designed error recovery procedures will take resource >>from other areas such as carefully designed testing. Given this >>it is not at all clear to me that run time checks are a win. > >Actually runtime checks are a fantastic aid to testing, since they can >pinpoint an error that otherwise might propagate and trash something far >down the line, leaving the programmer to backtrack from the manifestation of >the error to the source. I have seen bugs where days of debugging would >have been reduced to minutes with runtime checks. I have nothing against run time checks for debugging. Note this does not require a carefully designed error recovery procedure, just a pointer to the offending statement. However this is not the issue at hand. The issue at hand is whether, in production, the software is more likely to fail with run time checking enabled or disabled. >> In any >>case even if they are a good idea in general they still caused the >>Ariane 5 failure in particular. > >Once again, no, runtime checking didn't "cause" the failure. If the code had not had run time checking it would not have failed. Of course, as in many accidents, the failure had multiple causes. James B. Shearer ========================================================================= Path: yktvmv!jbs Subject: Re: risk management, fly-by-wire (was Re: fuses, iron lungs, range checking) From: jbs@watson.ibm.com Organization: IBM Newsgroups: alt.folklore.computers,comp.arch,comp.sys.unisys Message-ID: <20000119.224221.040@yktvmv.watson.ibm.com> Date: Thu, 20 Jan 2000 03:42:21 GMT References: <%2Nf4.479$dw3.17512@news.wenet.net> <20000114.192140.591@yktvmv.watson.ibm.com> <85ptsf$f40$1@news.campuscwix.net> <20000115.231057.698@yktvmv.watson.ibm.com> <85tjvb$5hm$1@nnrp1.deja.com> <3884084E.6F74@hda.hydro.com> <38842856.7D6C@hda.hydro.com> In article , on Wed, 19 Jan 2000 10:22:21 -0800, handleym@ricochet.net (Maynard Handley) writes: >It boils down to: Most of most flights is routine where FBW does better. >On certain rare occasions, the unexpected kicks in and manual flight does >better. >What are the relative frequencies, and what is my personal weighting of >them? This is something of interest to me and no-one else, and likewise >with every other persons weighing in. If people feel a need to share their >particular fear or not of flying, they might better do so in a chat room. Nonsense, the relative frequencies influence the safest design which is of interest to many people. James B. Shearer ========================================================================= Path: yktvmv!jbs Subject: Re: Monopoly From: jbs@watson.ibm.com Organization: IBM Newsgroups: comp.sys.unisys,alt.folklore.computers,comp.arch Message-ID: <20000215.151733.056@yktvmv.watson.ibm.com> Date: Tue, 15 Feb 2000 20:17:33 GMT References: <86auac$is3$1@mail.pl.unisys.com> <87nurv$cnu@gwis2.circ.gwu.edu> <88c32l$lqe$1@hawkins.cba.uni.edu> <88cama$1140$1@news.rchland.ibm.com> In article <88cama$1140$1@news.rchland.ibm.com>, on 15 Feb 2000 19:50:34 GMT, cecchi@signa.rchland.ibm.com (Del Cecchi) writes: >Isn't this sort of tautological? After all what is the definition of >"competitive industry"? And "long run" for that matter. Yes, an industry with >no barriers to entry like start up cost, and perfect information, etc will have >no profit. Of course there are very few such industries in the real world, but >the math is fun and one can get published. > >Del "realist" cecchi. Not exactly, in such an industry the rate of return on capital (ie profit) should be about average and the average return is not zero. >What was the name of that hedge fund with all the nobel laureates and professors >and stuff that almost sunk the world's economy? Something Capital Management? Long term capital management, LTCM. James B. Shearer ========================================================================= Path: yktvmv!jbs Subject: Re: Itanium and libm and IA32 From: jbs@watson.ibm.com Organization: IBM Newsgroups: comp.arch Message-ID: <20000424.190706.080@yktvmv.watson.ibm.com> Date: Mon, 24 Apr 2000 23:07:06 GMT References: <8dl6fm$4hr@deadzone.rsn.hp.com> <8dla2k$pdm$1@server05.icaen.uiowa.edu> <8e1mvv$dnb@deadzone.rsn.hp.com> In article <8e1mvv$dnb@deadzone.rsn.hp.com>, on 24 Apr 2000 09:50:39 -0500, Patrick F. McGehearty writes: >I can't comment at this time on the details of HP's compiler optimization >plans for the IA64 architecture (mostly because I haven't asked permission >and any undelivered product needs a formal approval before public >discussion). However, it is fair to say that the HP PA-RISC optimizer has >had pair-wise versions of some math library functions which pass two sets of >input values and return two sets of results per call. Since the PA8000 >series has two complete sets of functional units, these pair-wise math >functions only take slightly longer to get two results than the nominal >single result functions take to get one result. When a loop which uses one >or more these functions is unrolled an even number of times, the compiler >replaces pairs of calls with pair-wise calls, for a near doubling in >performance. The concept is not difficult, once explained. The difficulty >is in getting all the details right for maximum performance and deciding >which math routines are worth the trouble. :-) > >Response in advance: I can already hear someone saying "well, just do them >all!". The act of optimizing all hundred or so std math library routines >requires substantial effort. That effort could be applied to other >optimizations for perhaps greater benefits. Available staff to implement, >test, measure, and validate optimizations is always limited, so the decision >of which optimizations to go after is an optimization problem itself. This is interesting. An alternative approach is to write vector intrinsic functions. Does HP do this also? If not could you comment on why HP chose to write two at time functions rather than n at a time functions? James B. Shearer ========================================================================= Path: yktvmv!jbs Subject: Re: Itanium and libm and IA32 From: jbs@watson.ibm.com Organization: IBM Newsgroups: comp.arch Message-ID: <20000425.123928.406@yktvmv.watson.ibm.com> Date: Tue, 25 Apr 2000 16:39:28 GMT References: <8dl6fm$4hr@deadzone.rsn.hp.com> <8dla2k$pdm$1@server05.icaen.uiowa.edu> <8e1mvv$dnb@deadzone.rsn.hp.com> <20000424.190706.080@yktvmv.watson.ibm.com> <390551E7.49D77837@hda.hydro.com> In article <390551E7.49D77837@hda.hydro.com>, on Tue, 25 Apr 2000 10:05:59 +0200, Terje Mathisen writes: >jbs@watson.ibm.com wrote: >> >> In article <8e1mvv$dnb@deadzone.rsn.hp.com>, >> on 24 Apr 2000 09:50:39 -0500, >> Patrick F. McGehearty writes: >> >Response in advance: I can already hear someone saying "well, just do them >> >all!". The act of optimizing all hundred or so std math library routines >> >requires substantial effort. That effort could be applied to other >> >optimizations for perhaps greater benefits. Available staff to implement, >> >test, measure, and validate optimizations is always limited, so the decision >> >of which optimizations to go after is an optimization problem itself. >> >> This is interesting. An alternative approach is to write >> vector intrinsic functions. Does HP do this also? If not could you >> comment on why HP chose to write two at time functions rather than >> n at a time functions? > >I agree that this is very interesting, and a very good idea as well. > >I'll make a guess that the main reasons for doing pairs instead of N, >was that >a) unrolling by two was a reasonable tradeoff between increasing >throughput and increasing code size. > >b) unrolling by more than two would cause them to run out of registers, >having to use memory buffers instead, and this could quickly cost more >than the speed gained by doubling the math function throughput. I see I was too brief. If you use vector intrinsic functions you don't unroll the loop at all you split it and replace the intrinsic part with a call to a vector intrinsic functions. For example suppose your loop is do j=1,n y(j)=y(j)+a*exp(b*x(j)) enddo you would introduce a temporary array t and split the loop into 3: do j=1,n t(j)=b*x(j) enddo do j=1,n t(j)=exp(t(j)) enddo do j=1,n y(j)=y(j)+a*t(j) endo Then you would replace the middle loop with a vector intrinsic function call. James B. Shearer ========================================================================= Path: yktvmv!jbs Subject: Re: Itanium and libm and IA32 From: jbs@watson.ibm.com Organization: IBM Newsgroups: comp.arch Message-ID: <20000425.191952.874@yktvmv.watson.ibm.com> Date: Tue, 25 Apr 2000 23:19:52 GMT References: <8dl6fm$4hr@deadzone.rsn.hp.com> <8dla2k$pdm$1@server05.icaen.uiowa.edu> <8e1mvv$dnb@deadzone.rsn.hp.com> <20000424.190706.080@yktvmv.watson.ibm.com> <390551E7.49D77837@hda.hydro.com> <20000425.123928.406@yktvmv.watson.ibm.com> <3905F434.9CD@hda.hydro.com> In article <3905F434.9CD@hda.hydro.com>, on Tue, 25 Apr 2000 21:38:28 +0200, Terje Mathisen writes: >jbs@watson.ibm.com wrote: >> >> In article <390551E7.49D77837@hda.hydro.com>, >> on Tue, 25 Apr 2000 10:05:59 +0200, >> Terje Mathisen writes: >> >b) unrolling by more than two would cause them to run out of registers, >> >having to use memory buffers instead, and this could quickly cost more >> >than the speed gained by doubling the math function throughput. >> >> I see I was too brief. If you use vector intrinsic functions >> you don't unroll the loop at all you split it and replace the intrinsic >> part with a call to a vector intrinsic functions. For example suppose >> your loop is >> do j=1,n >> y(j)=y(j)+a*exp(b*x(j)) >> enddo >> you would introduce a temporary array t and split the loop into 3: >> do j=1,n >> t(j)=b*x(j) >> enddo >> do j=1,n >> t(j)=exp(t(j)) >> enddo >> do j=1,n >> y(j)=y(j)+a*t(j) >> endo >> Then you would replace the middle loop with a vector intrinsic >> function call. >> James B. Shearer > >No, no, no! > >This is _exactly_ what I meant about running out registers: > > "you would introduce a temporary array t" > >This is a huge performance limiter, unless you at the very least do L1 >cache blocking, so t is always a small part of available L1 size, and >then preferably use streaming load/store operations for the base >vectors. Yes, this approach will be a real disaster if your vectors do not fit in cache on a machine with a lousy memory system. However if you are working in cache the speed up can be more than two which is the maximum obtainable by the two at a time method. Anyway the problem is memory bandwidth, not the number of registers. James B. Shearer ========================================================================= Path: yktvmv!jbs Subject: Re: Itanium and libm and IA32 From: jbs@watson.ibm.com Organization: IBM Newsgroups: comp.arch Message-ID: <20000426.194444.960@yktvmv.watson.ibm.com> Date: Wed, 26 Apr 2000 23:44:44 GMT References: <8dl6fm$4hr@deadzone.rsn.hp.com> <8dla2k$pdm$1@server05.icaen.uiowa.edu> <8e1mvv$dnb@deadzone.rsn.hp.com> <20000424.190706.080@yktvmv.watson.ibm.com> <390551E7.49D77837@hda.hydro.com> <20000425.123928.406@yktvmv.watson.ibm.com> <3905F434.9CD@hda.hydro.com> <3906B6F0.76C3F05B@mikron.de> In article <3906B6F0.76C3F05B@mikron.de>, on Wed, 26 Apr 2000 11:29:20 +0200, Bernd Paysan writes: >Terje Mathisen wrote: >> >> jbs@watson.ibm.com wrote: >> > For example suppose >> > your loop is >> > do j=1,n >> > y(j)=y(j)+a*exp(b*x(j)) >> > enddo >> > you would introduce a temporary array t and split the loop into 3: >... >> No, no, no! >> >> This is _exactly_ what I meant about running out registers: >> >> "you would introduce a temporary array t" >> >> This is a huge performance limiter, unless you at the very least do L1 >> cache blocking, so t is always a small part of available L1 size, and >> then preferably use streaming load/store operations for the base >> vectors. > >Indeed. The right way is to unfold exp() in the proper sequence of >multiply by magic constant (e.g. e/2), extract integer/fraction, >multiply fraction by the reciprocal magic constant, perform Tailor >polynom (with sufficient order) on result, add integer part to exponent >(or a faster solution like using 8e as magic constant and indexing a >table with the lower 4 bits of the integer part - you multiply with the >table value at the end, and the polynom converts much faster). That's >perhaps ~10 operations that can be pipelined. Now, get your two >multiplications and the addition into this sequence, and you have an >instruction sequence that can be vectorized. If you can execute two macs >with a latency of 4 cycles each (fully pipelined), the optimal unrolling >would be 8. I don't think it is practical for a compiler to do this. The exp code will need a rare path to handle things like exp(-1000.) making eight way unrolling rather messy. Do you know of any actual compiler that does anything like this? There is an actual compiler which generates vector intrinsic function calls as I described. See http://www.npcai.edu/online/v3.9/SCAN1.html for example. James B. Shearer PS: The magic constant is of course 1./log(2.) not e/2. ========================================================================= Path: yktvmv!jbs Subject: Bush's IQ (was Re: Al Gore and the Internet From: jbs@watson.ibm.com Organization: IBM Newsgroups: comp.arch Message-ID: <20001019.154642.410@yktvmv.WATSON.IBM.COM> Date: Thu, 19 Oct 2000 19:46:42 GMT References: <9cuG5.235$xE1.131813@news.pacbell.net> <8sikoq02o0g@enews3.newsguy.com> <39ED6E89.64699CE0@mikron.de> <8sk4jj01cgf@enews2.newsguy.com> <39EDC4BF.AEC2F0AD@mikron.de> In article <39EDC4BF.AEC2F0AD@mikron.de>, on Wed, 18 Oct 2000 17:41:52 +0200, Bernd Paysan writes: >"John S. Dyson" wrote: >> BTW, Bush did graduate from his ivy-league college... He is NO fool. > >Two-digit means "below average". Many people below average should be >able to graduate from a *college*. It's not really that difficult. I >haven't heard a single intelligent sentence from Bush. Ok, most of the >time, he tries to defend execusions, and with arguments somewhere >between Filbinger and Freisler. This is silly. Bush's SAT scores were 566 verbal, 640 math. Adjusted for the fact that not everyone takes the SAT, this puts him around 95 percentile verbal and 98 percentile math. This corresponds to an IQ of about 125-130. James B. Shearer PS: Gore scored 625 verbal, 730 math.