Topic P6 from CPU FAQ base

Ļīęąėóéńņą, īįšąņčņå āķčģąķčå ķą äąņó ļšåäńņąāėåķķīćī ēäåńü ńīīįłåķč’! Čķōīšģąöč’ īį ąäšåńąõ, ņåėåōīķąõ, īšćąķčēąöč’õ č ėžä’õ ķąāåšķ’źą óńņąšåėą č ļīņåš’ėą ļšąźņč÷åńźóž öåķķīńņü, īįšåņ’, īäķąźī, öåķķīńņü čńņīšč÷åńźóž, ēąšąäč źīņīšīé äī ńčõ ļīš č õšąķčņń’...

— 299.COMP.SYS.INTEL (2:5020/299) ———————————————————————— 299.COMP.SYS.INTEL — From : 2:5020/299.100 Fri 03 Feb 95 20:57 Subj : P6 Info ———————————————————————————————————————————————————————————————————————————————— This information is available on Intel's P6 processor The P6 family of processors will be the next generation of Intel's processor technology. All members of the P6 family will be designed for complete compatibility with all PC software. The high performance of the P6 will make it especially well-suited for upcoming desktop applications like speech recognition and multimedia authoring, as well as for more demanding server applications. Intel's P6 family of processors... Ensures complete binary compatibility with previous generations of the Intel Architecture. Delivers superior performance through an innovation called Dynamic Execution. Provides support for enhanced data integrity and reliability feaures: ECC (Error Checking and Correction), Fault Analysis & Recovery, and Functional Redundancy Checking. Includes features that will greatly simplify the design of mutiprocessing systems. The first member of the P6 processor family... Arrives in desktops and servers in 1995. Operates at an expected 250-300 MIPS. Integrates about 5.5 million transistors on the chip, compared to approximately 3.1 million transistors on the Pentium processor. Will initially be produced on the same high volume 0.6 micron process currently used for the 90 & 100 MHz versions of the Pentium processor, and will then move to a 0.35 micron process. Delivers performance that will scale up to 1000 MIPS with four processors. Intel is presenting a technical paper on the P6 microarchitecture at the 1995 IEEE International Solid-State Circuits Conference on February 16th. Please visit this page again after this date for updated information on Intel's P6 processor. -- Fred Gruner Intel Corporation e-mail: 2200 Mission College Blvd. MS RN6-16 phone: (408) 765-8882 Santa Clara, CA 95056 Fax: (408) 765-6688 URL: My opinions are strictly my own and I do not speak for Intel Corporation. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ --- * Origin: a kind of gate (2:5020/299.100) — 299.COMP.SYS.INTEL (2:5020/299) ———————————————————————— 299.COMP.SYS.INTEL — From : 2:5020/299.100 Tue 21 Feb 95 18:34 Subj : P6 operation explained ———————————————————————————————————————————————————————————————————————————————— In article <3ic9cq$> (Mike Schmit) writes: The Intel documents say that the P6 has a 12-stage pipeline, however there are no specific details of the 12 stages. However, the stages can be grouped into 3 phases: 1. fetch and decode 2. dispatch and execute 3. retire and write-back I thought the foils I used at ISSCC last week were put online at Intel's www site. If so, the pipeline diagram does show some details on what the pipe stages are: In-order front end: 1 cycle BTB prediction 2.5 cycles IFU (instruction cache) lookup 2.5 cycles decode to micro-ops 1 cycle register rename 1 cycle reservation station (RS) write Out-of-order core: 2 cycles RS dispatch 1 cycles execute (for an integer operation) In-order retirement: 3 cycles In the dispatch/execute phase the micro-ops are sent to an instruction pool that holds 20 or 30 micro-ops. The dispatcher chooses ops to be executed based on dependencies and available resources to perform the execution. When an execution unit finishes with an operation the results are sent back to the pool (not to memory or an actual physical register). Need to be precise here. An EU's results actually DO go "back to the pool", meaning the reservation station, in case some uop is waiting for exactly that data in order for it to become ready for execution. Those results also go to the Reorder Buffer, which is the pool of "physical registers", so that they can eventually be committed to permanent machine state. I believe permanent machine state is what you meant by "physical registers", but I'm drawing the distinction between the renamed register entries in the ROB, which we also call "physical registers" in P6 parlance, and the ISA registers in the Retirement Register File. ... How, exactly, ports 0 and 1 work have not been disclosed (i.e. can the FPU be working on a long multi-cycle operation and the integer unit 0 still be used. I assume this is true). Five micro-ops can be issued in It is true. The FPU's are pipelined and only tie up an RS port when an FP operation is being dispatched, or when one is writing back a result. any given cycle, one to each resource port. A throughput of only 3 micro-ops can be sustained, apparently due to a limitation of the retire unit. Ports 0 and 1 seem very much like the Pentium's U and V pipelines. Really? I don't see the connection. Bob Colwell Intel Corp. JF1-19 5200 NE Elam Young Parkway Hillsboro, Oregon 97124 --- * Origin: a kind of gate (2:5020/299.100) — 299.COMP.SYS.INTEL (2:5020/299) ———————————————————————— 299.COMP.SYS.INTEL — From : 2:5020/299.100 Tue 21 Feb 95 22:13 Subj : some P6 Q&A ———————————————————————————————————————————————————————————————————————————————— Sender: Are actual SPECint numbers (of the individual SPEC benchmarks, not the composite number) from the prototype system available? I'm interested to know how dynamic execution affects the performance of programs like gcc that have a lot of decision trees. Yep, gcc's a good benchmark. We relied on it heavily in performance tuning P6, especially earlier in the design effort. When we go through the formal SPEC process of validating our numbers, we'll make the whole suite of benchmark results available on the net. What is the estimated SpecFP speed? We haven't disclosed that yet. Does the dynamic execution stuff apply to floating point instructions? Yes, the FP uops go through exactly the same process as the integer uops do. FP uops do tend to take more than one clock cycle to execute, however. Does the FPU have a separate adder and multiplier that can operate at the same time? Yes. Do you still have to put delay slots in your source code before you can use the multiplier output? What I'm getting at is, if you write an inner product routine in the obvious way (straight line code saying sum += x[n]*y[n]), can the P6 start an add/multiply on every cycle except maybe the first few? Can it do the add *and* multiply in a single cycle? The FP adder and multiplier are on the same port, so P6 cannot initiate an FADD and FMPY on every clock. Both units are pipelined, however, so it can sustain multiple simultaneous FP ops on both. But there is a certain amount of memory bandwidth implied in your question, and P6 can do but one load and one store per clock (this is not a vector machine!). Are any new instructions added? What is this native signal processing stuff? Is bit reversed addressing supported, or anything like that? We added conditional move instructions to both the integer and FP sides, to help in cases of especially flaky branches. P6 does a good job on signal processing because it naturally unrolls inner loops and extracts lots of parallelism. But there is no suite of new instructions that's been added beyond that. Will Intel publish a manual that actually says how to program the chip, or will it be like the stupid Pentium manual where all the good parts (Appendix H) are left out? None of us are real anxious to help the competition reverse-engineer this thing, but on the other hand we do want our customers to get maximal performance. We're going to try really hard to get this balancing act right. Stay tuned. Bob Colwell Intel Corp. JF1-19 5200 NE Elam Young Parkway Hillsboro, Oregon 97124 --- * Origin: a kind of gate (2:5020/299.100) — 299.COMP.SYS.INTEL (2:5020/299) ———————————————————————— 299.COMP.SYS.INTEL — From : 2:5020/299.100 Wed 01 Mar 95 10:28 Subj : Unknown ———————————————————————————————————————————————————————————————————————————————— In article <3ivvmh$> (Mike Schmit) writes: In (Robert Colwell) writes: >P6 has a single port on the Reservation Station for FP operations, but it >actually has several FP units, including an FP adder, an FP multiplier, and >an FP divider. The adder and multiplier are pipelined, the divider is not. So I guess this means that a program could have a series of FADD, FMUL, FADD FMUL and they would issue in 1-cycle increments (assuming no data dependencies). And that the dispatcher could/would rearrange the order to eliminate the dependencies? Not quite; FADD can accept a new operation every cycle, latency 3 cycles. FMUL can accept a new op every other cycle, latency 5 cycles. The RS will ensure that FP ops are issued only when their true data dependencies have been satisfied (same as integer ops). How does this affect the FXCH optimizations for the Pentium? Does FXCH take up a full RS cycle for the FPU? P6 renames FP registers anyway, so P6 sees an FXCH as a directive to swap two of the entries in the FP rename table. FXCH's don't even appear in the RS, since they don't need an execution unit in order to have their effect. Bob Colwell Intel Corp. JF1-19 5200 NE Elam Young Parkway Hillsboro, Oregon 97124 --- * Origin: a kind of gate (2:5020/299.100) — SU.HARDW.PC.CPU (2:5020/299) —————————————————————————————— SU.HARDW.PC.CPU — From : Alex Iliynsky 2:5020/23 Wed 06 Sep 95 13:10 Subj : P6 news ———————————————————————————————————————————————————————————————————————————————— Hi All! Intel prepares P6 successor with clock speeds pushing 200MHz September 4, 1995 Intel prepares P6 successor with clock speeds pushing 200MHz By Tom Davey Responding to concerns that its next-generation P6 chip may be slower than comparable Pentiums on some tasks, Intel Corp. is rapidly developing an encore, tentatively dubbed the P6S, that features an initial clock speed of at least 180MHz. Expected to premiere in the second half of 1996, the new chip will be built using 0.35-micron technology instead of the 0.60-micron technology that Intel is using for its initial P6, which is due later this year. By the end of 1996, the P6S could reach clock speeds of 266MHz, sources close to the company said. The P6S will likely double the L1 (Level 1) cache size of its predecessor, providing 16K bytes of L1 cache for data and another 16K bytes for instructions, sources said. The P6S is one of several development efforts in the works at Intel, sources said. In the first half of 1997, the Santa Clara, Calif., company is expected to ship a multimedia version of its P6 chip, built on either 0.35-micron or 0.25-micron technology. The so-called P68 will likely have more L1 and L2 cache memory and an enhanced floating-point architecture to better support digital-signal processing and the MPEG video standard, said Linley Gwennap, editor of The Microprocessor Report, in Sebastopol, Calif. But unlike the other versions of the P6, which have their L2 cache in the same module as the microprocessor, the P68 will likely have an external L2 cache and an enhanced floating-point unit while retaining the 32-bit architecture, said Martin Reynolds, an analyst at Dataquest Inc., in San Jose, Calif. Intel may offer a variety of external cache configurations to match the needs and budgets of system makers. Some users see the advance in P6 technology as giving them more choices in their networks. "We're going to skip Windows 95 and move to Windows NT. The P6 would fit in well with that strategy," said Jack Paulson, technology team leader for Eastman Kodak Inc., in Victor, N.Y. "That seems to cut right into Digital [Equipment Corp.]'s Alpha strategy." Intel is also developing two distinct versions of the P7, the successor to the P6 family, observers said. The first P7, which Intel is developing without assistance from technology partner Hewlett-Packard Co., will be a 64-bit chip designed for backward compatibility with the X86 family, sources said. That chip is expected to ship in 1997. The second 64-bit P7 chip, being jointly developed with HP, of Palo Alto, Calif., will combine RISC architecture with a still-experimental technology known as VLIW (very long instruction word), sources said. VLIW will probably not become available until at least 1997 or 1998. Some observers questioned whether both of the P7 designs will ever ship. "I think they will hedge," said Mike Feibus, an analyst with Mercury Research, in Scottsdale, Ariz., explaining that Intel will likely select the best of the two technologies for a single chip launch. "There will only be one P7," said Feibus, "but it still remains to be seen which is the P7." Intel officials declined to comment. Copyright (c) 1995 Ziff-Davis Publishing Company. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of Ziff-Davis Publishing Company is prohibited. PC Week and the PC Week logo are trademarks of Ziff-Davis Publishing Company. PC Week Online and the PC Week Online logo are trademarks of Ziff-Davis Publishing Company. Starder --- IBM OS/2 Dialog Editor * Origin: (C) Starder. From Home (%TEMP%) (2:5020/23) — SU.HARDW.PC.CPU (2:5020/299) —————————————————————————————— SU.HARDW.PC.CPU — From : Alex Iliynsky 2:5020/23 Wed 20 Sep 95 03:36 Subj : The P6 gets a name ———————————————————————————————————————————————————————————————————————————————— Hi All! The P6 gets a name September 19, 1995 The P6 gets a name By Charles Cooper Now we know: it won't be the Sextium. Intel Corp. today announced that it had chosen the name Pentium Pro processor for its upcoming P6 chip. "It's a good idea. We expected they would stick with the Pentium name," said Linley Gwennap, the editor in chief of Microprocessor Report, in Sebastapol, Calif. "They've invested hundreds of millions of dollars in building consumer and business identification with the name." And indeed, that was the gist of the marketing strategy behind Intel's decision to go with the new name. "We've built up tremendous equity in the market -- both with consumers and business -- in Pentium," said Carl Everett, senior vice president and general manager of desktop products at the Santa Clara, Calif., company. "Using that equity which has been built up there is very important. The suffix that we've used here simply designates something different." Intel does not usually disclose how much it spends on advertising. In the only break from that policy, Intel last year said it would spend $150 million in marketing the Pentium in 1994. However, analysts believe the sum total that Intel has invested in promoting the Pentium could exceed hundreds of millions of dollars. Everett declined to say whether Intel planned to continue using this naming convention for future microprocessor generations. However, he did seem to indicate that Intel was considering something along those lines. "This naming convention gives me the flexibility to do whatever I believe is correct," Everett said. He said that the decision to go with the Pentium Pro processors "was primarily a management decision ... All the cooks contributed to this soup." The first advertisements from Intel promoting the new microprocessor are expected to begin after the chip officially debuts in the fourth quarter. He said that Intel never considered using the P6 designation beyond the product's testing period. "Processor-code names were never intended to be code names," Everett said, predicting that Pentium Pro processor "will soon become words on everyone's lips." Intel, which plans to introduce the chip sometime during the fourth quarter, said the Pro processor will be targeted at "workstation and high-end desktop systems, as well as cost-effective servers." Meanwhile, Intel's chief rival, Advanced Micro Devices Inc., downplayed the significance of the announcement. "It's not surprising, considering that they spent so much time and money trying to brand the word Pentium," said an AMD spokesman. "We expected they would try and leverage off the Pentium name. With the Pro designator, I take it to mean that they are pretty clearly trying to identify this as a processor for a high-end machine." >Copyright (c) 1995 Ziff-Davis Publishing Company. All rights reserved. >Reproduction in whole or in part in any form or medium without express written >permission of Ziff-Davis Publishing Company is prohibited. PC Week and the PC >Week logo are trademarks of Ziff-Davis Publishing Company. PC Week Online and >the PC Week Online logo are trademarks of Ziff-Davis Publishing Company. Starder --- IBM OS/2 Dialog Editor * Origin: (C) Starder. From Home (%TEMP%) (2:5020/23)

Return to the main CPU FAQ page