Topic P6 from CPU FAQ base

Пожалуйста, обратите внимание на дату представленного здесь сообщения! Информация об адресах, телефонах, организациях и людях наверняка устарела и потеряла практическую ценность, обретя, однако, ценность историческую, заради которой до сих пор и хранится...


— 299.COMP.SYS.INTEL (2:5020/299)  ———————————————————————— 299.COMP.SYS.INTEL —
 From : fgruner@iil.intel.com               2:5020/299.100  Fri 03 Feb 95 20:57 
 Subj : P6 Info                                                                 
————————————————————————————————————————————————————————————————————————————————
This information is available on www.intel.com:

Intel's P6 processor


The P6 family of processors will be the next generation of Intel's processor
technology. All members of the P6 family will be designed for complete
compatibility with all PC software. The high performance of the P6 will
make it especially well-suited for upcoming desktop applications like
speech recognition and multimedia authoring, as well as for more
demanding server applications.

Intel's P6 family of processors...

   Ensures complete binary compatibility with previous generations of
   the Intel Architecture.

   Delivers superior performance through an innovation called
   Dynamic Execution.

   Provides support for enhanced data integrity and reliability feaures:
   ECC (Error Checking and Correction), Fault Analysis & Recovery,
   and Functional Redundancy Checking.

   Includes features that will greatly simplify the design of
   mutiprocessing systems.

The first member of the P6 processor family...

   Arrives in desktops and servers in 1995.

   Operates at an expected 250-300 MIPS.

   Integrates about 5.5 million transistors on the chip, compared to
   approximately 3.1 million transistors on the Pentium processor.

   Will initially be produced on the same high volume 0.6 micron
   process currently used for the 90 & 100 MHz versions of the
   Pentium processor, and will then move to a 0.35 micron process.

   Delivers performance that will scale up to 1000 MIPS with four
   processors.

Intel is presenting a technical paper on the P6 microarchitecture at the 1995
IEEE International Solid-State Circuits Conference on February 16th.
Please visit this page again after this date for updated information on Intel's
P6 processor.

--
Fred Gruner    Intel Corporation             e-mail: fgruner@mipos2.intel.com
2200 Mission College Blvd. MS RN6-16         phone:  (408) 765-8882
Santa Clara, CA 95056                        Fax:    (408) 765-6688

URL: http://www.best.com/~fgruner/

My opinions are strictly my own and I do not speak for Intel Corporation.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
---
 * Origin: a kind of gate (2:5020/299.100)

— 299.COMP.SYS.INTEL (2:5020/299)  ———————————————————————— 299.COMP.SYS.INTEL —
 From : colwell@pdx145.intel.com            2:5020/299.100  Tue 21 Feb 95 18:34 
 Subj : P6 operation explained                                                  
————————————————————————————————————————————————————————————————————————————————
In article <3ic9cq$bd7@ixnews3.ix.netcom.com> mschmit@ix.netcom.com (Mike
Schmit) writes:

   The Intel documents say that the P6 has a 12-stage pipeline, however
   there are no specific details of the 12 stages. However, the stages
   can be grouped into 3 phases:

    1. fetch and decode
    2. dispatch and execute
    3. retire and write-back

I thought the foils I used at ISSCC last week were put online at Intel's
www site. If so, the pipeline diagram does show some details on what the
pipe stages are:

In-order front end:

  1 cycle BTB prediction
  2.5 cycles IFU (instruction cache) lookup
  2.5 cycles decode to micro-ops
  1 cycle register rename
  1 cycle reservation station (RS) write

Out-of-order core:

  2 cycles RS dispatch
  1 cycles execute (for an integer operation)

In-order retirement:

  3 cycles

   In the dispatch/execute phase the micro-ops are sent to an instruction
   pool that holds 20 or 30 micro-ops.  The dispatcher chooses ops to be
   executed based on dependencies and available resources to perform the
   execution.  When an execution unit finishes with an operation the
   results are sent back to the pool (not to memory or an actual physical
   register).

Need to be precise here. An EU's results actually DO go "back to the pool",
meaning the reservation station, in case some uop is waiting for exactly
that data in order for it to become ready for execution. Those results also
go to the Reorder Buffer, which is the pool of "physical registers", so
that they can eventually be committed to permanent machine state. I
believe permanent machine state is what you meant by "physical registers",
but I'm drawing the distinction between the renamed register entries in the
ROB, which we also call "physical registers" in P6 parlance, and the ISA
registers in the Retirement Register File.

   ...
   How, exactly, ports 0 and 1 work have not been disclosed (i.e. can the
   FPU be working on a long multi-cycle operation and the integer unit 0
   still be used.  I assume this is true).  Five micro-ops can be issued in

It is true. The FPU's are pipelined and only tie up an RS port when an FP
operation is being dispatched, or when one is writing back a result.

   any given cycle, one to each resource port.  A throughput of only 3
   micro-ops can be sustained, apparently due to a limitation of the retire
   unit.  Ports 0 and 1 seem very much like the Pentium's U and V
   pipelines.

Really? I don't see the connection.

Bob Colwell  colwell@ichips.intel.com
Intel Corp.  JF1-19
5200 NE Elam Young Parkway
Hillsboro, Oregon 97124
---
 * Origin: a kind of gate (2:5020/299.100)

— 299.COMP.SYS.INTEL (2:5020/299)  ———————————————————————— 299.COMP.SYS.INTEL —
 From : colwell@pdx045.intel.com            2:5020/299.100  Tue 21 Feb 95 22:13 
 Subj : some P6 Q&A                                                             
————————————————————————————————————————————————————————————————————————————————
   Sender: phr@netcom9.netcom.com

   Are actual SPECint numbers (of the individual SPEC benchmarks, not the
   composite number) from the prototype system available?  I'm interested
   to know how dynamic execution affects the performance of programs like
   gcc that have a lot of decision trees.

Yep, gcc's a good benchmark. We relied on it heavily in performance tuning
P6, especially earlier in the design effort. When we go through the formal
SPEC process of validating our numbers, we'll make the whole suite of
benchmark results available on the net.

   What is the estimated SpecFP speed?

We haven't disclosed that yet.

   Does the dynamic execution stuff apply to floating point instructions?

Yes, the FP uops go through exactly the same process as the integer uops
do. FP uops do tend to take more than one clock cycle to execute, however.

   Does the FPU have a separate adder and multiplier that can operate at
   the same time?

Yes.

   Do you still have to put delay slots in your source
   code before you can use the multiplier output?  What I'm getting at
   is, if you write an inner product routine in the obvious way (straight
   line code saying sum += x[n]*y[n]), can the P6 start an add/multiply
   on every cycle except maybe the first few?  Can it do the add *and*
   multiply in a single cycle?

The FP adder and multiplier are on the same port, so P6 cannot initiate an
FADD and FMPY on every clock. Both units are pipelined, however, so it can
sustain multiple simultaneous FP ops on both. But there is a certain amount
of memory bandwidth implied in your question, and P6 can do but one load
and one store per clock (this is not a vector machine!).

   Are any new instructions added?  What is this native signal processing
   stuff?  Is bit reversed addressing supported, or anything like that?

We added conditional move instructions to both the integer and FP sides, to
help in cases of especially flaky branches. P6 does a good job on signal
processing because it naturally unrolls inner loops and extracts lots of
parallelism. But there is no suite of new instructions that's been added
beyond that.

   Will Intel publish a manual that actually says how to program the
   chip, or will it be like the stupid Pentium manual where all the good
   parts (Appendix H) are left out?

None of us are real anxious to help the competition reverse-engineer this
thing, but on the other hand we do want our customers to get maximal
performance. We're going to try really hard to get this balancing act
right. Stay tuned.

Bob Colwell  colwell@ichips.intel.com
Intel Corp.  JF1-19
5200 NE Elam Young Parkway
Hillsboro, Oregon 97124
---
 * Origin: a kind of gate (2:5020/299.100)

— 299.COMP.SYS.INTEL (2:5020/299)  ———————————————————————— 299.COMP.SYS.INTEL —
 From : colwell@pdx145.intel.com            2:5020/299.100  Wed 01 Mar 95 10:28 
 Subj : Unknown                                                                 
————————————————————————————————————————————————————————————————————————————————
In article <3ivvmh$45e@ixnews2.ix.netcom.com> mschmit@ix.netcom.com (Mike
Schmit) writes:

   In  colwell@pdx044.intel.com (Robert
   Colwell) writes:

   >P6 has a single port on the Reservation Station for FP operations, but it
   >actually has several FP units, including an FP adder, an FP multiplier, and
   >an FP divider. The adder and multiplier are pipelined, the divider is not.

   So I guess this means that a program could have a series of FADD, FMUL, FADD
   FMUL and they would issue in 1-cycle increments (assuming no data
   dependencies). And that the dispatcher could/would rearrange the
   order to eliminate the dependencies?

Not quite; FADD can accept a new operation every cycle, latency 3 cycles.
FMUL can accept a new op every other cycle, latency 5 cycles.

The RS will ensure that FP ops are issued only when their true data
dependencies have been satisfied (same as integer ops).

   How does this affect the FXCH optimizations for the Pentium? Does FXCH
   take up a full RS cycle for the FPU?

P6 renames FP registers anyway, so P6 sees an FXCH as a directive to swap
two of the entries in the FP rename table. FXCH's don't even appear in the
RS, since they don't need an execution unit in order to have their effect.

Bob Colwell  colwell@ichips.intel.com
Intel Corp.  JF1-19
5200 NE Elam Young Parkway
Hillsboro, Oregon 97124
---
 * Origin: a kind of gate (2:5020/299.100)

— SU.HARDW.PC.CPU (2:5020/299)  —————————————————————————————— SU.HARDW.PC.CPU —
 From : Alex Iliynsky                       2:5020/23       Wed 06 Sep 95 13:10 
 Subj : P6 news                                                                 
————————————————————————————————————————————————————————————————————————————————
Hi All!


Intel prepares P6 successor with clock speeds pushing 200MHz
September 4, 1995

Intel prepares P6 successor with clock speeds pushing 200MHz
By Tom Davey

Responding to concerns that its next-generation P6 chip may be slower than
comparable Pentiums on some tasks, Intel Corp. is rapidly developing an encore,
tentatively dubbed the P6S, that features an initial clock speed of at least
180MHz.

Expected to premiere in the second half of 1996, the new chip will be built
using 0.35-micron technology instead of the 0.60-micron technology that Intel is
using for its initial P6, which is due later this year.

By the end of 1996, the P6S could reach clock speeds of 266MHz, sources close
to the company said.

The P6S will likely double the L1 (Level 1) cache size of its predecessor,
providing 16K bytes of L1 cache for data and another 16K bytes for instructions,
sources said.

The P6S is one of several development efforts in the works at Intel, sources
said. In the first half of 1997, the Santa Clara, Calif., company is expected to
ship a multimedia version of its P6 chip, built on either 0.35-micron or
0.25-micron technology.

The so-called P68 will likely have more L1 and L2 cache memory and an enhanced
floating-point architecture to better support digital-signal processing and the
MPEG video standard, said Linley Gwennap, editor of The Microprocessor Report,
in Sebastopol, Calif.

But unlike the other versions of the P6, which have their L2 cache in the same
module as the microprocessor, the P68 will likely have an external L2 cache and
an enhanced floating-point unit while retaining the 32-bit architecture, said
Martin Reynolds, an analyst at Dataquest Inc., in San Jose, Calif. Intel may
offer a variety of external cache configurations to match the needs and budgets
of system makers.

Some users see the advance in P6 technology as giving them more choices in
their networks.

"We're going to skip Windows 95 and move to Windows NT. The P6 would fit in
well with that strategy," said Jack Paulson, technology team leader for Eastman
Kodak Inc., in Victor, N.Y. "That seems to cut right into Digital [Equipment
Corp.]'s Alpha strategy."

Intel is also developing two distinct versions of the P7, the successor to the
P6 family, observers said. The first P7, which Intel is developing without
assistance from technology partner Hewlett-Packard Co., will be a 64-bit chip
designed for backward compatibility with the X86 family, sources said. That chip
is expected to ship in 1997.

The second 64-bit P7 chip, being jointly developed with HP, of Palo Alto,
Calif., will combine RISC architecture with a still-experimental technology
known as VLIW (very long instruction word), sources said. VLIW will probably not
become available until at least 1997 or 1998.

Some observers questioned whether both of the P7 designs will ever ship.

"I think they will hedge," said Mike Feibus, an analyst with Mercury Research,
in Scottsdale, Ariz., explaining that Intel will likely select the best of the
two technologies for a single chip launch. "There will only be one P7," said
Feibus, "but it still remains to be seen which is the P7."

Intel officials declined to comment.


Copyright (c) 1995 Ziff-Davis Publishing Company. All rights reserved.
Reproduction in whole or in part in any form or medium without express written
permission of Ziff-Davis Publishing Company is prohibited. PC Week and the PC
Week logo are trademarks of Ziff-Davis Publishing Company. PC Week Online and
the PC Week Online logo are trademarks of Ziff-Davis Publishing Company.


   Starder

--- IBM OS/2 Dialog Editor
 * Origin:  (C) Starder. From Home (%TEMP%)  (2:5020/23)

— SU.HARDW.PC.CPU (2:5020/299)  —————————————————————————————— SU.HARDW.PC.CPU —
 From : Alex Iliynsky                       2:5020/23       Wed 20 Sep 95 03:36 
 Subj : The P6 gets a name                                                      
————————————————————————————————————————————————————————————————————————————————
Hi All!


The P6 gets a name

September 19, 1995

The P6 gets a name

By Charles Cooper

Now we know: it won't be the Sextium.

Intel Corp. today announced that it had chosen the name Pentium Pro processor
for its upcoming P6 chip.

"It's a good idea. We expected they would stick with the Pentium name," said
Linley Gwennap, the editor in chief of Microprocessor Report, in Sebastapol,
Calif. "They've  invested hundreds of millions of dollars in building consumer
and business identification with the name."

And indeed, that was the gist of the marketing strategy behind Intel's decision
to go with the new name.

"We've built up tremendous  equity in the market -- both with consumers and
business -- in Pentium," said Carl Everett, senior vice president and general
manager of desktop products at the Santa Clara, Calif., company. "Using that
equity which has been built up there is very important. The suffix that we've
used here simply designates something different."

Intel does not usually disclose how much it spends on advertising.  In the only
break from that policy, Intel last year  said it would spend $150 million in
marketing the Pentium in 1994. However, analysts believe the sum total that
Intel has invested in promoting the Pentium could exceed hundreds of millions of
dollars.

Everett declined to say whether Intel planned to continue using this naming
convention for future microprocessor generations. However, he did seem to
indicate that Intel was considering something along those lines.

"This naming convention gives me the flexibility to do whatever I believe is
correct," Everett said.

He said that the decision to go with the Pentium Pro processors "was primarily
a management decision ... All the cooks contributed to this soup."

The first advertisements from Intel promoting the new microprocessor are
expected to begin after the chip officially debuts in the fourth quarter.

He said that Intel never considered using the P6 designation beyond the
product's testing period.

"Processor-code names were never intended to be code names," Everett said,
predicting that Pentium Pro processor "will soon become words on everyone's
lips."

Intel, which plans to introduce the chip sometime during the fourth quarter,
said the Pro processor will be targeted at "workstation and high-end desktop
systems, as well as cost-effective servers."

Meanwhile, Intel's chief rival, Advanced Micro Devices Inc., downplayed the
significance of the announcement.

"It's not surprising,  considering that they spent so much time and money
trying to brand the word Pentium," said an AMD spokesman. "We expected they
would try and leverage off the Pentium name. With the Pro designator, I take it
to mean that they are pretty clearly trying to identify this as a processor for
a high-end machine."

>Copyright (c) 1995 Ziff-Davis Publishing Company. All rights reserved.
>Reproduction in whole or in part in any form or medium without express written
>permission of Ziff-Davis Publishing Company is prohibited. PC Week and the PC
>Week logo are trademarks of Ziff-Davis Publishing Company. PC Week Online and
>the PC Week Online logo are trademarks of Ziff-Davis Publishing Company.

   Starder

--- IBM OS/2 Dialog Editor
 * Origin:  (C) Starder. From Home (%TEMP%)  (2:5020/23)
Return to the main CPU FAQ page