® Power ISATM Version 2.05 October 23, 2007 Version 2.05 The following paragraph does not apply to the United © Copyright International Business Machines Corpora- Kingdom or any country or state where such provisions tion, 1994, 2007. All rights reserved. are inconsistent with local law. The specifications in this manual are subject to change without notice. This manual is provided "AS IS". Inter- national Business Machines Corp. makes no warranty of any kind, either expressed or implied, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. International Business Machines Corp. does not war- rant that the contents of this publication or the accom- panying source code examples, whether individually or as one or more groups, will meet your requirements or that the publication or the accompanying source code examples are error-free. This publication could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorpo- rated in new editions of the publication. Address comments to IBM Corporation, 11400 Burnett Road, Austin, Texas 78758-3493. IBM may use or dis- tribute whatever information you supply in any way it believes appropriate without incurring any obligation to you. The following terms are trademarks of the International Business Machines Corporation in the United States and/or other countries: IBM® Power ISA PowerPC® Power Architecture PowerPC Architecture Power Family RISC/System 6000® POWER POWER2 POWER4 POWER4+ POWER5 System/370 System z The POWER ARCHITECTURE and POWER.ORG. word marks and the Power and Power.org logos and related marks are trademarks and service marks licensed by Power.org. AltiVec is a trademark of Freescale Semiconductor, Inc. used under license. Notice to U.S. Government Users--Documentation Related to Restricted Rights--Use, duplication or dis- closure is subject to restrictions set fourth in GSA ADP Schedule Contract with IBM Corporation. ii Power ISATM Version 2.05 Preface The roots of the Power ISA (Instruction Set Architec- Book II, Power ISA Virtual Environment Architecture, ture) extend back over a quarter of a century, to IBM defines the storage model and related instructions and Research. The POWER (Performance Optimization facilities available to the application programmer. With Enhanced RISC) Architecture was introduced with Book III-S, Power ISA Operating Environment Architec- the RISC System/6000 product family in early 1990. In ture, defines the supervisor instructions and related 1991, Apple, IBM, and Motorola began the collabora- facilities used for general purpose implementations. tion to evolve to the PowerPC Architecture, expanding the architecture's applicability. In 1997, Motorola and Book III-E, Power ISA Operating Environment Architec- IBM began another collaboration, focused on optimiz- ture, defines the supervisor instructions and related ing PowerPC for embedded systems, which produced facilities used for embedded implementations. It was Book E. derived from Book E and extended to include APU function. In 2006, Freescale and IBM collaborated on the cre- ation of the Power ISA Version 2.03, which represented Book VLE, Power ISAVariable Length Encoded Instruc- the reunification of the architecture by combining Book tions Architecture, defines alternative instruction E content with the more general purpose PowerPC encodings and definitions intended to increase instruc- Version 2.02. A significant benefit of the reunification is tion density for very low end implementations. It was the establishment of a single, compatible, 64-bit pro- derived from an APU description developed by Frees- gramming model. The combining also extends explicit cale Semiconductor. architectural endorsement and control to Auxiliary Pro- cessing Units (APUs), units of function that were origi- As used in this document, the term "Power ISA" refers nally developed as implementation- or product family- to the instructions and facilities described in Books I, II, specific extensions in the context of the Book E allo- III-S, III-E, and VLE. cated opcode space. With the resulting architectural Usage of the phrase "Book III" refers to both Book III-S superset comes a framework that clearly establishes and Book III-E. An exception to this rule is when, at the requirements and identifies options. beginning of a Section or Book, it is specified that To a very large extent, application program compatibil- usage of the phrase "Book III" implies only either "Book ity has been maintained throughout the history of the III-S" or "Book III-E". architecture, with the main exception being application Change bars have been included to indicate changes exploitation of APUs. The framework identifies the from the Power ISA Version 2.04. base, pervasive, part of the architecture, and differenti- ates it from "categories" of optional function (see Section 1.3.5 of Book I). Because of the substantial dif- ferences in the supervisor (privileged) architecture that developed as Book E was optimized for embedded systems, the supervisor architectures for embedded and general purpose implementations are represented as mutually exclusive categories. Future versions of the architecture will seek to converge on a common solu- tion where possible. This document defines the Power ISAversion 2.05. It is comprised of five books and a set of appendices. Book I, Power ISA User Instruction Set Architecture, covers the base instruction set and related facilities available to the application programmer. It includes five chapters derived from APU function, including the vec- tor extension also known as Altivec. Preface iii Version 2.05 Summary of Changes in Power ISA 2.05 The PowerISA was created by applying the following Parity Instructions: Two new instructions are added requests for change(RFCs) to Power ISA version 2.04. that compute the parity on a word and a doubleword; see Section 3.3.12 of Book I. Power Management Architecture: Four new hypervi- sor-level instructions are added that put the processor Compare Byte Instruction: A new instruction is added into power-saving modes in which execution is sus- that compares each byte of a register to a byte-sized pended and power consumption is reduced to varying token; see Section 3.3.12 of Book I. degrees; see Chapter 3.3.2 of Book III-S. Come-From Address Register: A new hypervisor- Decimal Floating-Point: Decimal Floating-Point (DFP) accessible register is added that is set to the effective support is added as a category to the architecture; see address of the rfid instruction upon execution of the Chapter 5 of Book I - III. instruction. When a Branch instruction is executed and the branch is taken, the register is set to the effective PCR (Program Compatibility Register): A hypervisor- address of an instruction in the instruction cache block accessible register is added that controls the availabil- containing the Branch instruction; see Section 8.1.1 of ity of processor resources not available on implementa- Book III-S. tions of previous versions of the architecture; see Section 2.6 of Book III-S. Floating-Point Estimate: Additional language is added to allow for implementations that provide higher than The next 2 RFCs facilitate Decimal Floating Point emu- the minimum architected precision. (To cover current lation, as an alternative to supporting the instructions implementations, a new variant is introduced as cate- described in RFC02080 in hardware. gory: Phased-Out that allows denormalized operands Binary Coded Decimal Assist (BCD) Instructions: to be treated as 0, however this variant will be removed Three new hypervisor-level instructions are aded that in the next revision.) See Section 4.6.6.1 of Book I. operate on Binary Coded Decimal operands; see Load/Store Floating-Point Double Pair: New instruc- Section 4.4.3 of Book III-S. tions are added that transfer pairs of doublewords Hypervisor Emulation Assistance: Illegal Instruction between adjacent locations in memory and adjacent type Program interrupts are routed to a new interrupt floating-point registers. These instructions are catego- (Hypervisor Emulation Assistance interrupt), which rized as category: Phased-Out so that software does goes to the hypervisor, and the instruction image is not develop a dependency on them. See Section 4.6.1 copied into a new register (HEIR); see Section 6.5.19 of Book I. of Book III-S. Miscellaneous Changes for V 2.05: Miscellaneous pri- Changes to mtspr and mfspr: The behavior of the marily editorial enhancements. mtspr and mfspr instructions is defined for when the Disable Secondary Page Table Search: A new field is specified SPR is inaccessable due to the privilege added to the LPCR to disable the secondary hash func- level; see Section 4.4.5 of Book III-S. tion during a page table search; see Section 5.7.7.3 of Load Floating-Point as Integer Word: A new instruction Book III-S. is added that loads the specified word into the low- SLB Find Entry ESID Instruction: A new instruction is order half of an FPR, and propagates the sign bit to fill added that searches the SLB for an entry that matches the high-order half of the FPR; see Section 4.6.2 of the ESID specified by a GPR operand; see Book I. Section 5.9.3.1 of Book III-S. FPSCR extended to 64 bits: The FPSCR is extended Executed no-op Instruction: The instruction xori 0,0,0 to 64 bits to accomodate an anticipated need for more is designated as a form of no-op that is excluded from floating-point status and control bits. The mffs, mtfsfi, run-time optimizations related to no-ops; see and mtfsf instructions are extended to provide a Section 3.3.12 of Book I. means of managing the extended FPSCR; see Section 4.2.2 of Book I. Relaxed Page Table Alignment: The option is provided to align the Page Table at any 218-byte boundary Floating-Point Copy Sign: A new instruction is added instead of at a boundary that is a multiple of its size; that combines the sign from one register with the rest of see Section 5.7.7.4 of Book III-S. the floating-point number in another register, and the result is placed in the target register. This instruction Data Cache Block Flush Local Primary: A new variant can be used for building a floating-point number effi- of the dcbf instruction is added that flushes the speci- ciently (without bitwise manipulation); see Section 4.6.5 fied block from the local primary cache, but not from of Book I. lower level caches; see Section 3.3.2 of Book II. Reserved no-op: Book E reserved-no-op instructions are added to Power ISA as category: Phased-In. These iv Power ISATM Version 2.05 instructions are intended to be redefined as perfor- age location is Caching Inhibited and Guarded; see mance hint type instructions in the future while treated Section 4.4.1 of Book III-S. as no-ops in earlier processors; see Section 1.8.3 of Hypervisor Maintenance Interrupt: A new type of inter- Book I. rupt is added that is caused by certain conditions in the Stream Prefetching Extensions: The ability to specify a hardware requiring the attention of the Hypervisor but default prefetch depth for hardware-detected and soft- that are not serious enough to require a Machine ware-specified streams is added. Also, software-speci- Check; see Section 6.5.20 of Book III-S. fication of store data streams is introduced; see Mediated External Interrupt: A new type of External Section 3.3.2 of Book II. exception is added, called the "Mediated External Mutex Hint: A hint specification is added to the Load exception". The currently defined External exception is And Reserve instructions to indicate the type of mutual renamed to be a "Direct External exception". A new bit, exclusion algorithm represented by the corresponding called the "Mediated External Exception Request" sequence of instructions; see Section 3.4.2 of Book II. (MER) bit, is added to the LPCR, to indicate that a Mediated External exception is requested; see Enhanced Lookaside Buffer ManagementHint bits are Section 6.5.7 of Book III-S. added to the slbia instruction for limiting the invalida- tion of implementation-specific lookaside information Scaled Processor Utilization of Resources (SPURR): A (e.g. ERAT); see Section 5.9.3.1 of Book III-S. new SPR is added that measures the fraction of hard- ware resources used by a processor (as does the Caching Inhibited Load/Store Instructions (Hypervisor PURR), but takes into account changes in processing Only): The RMI bit in the LPCR is redefined as capacity made to help manage the thermal environ- reserved. Eight new hypervisor-level instructions are ment; see Section 7.6 of Book III-S. added to replace the function previously provided by the RMI bit: The storage accesses caused by the new Version Verification instructions are performed as though the specified stor- 1 See the Power ISA representative for your com- pany. Preface v Version 2.05 vi Power ISATM Version 2.05 Table of Contents 1.6.8 XL-FORM . . . . . . . . . . . . . . . . . . . 16 1.6.9 XFX-FORM . . . . . . . . . . . . . . . . . 16 1.6.10 XFL-FORM. . . . . . . . . . . . . . . . . 16 1.6.11 XS-FORM . . . . . . . . . . . . . . . . . . 17 1.6.12 XO-FORM . . . . . . . . . . . . . . . . . 17 Preface. . . . . . . . . . . . . . . . . . . . . . . . . iii 1.6.13 A-FORM . . . . . . . . . . . . . . . . . . . 17 Summary of Changes in Power ISA 2.05 1.6.14 M-FORM . . . . . . . . . . . . . . . . . . 17 iv 1.6.15 MD-FORM . . . . . . . . . . . . . . . . . 17 1.6.16 MDS-FORM . . . . . . . . . . . . . . . . 17 Table of Contents . . . . . . . . . . . . . . . vii 1.6.17 VA-FORM . . . . . . . . . . . . . . . . . . 17 1.6.18 VC-FORM . . . . . . . . . . . . . . . . . 17 1.6.19 VX-FORM. . . . . . . . . . . . . . . . . . 17 Figures. . . . . . . . . . . . . . . . . . . . . . . xxiii 1.6.20 EVX-FORM . . . . . . . . . . . . . . . . 17 1.6.21 EVS-FORM . . . . . . . . . . . . . . . . 17 Book I: 1.6.22 Z22-FORM . . . . . . . . . . . . . . . . . 18 1.6.23 Z23-FORM . . . . . . . . . . . . . . . . . 18 Power ISA User Instruction Set 1.6.24 Instruction Fields . . . . . . . . . . . . 18 1.7 Classes of Instructions . . . . . . . . . . 21 Architecture. . . . . . . . . . . . . . . . . . . . 1 1.7.1 Defined Instruction Class . . . . . . . 21 1.7.2 Illegal Instruction Class . . . . . . . . 21 Chapter 1. Introduction . . . . . . . . . . 3 1.7.3 Reserved Instruction Class . . . . . 21 1.1 Overview. . . . . . . . . . . . . . . . . . . . . . 3 1.8 Forms of Defined Instructions . . . . . 21 1.2 Instruction Mnemonics and Operands3 1.8.1 Preferred Instruction Forms . . . . . 21 1.3 Document Conventions . . . . . . . . . . 4 1.8.2 Invalid Instruction Forms . . . . . . . 21 1.3.1 Definitions . . . . . . . . . . . . . . . . . . . 4 1.8.3 Reserved-no-op Instructions [Cate- 1.3.2 Notation . . . . . . . . . . . . . . . . . . . . . 4 gory: Phased-In (sV2.07)] . . . . . . . . . . . 22 1.3.3 Reserved Fields and Reserved Val- 1.9 Exceptions. . . . . . . . . . . . . . . . . . . . 22 ues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.10 Storage Addressing. . . . . . . . . . . . 23 1.3.4 Description of Instruction Operation 7 1.10.1 Storage Operands . . . . . . . . . . . 23 1.3.5 Categories . . . . . . . . . . . . . . . . . . . 9 1.10.2 Instruction Fetches. . . . . . . . . . . 24 1.3.5.1 Phased-In/Phased-Out . . . . . . . 10 1.10.3 Effective Address Calculation. . . 26 1.3.5.2 Corequisite Category . . . . . . . . 10 1.3.5.3 Category Notation. . . . . . . . . . . 10 Chapter 2. Branch Processor. . . . . 29 1.3.6 Environments. . . . . . . . . . . . . . . . 11 2.1 Branch Processor Overview . . . . . . 29 1.4 Processor Overview . . . . . . . . . . . . 12 2.2 Instruction Execution Order. . . . . . . 29 1.5 Computation modes . . . . . . . . . . . . 14 2.3 Branch Processor Registers . . . . . . 30 1.5.1 Modes [Category: Server] . . . . . . 14 2.3.1 Condition Register . . . . . . . . . . . . 30 1.5.2 Modes [Category: Embedded] . . . 14 2.3.2 Link Register . . . . . . . . . . . . . . . . 31 1.6 Instruction formats . . . . . . . . . . . . . 14 2.3.3 Count Register . . . . . . . . . . . . . . . 31 1.6.1 I-FORM . . . . . . . . . . . . . . . . . . . . 14 2.4 Branch Instructions . . . . . . . . . . . . . 31 1.6.2 B-FORM . . . . . . . . . . . . . . . . . . . 14 2.5 Condition Register Instructions . . . . 37 1.6.3 SC-FORM . . . . . . . . . . . . . . . . . . 15 2.5.1 Condition Register Logical Instruc- 1.6.4 D-FORM . . . . . . . . . . . . . . . . . . . 15 tions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 1.6.5 DS-FORM . . . . . . . . . . . . . . . . . . 15 2.5.2 Condition Register Field Instruction . 1.6.6 DQ-FORM . . . . . . . . . . . . . . . . . . 15 38 1.6.7 X-FORM . . . . . . . . . . . . . . . . . . . 16 2.6 System Call Instruction . . . . . . . . . 39 Table of Contents vii Version 2.05 Chapter 3. Fixed-Point Processor . 41 3.3.14.2 Move To/From System Registers 3.1 Fixed-Point Processor Overview . . .41 [Category: Embedded] . . . . . . . . . . . . . 97 3.2 Fixed-Point Processor Registers . . .42 3.2.1 General Purpose Registers. . . . . .42 Chapter 4. Floating-Point Processor 3.2.2 Fixed-Point Exception Register. . .42 [Category: Floating-Point] . . . . . . . 99 3.2.3 Program Priority Register [Category: 4.1 Floating-Point Processor Overview 99 Server] . . . . . . . . . . . . . . . . . . . . . . . . . .43 4.2 Floating-Point Processor Registers100 3.2.4 Software Use SPRs [Category: 4.2.1 Floating-Point Registers . . . . . . 100 Embedded] . . . . . . . . . . . . . . . . . . . . . . .43 4.2.2 Floating-Point Status and Control 3.2.5 Device Control Registers Register. . . . . . . . . . . . . . . . . . . . . . . . 101 [Category: Embedded] . . . . . . . . . . . . . .43 4.3 Floating-Point Data . . . . . . . . . . . . 103 3.3 Fixed-Point Processor Instructions .44 4.3.1 Data Format. . . . . . . . . . . . . . . . 103 3.3.1 Fixed-Point Storage Access Instruc- 4.3.2 Value Representation . . . . . . . . 104 tions . . . . . . . . . . . . . . . . . . . . . . . . . . . .44 4.3.3 Sign of Result . . . . . . . . . . . . . . 105 3.3.1.1 Storage Access Exceptions . . . .44 4.3.4 Normalization and 3.3.2 Fixed-Point Load Instructions . . . .44 Denormalization . . . . . . . . . . . . . . . . . 106 3.3.2.1 64-bit Fixed-Point Load Instruc- 4.3.5 Data Handling and Precision . . . 106 tions [Category: 64-Bit] . . . . . . . . . . . . . .49 4.3.5.1 Single-Precision Operands . . . 106 3.3.3 Fixed-Point Store Instructions . . . .51 4.3.5.2 Integer-Valued Operands . . . . 107 3.3.3.1 64-bit Fixed-Point Store Instruc- 4.3.6 Rounding . . . . . . . . . . . . . . . . . . 107 tions [Category: 64-Bit] . . . . . . . . . . . . . .54 4.4 Floating-Point Exceptions . . . . . . . 108 3.3.4 Fixed-Point Load and Store with Byte 4.4.1 Invalid Operation Exception. . . . 110 Reversal Instructions . . . . . . . . . . . . . . .55 4.4.1.1 Definition. . . . . . . . . . . . . . . . . 110 3.3.5 Fixed-Point Load and Store Multiple 4.4.1.2 Action . . . . . . . . . . . . . . . . . . . 110 Instructions . . . . . . . . . . . . . . . . . . . . . . .56 4.4.2 Zero Divide Exception . . . . . . . . .111 3.3.6 Fixed-Point Move Assist Instructions 4.4.2.1 Definition. . . . . . . . . . . . . . . . . .111 [Category: Move Assist] . . . . . . . . . . . . .58 4.4.2.2 Action . . . . . . . . . . . . . . . . . . . .111 3.3.7 Other Fixed-Point Instructions. . . .61 4.4.3 Overflow Exception . . . . . . . . . . .111 3.3.8 Fixed-Point Arithmetic Instructions62 4.4.3.1 Definition. . . . . . . . . . . . . . . . . .111 3.3.8.1 64-bit Fixed-Point Arithmetic 4.4.3.2 Action . . . . . . . . . . . . . . . . . . . .111 Instructions [Category: 64-Bit] . . . . . . . .69 4.4.4 Underflow Exception . . . . . . . . . 112 3.3.9 Fixed-Point Compare Instructions.71 4.4.4.1 Definition. . . . . . . . . . . . . . . . . 112 3.3.10 Fixed-Point Trap Instructions . . .73 4.4.4.2 Action . . . . . . . . . . . . . . . . . . . 112 3.3.10.1 64-bit Fixed-Point Trap Instruc- 4.4.5 Inexact Exception . . . . . . . . . . . 113 tions [Category: 64-Bit] . . . . . . . . . . . . . .74 4.4.5.1 Definition. . . . . . . . . . . . . . . . . 113 3.3.11 Fixed-Point Select [Category: 4.4.5.2 Action . . . . . . . . . . . . . . . . . . . 113 Phased-In (sV2.06)] . . . . . . . . . . . . . . . .74 4.5 Floating-Point Execution Models . 113 3.3.12 Fixed-Point Logical Instructions .75 4.5.1 Execution Model for IEEE Opera- 3.3.12.1 64-bit Fixed-Point Logical Instruc- tions . . . . . . . . . . . . . . . . . . . . . . . . . . 113 tions [Category: 64-Bit] . . . . . . . . . . . . . .81 4.5.2 Execution Model for 3.3.12.2 Phased-In Fixed-Point Logical Multiply-Add Type Instructions . . . . . . 115 Instructions [Category: Phased-In 4.6 Floating-Point Processor Instructions . (sV2.05)] . . . . . . . . . . . . . . . . . . . . . . . . .81 116 3.3.13 Fixed-Point Rotate and Shift 4.6.1 Floating-Point Storage Access Instructions . . . . . . . . . . . . . . . . . . . . . . .82 Instructions . . . . . . . . . . . . . . . . . . . . . 117 3.3.13.1 Fixed-Point Rotate Instructions 82 4.6.1.1 Storage Access Exceptions . . 117 3.3.13.1.1 64-bit Fixed-Point Rotate 4.6.2 Floating-Point Load Instructions 117 Instructions [Category: 64-Bit] . . . . . . . .85 4.6.3 Floating-Point Store Instructions 121 3.3.13.2 Fixed-Point Shift Instructions . .88 4.6.4 Floating-Point Load Store Double- 3.3.13.2.1 64-bit Fixed-Point Shift Instruc- word Pair Instructions [Category: Floating- tions [Category: 64-Bit] . . . . . . . . . . . . . .90 Point.Phased-Out] . . . . . . . . . . . . . . . . 125 3.3.14 Move To/From System Register 4.6.5 Floating-Point Move Instructions 126 Instructions . . . . . . . . . . . . . . . . . . . . . . .92 4.6.6 Floating-Point Arithmetic Instructions 3.3.14.1 Move to/From One Condition 127 Register Field Instructions [Category: 4.6.6.1 Floating-Point Elementary Arith- Phased-In (sV2.05)] . . . . . . . . . . . . . . . .96 metic Instructions . . . . . . . . . . . . . . . . 127 viii Power ISATM I-III, VLE Version 2.05 4.6.6.2 Floating-Point Multiply-Add Instruc- 5.5.11 Summary of Normal Rounding And tions . . . . . . . . . . . . . . . . . . . . . . . . . . 132 Range Actions . . . . . . . . . . . . . . . . . . . 160 4.6.7 Floating-Point Rounding and Con- 5.6 DFP Instruction Descriptions. . . . . 162 version Instructions . . . . . . . . . . . . . . . 134 5.6.1 DFP Arithmetic Instructions . . . . 163 4.6.7.1 Floating-Point Rounding Instruc- 5.6.2 DFP Compare Instructions. . . . . 167 tion . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 5.6.3 DFP Test Instructions. . . . . . . . . 170 4.6.7.2 Floating-Point Convert To/From 5.6.4 DFP Quantum Adjustment Instruc- Integer Instructions . . . . . . . . . . . . . . . 134 tions . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 4.6.7.3 Floating Round to Integer Instruc- 5.6.5 DFP Conversion Instructions . . . 182 tions [Category: Floating-Point.Phased-In 5.6.5.1 DFP Data-Format Conversion (sV2.05)] . . . . . . . . . . . . . . . . . . . . . . . 136 Instructions . . . . . . . . . . . . . . . . . . . . . 182 4.6.8 Floating-Point Compare Instructions 5.6.5.2 DFP Data-Type Conversion 138 Instructions . . . . . . . . . . . . . . . . . . . . . 185 4.6.9 Floating-Point Select Instruction 139 5.6.6 DFP Format Instructions . . . . . . 187 4.6.10 Floating-Point Status and Control 5.6.7 DFP Instruction Summary . . . . . 191 Register Instructions . . . . . . . . . . . . . . 140 Chapter 6. Vector Processor Chapter 5. Decimal Floating-Point [Category: Vector] . . . . . . . . . . . . . 193 [Category: Decimal Floating-Point]. . 6.1 Vector Processor Overview. . . . . . 194 143 6.2 Chapter Conventions . . . . . . . . . . 194 5.1 Decimal Floating-Point (DFP) Proces- 6.2.1 Description of Instruction Operation. sor Overview . . . . . . . . . . . . . . . . . . . . 143 194 5.2 DFP Register Handling . . . . . . . . . 144 6.3 Vector Processor Registers . . . . . 195 5.2.1 DFP Usage of Floating-Point Regis- 6.3.1 Vector Registers. . . . . . . . . . . . . 195 ters . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 6.3.2 Vector Status and Control Register . 5.3 DFP Support for Non-DFP Data Types 195 146 6.3.3 VR Save Register. . . . . . . . . . . . 196 5.4 DFP Number Representation . . . . 147 6.4 Vector Storage Access Operations 196 5.4.1 DFP Data Format . . . . . . . . . . . 148 6.4.1 Accessing Unaligned Storage Oper- 5.4.1.1 Fields Within the Data Format 148 ands. . . . . . . . . . . . . . . . . . . . . . . . . . . 198 5.4.1.2 Summary of DFP Data Formats . . 6.5 Vector Integer Operations . . . . . . . 199 149 6.5.1 Integer Saturation. . . . . . . . . . . . 199 5.4.1.3 Preferred DPD Encoding . . . . 149 6.6 Vector Floating-Point Operations . 200 5.4.2 Classes of DFP Data . . . . . . . . . 149 6.6.1 Floating-Point Overview . . . . . . . 200 5.5 DFP Execution Model . . . . . . . . . . 150 6.6.2 Floating-Point Exceptions . . . . . 200 5.5.1 Rounding . . . . . . . . . . . . . . . . . . 150 6.6.2.1 NaN Operand Exception . . . . . 201 5.5.2 Rounding Mode Specification . . 151 6.6.2.2 Invalid Operation Exception . . 201 5.5.3 Formation of Final Result. . . . . . 152 6.6.2.3 Zero Divide Exception . . . . . . . 201 5.5.3.1 Use of Ideal Exponent . . . . . . 152 6.6.2.4 Log of Zero Exception . . . . . . . 201 5.5.4 Arithmetic Operations . . . . . . . . 152 6.6.2.5 Overflow Exception . . . . . . . . . 201 5.5.4.1 Sign of Arithmetic Result . . . . 152 6.6.2.6 Underflow Exception . . . . . . . . 202 5.5.5 Compare Operations . . . . . . . . . 153 6.7 Vector Storage Access Instructions . . 5.5.6 Test Operations . . . . . . . . . . . . . 153 202 5.5.7 Quantum Adjustment Operations 153 6.7.1 Storage Access Exceptions . . . . 202 5.5.8 Conversion Operations . . . . . . . 153 6.7.2 Vector Load Instructions . . . . . . 203 5.5.8.1 Data-Format Conversion . . . . 153 6.7.3 Vector Store Instructions . . . . . . 206 5.5.8.2 Data-Type Conversion . . . . . . 154 6.7.4 Vector Alignment Support Instruc- 5.5.9 Format Operations. . . . . . . . . . . 154 tions . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 5.5.10 DFP Exceptions . . . . . . . . . . . . 154 6.8 Vector Permute and Formatting 5.5.10.1 Invalid Operation Exception . 156 Instructions . . . . . . . . . . . . . . . . . . . . . 209 5.5.10.2 Zero Divide Exception . . . . . 157 6.8.1 Vector Pack and Unpack Instructions 5.5.10.3 Overflow Exception. . . . . . . . 157 209 5.5.10.4 Underflow Exception. . . . . . . 158 6.8.2 Vector Merge Instructions . . . . . 214 5.5.10.5 Inexact Exception . . . . . . . . . 159 6.8.3 Vector Splat Instructions . . . . . . 216 6.8.4 Vector Permute Instruction. . . . . 217 6.8.5 Vector Select Instruction . . . . . . 217 Table of Contents ix Version 2.05 6.8.6 Vector Shift Instructions . . . . . . .218 7.3.8 Saturation, Shift, and Bit Reverse 6.9 Vector Integer Instructions . . . . . . .220 Models . . . . . . . . . . . . . . . . . . . . . . . . 267 6.9.1 Vector Integer Arithmetic Instructions 7.3.8.1 Saturation . . . . . . . . . . . . . . . . 267 220 7.3.8.2 Shift Left . . . . . . . . . . . . . . . . . 267 6.9.1.1 Vector Integer Add Instructions 220 7.3.8.3 Bit Reverse . . . . . . . . . . . . . . . 267 6.9.1.2 Vector Integer Subtract Instruc- 7.3.9 SPE Instruction Set . . . . . . . . . . 268 tions . . . . . . . . . . . . . . . . . . . . . . . . . . .223 6.9.1.3 Vector Integer Multiply Instructions Chapter 8. Embedded Floating-Point 226 [Category: SPE.Embedded Float Scal 6.9.1.4 Vector Integer Multiply-Add/Sum Instructions . . . . . . . . . . . . . . . . . . . . . .228 ar Double] 6.9.1.5 Vector Integer Sum-Across Instruc- [Category: SPE.Embedded Float Scal tions . . . . . . . . . . . . . . . . . . . . . . . . . . .233 ar Single] 6.9.1.6 Vector Integer Average Instruc- [Category: SPE.Embedded Float Vect tions . . . . . . . . . . . . . . . . . . . . . . . . . . .235 6.9.1.7 Vector Integer Maximum and Mini- or]. . . . . . . . . . . . . . . . . . . . . . . . . . 315 mum Instructions . . . . . . . . . . . . . . . . .237 8.1 Overview. . . . . . . . . . . . . . . . . . . . 315 6.9.2 Vector Integer Compare Instructions 8.2 Programming Model . . . . . . . . . . . 316 241 8.2.1 Signal Processing Embedded Float- 6.9.3 Vector Logical Instructions . . . . .244 ing-Point Status and Control Register 6.9.4 Vector Integer Rotate and Shift (SPEFSCR). . . . . . . . . . . . . . . . . . . . . 316 Instructions . . . . . . . . . . . . . . . . . . . . . .245 8.2.2 Floating-Point Data Formats . . . 316 6.10 Vector Floating-Point Instruction Set . 8.2.3 Exception Conditions . . . . . . . . . 317 249 8.2.3.1 Denormalized Values on Input 317 6.10.1 Vector Floating-Point Arithmetic 8.2.3.2 Embedded Floating-Point Overflow Instructions . . . . . . . . . . . . . . . . . . . . . .249 and Underflow . . . . . . . . . . . . . . . . . . . 317 6.10.2 Vector Floating-Point Maximum and 8.2.3.3 Embedded Floating-Point Invalid Minimum Instructions . . . . . . . . . . . . . .251 Operation/Input Errors . . . . . . . . . . . . 317 6.10.3 Vector Floating-Point Rounding and 8.2.3.4 Embedded Floating-Point Round Conversion Instructions . . . . . . . . . . . .252 (Inexact) . . . . . . . . . . . . . . . . . . . . . . . 317 6.10.4 Vector Floating-Point Compare 8.2.3.5 Embedded Floating-Point Divide Instructions . . . . . . . . . . . . . . . . . . . . . .255 by Zero . . . . . . . . . . . . . . . . . . . . . . . . 317 6.10.5 Vector Floating-Point Estimate 8.2.3.6 Default Results . . . . . . . . . . . . 318 Instructions . . . . . . . . . . . . . . . . . . . . . .257 8.2.4 IEEE 754 Compliance . . . . . . . . 318 6.11 Vector Status and Control Register 8.2.4.1 Sticky Bit Handling For Exception Instructions . . . . . . . . . . . . . . . . . . . . . .259 Conditions . . . . . . . . . . . . . . . . . . . . . . 318 8.3 Embedded Floating-Point Instructions Chapter 7. Signal Processing Engine 319 8.3.1 Load/Store Instructions . . . . . . . 319 (SPE) 8.3.2 SPE.Embedded Float Vector Instruc- [Category: Signal Processing Engine tions [Category: SPE.Embedded Float ] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 Vector]. . . . . . . . . . . . . . . . . . . . . . . . . 319 7.1 Overview . . . . . . . . . . . . . . . . . . . .261 8.3.3 SPE.Embedded Float Scalar Single 7.2 Nomenclature and Conventions. . .261 Instructions 7.3 Programming Model. . . . . . . . . . . .262 [Category: SPE.Embedded Float Scalar 7.3.1 General Operation. . . . . . . . . . . .262 Single] . . . . . . . . . . . . . . . . . . . . . . . . . 328 7.3.2 GPR Registers . . . . . . . . . . . . . .262 8.3.4 SPE.Embedded Float Scalar Double 7.3.3 Accumulator Register . . . . . . . . .262 Instructions 7.3.4 Signal Processing Embedded Float- [Category: SPE.Embedded Float Scalar ing-Point Status and Control Register Double] . . . . . . . . . . . . . . . . . . . . . . . . 335 (SPEFSCR) . . . . . . . . . . . . . . . . . . . . .262 8.4 Embedded Floating-Point Results 7.3.5 Data Formats . . . . . . . . . . . . . . .265 Summary. . . . . . . . . . . . . . . . . . . . . . . 344 7.3.5.1 Integer Format . . . . . . . . . . . . .265 7.3.5.2 Fractional Format . . . . . . . . . . .265 7.3.6 Computational Operations . . . . .266 7.3.7 SPE Instructions . . . . . . . . . . . . .267 x Power ISATM I-III, VLE Version 2.05 Chapter 9. Legacy Move Assist C.5 Convert to Single-Precision Embed- Instruction [Category: Legacy Move ded Floating-Point from Integer Word . 381 C.6 Convert to Double-Precision Embed- Assist] . . . . . . . . . . . . . . . . . . . . . . 349 ded Floating-Point from Integer Word . 381 C.7 Convert to Double-Precision Embed- Chapter 10. Legacy Integer Multiply- ded Floating-Point from Integer Double- Accumulate Instructions word. . . . . . . . . . . . . . . . . . . . . . . . . . . 382 [Category: Legacy Integer Multiply- Accumulate] . . . . . . . . . . . . . . . . . 351 Appendix D. Assembler Extended Mnemonics . . . . . . . . . . . . . . . . . . . 383 Appendix A. Suggested Floating- D.1 Symbols . . . . . . . . . . . . . . . . . . . . 383 D.2 Branch Mnemonics . . . . . . . . . . . . 384 Point Models [Category: Floating- D.2.1 BO and BI Fields . . . . . . . . . . . . 384 Point] . . . . . . . . . . . . . . . . . . . . . . . 361 D.2.2 Simple Branch Mnemonics . . . . 384 A.1 Floating-Point Round to Single-Preci- D.2.3 Branch Mnemonics Incorporating sion Model. . . . . . . . . . . . . . . . . . . . . . 361 Conditions . . . . . . . . . . . . . . . . . . . . . . 385 A.2 Floating-Point Convert to Integer D.2.4 Branch Prediction . . . . . . . . . . . 386 Model . . . . . . . . . . . . . . . . . . . . . . . . . 365 D.3 Condition Register Logical Mnemonics A.3 Floating-Point Convert from Integer 387 Model . . . . . . . . . . . . . . . . . . . . . . . . . 368 D.4 Subtract Mnemonics . . . . . . . . . . . 387 A.4 Floating-Point Round to Integer Model D.4.1 Subtract Immediate . . . . . . . . . . 387 369 D.4.2 Subtract . . . . . . . . . . . . . . . . . . . 387 D.5 Compare Mnemonics . . . . . . . . . . 388 Appendix A. Densely Packed D.5.1 Doubleword Comparisons . . . . . 388 D.5.2 Word Comparisons . . . . . . . . . . 388 Decimal . . . . . . . . . . . . . . . . . . . . . 371 D.6 Trap Mnemonics . . . . . . . . . . . . . . 389 A.1 BCD-to-DPD Translation . . . . . . . 371 D.7 Rotate and Shift Mnemonics . . . . 391 A.2 DPD-to-BCD Translation . . . . . . . 371 D.7.1 Operations on Doublewords . . . 391 A.3 Preferred DPD encoding . . . . . . . 372 D.7.2 Operations on Words. . . . . . . . . 392 D.8 Move To/From Special Purpose Reg- Appendix B. Vector RTL Functions ister Mnemonics . . . . . . . . . . . . . . . . . 393 [Category: Vector]. . . . . . . . . . . . . 375 D.9 Miscellaneous Mnemonics . . . . . . 393 Appendix C. Embedded Floating- Appendix E. Programming Examples Point RTL Functions 397 E.1 Multiple-Precision Shifts . . . . . . . . 397 E.2 Floating-Point Conversions [Category: [Category: SPE.Embedded Float Floating-Point] . . . . . . . . . . . . . . . . . . . 400 Scalar Double] E.2.1 Conversion from [Category: SPE.Embedded Float Floating-Point Number to Scalar Single] Floating-Point Integer . . . . . . . . . . . . . 400 E.2.2 Conversion from [Category: SPE.Embedded Float Floating-Point Number to Signed Fixed- Vector] . . . . . . . . . . . . . . . . . . . . . . 377 Point Integer Doubleword . . . . . . . . . . 400 C.1 Common Functions . . . . . . . . . . . 377 E.2.3 Conversion from C.2 Convert from Single-Precision Embed- Floating-Point Number to Unsigned Fixed- ded Floating-Point to Integer Word with Point Integer Doubleword . . . . . . . . . . 400 Saturation . . . . . . . . . . . . . . . . . . . . . . 378 E.2.4 Conversion from C.3 Convert from Double-Precision Floating-Point Number to Signed Fixed- Embedded Floating-Point to Integer Word Point Integer Word. . . . . . . . . . . . . . . . 400 with Saturation . . . . . . . . . . . . . . . . . . 379 E.2.5 Conversion from C.4 Convert from Double-Precision Floating-Point Number to Unsigned Fixed- Embedded Floating-Point to Integer Dou- Point Integer Word. . . . . . . . . . . . . . . . 401 bleword with Saturation. . . . . . . . . . . . 380 Table of Contents xi Version 2.05 E.2.6 Conversion from Signed Fixed-Point Chapter 2. Effect of Operand Integer Doubleword to Floating-Point Num- Placement on Performance . . . . . 421 ber . . . . . . . . . . . . . . . . . . . . . . . . . . . .401 2.1 Instruction Restart . . . . . . . . . . . 423 E.2.7 Conversion from Unsigned Fixed- Point Integer Doubleword to Floating-Point Number. . . . . . . . . . . . . . . . . . . . . . . . .401 Chapter 3. Storage Control E.2.8 Conversion from Signed Fixed-Point Instructions . . . . . . . . . . . . . . . . . . 425 Integer Word to Floating-Point Number 401 3.1 Parameters Useful to Application Pro- E.3 Floating-Point Selection [Category: grams . . . . . . . . . . . . . . . . . . . . . . . . . 425 Floating-Point] . . . . . . . . . . . . . . . . . . .402 3.2 Data Stream Control Register (DSCR) E.3.1 Comparison to Zero . . . . . . . . . .402 [Category: Stream] . . . . . . . . . . . . . . . 426 E.3.2 Minimum and Maximum . . . . . . .402 3.3 Cache Management Instructions . 427 E.3.3 Simple if-then-else 3.3.1 Instruction Cache Instructions . . 428 Constructions . . . . . . . . . . . . . . . . . . . .402 3.3.2 Data Cache Instructions . . . . . . 429 E.3.4 Notes . . . . . . . . . . . . . . . . . . . . .402 3.3.2.1 Obsolete Data Cache Instructions E.4 Vector Unaligned Storage Operations [Category: Vector.Phased-Out] . . . . . . 437 [Category: Vector]. . . . . . . . . . . . . . . . .403 3.4 Synchronization Instructions. . . . . 440 E.4.1 Loading a Unaligned Quadword 3.4.1 Instruction Synchronize Instruction . Using Permute from Big-Endian Storage . . 440 403 3.4.2 Load and Reserve and Store Condi- tional Instructions . . . . . . . . . . . . . . . . 440 Book II: 3.4.2.1 64-Bit Load and Reserve and Store Conditional Instructions [Category: 64-Bit] 444 Power ISA Virtual Environment 3.4.3 Memory Barrier Instructions . . . 446 Architecture . . . . . . . . . . . . . . . . . . 405 3.4.4 Wait Instruction . . . . . . . . . . . . . 449 Chapter 1. Storage Model . . . . . . 407 Chapter 4. Time Base . . . . . . . . . 451 1.1 Definitions . . . . . . . . . . . . . . . . . . .407 4.1 Time Base Overview . . . . . . . . . . . 451 1.2 Introduction . . . . . . . . . . . . . . . . . .408 4.2 Time Base . . . . . . . . . . . . . . . . . . 451 1.3 Virtual Storage . . . . . . . . . . . . . . .408 4.2.1 Time Base Instructions . . . . . . . 451 1.4 Single-copy Atomicity . . . . . . . . . .409 4.3 Alternate Time Base [Category: Alter- 1.5 Cache Model . . . . . . . . . . . . . . . . .409 nate Time Base] . . . . . . . . . . . . . . . . . 454 1.6 Storage Control Attributes . . . . . . .410 1.6.1 Write Through Required . . . . . . .410 Chapter 5. External Control 1.6.2 Caching Inhibited . . . . . . . . . . . . 411 [Category: External Control] . . . 455 1.6.3 Memory Coherence Required [Cate- 5.1 External Access Instructions . . . . 456 gory: Memory Coherence] . . . . . . . . . . 411 1.6.4 Guarded . . . . . . . . . . . . . . . . . . 411 1.6.5 Endianness [Category: Embed- Appendix A. Assembler Extended ded.Little-Endian] . . . . . . . . . . . . . . . . .412 Mnemonics . . . . . . . . . . . . . . . . . . 457 1.6.6 Variable Length Encoded (VLE) A.1 Data Cache Block Flush Mnemonics . Instructions . . . . . . . . . . . . . . . . . . . . . .412 457 1.7 Shared Storage . . . . . . . . . . . . . .413 A.2 Synchronize Mnemonics . . . . . . . 457 1.7.1 Storage Access Ordering . . . . .413 1.7.2 Storage Ordering of I/O Accesses . . Appendix B. Programming Examples 415 1.7.3 Atomic Update. . . . . . . . . . . . . . .415 for Sharing Storage . . . . . . . . . . . 459 1.7.3.1 Reservations . . . . . . . . . . . . .415 B.1 Atomic Update Primitives . . . . . . . 459 1.7.3.2 Forward Progress . . . . . . . . . .417 B.2 Lock Acquisition and Release, and 1.8 Instruction Storage . . . . . . . . . . . . .417 Related Techniques . . . . . . . . . . . . . . 461 1.8.1 Concurrent Modification and Execu- B.2.1 Lock Acquisition and Import Barriers tion of Instructions . . . . . . . . . . . . . . . .419 461 B.2.1.1 Acquire Lock and Import Shared Storage . . . . . . . . . . . . . . . . . . . . . . . . 461 xii Power ISATM I-III, VLE Version 2.05 B.2.1.2 Obtain Pointer and Import Shared 3.3.2.1 Entering and Exiting Power-Sav- Storage . . . . . . . . . . . . . . . . . . . . . . . . 461 ing Mode . . . . . . . . . . . . . . . . . . . . . . . 485 B.2.2 Lock Release and Export Barriers. . 462 Chapter 4. Fixed-Point Processor 487 B.2.2.1 Export Shared Storage and 4.1 Fixed-Point Processor Overview . . 487 Release Lock . . . . . . . . . . . . . . . . . . . 462 4.2 Special Purpose Registers . . . . . . 487 B.2.2.2 Export Shared Storage and 4.3 Fixed-Point Processor Registers. . 487 Release Lock using lwsync . . . . . . . . . 462 4.3.1 Processor Version Register . . . . 487 B.2.3 Safe Fetch . . . . . . . . . . . . . . . . . 462 4.3.2 Processor Identification Register 487 B.3 List Insertion . . . . . . . . . . . . . . . . . 463 4.3.3 Control Register . . . . . . . . . . . . . 488 B.4 Notes . . . . . . . . . . . . . . . . . . . . . . 463 4.3.4 Program Priority Register. . . . . . 488 4.3.5 Software-use SPRs . . . . . . . . . . 489 Book III-S: 4.4 Fixed-Point Processor Instructions 490 4.4.1 Fixed-Point Load and Store Caching Power ISA Operating Environment Inhibited Instructions . . . . . . . . . . . . . . 490 4.4.2 Fixed-Point Load and Store Quad- Architecture - Server Environment . . word Instructions [Category: Load/Store 465 Quadword] . . . . . . . . . . . . . . . . . . . . . . 493 4.4.3 Binary Coded Decimal (BCD) Assis- Chapter 1. Introduction . . . . . . . . 467 tance Instructions [Category: BCD Assis- 1.1 Overview. . . . . . . . . . . . . . . . . . . . 467 tance]. . . . . . . . . . . . . . . . . . . . . . . . . . 494 1.2 Document Conventions . . . . . . . . 467 4.4.4 OR Instruction . . . . . . . . . . . . . . 496 1.2.1 Definitions and Notation. . . . . . . 467 4.4.5 Move To/From System Register 1.2.2 Reserved Fields. . . . . . . . . . . . . 468 Instructions . . . . . . . . . . . . . . . . . . . . . 496 1.3 General Systems Overview . . . . . 468 1.4 Exceptions . . . . . . . . . . . . . . . . . . 469 Chapter 5. Storage Control . . . . . 505 1.5 Synchronization . . . . . . . . . . . . . . 469 5.1 Overview . . . . . . . . . . . . . . . . . . . . 505 1.5.1 Context Synchronization . . . . . . 469 5.2 Storage Exceptions . . . . . . . . . . . . 506 1.5.2 Execution Synchronization . . . . 469 5.3 Instruction Fetch . . . . . . . . . . . . . . 506 5.3.1 Implicit Branch . . . . . . . . . . . . . . 506 Chapter 2. Logical Partitioning 5.3.2 Address Wrapping Combined with (LPAR) . . . . . . . . . . . . . . . . . . . . . . 471 Changing MSR Bit SF . . . . . . . . . . . . . 506 5.4 Data Access . . . . . . . . . . . . . . . . . 506 2.1 Overview. . . . . . . . . . . . . . . . . . . . 471 5.5 Performing Operations 2.2 Logical Partitioning Control Register Out-of-Order . . . . . . . . . . . . . . . . . . . . 506 (LPCR) . . . . . . . . . . . . . . . . . . . . . . . . 471 5.6 Invalid Real Address . . . . . . . . . . . 507 2.3 Real Mode Offset Register (RMOR) . . 5.7 Storage Addressing. . . . . . . . . . . . 508 473 5.7.1 32-Bit Mode . . . . . . . . . . . . . . . . 508 2.4 Hypervisor Real Mode Offset Register 5.7.2 Virtualized Partition Memory (VPM) (HRMOR) . . . . . . . . . . . . . . . . . . . . . . 474 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . 508 2.5 Logical Partition 5.7.3 Real And Virtual Real Addressing Identification Register (LPIDR) . . . . . . 474 Modes . . . . . . . . . . . . . . . . . . . . . . . . . 508 2.6 Processor Compatibility Register 5.7.3.1 Hypervisor Offset Real Mode (PCR) . . . . . . . . . . . . . . . . . . . . . . . . . 474 Address . . . . . . . . . . . . . . . . . . . . . . . . 509 2.7 Other Hypervisor Resources . . . . 475 5.7.3.2 Offset Real Mode Address . . . 509 2.8 Sharing Hypervisor Resources . . . 476 5.7.3.3 Storage Control Attributes for 2.9 Hypervisor Interrupt Little-Endian Accesses in Real and Hypervisor Real (HILE) Bit . . . . . . . . . . . . . . . . . . . . . . 476 Addressing Modes. . . . . . . . . . . . . . . . 510 5.7.3.3.1 Hypervisor Real Mode Storage Chapter 3. Branch Processor . . . 477 Control . . . . . . . . . . . . . . . . . . . . . . . . . 510 3.1 Branch Processor Overview . . . . . 477 5.7.3.4 Virtual Real Mode Addressing 3.2 Branch Processor Registers . . . . . 477 Mechanism . . . . . . . . . . . . . . . . . . . . . 510 3.2.1 Machine State Register . . . . . . . 477 5.7.3.5 Storage Control Attributes for 3.3 Branch Processor Instructions . . . 479 Implicit Storage Accesses . . . . . . . . . . 511 3.3.1 System Linkage Instructions . . . 479 5.7.4 Address Ranges Having Defined 3.3.2 Power-Saving Mode Instructions 481 Uses . . . . . . . . . . . . . . . . . . . . . . . . . . 512 Table of Contents xiii Version 2.05 5.7.5 Address Translation Overview . .514 6.2.2 Hypervisor Machine Status Save/ 5.7.6 Virtual Address Generation . . . . .514 Restore Registers . . . . . . . . . . . . . . . . 548 5.7.6.1 Segment Lookaside Buffer (SLB) . 6.2.3 Data Address Register . . . . . . . 548 514 6.2.4 Hypervisor Data Address Register 5.7.6.2 SLB Search . . . . . . . . . . . . . . .515 548 5.7.7 Virtual to Real Translation. . . . . .517 6.2.5 Data Storage Interrupt 5.7.7.1 Page Table. . . . . . . . . . . . . . . .518 Status Register . . . . . . . . . . . . . . . . . . 548 5.7.7.2 Storage Description 6.2.6 Hypervisor Data Storage Interrupt Register 1 . . . . . . . . . . . . . . . . . . . . . . .520 Status Register . . . . . . . . . . . . . . . . . 549 5.7.7.3 Page Table Search . . . . . . . . .520 6.2.7 Hypervisor Emulation Instruction 5.7.7.4 Relaxed Page Table Alignment Register [Category: Hypervisor Emula- [Category: Server.Relaxed Page Table tion Assistance] . . . . . . . . . . . . . . . . 549 Alignment]. . . . . . . . . . . . . . . . . . . . . . .522 6.2.8 Hypervisor Maintenance Exception 5.7.8 Reference and Change Recording . . Register. . . . . . . . . . . . . . . . . . . . . . . . 549 522 6.2.9 Hypervisor Maintenance Exception 5.7.9 Storage and Virtual Page Class Key Enable Register . . . . . . . . . . . . . . . . . 549 Protection . . . . . . . . . . . . . . . . . . . . . . .524 6.3 Interrupt Synchronization . . . . . . . 550 5.7.9.1 Virtual Page Class Key Protection . 6.4 Interrupt Classes . . . . . . . . . . . . . 550 524 6.4.1 Precise Interrupt . . . . . . . . . . . . 550 5.7.9.2 Storage Protection, Address 6.4.2 Imprecise Interrupt. . . . . . . . . . . 550 Translation Enabled . . . . . . . . . . . . . . .525 6.4.3 Interrupt Processing . . . . . . . . . 551 5.7.9.3 Storage Protection, Address 6.4.4 Implicit alteration of HSRR0 and Translation Disabled. . . . . . . . . . . . . . .526 HSRR1 . . . . . . . . . . . . . . . . . . . . . . . . 554 5.8 Storage Control Attributes . . . . . . .527 6.5 Interrupt Definitions. . . . . . . . . . . . 555 5.8.1 Guarded Storage. . . . . . . . . . . . .527 6.5.1 System Reset Interrupt . . . . . . . 556 5.8.1.1 Out-of-Order Accesses to Guarded 6.5.2 Machine Check Interrupt . . . . . . 557 Storage . . . . . . . . . . . . . . . . . . . . . . . . .527 6.5.3 Data Storage Interrupt . . . . . . . . 559 5.8.2 Storage Control Bits . . . . . . . . . .527 6.5.4 Data Segment Interrupt . . . . . . . 560 5.8.2.1 Storage Control Bit Restrictions . . . 6.5.5 Instruction Storage Interrupt . . . 560 528 6.5.6 Instruction Segment 5.8.2.2 Altering the Storage Control Bits . . Interrupt. . . . . . . . . . . . . . . . . . . . . . . . 561 528 6.5.7 External Interrupt . . . . . . . . . . . . 561 5.9 Storage Control Instructions. . . . . .529 6.5.8 Alignment Interrupt . . . . . . . . . . 562 5.9.1 Cache Management Instructions 529 6.5.9 Program Interrupt . . . . . . . . . . . 563 5.9.2 Synchronize Instruction. . . . . . . .529 6.5.10 Floating-Point Unavailable 5.9.3 Lookaside Buffer Interrupt. . . . . . . . . . . . . . . . . . . . . . . . 564 Management. . . . . . . . . . . . . . . . . . . . .529 6.5.11 Decrementer Interrupt . . . . . . . 565 5.9.3.1 SLB Management Instructions 530 6.5.12 Hypervisor Decrementer 5.9.3.2 Bridge to SLB Architecture [Cate- Interrupt. . . . . . . . . . . . . . . . . . . . . . . . 565 gory:Server.Phased-Out] . . . . . . . . . . .536 6.5.13 System Call Interrupt . . . . . . . . 565 5.9.3.2.1 Segment Register 6.5.14 Trace Interrupt [Category: Trace] . Manipulation Instructions . . . . . . . . . . .536 565 5.9.3.3 TLB Management Instructions .539 6.5.15 Hypervisor Data Storage Inter- 5.10 Page Table Update Synchronization rupt . . . . . . . . . . . . . . . . . . . . . . . . . . . 566 Requirements . . . . . . . . . . . . . . . . . . . .543 6.5.16 Hypervisor Instruction Storage 5.10.1 Page Table Updates . . . . . . . . .543 Interrupt . . . . . . . . . . . . . . . . . . . . . . . 567 5.10.1.1 Adding a Page Table Entry . .544 6.5.17 Hypervisor Data Segment Inter- 5.10.1.2 Modifying a Page Table Entry 545 rupt . . . . . . . . . . . . . . . . . . . . . . . . . . . 567 5.10.1.3 Deleting a Page Table Entry .546 6.5.18 Hypervisor Instruction Segment Interrupt . . . . . . . . . . . . . . . . . . . . . . . 568 Chapter 6. Interrupts . . . . . . . . . . 547 6.5.19 Hypervisor Emulation Assis- 6.1 Overview . . . . . . . . . . . . . . . . . . . .547 tance Interrupt [Category: Hypervisor 6.2 Interrupt Registers . . . . . . . . . . . . .548 Emulation Assistance] . . . . . . . . . . . 568 6.2.1 Machine Status Save/Restore Regis- 6.5.20 Hypervisor Maintenance Interrupt . ters . . . . . . . . . . . . . . . . . . . . . . . . . . . .548 568 xiv Power ISATM I-III, VLE Version 2.05 6.5.21 Performance Monitor B.2.4 Monitor Mode Control Register A596 Interrupt [Category: Server.Performance B.2.5 Sampled Instruction Address Regis- Monitor] . . . . . . . . . . . . . . . . . . . . . . . . 569 ter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597 6.5.22 Vector Unavailable Interrupt [Cate- B.2.6 Sampled Data Address Register 597 gory: Vector] . . . . . . . . . . . . . . . . . . . . 569 B.3 Performance Monitor 6.6 Partially Executed Interrupt . . . . . . . . . . . . . . . . . . . . . . . . 598 Instructions . . . . . . . . . . . . . . . . . . . . . 570 B.4 Interaction with the Trace Facility . 598 6.7 Exception Ordering . . . . . . . . . . . . 571 6.7.1 Unordered Exceptions . . . . . . . . 571 Appendix C. Example Trace 6.7.2 Ordered Exceptions . . . . . . . . . . 571 Extensions . . . . . . . . . . . . . . . . . . . 599 6.8 Interrupt Priorities . . . . . . . . . . . . . 571 Chapter 7. Timer Facilities. . . . . . 575 Appendix D. Interpretation of the 7.1 Overview. . . . . . . . . . . . . . . . . . . . 575 DSISR as Set by an Alignment 7.2 Time Base (TB) . . . . . . . . . . . . . . 575 Interrupt . . . . . . . . . . . . . . . . . . . . . 601 7.2.1 Writing the Time Base . . . . . . . . 576 7.3 Decrementer . . . . . . . . . . . . . . . . . 576 Appendix E. Programming Examples 7.3.1 Writing and Reading the Decre- 603 menter . . . . . . . . . . . . . . . . . . . . . . . . . 577 7.4 Hypervisor Decrementer. . . . . . . . 577 E.1 Unsigned Single-PrecisionBCD Arith- 7.5 Processor Utilization of Resources metic . . . . . . . . . . . . . . . . . . . . . . . . . . 603 Register (PURR) . . . . . . . . . . . . . . . . . 578 E.2 Signed Single-Precision BCD Arith- 7.6 Scaled Processor Utilization of metic . . . . . . . . . . . . . . . . . . . . . . . . . . 603 Resources Register (PURR) . . . . . . . . 578 E.3 Unsigned Extended-Precision BCD Arithmetic. . . . . . . . . . . . . . . . . . . . . . . 604 Chapter 8. Debug Facilities . . . . 581 Book III-E: 8.1 Overview. . . . . . . . . . . . . . . . . . . . 581 8.1.1 Come-From Address Register . . 581 8.1.2 Data Address Breakpoint. . . . . . 581 Power ISA Operating Environment Architecture - Embedded Chapter 9. External Control Environment . . . . . . . . . . . . . . . . . . 605 [Category: External Control] . . . . 583 9.1 External Access Register . . . . . . . 583 Chapter 1. Introduction. . . . . . . . . 607 9.2 External Access Instructions. . . . . 583 1.1 Overview . . . . . . . . . . . . . . . . . . . . 607 1.2 32-Bit Implementations . . . . . . . . . 607 Chapter 10. Synchronization 1.3 Document Conventions . . . . . . . . . 607 Requirements for Context Alterations 1.3.1 Definitions and Notation . . . . . . . 607 585 1.3.2 Reserved Fields . . . . . . . . . . . . . 608 1.4 General Systems Overview. . . . . . 608 1.5 Exceptions. . . . . . . . . . . . . . . . . . . 608 Appendix A. Assembler Extended 1.6 Synchronization . . . . . . . . . . . . . . 609 Mnemonics . . . . . . . . . . . . . . . . . . 589 1.6.1 Context Synchronization . . . . . . 609 A.1 Move To/From Special Purpose Reg- 1.6.2 Execution Synchronization. . . . . 609 ister Mnemonics . . . . . . . . . . . . . . . . . 589 Chapter 2. Branch Processor. . . . 611 Appendix B. Example Performance 2.1 Branch Processor Overview . . . . . 611 Monitor. . . . . . . . . . . . . . . . . . . . . . 591 2.2 Branch Processor Registers . . . . . 611 B.1 PMM Bit of the Machine State Register 2.2.1 Machine State Register . . . . . . . 611 592 2.3 Branch Processor Instructions . . . 613 B.2 Special Purpose Registers . . . . . . 592 2.4 System Linkage Instructions . . . . . 613 B.2.1 Performance Monitor Counter Regis- ters . . . . . . . . . . . . . . . . . . . . . . . . . . . 593 Chapter 3. Fixed-Point Processor 617 B.2.2 Monitor Mode Control Register 0 594 3.1 Fixed-Point Processor Overview . . 617 B.2.3 Monitor Mode Control Register 1 596 3.2 Special Purpose Registers . . . . . . 617 Table of Contents xv Version 2.05 3.3 Fixed-Point Processor Registers . .617 4.9.2.1 Lock Setting and Clearing . . . 653 3.3.1 Processor Version Register . . . .617 4.9.2.2 Error Conditions . . . . . . . . . . . 653 3.3.2 Processor Identification Register 617 4.9.2.2.1 Overlocking . . . . . . . . . . . . . 653 3.3.3 Software-use SPRs. . . . . . . . . . .618 4.9.2.2.2 Unable-to-lock and Unable-to- 3.3.4 External Process ID Registers [Cate- unlock Conditions . . . . . . . . . . . . . . . . 654 gory: Embedded.External PID] . . . . . . .619 4.9.2.3 Cache Locking Instructions . . 655 3.3.4.1 External Process ID Load Context 4.9.3 Synchronize Instruction . . . . . . . 657 (EPLC) Register . . . . . . . . . . . . . . . . . .619 4.9.4 Lookaside Buffer 3.3.4.2 External Process ID Store Context Management . . . . . . . . . . . . . . . . . . . . 657 (EPSC) Register . . . . . . . . . . . . . . . . . .620 4.9.4.1 TLB Management Instructions 658 3.4 Fixed-Point Processor Instructions 621 3.4.1 Move To/From System Register Chapter 5. Interrupts and Exceptions Instructions . . . . . . . . . . . . . . . . . . . . . .621 661 3.4.2 External Process ID Instructions 5.1 Overview. . . . . . . . . . . . . . . . . . . . 662 [Category: Embedded.External PID] . .627 5.2 Interrupt Registers . . . . . . . . . . . . 662 5.2.1 Save/Restore Register 0 . . . . . . 662 Chapter 4. Storage Control . . . . . 639 5.2.2 Save/Restore Register 1 . . . . . . 662 4.1 Storage Addressing . . . . . . . . . . . .639 5.2.3 Critical Save/Restore Register 0 663 4.2 Storage Exceptions . . . . . . . . . . . .639 5.2.4 Critical Save/Restore Register 1 663 4.3 Instruction Fetch . . . . . . . . . . . . . .640 5.2.5 Debug Save/Restore Register 0 4.3.1 Implicit Branch. . . . . . . . . . . . . . .640 [Category: Embedded.Enhanced Debug] . 4.3.2 Address Wrapping Combined with 663 Changing MSR Bit CM . . . . . . . . . . . . .640 5.2.6 Debug Save/Restore Register 1 4.4 Data Access . . . . . . . . . . . . . . . . . .640 [Category: Embedded.Enhanced Debug] . 4.5 Performing Operations 663 Out-of-Order . . . . . . . . . . . . . . . . . . . . .640 5.2.7 Data Exception Address Register . . 4.6 Invalid Real Address . . . . . . . . . . .641 664 4.7 Storage Control . . . . . . . . . . . . . . .641 5.2.8 Interrupt Vector Prefix Register . 664 4.7.1 Storage Control Registers. . . . . .641 5.2.9 Exception Syndrome Register . . 665 4.7.1.1 Process ID Register . . . . . . . . .641 5.2.10 Interrupt Vector Offset Registers . . 4.7.1.2 Translation Lookaside Buffer . .641 666 4.7.2 Page Identification. . . . . . . . . . . .643 5.2.11 Machine Check Registers . . . . 666 4.7.3 Address Translation . . . . . . . . . .646 5.2.11.1 Machine Check Save/Restore 4.7.4 Storage Access Control . . . . . . .647 Register 0 . . . . . . . . . . . . . . . . . . . . . . 667 4.7.4.1 Execute Access . . . . . . . . . . . .647 5.2.11.2 Machine Check Save/Restore 4.7.4.2 Write Access . . . . . . . . . . . . . .647 Register 1 . . . . . . . . . . . . . . . . . . . . . . 667 4.7.4.3 Read Access . . . . . . . . . . . . . .647 5.2.11.3 Machine Check Syndrome Regis- 4.7.4.4 Storage Access Control Applied to ter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 667 Cache Management Instructions . . . . .647 5.2.12 External Proxy Register [Category: 4.7.4.5 Storage Access Control Applied to External Proxy] . . . . . . . . . . . . . . . . . . 667 String Instructions. . . . . . . . . . . . . . . . .648 5.3 Exceptions . . . . . . . . . . . . . . . . . . 668 4.7.5 TLB Management . . . . . . . . . . . .648 5.4 Interrupt Classification . . . . . . . . . 668 4.8 Storage Control Attributes . . . . . . .649 5.4.1 Asynchronous Interrupts . . . . . . 668 4.8.1 Guarded Storage. . . . . . . . . . . . .649 5.4.2 Synchronous Interrupts . . . . . . . 668 4.8.1.1 Out-of-Order Accesses to Guarded 5.4.2.1 Synchronous, Precise Interrupts . Storage . . . . . . . . . . . . . . . . . . . . . . . . .650 669 4.8.2 User-Definable . . . . . . . . . . . . . .650 5.4.2.2 Synchronous, Imprecise Interrupts 4.8.3 Storage Control Bits . . . . . . . . . .650 669 4.8.3.1 Storage Control Bit Restrictions . . . 5.4.3 Interrupt Classes . . . . . . . . . . . . 669 650 5.4.4 Machine Check Interrupts . . . . . 669 4.8.3.2 Altering the Storage Control Bits . . 5.5 Interrupt Processing . . . . . . . . . . . 670 651 5.6 Interrupt Definitions. . . . . . . . . . . . 672 4.9 Storage Control Instructions. . . . . .652 5.6.1 Critical Input Interrupt . . . . . . . . 674 4.9.1 Cache Management Instructions 652 5.6.2 Machine Check Interrupt . . . . . . 674 4.9.2 Cache Locking [Category: Embed- 5.6.3 Data Storage Interrupt . . . . . . . . 675 ded Cache Locking] . . . . . . . . . . . . . . .653 5.6.4 Instruction Storage Interrupt . . . 676 xvi Power ISATM I-III, VLE Version 2.05 5.6.5 External Input Interrupt . . . . . . . 676 5.9.1.5 Exception Priorities for Defined 5.6.6 Alignment Interrupt . . . . . . . . . . 677 Trap Instructions . . . . . . . . . . . . . . . . . 690 5.6.7 Program Interrupt . . . . . . . . . . . 678 5.9.1.6 Exception Priorities for Defined 5.6.8 Floating-Point Unavailable Interrupt System Call Instruction . . . . . . . . . . . . 691 679 5.9.1.7 Exception Priorities for Defined 5.6.9 System Call Interrupt . . . . . . . . . 679 Branch Instructions . . . . . . . . . . . . . . . 691 5.6.10 Auxiliary Processor Unavailable 5.9.1.8 Exception Priorities for Defined Interrupt. . . . . . . . . . . . . . . . . . . . . . . . 679 Return From Interrupt Instructions. . . . 691 5.6.11 Decrementer Interrupt . . . . . . . 680 5.9.1.9 Exception Priorities for Other 5.6.12 Fixed-Interval Timer Interrupt . 680 Defined Instructions . . . . . . . . . . . . . . . 691 5.6.13 Watchdog Timer Interrupt . . . . 680 5.9.2 Exception Priorities for Reserved 5.6.14 Data TLB Error Interrupt . . . . . 681 Instructions . . . . . . . . . . . . . . . . . . . . . 691 5.6.15 Instruction TLB Error Interrupt . 681 5.6.16 Debug Interrupt . . . . . . . . . . . . 682 Chapter 6. Reset and Initialization. . . 5.6.17 SPE/Embedded Floating-Point/ 693 Vector Unavailable Interrupt 6.1 Background . . . . . . . . . . . . . . . . . . 693 [Categories: SPE.Embedded Float Scalar 6.2 Reset Mechanisms . . . . . . . . . . . . 693 Double, SPE.Embedded Float Vector, 6.3 Processor State After Reset . . . . . 693 Vector]. . . . . . . . . . . . . . . . . . . . . . . . . 683 6.4 Software Initialization Requirements . . 5.6.18 Embedded Floating-Point Data 694 Interrupt [Categories: SPE.Embedded Float Scalar Double, SPE.Embedded Float Scalar Sin- Chapter 7. Timer Facilities . . . . . . 695 gle, SPE.Embedded Float Vector] . . . 684 7.1 Overview . . . . . . . . . . . . . . . . . . . . 695 5.6.19 Embedded Floating-Point Round 7.2 Time Base (TB) . . . . . . . . . . . . . . . 695 Interrupt 7.2.1 Writing the Time Base . . . . . . . . 696 [Categories: SPE.Embedded Float Scalar 7.3 Decrementer . . . . . . . . . . . . . . . . . 697 Double, SPE.Embedded Float Scalar Sin- 7.3.1 Writing and Reading the Decre- gle, SPE.Embedded Float Vector] . . . 684 menter . . . . . . . . . . . . . . . . . . . . . . . . . 697 5.6.20 Performance Monitor Interrupt [Cat- 7.3.2 Decrementer Events . . . . . . . . . 697 egory: Embedded.Performance Monitor]. . 7.4 Decrementer Auto-Reload Register . . 685 698 5.6.21 Processor Doorbell Interrupt [Cate- 7.5 Timer Control Register . . . . . . . . . 698 gory: Embedded.Processor Control] . . 685 7.5.1 Timer Status Register . . . . . . . . . 700 5.6.22 Processor Doorbell Critical Interrupt 7.6 Fixed-Interval Timer . . . . . . . . . . . 700 [Category: Embedded.Processor Control]. 7.7 Watchdog Timer . . . . . . . . . . . . . . 701 685 7.8 Freezing the Timer Facilities . . . . . 702 5.7 Partially Executed Instructions . . . 686 5.8 Interrupt Ordering and Masking . . 687 Chapter 8. Debug Facilities . . . . . 703 5.8.1 Guidelines for System Software 688 8.1 Overview . . . . . . . . . . . . . . . . . . . . 703 5.8.2 Interrupt Order . . . . . . . . . . . . . . 689 8.2 Internal Debug Mode. . . . . . . . . . . 703 5.9 Exception Priorities . . . . . . . . . . . . 689 8.3 External Debug Mode [Category: 5.9.1 Exception Priorities for Defined Embedded.Enhanced Debug] . . . . . . . 704 Instructions . . . . . . . . . . . . . . . . . . . . . 690 8.4 Debug Events . . . . . . . . . . . . . . . . 704 5.9.1.1 Exception Priorities for Defined 8.4.1 Instruction Address Compare Debug Floating-Point Load and Store Instructions Event . . . . . . . . . . . . . . . . . . . . . . . . . . 705 690 8.4.2 Data Address Compare Debug 5.9.1.2 Exception Priorities for Other Event . . . . . . . . . . . . . . . . . . . . . . . . . . 707 Defined Load and Store Instructions and 8.4.3 Trap Debug Event . . . . . . . . . . . 708 Defined Cache Management Instructions . 8.4.4 Branch Taken Debug Event . . . . 708 690 8.4.5 Instruction Complete Debug Event . 5.9.1.3 Exception Priorities for Other 709 Defined Floating-Point Instructions . . . 690 8.4.6 Interrupt Taken Debug Event . . . 709 5.9.1.4 Exception Priorities for Defined 8.4.6.1 Causes of Interrupt Taken Debug Privileged Instructions . . . . . . . . . . . . . 690 Events . . . . . . . . . . . . . . . . . . . . . . . . . 709 Table of Contents xvii Version 2.05 8.4.6.2 Interrupt Taken Debug Event A.2.1.1 Data Cache Debug Tag Register Description . . . . . . . . . . . . . . . . . . . . . .709 High. . . . . . . . . . . . . . . . . . . . . . . . . . . 728 8.4.7 Return Debug Event . . . . . . . . . .710 A.2.1.2 Data Cache Debug Tag Register 8.4.8 Unconditional Debug Event. . . . .710 Low . . . . . . . . . . . . . . . . . . . . . . . . . . . 728 8.4.9 Critical Interrupt Taken Debug Event A.2.1.3 Instruction Cache Debug Data [Category: Embedded.Enhanced Debug]. . Register. . . . . . . . . . . . . . . . . . . . . . . . 729 710 A.2.1.4 Instruction Cache Debug Tag Reg- 8.4.10 Critical Interrupt Return Debug ister High . . . . . . . . . . . . . . . . . . . . . . . 729 Event [Category: Embedded.Enhanced A.2.1.5 Instruction Cache Debug Tag Reg- Debug] . . . . . . . . . . . . . . . . . . . . . . . . . 711 ister Low . . . . . . . . . . . . . . . . . . . . . . . 729 8.5 Debug Registers . . . . . . . . . . . . . . 711 A.2.2 Embedded Cache Debug Instruc- 8.5.1 Debug Control Registers. . . . . . . 711 tions . . . . . . . . . . . . . . . . . . . . . . . . . . 730 8.5.1.1 Debug Control Register 0 (DBCR0) 711 Appendix B. Assembler Extended 8.5.1.2 Debug Control Register 1 (DBCR1) Mnemonics . . . . . . . . . . . . . . . . . . 733 712 B.1 Move To/From Special Purpose Reg- 8.5.1.3 Debug Control Register 2 (DBCR2) ister Mnemonics . . . . . . . . . . . . . . . . . 734 714 8.5.2 Debug Status Register . . . . . . . .715 8.5.3 Instruction Address Compare Regis- Appendix C. Guidelines for 64-bit ters . . . . . . . . . . . . . . . . . . . . . . . . . . . .716 Implementations in 32-bit Mode and 8.5.4 Data Address Compare Registers . . 32-bit Implementations . . . . . . . . 735 716 C.1 Hardware Guidelines . . . . . . . . . . 735 8.5.5 Data Value Compare Registers .717 C.1.1 64-bit Specific Instructions . . . . 735 8.6 Debugger Notify Halt Instruction C.1.2 Registers on 32-bit Implementations [Category: Embedded.Enhanced Debug]. . 735 718 C.1.3 Addressing on 32-bit Implementa- tions . . . . . . . . . . . . . . . . . . . . . . . . . . 735 Chapter 9. Processor Control C.1.4 TLB Fields on 32-bit Implementa- [Category: Embedded.Processor tions . . . . . . . . . . . . . . . . . . . . . . . . . . 735 Control] . . . . . . . . . . . . . . . . . . . . . 719 C.2 32-bit Software Guidelines . . . . . . 735 C.2.1 32-bit Instruction Selection . . . . 735 9.1 Overview . . . . . . . . . . . . . . . . . . . .719 9.2 Programming Model. . . . . . . . . . . .719 9.2.1 Processor Message Handling and Appendix D. Type FSL Storage Filtering. . . . . . . . . . . . . . . . . . . . . . . . .719 Control 9.2.1.1 Doorbell Message Filtering . . .720 [Category: Embedded.MMU Type 9.2.1.2 Doorbell Critical Message Filtering FSL] . . . . . . . . . . . . . . . . . . . . . . . . 737 720 9.3 Processor Control Instructions . . . .721 D.1 Type FSL Storage Control Overview . 737 D.2 Type FSL Storage Control Registers . Chapter 10. Synchronization 737 Requirements for Context Alterations D.2.1 Process ID Registers (PIDn) . . . 737 723 D.2.2 Translation Lookaside Buffer . . 737 D.2.3 Address Space Identifiers . . . . . 738 Appendix A. Implementation- D.2.4 MMU Assist Registers. . . . . . . . 738 D.2.4.1 MAS0 Register . . . . . . . . . . . . 738 Dependent Instructions . . . . . . . . 727 D.2.4.2 MAS1 Register . . . . . . . . . . . . 739 A.1 Embedded Cache Initialization D.2.4.3 MAS2 Register . . . . . . . . . . . . 739 [Category: Embedded.Cache Initialization] D.2.4.4 MAS3 Register . . . . . . . . . . . . 740 727 D.2.4.5 MAS4 Register . . . . . . . . . . . . 740 A.2 Embedded Cache Debug Facility D.2.4.6 MAS6 Register . . . . . . . . . . . . 741 [Category: Embedded.Cache Debug]. .728 D.2.4.7 MAS7 Register . . . . . . . . . . . . 741 A.2.1 Embedded Cache Debug Registers . D.2.5 MMU Configuration and Control 728 Registers . . . . . . . . . . . . . . . . . . . . . . . 743 xviii Power ISATM I-III, VLE Version 2.05 D.2.5.1 MMU Configuration Register Variable Length Encoding (VLE) Envi (MMUCFG) . . . . . . . . . . . . . . . . . . . . . 743 ronment. . . . . . . . . . . . . . . . . . . . . . 759 D.2.5.2 TLB Configuration Registers (TLBnCFG) . . . . . . . . . . . . . . . . . . . . . 743 D.2.5.3 MMU Control and Status Register Chapter 1. Variable Length Encoding (MMUCSR0) . . . . . . . . . . . . . . . . . . . . 743 Introduction . . . . . . . . . . . . . . . . . . 761 D.3 Page Identification and Address 1.1 Overview . . . . . . . . . . . . . . . . . . . . 761 Translation . . . . . . . . . . . . . . . . . . . . . 744 1.2 Documentation Conventions . . . . . 762 D.4 TLB Management. . . . . . . . . . . . . 744 1.2.1 Description of Instruction Operation. D.4.1 Reading TLB Entries . . . . . . . . . 744 762 D.4.2 Writing TLB Entries . . . . . . . . . . 744 1.3 Instruction Mnemonics and Operands D.4.3 Invalidating TLB Entries . . . . . . 745 762 D.4.4 Searching TLB Entries . . . . . . . 745 1.4 VLE Instruction Formats . . . . . . . . 762 D.4.5 TLB Replacement Hardware Assist 1.4.1 BD8-form (16-bit Branch Instruc- 745 tions) . . . . . . . . . . . . . . . . . . . . . . . . . . 762 D.5 32-bit and 64-bit Specific MMU Behav- 1.4.2 C-form (16-bit Control Instructions) . ior . . . . . . . . . . . . . . . . . . . . . . . . . . . . 746 762 D.6 Type FSL MMU Instructions. . . . . 747 1.4.3 IM5-form (16-bit register + immediate Instructions) . . . . . . . . . . . . . . . . . . . . . 762 Appendix E. Example Performance 1.4.4 OIM5-form (16-bit register + offset Monitor [Category: immediate Instructions) . . . . . . . . . . . . 762 1.4.5 IM7-form (16-bit Load immediate Embedded.Performance Monitor] 751 Instructions) . . . . . . . . . . . . . . . . . . . . 762 E.1 Overview . . . . . . . . . . . . . . . . . . . 751 1.4.6 R-form (16-bit Monadic Instructions) E.2 Programming Model . . . . . . . . . . . 751 763 E.2.1 Event Counting . . . . . . . . . . . . . 752 1.4.7 RR-form (16-bit Dyadic Instructions) E.2.2 Processor Context Configurability . . 763 752 1.4.8 SD4-form (16-bit Load/Store Instruc- E.2.3 Event Selection . . . . . . . . . . . . . 752 tions) . . . . . . . . . . . . . . . . . . . . . . . . . . 763 E.2.4 Thresholds . . . . . . . . . . . . . . . . 753 1.4.9 BD15-form . . . . . . . . . . . . . . . . . 763 E.2.5 Performance Monitor Exception 753 1.4.10 BD24-form . . . . . . . . . . . . . . . . 763 E.2.6 Performance Monitor Interrupt . 753 1.4.11 D8-form . . . . . . . . . . . . . . . . . . 763 E.3 Performance Monitor Registers . . 753 1.4.12 I16A-form . . . . . . . . . . . . . . . . . 763 E.3.1 Performance Monitor Global Control 1.4.13 I16L-form . . . . . . . . . . . . . . . . . 763 Register 0 . . . . . . . . . . . . . . . . . . . . . . 753 1.4.14 M-form . . . . . . . . . . . . . . . . . . . 763 E.3.2 Performance Monitor Local Control 1.4.15 SCI8-form. . . . . . . . . . . . . . . . . 763 A Registers . . . . . . . . . . . . . . . . . . . . 754 1.4.16 LI20-form . . . . . . . . . . . . . . . . . 763 E.3.3 Performance Monitor Local Control 1.4.17 Instruction Fields . . . . . . . . . . . 763 B Registers . . . . . . . . . . . . . . . . . . . . 754 E.3.4 Performance Monitor Counter Regis- Chapter 2. VLE Storage Addressing . ters . . . . . . . . . . . . . . . . . . . . . . . . . . . 755 E.4 Performance Monitor Instructions 756 767 E.5 Performance Monitor Software Usage 2.1 Data Storage Addressing Modes . 767 Notes. . . . . . . . . . . . . . . . . . . . . . . . . . 757 2.2 Instruction Storage Addressing Modes E.5.1 Chaining Counters . . . . . . . . . . 757 768 E.5.2 Thresholding . . . . . . . . . . . . . . . 757 2.2.1 Misaligned, Mismatched, and Byte Ordering Instruction Storage Exceptions . . Book VLE: 768 2.2.2 VLE Exception Syndrome Bits . . 768 Power ISA Operating Environment Chapter 3. VLE Compatibility with Architecture - Books I­III . . . . . . . . . . . . . . . . . . . . 771 3.1 Overview . . . . . . . . . . . . . . . . . . . . 771 3.2 VLE Processor and Storage Control Extensions . . . . . . . . . . . . . . . . . . . . . . 771 3.2.1 Instruction Extensions . . . . . . . . 771 Table of Contents xix Version 2.05 3.2.2 MMU Extensions . . . . . . . . . . . . .771 7.6 External PID . . . . . . . . . . . . . . . . . 811 3.3 VLE Limitations . . . . . . . . . . . . . . .771 7.7 Embedded Performance Monitor . 812 7.8 Processor Control . . . . . . . . . . . . . 812 Chapter 4. Branch Operation Instructions . . . . . . . . . . . . . . . . . . 773 Appendix A. VLE Instruction Set 4.1 Branch Processor Registers . . . . .773 Sorted by Mnemonic . . . . . . . . . . 813 4.1.1 Condition Register (CR) . . . . . . .773 4.1.1.1 Condition Register Setting for Appendix B. VLE Instruction Set Compare Instructions . . . . . . . . . . . . . .774 Sorted by Opcode. . . . . . . . . . . . . 829 4.1.1.2 Condition Register Setting for the Bit Test Instruction . . . . . . . . . . . . . . . .774 4.1.2 Link Register (LR) . . . . . . . . . . . .774 Appendices: 4.1.3 Count Register (CTR) . . . . . . . . .774 4.2 Branch Instructions . . . . . . . . . . . .775 Power ISA Book I-III Appendices 845 4.3 System Linkage Instructions . . . . .778 4.4 Condition Register Instructions . . .781 Appendix A. Incompatibilities with Chapter 5. Fixed-Point Instructions . the POWER Architecture . . . . . . . 847 A.1 New Instructions, Formerly Privileged 783 Instructions . . . . . . . . . . . . . . . . . . . . . 847 5.1 Fixed-Point Load Instructions. . . . .783 A.2 Newly Privileged 5.2 Fixed-Point Store Instructions . . . .787 Instructions . . . . . . . . . . . . . . . . . . . . . 847 5.3 Fixed-Point Load and Store with Byte A.3 Reserved Fields in Reversal Instructions . . . . . . . . . . . . . .790 Instructions . . . . . . . . . . . . . . . . . . . . . 847 5.4 Fixed-Point Load and Store Multiple A.4 Reserved Bits in Registers . . . . . . 847 Instructions . . . . . . . . . . . . . . . . . . . . . .790 A.5 Alignment Check . . . . . . . . . . . . . 847 5.5 Fixed-Point Arithmetic Instructions.791 A.6 Condition Register . . . . . . . . . . . . 848 5.6 Fixed-Point Compare and Bit Test A.7 LK and Rc Bits . . . . . . . . . . . . . . . 848 Instructions . . . . . . . . . . . . . . . . . . . . . .795 A.8 BO Field . . . . . . . . . . . . . . . . . . . . 848 5.7 Fixed-Point Trap Instructions . . . . .799 A.9 BH Field . . . . . . . . . . . . . . . . . . . . 848 5.8 Fixed-Point Select Instruction . . . .799 A.10 Branch Conditional to Count Register 5.9 Fixed-Point Logical, Bit, and Move 848 Instructions . . . . . . . . . . . . . . . . . . . . . .800 A.11 System Call . . . . . . . . . . . . . . . . 848 5.10 Fixed-Point Rotate and Shift Instruc- A.12 Fixed-Point Exception tions . . . . . . . . . . . . . . . . . . . . . . . . . . .805 Register (XER) . . . . . . . . . . . . . . . . . . 849 5.11 Move To/From System Register A.13 Update Forms of Storage Access Instructions . . . . . . . . . . . . . . . . . . . . . .808 Instructions . . . . . . . . . . . . . . . . . . . . . 849 A.14 Multiple Register Loads . . . . . . . 849 Chapter 6. Storage Control A.15 Load/Store Multiple Instructions . 849 Instructions . . . . . . . . . . . . . . . . . . 809 A.16 Move Assist Instructions . . . . . . 849 6.1 Storage Synchronization Instructions. . A.17 Move To/From SPR . . . . . . . . . . 849 809 A.18 Effects of Exceptions on FPSCR Bits 6.2 Cache Management Instructions . .810 FR and FI . . . . . . . . . . . . . . . . . . . . . . 850 6.3 Cache Locking Instructions . . . . . .810 A.19 Store Floating-Point Single Instruc- 6.4 TLB Management Instructions . . . .810 tions . . . . . . . . . . . . . . . . . . . . . . . . . . 850 6.5 Instruction Alignment and Byte Order- A.20 Move From FPSCR . . . . . . . . . . 850 ing. . . . . . . . . . . . . . . . . . . . . . . . . . . . .810 A.21 Zeroing Bytes in the Data Cache 850 A.22 Synchronization . . . . . . . . . . . . . 850 Chapter 7. Additional Categories A.23 Move To Machine State Register Instruction . . . . . . . . . . . . . . . . . . . . . . 850 Available in VLE. . . . . . . . . . . . . . . 811 A.24 Direct-Store Segments . . . . . . . . 850 7.1 Move Assist . . . . . . . . . . . . . . . . . . 811 A.25 Segment Register 7.2 Vector. . . . . . . . . . . . . . . . . . . . . . . 811 Manipulation Instructions . . . . . . . . . . 850 7.3 Signal Processing Engine . . . . . . . 811 A.26 TLB Entry Invalidation . . . . . . . . 851 7.4 Embedded Floating Point. . . . . . . . 811 A.27 Alignment Interrupts . . . . . . . . . . 851 7.5 Legacy Move Assist . . . . . . . . . . . . 811 A.28 Floating-Point Interrupts. . . . . . . 851 xx Power ISATM I-III, VLE Version 2.05 A.29 Timing Facilities . . . . . . . . . . . . . 851 A.29.1 Real-Time Clock . . . . . . . . . . . 851 A.29.2 Decrementer . . . . . . . . . . . . . . 851 A.30 Deleted Instructions . . . . . . . . . . 852 A.31 Discontinued Opcodes . . . . . . . . 852 A.32 POWER2 Compatibility . . . . . . . 853 A.32.1 Cross-Reference for Changed POWER2 Mnemonics . . . . . . . . . . . . . 853 A.32.2 Load/Store Floating-Point Double . 853 A.32.3 Floating-Point Conversion to Inte- ger . . . . . . . . . . . . . . . . . . . . . . . . . . . . 853 A.32.4 Floating-Point Interrupts . . . . . 854 A.32.5 Trace . . . . . . . . . . . . . . . . . . . . 854 A.33 Deleted Instructions . . . . . . . . . . 854 A.33.1 Discontinued Opcodes . . . . . . 854 Appendix B. Platform Support Requirements . . . . . . . . . . . . . . . . 855 Appendix C. Complete SPR List . 859 Appendix D. Illegal Instructions . 863 Appendix E. Reserved Instructions . 865 Appendix F. Opcode Maps . . . . . 867 Appendix G. Power ISA Instruction Set Sorted by Mnemonic . . . . . . . 889 Appendix H. Power ISA Instruction Set Sorted by Category . . . . . . . . 907 Appendix I. Power ISA Instruction Set Sorted by Opcode . . . . . . . . . 925 Index . . . . . . . . . . . . . . . . . . . . . . . . 943 Last Page - End of Document . . . . 953 Table of Contents xxi Version 2.05 xxii Power ISATM I-III, VLE Version 2.05 Figures Preface ................................................. iii 35. Condition Register . . . . . . . . . . . . . . . . . . . . . . 30 36. Link Register . . . . . . . . . . . . . . . . . . . . . . . . . . 31 37. Count Register . . . . . . . . . . . . . . . . . . . . . . . . . 31 Table of Contents ................................ vii 38. BO field encodings . . . . . . . . . . . . . . . . . . . . . . 32 39. "at" bit encodings . . . . . . . . . . . . . . . . . . . . . . . 32 Figures.............................................. xxiii 40. BH field encodings . . . . . . . . . . . . . . . . . . . . . . 32 41. General Purpose Registers . . . . . . . . . . . . . . . 42 42. Fixed-Point Exception Register . . . . . . . . . . . . 42 Book I: 43. Program Priority Register. . . . . . . . . . . . . . . . . 43 44. Software-use SPRs . . . . . . . . . . . . . . . . . . . . . 43 Power ISA User Instruction Set Architec- 45. Priority levels for or Rx,Rx,Rx . . . . . . . . . . . . . 77 46. Floating-Point Registers. . . . . . . . . . . . . . . . . 101 ture ....................................................... 1 47. Floating-Point Status and Control Register . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 1. Category Listing . . . . . . . . . . . . . . . . . . . . . . . . . 9 48. Floating-Point Result Flags . . . . . . . . . . . . . . 103 2. Logical processing model . . . . . . . . . . . . . . . . . 12 49. Floating-point single format. . . . . . . . . . . . . . 103 3. Power ISA user register set. . . . . . . . . . . . . . . . 13 50. Floating-point double format . . . . . . . . . . . . . 104 4. I instruction format. . . . . . . . . . . . . . . . . . . . . . . 14 51. IEEE floating-point fields . . . . . . . . . . . . . . . . 104 5. B instruction format . . . . . . . . . . . . . . . . . . . . . . 14 52. Approximation to real numbers . . . . . . . . . . . 104 6. SC instruction format. . . . . . . . . . . . . . . . . . . . . 15 53. Selection of Z1 and Z2 . . . . . . . . . . . . . . . . . . 108 7. D instruction format . . . . . . . . . . . . . . . . . . . . . . 15 54. IEEE 64-bit execution model . . . . . . . . . . . . . 114 8. DS instruction format. . . . . . . . . . . . . . . . . . . . . 15 55. Interpretation of G, R, and X bits . . . . . . . . . . 114 9. DQ instruction format . . . . . . . . . . . . . . . . . . . . 15 56. Location of the Guard, Round, and 10. X Instruction Format . . . . . . . . . . . . . . . . . . . . 16 Sticky bits in the IEEE execution model . . . 114 11. XL instruction format . . . . . . . . . . . . . . . . . . . . 16 57. Multiply-add 64-bit execution model. . . . . . . . 115 12. XFX instruction format. . . . . . . . . . . . . . . . . . . 16 58. Location of the Guard, Round, and Sticky bits in the 13. XFL instruction format . . . . . . . . . . . . . . . . . . . 16 multiply-add execution model . . . . . . . . . . . 115 14. XS instruction format . . . . . . . . . . . . . . . . . . . . 17 60. Format for Unsigned Decimal Data . . . . . . . . 147 15. XO instruction format. . . . . . . . . . . . . . . . . . . . 17 61. Format for Signed Decimal Data . . . . . . . . . . 147 16. A instruction format . . . . . . . . . . . . . . . . . . . . . 17 62. Summary of BCD Digit and Sign Codes . . . . 147 17. M instruction format. . . . . . . . . . . . . . . . . . . . . 17 63. DFP Short format . . . . . . . . . . . . . . . . . . . . . . 148 18. MD instruction format . . . . . . . . . . . . . . . . . . . 17 64. DFP Long format . . . . . . . . . . . . . . . . . . . . . . 148 19. MDS instruction format . . . . . . . . . . . . . . . . . . 17 65. DFP Extended format. . . . . . . . . . . . . . . . . . . 148 20. VA instruction format . . . . . . . . . . . . . . . . . . . . 17 66. Encoding of the G field for Special Symbols . 148 21. VC instruction format. . . . . . . . . . . . . . . . . . . . 17 67. Encoding of bits 0:4 of the G field for Finite Numbers 22. VX instruction format . . . . . . . . . . . . . . . . . . . . 17 148 23. EVX instruction format. . . . . . . . . . . . . . . . . . . 17 68. Summary of DFP Formats . . . . . . . . . . . . . . . 149 24. EVS instruction format . . . . . . . . . . . . . . . . . . 17 69. Value Ranges for Finite Number Data Classes . . 25. Z22 instruction format . . . . . . . . . . . . . . . . . . . 18 150 26. Z23 instruction format . . . . . . . . . . . . . . . . . . . 18 70. Encoding of NaN and Infinity Data Classes . . 150 27. Storage operands and byte ordering. . . . . . . . 23 71. Rounding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 28. C structure `s', showing values of elements . . 24 72. Encoding of DFP Rounding-Mode Control (DRN) . 29. Big-Endian mapping of structure `s'. . . . . . . . . 24 151 30. Little-Endian mapping of structure `s' . . . . . . . 24 73. Primary Encoding of Rounding-Mode Control 152 31. Instructions and byte ordering . . . . . . . . . . . . . 24 74. Secondary Encoding of Rounding-Mode Control. . 32. Assembly language program `p' . . . . . . . . . . . 24 152 33. Big-Endian mapping of program `p' . . . . . . . . . 24 75. Summary of Ideal Exponents . . . . . . . . . . . . . 152 34. Little-Endian mapping of program `p'. . . . . . . . 25 76. Overflow Results When Exception Is Disabled 158 Figures xxiii Version 2.05 77. Rounding and Range Actions (Part 1). . . . . . 160 4. Logical Partition Identification Register . . . . . . 474 78. Rounding and Range Actions (Part 2). . . . . . 161 5. Processor Compatibility Register . . . . . . . . . . . 474 79. Actions: Add . . . . . . . . . . . . . . . . . . . . . . . . . 164 6. Machine State Register . . . . . . . . . . . . . . . . . . 477 80. Actions: Multiply . . . . . . . . . . . . . . . . . . . . . . 165 7. Processor Version Register . . . . . . . . . . . . . . . 487 81. Actions: Divide. . . . . . . . . . . . . . . . . . . . . . . . 166 8. Processor Identification Register . . . . . . . . . . . 488 82. Actions: Compare Unordered . . . . . . . . . . . . 168 9. Control Register . . . . . . . . . . . . . . . . . . . . . . . . 488 83. Actions: Compare Ordered . . . . . . . . . . . . . . 169 10. Program Priority Register. . . . . . . . . . . . . . . . 488 84. Actions: Test Exponent . . . . . . . . . . . . . . . . . 171 11. Software-use SPRs . . . . . . . . . . . . . . . . . . . . 489 85. Actions: Test Significance . . . . . . . . . . . . . . . 172 12. SPRs for use by hypervisor programs . . . . . . 489 86. DFP Quantize examples . . . . . . . . . . . . . . . . 174 13. Priority levels for or Rx,Rx,Rx . . . . . . . . . . . . 496 87. Actions (part 1) Quantize. . . . . . . . . . . . . . . . 175 14. SPR encodings . . . . . . . . . . . . . . . . . . . . . . . 497 88. Actions (part2) Quantize . . . . . . . . . . . . . . . . 175 15. SLBE for VRMA . . . . . . . . . . . . . . . . . . . . . . . 511 89. DFP Reround examples . . . . . . . . . . . . . . . . 177 16. Address translation overview . . . . . . . . . . . . . 514 90. Actions: Reround. . . . . . . . . . . . . . . . . . . . . . 178 17. Translation of 64-bit effective address to 91. Actions: Round to FP Integer With Inexact . . 180 78 bit virtual address. . . . . . . . . . . . . . . . . . 514 92. Actions: Round to FP Integer Without Inexact 181 18. SLB Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . 515 93. Actions: Data-Format Conversion Instructions 182 19. SLBLL||LP Encoding . . . . . . . . . . . . . . . . . . . . 515 94. Actions: Convert To Fixed . . . . . . . . . . . . . . . 186 20. Translation of 78-bit virtual address to 60-bit real 95. Actions: Insert Biased Exponent . . . . . . . . . . 189 address . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517 96. Decimal Floating-Point Instructions Summary 191 21. Page Table Entry . . . . . . . . . . . . . . . . . . . . . . 518 97. Vector Register elements . . . . . . . . . . . . . . . 195 22. Format of PTELP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519 98. Vector Registers . . . . . . . . . . . . . . . . . . . . . . 195 23. SDR1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 520 99. Vector Status and Control Register. . . . . . . . 195 24. Setting the Reference and Change bits . . . . . 523 100. VR Save Register . . . . . . . . . . . . . . . . . . . . 196 25. Authority Mask Register (AMR) . . . . . . . . . . . 524 101. Aligned quadword storage operand . . . . . . 197 26. PP bit protection states, address 102. Vector Register contents for aligned quadword translation enabled . . . . . . . . . . . . . . . . . . . 526 Load or Store . . . . . . . . . . . . . . . . . . . . . . . 197 27. Protection states, address translation 103. Unaligned quadword storage operand . . . . 197 disabled . . . . . . . . . . . . . . . . . . . . . . . . . . . 526 104. Vector Register contents . . . . . . . . . . . . . . . 197 28. Storage control bits . . . . . . . . . . . . . . . . . . . . 528 105. GPR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 29. GPR contents for slbmte . . . . . . . . . . . . . . . . 533 106. Accumulator . . . . . . . . . . . . . . . . . . . . . . . . 262 30. GPR contents for slbmfev . . . . . . . . . . . . . . . 534 107. Signal Processing and Embedded Floating-Point 31. GPR contents for slbmfee . . . . . . . . . . . . . . . 534 Status and Control Register. . . . . . . . . . . . . . 262 32. GPR contents for slbfee. . . . . . . . . . . . . . . . . 535 108. Floating-Point Data Format . . . . . . . . . . . . . 316 33. GPR contents for mtsr, mtsrin, mfsr, and mfsrin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536 34. Save/Restore Registers . . . . . . . . . . . . . . . . . 548 Book II: 35. Hypervisor Save/Restore Registers . . . . . . . . 548 36. Data Address Register . . . . . . . . . . . . . . . . . . 548 Power ISA Virtual Environment Architec- 37. Hypervisor Data Address Register. . . . . . . . . 548 ture .................................................... 405 38. Data Storage Interrupt Status Register . . . . . 548 39. Hypervisor Data Storage Interrupt Status Register 549 1. Performance effects of storage operand placement 40. Hypervisor Emulation Instruction Register . . . 549 422 41. Hypervisor Maintenance Exception Register . 549 2. [Category: Server] Performance effects of storage 42. Hypervisor Maintenance Exception Enable Regis- operand placement, Little-Endian . . . . . . . 422 ter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549 3. Data Stream Control Register . . . . . . . . . . . . 426 43. MSR setting due to interrupt . . . . . . . . . . . . . 555 4. Time Base . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451 44. Effective address of interrupt vector by 5. Alternate Time Base . . . . . . . . . . . . . . . . . . . . 454 interrupt type. . . . . . . . . . . . . . . . . . . . . . . . 556 45. Time Base . . . . . . . . . . . . . . . . . . . . . . . . . . . 575 Book III-S: 46. Decrementer . . . . . . . . . . . . . . . . . . . . . . . . . 576 47. Hypervisor Decrementer . . . . . . . . . . . . . . . . 577 48. Processor Utilization of Resources Register . 578 Power ISA Operating Environment Archi- 49. Scaled Processor Utilization of Resources Register tecture - Server Environment ............ 465 578 50. Come-From Address Register . . . . . . . . . . . . 581 1. Logical Partitioning Control Register . . . . . . . . 471 51. Data Address Breakpoint Register. . . . . . . . . 582 2. Real Mode Offset Register . . . . . . . . . . . . . . . 473 52. Data Address Breakpoint Register Extension 582 3. Hypervisor Real Mode Offset Register . . . . . . 474 53. External Access Register . . . . . . . . . . . . . . . . 583 xxiv Power ISATM I-III, VLE Version 2.05 54. Performance Monitor SPR encodings for 40. MMU Control and Status Register 0 . . . . . . . 744 mfspr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593 41. Processor States and PMLCan Bit Settings. . 752 55. Performance Monitor SPR encodings for 42. [User] Performance Monitor Global Control Regis- mtspr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593 ter 0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 753 56. Performance Monitor Counter registers . . . . 593 43. [User] Performance Monitor Local Control A Regis- 57. Monitor Mode Control Register 0 . . . . . . . . . 594 ters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754 58. Monitor Mode Control Register 1 . . . . . . . . . 596 44. [User] Performance Monitor Local Control B Regis- 59. Monitor Mode Control Register A . . . . . . . . . 596 ter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754 60. Sampled Instruction Address Register . . . . . 597 45. [User] Performance Monitor Counter Registers . . . 61. Sampled Data Address Register . . . . . . . . . . 597 755 46. Embedded.Peformance Monitor PMRs . . . . . 756 Book III-E: Book VLE: Power ISA Operating Environment Archi- tecture - Embedded Environment ..... 605 Power ISA Operating Environment Archi- tecture - 1. Machine State Register . . . . . . . . . . . . . . . . . . 611 Variable Length Encoding (VLE) Environ 2. Processor Version Register. . . . . . . . . . . . . . . 617 ment.................................................. 759 3. Processor Identification Register. . . . . . . . . . . 618 4. Special Purpose Registers . . . . . . . . . . . . . . . 618 1. BD8 instruction format . . . . . . . . . . . . . . . . . . . 762 5. External Process ID Load Context Register . . 619 2. C instruction format . . . . . . . . . . . . . . . . . . . . . 762 6. External Process ID Store Context Register . . 620 3. IM5 instruction format. . . . . . . . . . . . . . . . . . . . 762 7. SPR Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 621 4. OIM5 instruction format . . . . . . . . . . . . . . . . . . 762 8. Virtual Address to TLB Entry Match Process . 644 5. IM7 instruction format. . . . . . . . . . . . . . . . . . . . 762 9. Effective-to-Real Address Translation Flow . . 645 6. R instruction format . . . . . . . . . . . . . . . . . . . . . 763 10. Access Control Process . . . . . . . . . . . . . . . . 646 7. RR instruction format . . . . . . . . . . . . . . . . . . . . 763 11. Storage control bits . . . . . . . . . . . . . . . . . . . . 650 8. SD4 instruction format . . . . . . . . . . . . . . . . . . . 763 12. Exception Syndrome Register 9. BD15 instruction format . . . . . . . . . . . . . . . . . . 763 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 665 10. BD24 instruction format . . . . . . . . . . . . . . . . . 763 13. Interrupt Vector Offset Register 11. D8 instruction format . . . . . . . . . . . . . . . . . . . 763 Assignments . . . . . . . . . . . . . . . . . . . . . . . 666 12. I16A instruction format . . . . . . . . . . . . . . . . . . 763 14. External Proxy Register. . . . . . . . . . . . . . . . . 667 13. I16L instruction format . . . . . . . . . . . . . . . . . . 763 15. Interrupt and Exception Types . . . . . . . . . . . 673 14. M instruction format . . . . . . . . . . . . . . . . . . . . 763 16. Interrupt Hierarchy. . . . . . . . . . . . . . . . . . . . . 687 15. SC18 instruction format . . . . . . . . . . . . . . . . . 763 17. Machine State Register Initial Values . . . . . . 693 16. LI20 instruction format . . . . . . . . . . . . . . . . . . 763 18. TLB Initial Values . . . . . . . . . . . . . . . . . . . . . 694 17. Condition Register . . . . . . . . . . . . . . . . . . . . . 773 19. Time Base . . . . . . . . . . . . . . . . . . . . . . . . . . . 695 18. BO32 field encodings . . . . . . . . . . . . . . . . . . . 775 20. Decrementer . . . . . . . . . . . . . . . . . . . . . . . . . 697 19. BO16 field encodings . . . . . . . . . . . . . . . . . . . 775 21. Decrementer . . . . . . . . . . . . . . . . . . . . . . . . . 698 22. . . . . . . . Relationships of the Timer Facilities 699 23. Watchdog State Machine . . . . . . . . . . . . . . . 701 Appendices: 24. Watchdog Timer Controls . . . . . . . . . . . . . . . 702 25. Data Cache Debug Tag Register High . . . . . 728 26. Data Cache Debug Tag Register Low. . . . . . 728 Power ISA Book I-III Appendices...... 845 27. Instruction Cache Debug Data Register . . . . 729 28. Instruction Cache Debug Tag Register High. 729 20. Platform Support Requirements. . . . . . . . . . . 856 29. Instruction Cache Debug Tag Register Low . 729 21. SPR Numbers . . . . . . . . . . . . . . . . . . . . . . . . 859 30. Process ID Register (PID0­PID2) . . . . . . . . . 737 31. MAS0 register . . . . . . . . . . . . . . . . . . . . . . . . 738 Index ................................................. 943 32. MAS1 register . . . . . . . . . . . . . . . . . . . . . . . . 739 33. MAS2 register . . . . . . . . . . . . . . . . . . . . . . . . 739 34. MAS3 register . . . . . . . . . . . . . . . . . . . . . . . . 740 Last Page - End of Document........... 953 35. MAS4 register . . . . . . . . . . . . . . . . . . . . . . . . 740 36. MAS6 register . . . . . . . . . . . . . . . . . . . . . . . . 741 37. MAS7 register . . . . . . . . . . . . . . . . . . . . . . . . 741 38. MMU Configuration Register . . . . . . . . . . . . . 743 39. TLB Configuration Register . . . . . . . . . . . . . . 743 Figures xxv Version 2.05 xxvi Power ISATM I-III, VLE Version 2.05 Book I: Power ISA User Instruction Set Architecture Book I: Power ISA User Instruction Set Architecture 1 Version 2.05 2 Power ISATM I Version 2.05 Chapter 1. Introduction 1.1 Overview. . . . . . . . . . . . . . . . . . . . . . 3 1.6.12 XO-FORM . . . . . . . . . . . . . . . . . 17 1.2 Instruction Mnemonics and Operands3 1.6.13 A-FORM . . . . . . . . . . . . . . . . . . . 17 1.3 Document Conventions . . . . . . . . . . 4 1.6.14 M-FORM . . . . . . . . . . . . . . . . . . 17 1.3.1 Definitions . . . . . . . . . . . . . . . . . . . 4 1.6.15 MD-FORM . . . . . . . . . . . . . . . . . 17 1.3.2 Notation . . . . . . . . . . . . . . . . . . . . . 4 1.6.16 MDS-FORM . . . . . . . . . . . . . . . . 17 1.3.3 Reserved Fields and Reserved Val- 1.6.17 VA-FORM . . . . . . . . . . . . . . . . . . 17 ues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.6.18 VC-FORM . . . . . . . . . . . . . . . . . 17 1.3.4 Description of Instruction Operation 7 1.6.19 VX-FORM. . . . . . . . . . . . . . . . . . 17 1.3.5 Categories . . . . . . . . . . . . . . . . . . . 9 1.6.20 EVX-FORM . . . . . . . . . . . . . . . . 17 1.3.5.1 Phased-In/Phased-Out . . . . . . . 10 1.6.21 EVS-FORM . . . . . . . . . . . . . . . . 17 1.3.5.2 Corequisite Category . . . . . . . . 10 1.6.22 Z22-FORM . . . . . . . . . . . . . . . . . 18 1.3.5.3 Category Notation. . . . . . . . . . . 11 1.6.23 Z23-FORM . . . . . . . . . . . . . . . . . 18 1.3.6 Environments. . . . . . . . . . . . . . . . 11 1.6.24 Instruction Fields . . . . . . . . . . . . 18 1.4 Processor Overview . . . . . . . . . . . . 12 1.7 Classes of Instructions . . . . . . . . . . 21 1.5 Computation modes . . . . . . . . . . . . 14 1.7.1 Defined Instruction Class . . . . . . . 21 1.5.1 Modes [Category: Server] . . . . . . 14 1.7.2 Illegal Instruction Class . . . . . . . . 21 1.5.2 Modes [Category: Embedded] . . . 14 1.7.3 Reserved Instruction Class . . . . . 21 1.6 Instruction formats . . . . . . . . . . . . . 14 1.8 Forms of Defined Instructions . . . . . 21 1.6.1 I-FORM . . . . . . . . . . . . . . . . . . . . 15 1.8.1 Preferred Instruction Forms . . . . . 21 1.6.2 B-FORM . . . . . . . . . . . . . . . . . . . 15 1.8.2 Invalid Instruction Forms . . . . . . . 21 1.6.3 SC-FORM . . . . . . . . . . . . . . . . . . 15 1.8.3 Reserved-no-op Instructions [Cate- 1.6.4 D-FORM . . . . . . . . . . . . . . . . . . . 15 gory: Phased-In (sV2.07)] . . . . . . . . . . . 22 1.6.5 DS-FORM . . . . . . . . . . . . . . . . . . 15 1.9 Exceptions. . . . . . . . . . . . . . . . . . . . 22 1.6.6 DQ-FORM . . . . . . . . . . . . . . . . . . 15 1.10 Storage Addressing. . . . . . . . . . . . 23 1.6.7 X-FORM . . . . . . . . . . . . . . . . . . . 16 1.10.1 Storage Operands . . . . . . . . . . . 23 1.6.8 XL-FORM . . . . . . . . . . . . . . . . . . 16 1.10.2 Instruction Fetches. . . . . . . . . . . 24 1.6.9 XFX-FORM . . . . . . . . . . . . . . . . . 16 1.10.3 Effective Address Calculation. . . 26 1.6.10 XFL-FORM . . . . . . . . . . . . . . . . 16 1.6.11 XS-FORM . . . . . . . . . . . . . . . . . 17 1.1 Overview stw RS,D(RA) This chapter describes computation modes, document addis RT,RA,SI conventions, a processor overview, instruction formats, Power ISA-compliant Assemblers will support the mne- storage addressing, and instruction fetching. monics and operand lists exactly as shown. They should also provide certain extended mnemonics, such as the ones described in Appendix D of Book I. 1.2 Instruction Mnemonics and Operands The description of each instruction includes the mne- monic and a formatted list of operands. Some exam- ples are the following. Chapter 1. Introduction 3 Version 2.05 1 unavailable 1.3 Document Conventions Refers to a resource that cannot be used by the program. For example, storage is unavailable if 1.3.1 Definitions access to it is denied. See Book III. The following definitions are used throughout this docu- 1 undefined value ment. May vary between implementations, and between different executions on the same implementation, 1 program and similarly for register contents, storage con- A sequence of related instructions. tents, etc., that are specified as being undefined. 1 application program 1 boundedly undefined A program that uses only the instructions and The results of executing a given instruction are resources described in Books I and II. said to be boundedly undefined if they could have been achieved by executing an arbitrary finite 1 quadwords, doublewords, words, halfwords, sequence of instructions (none of which yields and bytes boundedly undefined results) in the state the pro- 128 bits, 64 bits, 32 bits, 16 bits, and 8 bits, cessor was in before executing the given instruc- respectively. tion. Boundedly undefined results may include the 1 positive presentation of inconsistent state to the system Means greater than zero. error handler as described in Section 1.8.1 of Book II. Boundedly undefined results for a given instruc- 1 negative tion may vary between implementations, and Means less than zero. between different executions on the same imple- 1 floating-point single format (or simply single mentation. format) 1 "must" Refers to the representation of a single-precision If software violates a rule that is stated using the binary floating-point value in a register or storage. word "must" (e.g., "this field must be set to 0"), the 1 floating-point double format (or simply double results are boundedly undefined unless otherwise format) stated. Refers to the representation of a double-precision 1 sequential execution model binary floating-point value in a register or storage. The model of program execution described in 1 system library program Section 2.2, "Instruction Execution Order" on A component of the system software that can be page 29. called by an application program using a Branch 1 Auxiliary Processor instruction. An implementation-specific processing unit. Previ- 1 system service program ous versions of the architecture use the term Auxil- A component of the system software that can be iary Processing Unit (APU) to describe this called by an application program using a System extension of the architecture. Architectural support Call instruction. for auxiliary processors is part of the Embedded category. 1 system trap handler A component of the system software that receives control when the conditions specified in a Trap instruction are satisfied. 1.3.2 Notation 1 system error handler The following notation is used throughout the Power A component of the system software that receives ISA documents. control when an error occurs. The system error 1 All numbers are decimal unless specified in some handler includes a component for each of the vari- special way. ous kinds of error. These error-specific compo- nents are referred to as the system alignment error - 0bnnnn means a number expressed in binary handler, the system data storage error handler, format. etc. - 0xnnnn means a number expressed in hexa- decimal format. 1 latency Refers to the interval from the time an instruction Underscores may be used between digits. begins execution until it produces a result that is 1 RT, RA, R1, ... refer to General Purpose Registers. available for use by a subsequent instruction. 1 FRT, FRA, FR1, ... refer to Floating-Point Regis- ters. 4 Power ISATM I Version 2.05 n 1 FRTp, FRAp, FRBp, ... refer to an even-odd pair of - 1 means a field of n bits with each bit equal to Floating-Point Registers. Values must be even, 1. Thus 51 is equivalent to 0b11111. otherwise the instruction form is invalid. 1 Each bit and field in instructions, and in status and 1 VRT, VRA, VR1, ... refer to Vector Registers. control registers (e.g., XER, FPSCR) and Special Purpose Registers, is either defined or reserved. 1 (x) means the contents of register x, where x is the Some defined fields contain reserved values. In name of an instruction field. For example, (RA) such cases when this document refers to the spe- means the contents of register RA, and (FRA) cific field, it refers only to the defined values, means the contents of register FRA, where RA unless otherwise specified. and FRA are instruction fields. Names such as LR and CTR denote registers, not fields, so parenthe- 1 /, //, ///, ... denotes a reserved field, in a register, ses are not used with them. Parentheses are also instruction, field, or bit string. omitted when register x is the register into which 1 ?, ??, ???, ... denotes an implementation-depen- the result of an operation is placed. dent field in a register, instruction, field or bit string. 1 (RA|0) means the contents of register RA if the RA field has the value 1-31, or the value 0 if the RA field is 0. 1.3.3 Reserved Fields and 1 Bits in registers, instructions, fields, and bit strings Reserved Values are specified as follows. In the last three items Reserved fields in instructions are ignored by the pro- (definition of Xp etc.), if X is a field that specifies a cessor. This is a requirement in the Server environment GPR, FPR, or VR (e.g., the RS field of an instruc- and is being phased into the Embedded environment. tion), the definitions apply to the register, not to the field. In some cases a defined field of an instruction has cer- tain values that are reserved. This includes cases in - Bits in instructions, fields, and bit strings are which the field is shown in the instruction layout as con- numbered from left to right, starting with bit 0 taining a particular value; in such cases all other values - For all registers except the Vector category, of the field are reserved. In general, if an instruction is bits in registers that are less than 64 bits start coded such that a defined field contains a reserved with bit number 64-L, where L is the register value the instruction form is invalid; see Section 1.8.2 length; for the Vector category, bits in regis- on page 21. The only exceptions to the preceding rule ters that are less than 128 bits start with bit is that it does not apply to Reserved and Illegal classes number 128-L. of instructions (see Section 1.7) or to portions of - The leftmost bit of a sequence of bits is the defined fields that are specified, in the instruction most significant bit of the sequence. description, as being treated as reserved fields. - Xp means bit p of register/instruction/field/ To maximize compatibility with future architecture bit_string X. extensions, software must ensure that reserved fields - Xp:q means bits p through q of register/instruc- in instructions contain zero and that defined fields of tion/field/bit_string X. instructions do not contain reserved values. - Xp q ... means bits p, q, ... of register/instruc- tion/field/bit_string X. The handling of reserved bits in System Registers 1 ¬(RA) means the one's complement of the con- (e.g., XER, FPSCR) is implementation-dependent. tents of register RA. Unless otherwise stated, software is permitted to write any value to such a bit. A subsequent reading of the bit returns 0 if the value last written to the bit was 0 and 1 A period (.) as the last character of an instruction returns an undefined value (0 or 1) otherwise. mnemonic means that the instruction records sta- tus information in certain fields of the Condition In some cases a defined field of a System Register has Register as a side effect of execution. certain values that are reserved. Software must not set a defined field of a System Register to a reserved 1 The symbol || is used to describe the concatena- value. tion of two values. For example, 010 || 111 is the same as 010111. References elsewhere in this document to a defined n field (in an instruction or System Register) that has 1 x means x raised to the nth power. reserved values assume the field does not contain a n 1 x means the replication of x, n times (i.e., x con- reserved value, unless otherwise stated or obvious catenated to itself n-1 times). (n)0 and (n)1 are from context. special cases: n - 0 means a field of n bits with each bit equal to 0. Thus 50 is equivalent to 0b00000. Chapter 1. Introduction 5 Version 2.05 Assembler Note Assemblers should report uses of reserved values of defined fields of instructions as errors. Programming Note It is the responsibility of software to preserve bits that are now reserved in System Registers, because they may be assigned a meaning in some future version of the architecture. In order to accomplish this preservation in imple- mentation-independent fashion, software should do the following. 1 Initialize each such register supplying zeros for all reserved bits. 1 Alter (defined) bit(s) in the register by reading the register, altering only the desired bit(s), and then writing the new value back to the reg- ister. The XER and FPSCR are partial exceptions to this recommendation. Software can alter the status bits in these registers, preserving the reserved bits, by executing instructions that have the side effect of altering the status bits. Similarly, software can alter any defined bit in the FPSCR by executing a Float- ing-Point Status and Control Register instruction. Using such instructions is likely to yield better per- formance than using the method described in the second item above. 6 Power ISATM I Version 2.05 1.3.4 Description of Instruction CEIL(x) Least integer x DCR(x) Device Control Register x Operation DOUBLE(x) Result of converting x from floating-point single format to floating-point double for- Instruction descriptions (including related material such mat, using the model shown on page 117 as the introduction to the section describing the instruc- EXTS(x) Result of extending x on the left with sign tions) mention that the instruction may cause a system bits error handler to be invoked, under certain conditions, if FLOOR(x) Greatest integer x and only if the system error handler may treat the case GPR(x) General Purpose Register x as a programming error. (An instruction may cause a MASK(x, y) Mask having 1s in positions x through y system error handler to be invoked under other condi- (wrapping if x > y) and 0s elsewhere tions as well; see Chapter 6 of Book III-S and Chapter 5 MEM(x, y) Contents of a sequence of y bytes of stor- of Book III-E). age. The sequence depends on the byte A formal description is given of the operation of each ordering used for storage access, as fol- instruction. In addition, the operation of most instruc- lows. tions is described by a semiformal language at the reg- Big-Endian byte ordering: ister transfer level (RTL). This RTL uses the notation The sequence starts with the byte at given below, in addition to the notation described in address x and ends with the byte at Section 1.3.2. Some of this notation is also used in the address x+y-1. formal descriptions of instructions. RTL notation not Little-Endian byte ordering: summarized here should be self-explanatory. The sequence starts with the byte at address x+y-1 and ends with the byte at The RTL descriptions cover the normal execution of the address x. instruction, except that "standard" setting of status reg- ROTL64(x, y) isters, such as the Condition Register, is not shown. Result of rotating the 64-bit value x left y ("Non-standard" setting of these registers, such as the positions setting of the Condition Register by the Compare ROTL32(x, y) instructions, is shown.) The RTL descriptions do not Result of rotating the 64-bit value x||x left y cover cases in which the system error handler is positions, where x is 32 bits long invoked, or for which the results are boundedly unde- SINGLE(x) Result of converting x from floating-point fined. double format to floating-point single for- The RTL descriptions specify the architectural transfor- mat, using the model shown on page 121 mation performed by the execution of an instruction. SPR(x) Special Purpose Register x They do not imply any particular implementation. TRAP Invoke the system trap handler characterization Reference to the setting of status bits, in a Notation Meaning standard way that is explained in the text 1 Assignment undefined An undefined value. 1iea Assignment of an instruction effective address. In 32-bit mode the high-order 32 bits of the 64-bit target address are set to 0. CIA Current Instruction Address, which is the ¬ NOT logical operator 64-bit address of the instruction being + Two's complement addition described by a sequence of RTL. Used by - Two's complement subtraction, unary relative branches to set the Next Instruc- minus tion Address (NIA), and by Branch instruc- × Multiplication tions with LK=1 to set the Link Register. ×si Signed-integer multiplication Does not correspond to any architected ×ui Unsigned-integer multiplication register. / Division NIA Next Instruction Address, which is the ÷ Division, with result truncated to integer 64-bit address of the next instruction to be Square root executed. For a successful branch, the =, Equals, Not Equals relations next instruction address is the branch tar- <, , >, Signed comparison relations get address: in RTL, this is indicated by u Unsigned comparison relations assigning a value to NIA. For other ? Unordered comparison relation instructions that cause non-sequential &, | AND, OR logical operators instruction fetching (see Book III), the RTL , Exclusive OR, Equivalence logical opera- is similar. For instructions that do not tors ((ab) = (a¬b)) branch, and do not otherwise cause ABS(x) Absolute value of x instruction fetching to be non-sequential, Chapter 1. Introduction 7 Version 2.05 the next instruction address is CIA+4 (VLE The precedence rules for RTL operators are summa- behavior is different; see Book VLE). Does rized in Table 1. Operators higher in the table are not correspond to any architected register. applied before those lower in the table. Operators at if... then... else... the same level in the table associate from left to right, Conditional execution, indenting shows from right to left, or not at all, as shown. (For example, range; else is optional. - associates from left to right, so a-b-c = (a-b)-c.) do Do loop, indenting shows range. "To" and/ Parentheses are used to override the evaluation order or "by" clauses specify incrementing an implied by the table or to increase clarity; parenthe- iteration variable, and a "while" clause sized expressions are evaluated before serving as gives termination conditions. operands. leave Leave innermost do loop, or do loop described in leave statement. Table 1: Operator precedence for For loop, indenting shows range. Clause Operators Associativity after "for" specifies the entities for which to execute the body of the loop. subscript, function evaluation left to right pre-superscript (replication), right to left post-superscript (exponentiation) unary -, ¬ right to left ×, ÷ left to right +, -, left to right || left to right =, , <, , >, ,u, ? left to right &, , left to right | left to right : (range) none 1,1iea none 8 Power ISATM I Version 2.05 1.3.5 Categories dent categories are identified by the "." in their category name, e.g., if an implementation supports the Float- Each facility (including registers and fields therein) and ing-Point.Record category, then the Floating-Point cat- instruction is in exactly one of the categories listed in egory is also supported. Figure 1. An implementation that supports a facility or instruction A category may be defined as a dependent category. in a given category, except for the two categories These are categories that are supported only if the cat- described in Section 1.3.5.1, supports all facilities and egory they are dependent on is also supported. Depen- instructions in that category. Category Abvr. Notes Base B Required for all implementations Server S Required for Server implementations Embedded E Required for Embedded implementations Alternate Time Base ATB An additional Time Base; see Book II BCD Assistance BCDA Binary Coded Decimal Assistance Instructions Cache Specification CS Specify a specific cache for some instructions; see Book II Decimal Floating-Point DFP Decimal Floating-Point facilities Embedded.Cache Debug E.CD Provides direct access to cache data and directory content Embedded.Cache Initialization E.CI Instructions that invalidate the entire cache Embedded.Enhanced Debug E.ED Embedded Enhanced Debug facility; see Book III-E Embedded.External PID E.PD Embedded External PID facility; see Book III-E Embedded.Little-Endian E.LE Embedded Little-Endian page attribute; see Book III-E Embedded.MMU Type FSL E.MF Embedded MMU example Type FSL; see Book III-E Embedded.Performance Monitor E.PM Embedded performance monitor example; see Book III-E Embedded.Processor Control E.PC Processor control facility; see Book III-E Embedded Cache Locking ECL Embedded Cache Locking facility; see Book III-E External Control EC External Control facility; see Book II External Proxy EXP External Proxy facility; see Book III-E Floating-Point FP Floating-Point Facilities Floating-Point.Record FP.R Floating-Point instructions with Rc=1 Hypervisor Emulation Assistance HEA Hypervisor Emulation Assistance Facilities Legacy Integer Multiply-Accumulate1 LMA Legacy Integer Multiply-accumulate instructions Legacy Move Assist LMV Determine Left most Zero Byte instruction Load/Store Quadword LSQ Load/Store Quadword instructions; see Book III-S Memory Coherence MMC Requirement for Memory Coherence; see Book II Move Assist MA Move Assist instructions Processor Compatibility PCR Processor Compatibility Register Server.Performance Monitor S.PM Performance monitor example for Servers; see Book III-S Server.Relaxed Page Table Alignment S.RPTAHTAB alignment on 256 KB boundary; see Book III-S Signal Processing Engine1, 2 SP Facility for signal processing SPE.Embedded Float Scalar Double SP.FD GPR-based Floating-Point double-precision instruction set SPE.Embedded Float Scalar Single SP.FS GPR-based Floating-Point single-precision instruction set SPE.Embedded Float Vector SP.FV GPR-based Floating-Point Vector instruction set Stream STM Stream variant of dcbt instruction; see Book II Trace TRC Trace Facility; see Book III-S 1 Because of overlapping opcode usage, SPE is mutually exclusive with Vector and with Legacy Integer Multi- ply-Accumulate, and Legacy Integer Multiply-Accumulate is mutually exclusive with Vector. 2 The SPE-dependent Floating-Point categories are collectively referred to as SPE.Embedded Float_* or SP.*. Figure 1. Category Listing (Sheet 1 of 2) Chapter 1. Introduction 9 Version 2.05 Category Abvr. Notes Variable Length Encoding VLE Variable Length Encoding facility; see Book VLE Vector1 V Vector facilities Vector.Little-Endian V.LE Little-Endian support for Vector storage operations. Wait WT wait instruction; see Book II 64-Bit 64 Required for 64-bit implementations; not defined for 32-bit impl's 1 Because of overlapping opcode usage, SPE is mutually exclusive with Vector and with Legacy Integer Multi- ply-Accumulate, and Legacy Integer Multiply-Accumulate is mutually exclusive with Vector. 2 The SPE-dependent Floating-Point categories are collectively referred to as SPE.Embedded Float_* or SP.*. Figure 1. Category Listing (Sheet 2 of 2) Phased-Out These are facilities and instruc- An instruction in a category that is not supported by the tions that, in some future ver- implementation is treated as an illegal instruction or an sion of the architecture, will be unimplemented instruction on that implementation (see dropped out of the architecture. Section 1.7.2). System developers should develop a migration plan to For an instruction that is supported by the implementa- eliminate use of them in new tion with field values that are defined by the architec- systems. ture, the field values defined as part of a category that is not supported by the implementation are treated as These facilities are required for reserved values on that implementation (see Section the Server Platform. 1.3.3 and Section 1.8.2). Programming Note Bits in a register that are in a category that is not sup- Warning: Instructions and facilities being phased ported by the implementation are treated as reserved. out of the architecture are likely to perform poorly on future implementations. New programs should 1.3.5.1 Phased-In/Phased-Out not use them. There are two special dependent categories, Phased-In and Phased-Out, defined below. Programming Note Facilities are categorized as Phased-In only in Phased-In (sVxxx) These are facilities and instruc- cases where there is a difference between the tions that, in some future ver- Server and Embedded environments. As soon as sion of the architecture, will be the facility is supported by both environments, the required as part of the category Phased-In categorization will be removed. they are dependent on. Starting with version 2.05, serv- ers may not implement a facility 1.3.5.2 Corequisite Category in this category until the version A corequisite category is an additional category that is indicated. Starting with the ver- associated with an instruction or facility, and must be sion indicated, servers must implemented if the instruction or facility is implemented. implement the facility. Servers that comply with earlier ver- sions of this architecture may 1.3.5.3 Category Notation have optionally implemented Instructions and facilities are considered part of the features that were category Base category unless otherwise marked. If a section is Phased-In. marked with a specific category tag, all material in that section and its subsections are considered part of the category, unless otherwise marked. Overview sections may contain discussion of instructions and facilities from various categories without being explicitly marked. An example of a category tag is: [Category: Server]. An example of a dependent category is: [Category: Server.Phased-In] 10 Power ISATM I Version 2.05 The shorthand and may also be used for Cat- egory: Embedded and Server respectively. 1.3.6 Environments All implementations support one of the two defined environments, Server or Embedded. Environments refer to common subsets of instructions that are shared across many implementations. The Server environment describes implementations that support Category: Base and Server. The Embedded environment describes implementations that support Category: Base and Embedded. Chapter 1. Introduction 11 Version 2.05 1.4 Processor Overview The processor implements the instruction set, the stor- age model, and other facilities defined in this docu- Branch ment. There are four basic classes of instructions: Processing 1 branch instructions (Chapter 2) Category: 1 fixed-point instructions (Chapter 3), and other Fixed-Point Floating-Point Vector instructions that use the fixed-point registers Instructions Instructions Instructions (Chapters 7, 8, 9, and 10) 1 floating-point instructions (Chapter 4) and decimal floating-point instructions (Chapter 5) 1 vector instructions (Chapter 6) Fixed-Pt Float-Pt Vector Fixed-point instructions operate on byte, halfword, Processing Processing Processing word, and doubleword operands. Floating-point instruc- tions operate on single-precision and double-precision floating-point operands. Vector instructions operate on Data to/from vectors of scalar quantities and on scalar quantities Storage where the scalar size is byte, halfword, word, and quadword. The Power ISA uses instructions that are four bytes long and word-aligned (VLE has different instruction characteristics; see Book VLE). It provides for byte, halfword, word, and doubleword operand fetches and stores between storage and a set of 32 General Purpose Registers (GPRs). It provides for Storage word and doubleword operand fetches, and stores Instructions from Storage between storage and a set of 32 Floating-Point Regis- ters (FPRs). It also provides for byte, halfword, word, and quadword operand fetches and stores between Figure 2. Logical processing model storage and a set of 32 Vector Registers (VRs). Signed integers are represented in two's complement form. There are no computational instructions that modify storage; instructions that reference storage may refor- mat the data (e.g. load halfword algebraic). To use a storage operand in a computation and then modify the same or another storage location, the contents of the storage operand must be loaded into a register, modi- fied, and then stored back to the target location. Figure 2 is a logical representation of instruction pro- cessing. Figure 3 shows the registers of the Power ISA User Instruction Set Architecture. 12 Power ISATM I Version 2.05 CR Category: Floating-Point, Decimal 32 63 Floating-Point: "Condition Register" on page 30 FPR 0 FPR 1 LR 0 63 ... "Link Register" on page 31 ... FPR 30 CTR FPR 31 0 63 0 63 "Count Register" on page 31 "Floating-Point Registers" on page 101 FPSCR GPR 0 0 63 GPR 1 "Floating-Point Registers" on page 101 and "DFP ... Usage of Floating-Point Registers" on page 144. ... Category: Vector: GPR 30 VR 0 GPR 31 VR 1 0 63 ... "General Purpose Registers" on page 42 ... VR 30 XER VR 31 0 63 0 127 "Fixed-Point Exception Register" on page 42 "Vector Registers" on page 195 Category: Embedded: VSCR 96 127 SPRG4 "Vector Status and Control Register" on page 195 SPRG5 SPRG6 Category: SPE: SPRG7 Accumulator 0 63 0 63 "Software-use SPRs" on page 43. "Accumulator" on page 262 Category: Embedded, Vector SPEFSCR 32 63 VRSAVE "Signal Processing and Embedded Floating-Point Status 32 63 and Control Register" on page 262 "VR Save Register" on page 196 Figure 3. Power ISA user register set Chapter 1. Introduction 13 Version 2.05 1.5 Computation modes 1.6 Instruction formats All instructions are four bytes long and word-aligned 1.5.1 Modes [Category: Server] (except for VLE instructions; see Book VLE). Thus, whenever instruction addresses are presented to the Processors provide two execution modes, 64-bit mode processor (as in Branch instructions) the low-order two and 32-bit mode. In both of these modes, instructions bits are ignored. Similarly, whenever the processor that set a 64-bit register affect all 64 bits. The computa- develops an instruction address the low-order two bits tional mode controls how the effective address is inter- are zero. preted, how status bits are set, how the Link Register is set by Branch instructions in which LK=1, and how the Bits 0:5 always specify the opcode (OPCD, below). Count Register is tested by Branch Conditional instruc- Many instructions also have an extended opcode (XO, tions. Nearly all instructions are available in both below). The remaining bits of the instruction contain modes (the only exceptions are a few instructions that one or more fields as shown below for the different are defined in Book III-S). In both modes, effective instruction formats. address computations use all 64 bits of the relevant The format diagrams given below show horizontally all registers (General Purpose Registers, Link Register, valid combinations of instruction fields. The diagrams Count Register, etc.) and produce a 64-bit result. How- include instruction fields that are used only by instruc- ever, in 32-bit mode the high-order 32 bits of the com- tions defined in Book II or in Book III. puted effective address are ignored for the purpose of addressing storage; see Section 1.10.3 for additional details. Split Field Notation In some cases an instruction field occupies more than 1.5.2 Modes [Category: Embed- one contiguous sequence of bits, or occupies one con- tiguous sequence of bits that are used in permuted ded] order. Such a field is called a split field. In the format diagrams given below and in the individual instruction Processors may provide 32-bit mode, or both 64-bit layouts, the name of a split field is shown in small let- mode and 32-bit mode. The modes differ in the follow- ters, once for each of the contiguous sequences. In the ing ways. RTL description of an instruction having a split field, 1 In 64-bit mode, the processor behaves as and in certain other places where individual bits of a described for 64-bit mode in the Server environ- split field are identified, the name of the field in small ment; see Section 1.5.1. letters represents the concatenation of the sequences 1 In 32-bit mode, instructions other than SP, from left to right. In all other places, the name of the SP.Embedded Float Scalar Double, and field is capitalized and represents the concatenation of SP.Embedded Float Vector use only the lower 32 the sequences in some order, which need not be left to bits of a GPR and produce a 32-bit result. Results right, as described for each affected instruction. written to the GPRs write only the lower 32-bits and the upper 32 bits are undefined except for SP.Embedded Float Scalar Single instructions which leave the upper 32-bits unchanged. SP, SP.Embedded Float Scalar Double, and SP.Embedded Float Vector instructions use all 64 bits of a GPR and produce a 64-bit result regard- 1.6.1 I-FORM less of the mode. 0 6 30 31 Instructions that set condition bits do so based on OPCD LI AA LK the 32-bit result computed. Effective addresses Figure 4. I instruction format and all SPRs operate on the lower 32 bits only unless otherwise stated. The instructions in the 64-Bit category are not necessarily available; if 1.6.2 B-FORM they are not available, attempting to execute such 0 6 11 16 30 31 an instruction causes the system illegal instruction OPCD BO BI BD AA LK error handler to be invoked. Figure 5. B instruction format Floating-Point and and Decimal Floating-Point instruc- tions operate on FPRs, and Vector instructions operate VPRs, independent of mode. 14 Power ISATM I Version 2.05 1.6.3 SC-FORM 0 6 11 16 20 27 30 31 OPCD /// /// // LEV // 1 / OPCD /// /// /// /// // 1 / Figure 6. SC instruction format 1.6.4 D-FORM 0 6 11 16 31 OPCD RT RA D OPCD RT RA SI OPCD RS RA D OPCD RS RA UI OPCD BF / L RA SI OPCD BF / L RA UI OPCD TO RA SI OPCD FRT RA D OPCD FRS RA D Figure 7. D instruction format 1.6.5 DS-FORM 0 6 11 16 30 31 OPCD RT RA DS XO OPCD RS RA DS XO OPCD RSp RA DS XO OPCD FRTp RA DS XO OPCD FRSp RA DS XO Figure 8. DS instruction format 1.6.6 DQ-FORM 0 6 11 16 28 31 OPCD RTp RA DQ /// Figure 9. DQ instruction format Chapter 1. Introduction 15 Version 2.05 1.6.7 X-FORM 0 6 11 16 21 31 0 6 11 16 21 31 OPCD FRTp S FRBp XO Rc OPCD RT RA /// XO / OPCD FRS RA RB XO / OPCD RT RA RB XO / OPCD FRSp RA RB XO / OPCD RT RA RB XO EH OPCD BT /// /// XO Rc OPCD RT RA NB XO / OPCD /// RA RB XO / OPCD RT / SR /// XO / OPCD /// /// RB XO / OPCD RT /// RB XO / OPCD /// /// /// XO / OPCD RT /// RB XO 1 OPCD /// /// E /// XO / OPCD RT /// /// XO / OPCD // IH /// /// XO / OPCD RS RA RB XO Rc OPCD ??? RA RB XO ? OPCD RT RA RB XO Rc OPCD ??? ??? ??? XO / OPCD RS RA RB XO 1 OPCD VRT RA RB XO / OPCD RS RA RB XO / OPCD VRS RA RB XO / OPCD RS RA NB XO / OPCD MO /// /// XO / OPCD RS RA SH XO Rc Figure 10. X Instruction Format OPCD RS RA /// XO Rc OPCD RS RA /// XO / 1.6.8 XL-FORM OPCD RS / SR /// XO / 0 6 11 16 21 31 OPCD RS /// RB XO / OPCD BT BA BB XO / OPCD RS /// /// XO / OPCD BO BI /// BH XO LK OPCD RS /// L /// XO / OPCD BF // BFA // /// XO / OPCD TH RA RB XO / OPCD /// /// /// XO / OPCD BF / L RA RB XO / OPCD BF // FRA FRB XO / Figure 11. XL instruction format OPCD BF // BFA // /// XO / OPCD BF // /// W U / XO Rc 1.6.9 XFX-FORM OPCD BF // /// /// XO / 0 6 11 21 31 OPCD TH RA RB XO / OPCD RT spr XO / OPCD / CT /// /// XO / OPCD RT tbr XO / OPCD / CT RA RB XO / OPCD RT 0 /// XO / OPCD /// L RA RB XO / OPCD RT 1 FXM / XO / OPCD /// L /// RB XO / OPCD RT dcr XO / OPCD /// L /// /// XO / OPCD RT pmrn XO / OPCD TO RA RB XO / OPCD DUI DUIS XO / OPCD FRT RA RB XO / OPCD RS 0 FXM / XO / OPCD FRT FRA FRB XO / OPCD RS 1 FXM / XO / OPCD FRTp RA RB XO / OPCD RS spr XO / OPCD FRT /// FRB XO Rc OPCD RS dcr XO / OPCD FRT /// FRBp XO Rc OPCD RS pmrn XO / OPCD FRT /// /// XO Rc Figure 12. XFX instruction format OPCD FRTp /// FRB XO Rc OPCD FRTp /// FRBp XO Rc 1.6.10 XFL-FORM OPCD FRTp FRA FRBp XO Rc 0 6 7 15 16 21 31 OPCD FRTp FRAp FRBp XO Rc OPCD L FLM W FRB XO Rc OPCD BF // FRA FRBp XO / Figure 13. XFL instruction format OPCD BF // FRAp FRBp XO / OPCD FRT S FRB XO Rc Figure 10. X Instruction Format 16 Power ISATM I Version 2.05 1.6.11 XS-FORM 1.6.18 VC-FORM 0 6 11 16 21 30 31 0 6 11 16 21 22 31 OPCD RS RA sh XO sh Rc OPCD VRT VRA VRB Rc XO Figure 14. XS instruction format Figure 21. VC instruction format 1.6.12 XO-FORM 0 6 11 16 21 22 31 1.6.19 VX-FORM OPCD RT RA RB OE XO Rc OPCD RT RA RB / XO Rc 0 6 11 16 21 31 OPCD RT RA RB / XO / OPCD VRT VRA VRB XO OPCD RT RA /// OE XO Rc OPCD VRT /// VRB XO OPCD VRT UIM VRB XO Figure 15. XO instruction format OPCD VRT / UIM VRB XO OPCD VRT // UIM VRB XO 1.6.13 A-FORM OPCD VRT /// UIM VRB XO 0 6 11 16 21 26 31 OPCD VRT SIM /// XO OPCD FRT FRA FRB FRC XO Rc OPCD VRT /// XO OPCD FRT FRA FRB /// XO Rc OPCD /// VRB XO OPCD FRT FRA /// FRC XO Rc OPCD FRT /// FRB /// XO Rc Figure 22. VX instruction format OPCD FRT /// L FRB /// XO Rc OPCD RT RA RB BC XO / 1.6.20 EVX-FORM Figure 16. A instruction format 0 6 11 16 21 31 OPCD RS RA RB XO 1.6.14 M-FORM OPCD RS RA UI XO OPCD RT /// RB XO 0 6 11 16 21 26 31 OPCD RS RA RB MB ME Rc OPCD RT RA RB XO OPCD RT RA /// XO OPCD RS RA SH MB ME Rc OPCD RT UI RB XO Figure 17. M instruction format OPCD BF // RA RB XO OPCD RT RA UI XO 1.6.15 MD-FORM OPCD RT SI /// XO 0 6 11 16 21 27 30 31 OPCD RS RA sh mb XO sh Rc Figure 23. EVX instruction format OPCD RS RA sh me XO sh Rc Figure 18. MD instruction format 1.6.21 EVS-FORM 0 6 11 16 21 29 31 1.6.16 MDS-FORM OPCD RT RA RB XO BFA 0 6 11 16 21 27 31 OPCD RS RA RB mb XO Rc Figure 24. EVS instruction format OPCD RS RA RB me XO Rc Figure 19. MDS instruction format 1.6.17 VA-FORM 0 6 11 16 21 26 31 OPCD VRT VRA VRB VRC XO OPCD VRT VRA VRB / SHB XO Figure 20. VA instruction format Chapter 1. Introduction 17 Version 2.05 1.6.22 Z22-FORM Field used to specify a bit in the CR to be used as a source. 0 6 11 15 16 22 31 BC (21:25) OPCD BF // FRA DCM XO / Field used to specify a bit in the CR to be used as OPCD BF // FRAp DCM XO / a source. OPCD BF // FRA DGM XO / BD (16:29) OPCD BF // FRAp DGM XO / Immediate field used to specify a 14-bit signed two's complement branch displacement which is OPCD FRT FRA SH XO Rc concatenated on the right with 0b00 and OPCD FRTp FRAp SH XO Rc sign-extended to 64 bits. BF (6:8) Figure 25. Z22 instruction format Field used to specify one of the CR fields or one of the FPSCR fields to be used as a target. 1.6.23 Z23-FORM BFA (11:13 or 29:31) 0 6 11 16 21 23 31 Field used to specify one of the CR fields or one of the FPSCR fields to be used as a source. OPCD FRT TE FRB RMC XO Rc OPCD FRTp TE FRBp RMC XO Rc BH (19:20) Field used to specify a hint in the Branch Condi- OPCD FRT FRA FRB RMC XO Rc tional to Link Register and Branch Conditional to OPCD FRTp FRA FRBp RMC XO Rc Count Register instructions. The encoding is described in Section 2.4, "Branch Instructions". OPCD FRTp FRAp FRBp RMC XO Rc BI (11:15) OPCD FRT /// R FRB RMC XO Rc Field used to specify a bit in the CR to be tested by OPCD FRTp /// R FRBp RMC XO Rc a Branch Conditional instruction. Figure 26. Z23 instruction format BO (6:10) Field used to specify options for the Branch Condi- tional instructions. The encoding is described in 1.6.24 Instruction Fields Section 2.4, "Branch Instructions". AA (30) BT (6:10) Absolute Address bit. Field used to specify a bit in the CR or in the 0 The immediate field represents an FPSCR to be used as a target. address relative to the current instruction CT (7:10) address. For I-form branches the effec- Field used in X-form instructions to specify a cache tive address of the branch target is the target (see Section 3.3.2 of Book II). sum of the LI field sign-extended to 64 bits and the address of the branch instruc- D (16:31) tion. For B-form branches the effective Immediate field used to specify a 16-bit signed address of the branch target is the sum of two's complement integer which is sign-extended the BD field sign-extended to 64 bits and to 64 bits. the address of the branch instruction. 1 The immediate field represents an abso- DCM (16:21) lute address. For I-form branches the Immediate field used as the Data Class Mask. effective address of the branch target is DCR (11:20) the LI field sign-extended to 64 bits. For Field used by the Move To/From Device Control B-form branches the effective address of Register instructions (see Book III-E). the branch target is the BD field sign-extended to 64 bits. DGM (16:21) Immediate field used as the Data Group Mask. BA (11:15) Field used to specify a bit in the CR to be used as DQ (16:27) a source. Immediate field used to specify a 12-bit signed two's complement integer which is concatenated BB (16:20) on the right with 0b0000 and sign-extended to 64 bits. 18 Power ISATM I Version 2.05 DS (16:29) Field mask used to identify the CR fields that are to Immediate field used to specify a 14-bit signed be written by the mtcrf and mtocrf instructions, or two's complement integer which is concatenated read by the mfocrf instruction. on the right with 0b00 and sign-extended to 64 bits. IH (8:10) Field used to specify a hint in the SLB Invalidate DUI (6:10) All instruction. The meaning is described in Field used by the dnh instruction (see Book II). Section 5.9.3.1, "SLB Management Instructions", in Book III-S. DUIS (11:20) Field used by the dnh instruction (see Book II). L (6) Field used to specify whether the mtfsf instruction E (16) updates the entire FPSCR. Field used by the Write MSR External Enable instruction (see Book III-E). L (10 or 15) Field used to specify whether a fixed-point Com- EH (31) pare instruction is to compare 64-bit numbers or Field used to specify a hint in the Load and 32-bit numbers. Reserve instructions. The meaning is described in Section 3.4.2, "Load and Reserve and Store Con- Field used by the Data Cache Block Flush instruc- ditional Instructions", in Book II. tion (see Section 3.3.2 of Book II). Field used by the Move To Machine State Register FLM (7:14) and TLB Invalidate Entry instructions (see Book Field mask used to identify the FPSCR fields that III). are to be updated by the mtfsf instruction. Field used to specify whether the Floating-Point Estimate instructions may treat denormalized FRA (11:15) operands as 0. Field used to specify an FPR to be used as a source. L (9:10) Field used by the Synchronize instruction (see FRAp (11:15) Section 3.4.1 of Book II). Field used to specify an even/odd pair of FPRs to be concatenated and used as a source. LEV (20:26) Field used by the System Call instruction. FRB (16:20) Field used to specify an FPR to be used as a LI (6:29) source. Immediate field used to specify a 24-bit signed two's complement integer which is concatenated FRBp (16:20) on the right with 0b00 and sign-extended to 64 Field used to specify an even/odd pair of FPRs to bits. be concatenated and used as a source. LK (31) FRC (21:25) LINK bit. Field used to specify an FPR to be used as a 0 Do not set the Link Register. source. 1 Set the Link Register. The address of the instruction following the Branch instruction FRS (6:10) is placed into the Link Register. Field used to specify an FPR to be used as a source. MB (21:25) and ME (26:30) Fields used in M-form instructions to specify a FRSp (6:10) 64-bit mask consisting of 1-bits from bit MB+32 Field used to specify an even/odd pair of FPRs to through bit ME+32 inclusive and 0-bits elsewhere, be concatenated and used as a source. as described in Section 3.3.13, "Fixed-Point FRT (6:10) Rotate and Shift Instructions" on page 82. Field used to specify an FPR to be used as a tar- MB (21:26) get. Field used in MD-form and MDS-form instructions FRTp (6:10) to specify the first 1-bit of a 64-bit mask, as Field used to specify an even/odd pair of FPRs to described in Section 3.3.13, "Fixed-Point Rotate be concatenated and used as a target. and Shift Instructions" on page 82. FXM (12:19) ME (21:26) Chapter 1. Introduction 19 Version 2.05 Field used in MD-form and MDS-form instructions Immediate field that specifies signed versus to specify the last 1-bit of a 64-bit mask, as unsigned conversion. described in Section 3.3.13, "Fixed-Point Rotate and Shift Instructions" on page 82. SH (16:20, or 16:20 and 30, or 16:21) Field used to specify a shift amount. MO (6:10) Field used in X-form instructions to specify a sub- SHB (22:25) set of storage accesses. Field used to specify a shift amount in bytes. NB (16:20) SI (16:31 or 11:15) Field used to specify the number of bytes to move Immediate field used to specify a 16-bit signed in an immediate Move Assist instruction. integer. OPCD (0:5) SIM (11:15) Primary opcode field. Immediate field used to specify a 5-bit signed inte- ger. OE (21) Field used by XO-form instructions to enable set- SP (11:12) ting OV and SO in the XER. Immediate field that specifies signed versus unsigned conversion. PMRN (11:20) Field used to specify a Performance Monitor Reg- SPR (11:20) ister for the mfpmr and mtpmr instructions. Field used to specify a Special Purpose Register for the mtspr and mfspr instructions. R (15) Immediate field that specifies whether the RMC is SR (12:15) specifiying the primary or secondary encoding Field used by the Segment Register Manipulation instructions (see Book III-S). RA (11:15) Field used to specify a GPR to be used as a TBR (11:20) source or as a target. Field used by the Move From Time Base instruc- tion (see Section 4.2.1 of Book II). RB (16:20) Field used to specify a GPR to be used as a TE (11:15) source. Immediate field that specifies a DFP exponent. Rc (21 OR 31) TH (6:10) RECORD bit. Field used by the data stream variant of the dcbt 0 Do not alter the Condition Register. and dcbtst instructions (see Section 3.3.2 of Book 1 Set Condition Register Field 0, Field 1, or II). Field 6 as described in Section 2.3.1, TO (6:10) "Condition Register" on page 30. Field used to specify the conditions on which to RMC (21:22) trap. The encoding is described in Section 3.3.10, Immediate field used for DFP rounding mode con- "Fixed-Point Trap Instructions" on page 73. trol. U (16:19) RS (6:10) Immediate field used as the data to be placed into Field used to specify a GPR to be used as a a field in the FPSCR. source. UI (11:15, 16:20, or 16:31) RSp (6:10) Immediate field used to specify an unsigned inte- Field used to specify an even/odd pair of GPRs to ger. be concatenated and used as a source. UIM (11:15, 12:15, 13:15, 14:15) RT (6:10) Immediate field used to specify an unsigned inte- Field used to specify a GPR to be used as a target. ger. RTp (6:10) VRA (11:15) Field used to specify an even/odd pair of GPRs to Field used to specify a VR to be used as a source. be concatenated and used as a target. VRB (16:20) S (11) Field used to specify a VR to be used as a source. 20 Power ISATM I Version 2.05 VRC (21:25) 1.7.3 Reserved Instruction Class Field used to specify a VR to be used as a source. This class of instructions contains the set of instruc- VRS (6:10) tions described in Appendix E of Book Appendices. Field used to specify a VR to be used as a source. Reserved instructions are allocated to specific pur- VRT (6:10) poses that are outside the scope of the Power ISA. Field used to specify a VR to be used as a target. Any attempt to execute a reserved instruction will: W (15) 1 perform the actions described by the implementa- Field used by the mtfsfi and mtfsf instructions to tion if the instruction is implemented; or specify the target word in the FPSCR. 1 cause the system illegal instruction error handler to XO (21:28, 21:29, 21:30, 21:31, 22:30, 22:31, 23:30, be invoked if the instruction is not implemented. 26:30, 26:31, 27:29, 27:30, or 30:31) Extended opcode field. 1.8 Forms of Defined Instruc- tions 1.7 Classes of Instructions An instruction falls into exactly one of the following 1.8.1 Preferred Instruction Forms three classes: Some of the defined instructions have preferred forms. Defined For such an instruction, the preferred form will execute Illegal in an efficient manner, but any other form may take sig- Reserved nificantly longer to execute than the preferred form. The class is determined by examining the opcode, and Instructions having preferred forms are: the extended opcode if any. If the opcode, or combina- 1 the Condition Register Logical instructions tion of opcode and extended opcode, is not that of a 1 the Load/Store Multiple instructions defined instruction or a reserved instruction, the 1 the Load/Store String instructions instruction is illegal. 1 the Or Immediate instruction (preferred form of no-op) 1.7.1 Defined Instruction Class 1 the Move To Condition Register Fields instruction This class of instructions contains all the instructions defined in this document. 1.8.2 Invalid Instruction Forms A defined instruction can have preferred and/or invalid Some of the defined instructions can be coded in a forms, as described in Section 1.8.1, "Preferred Instruc- form that is invalid. An instruction form is invalid if one tion Forms" and Section 1.8.2, "Invalid Instruction or more fields of the instruction, excluding the opcode Forms". Instructions that are part of a category that is field(s), are coded incorrectly in a manner that can be not supported are treated as illegal instructions. deduced by examining only the instruction encoding. In general, any attempt to execute an invalid form of an 1.7.2 Illegal Instruction Class instruction will either cause the system illegal instruc- tion error handler to be invoked or yield boundedly This class of instructions contains the set of instruc- undefined results. Exceptions to this rule are stated in tions described in Appendix D of Book Appendices. Ille- the instruction descriptions. gal instructions are available for future extensions of Some instruction forms are invalid because the instruc- the Power ISA ; that is, some future version of the tion contains a reserved value in a defined field (see Power ISA may define any of these instructions to per- Section 1.3.3 on page 5); these invalid forms are not form new functions. discussed further. All other invalid forms are identified Any attempt to execute an illegal instruction will cause in the instruction descriptions. the system illegal instruction error handler to be References to instructions elsewhere in this document invoked and will have no other effect. assume the instruction form is not invalid, unless other- An instruction consisting entirely of binary 0s is guaran- wise stated or obvious from context. teed always to be an illegal instruction. This increases the probability that an attempt to execute data or unini- tialized storage will result in the invocation of the sys- tem illegal instruction error handler. Chapter 1. Introduction 21 Version 2.05 1 the execution of a System Call instruction (system Assembler Note service program) Assemblers should report uses of invalid instruc- tion forms as errors. 1 the execution of a Trap instruction that traps (sys- tem trap handler) 1 the execution of a floating-point instruction that 1.8.3 Reserved-no-op Instructions causes a floating-point enabled exception to exist (system floating-point enabled exception error [Category: Phased-In (sV2.07)] handler) Reserved-no-op instructions include the following 1 the execution of an auxiliary processor instruction extended opcodes under primary opcode 31: 530, 562, that causes an auxiliary processor enabled excep- 594, 626, 658, 690, 722, and 754. tion to exist (system auxiliary processor enabled Reserved-no-op instructions are provided in the archi- exception error handler) tecture to anticipate the eventual adoption of perfor- The exceptions that can be caused by an asynchro- mance hint instructions to the architecture. For these nous event are described in Book III. instructions, which cause no visible change to archi- tected state, employing a reserved-no-op opcode will The invocation of the system error handler is precise, allow software to use this new capability on new imple- except that the invocation of the auxiliary processor mentations that support it while remaining compatible enabled exception error handler may be imprecise, and with existing implementations that may not support the if one of the imprecise modes for invoking the system new function. floating-point enabled exception error handler is in effect (see page 109), then the invocation of the system When a reserved-no-op instruction is executed, no floating-point enabled exception error handler may also operation is performed. be imprecise. When the system error handler is Reserved-no-op instructions are not assigned instruc- invoked imprecisely, the excepting instruction does not tion names or mnemonics. There are no individual appear to complete before the next instruction starts descriptions of reserved-no-op instructions in this docu- (because one of the effects of the excepting instruction, ment. namely the invocation of the system error handler, has not yet occurred). Additional information about exception handling can be 1.9 Exceptions found in Book III. There are two kinds of exception, those caused directly by the execution of an instruction and those caused by an asynchronous event. In either case, the exception may cause one of several components of the system software to be invoked. The exceptions that can be caused directly by the exe- cution of an instruction include the following: 1 an attempt to execute an illegal instruction, or an attempt by an application program to execute a "privileged" instruction (see Book III) (system ille- gal instruction error handler or system privileged instruction error handler) 1 the execution of a defined instruction using an invalid form (system illegal instruction error han- dler or system privileged instruction error handler) 1 an attempt to execute an instruction that is not pro- vided by the implementation (system illegal instruction error handler) 1 an attempt to access a storage location that is unavailable (system instruction storage error han- dler or system data storage error handler) 1 an attempt to access storage with an effective address alignment that is invalid for the instruction (system alignment error handler) 22 Power ISATM I Version 2.05 1.10 Storage Addressing When a storage operand of length N bytes starting at effective address EA is copied between storage and a A program references storage using the effective register that is R bytes long (i.e., the register contains address computed by the processor when it executes a bytes numbered from 0, most significant, through R-1, Storage Access or Branch instruction (or certain other least significant), the bytes of the operand are placed instructions described in Book II and Book III), or when into the register or into storage in a manner that it fetches the next sequential instruction. depends on the byte ordering for the storage access as shown in Figure 27, unless otherwise specified in the Bytes in storage are numbered consecutively starting instruction description. with 0. Each number is the address of the correspond- ing byte. Big-Endian Byte Ordering The byte ordering (Big-Endian or Little-Endian) for a Load Store storage access is specified by the operating system. In for i=0 to N-1: for i=0 to N-1: the Embedded environment this ordering is a page RT(R-N)+i1 MEM(EA+i,1) MEM(EA+i,1) 1 (RS)(R-N)+i attribute (see Book II) and is specified independently Little-Endian Byte Ordering for each virtual page, while in the Server environment it Load Store is a mode (see Book III-S) and applies to all storage. for i=0 to N-1: for i=0 to N-1: RT(R-1)-i 1 MEM(EA+i,1) MEM(EA+i,1) 1 (RS)(R-1)-i 1.10.1 Storage Operands Notes: 1. In this table, subscripts refer to bytes in a register Storage operands may be bytes, halfwords, words, rather than to bits as defined in Section 1.3.2. doublewords, or quadwords (see book III), or, for the 2. This table does not apply to the lvebx, lvehx, Load/Store Multiple and Move Assist instructions, a lvewx, stvebx, stvehx, and stvewx instructions. sequence of bytes or words. The address of a storage operand is the address of its first byte (i.e., of its low- Figure 27. Storage operands and byte ordering est-numbered byte). Figure 28 shows an example of a C language Operand length is implicit for each instruction. structure s containing an assortment of scalars and one character string. The value assumed to be in each The operand of a single-register Storage Access structure element is shown in hex in the C comments; instruction or quadword Load or Store instruction, has a these values are used below to show how the bytes "natural" alignment boundary equal to the operand making up each structure element are mapped into length. In other words, the "natural" address of an oper- storage. It is assumed that structure s is compiled for and is an integral multiple of the operand length. A stor- 32-bit mode or for a 32-bit implementation. (This affects age operand is said to be aligned if it is aligned at its the length of the pointer to c.) natural boundary; otherwise it is said to be unaligned. See the following table. C structure mapping rules permit the use of padding (skipped bytes) in order to align the scalars on desir- Operand Length Addr60:63 if aligned able boundaries. Figures 29 and 30 show each scalar Byte 8 bits xxxx aligned at its natural boundary. This alignment intro- Halfword 2 bytes xxx0 duces padding of four bytes between a and b, one byte Word 4 bytes xx00 between d and e, and two bytes between e and f. The same amount of padding is present for both Big-Endian Doubleword 8 bytes x000 and Little-Endian mappings. Quadword 16 bytes 0000 Note: An "x" in an address bit position indicates that The Big-Endian mapping of structure s is shown in the bit can be 0 or 1 independent of the contents of Figure 29. Addresses are shown in hex at the left of other bits in the address. each doubleword, and in small figures below each byte. The contents of each byte, as indicated in the C exam- The concept of alignment is also applied more gener- ple in Figure 28, are shown in hex (as characters for ally, to any datum in storage. For example, a 12-byte the elements of the string). datum in storage is said to be word-aligned if its address is an integral multiple of 4. The Little-Endian mapping of structure s is shown in Figure 30. Doublewords are shown laid out from right Some instructions require their storage operands to to left, which is the common way of showing storage have certain alignments. In addition, alignment may maps for processors that implement only Little-Endian affect performance. For single-register Storage Access byte ordering. instructions and quadword Load and Store instructions, the best performance is obtained when storage oper- ands are aligned. Additional effects of data placement on performance are described in Chapter 2 of Book II. Chapter 1. Introduction 23 Version 2.05 struct { int a; /* 0x1112_1314 word */ double b; /* 0x2122_2324_2526_2728 doubleword */ char * c; /* 0x3132_3334 word */ char d[7]; /* `A', `B', `C', `D', `E', `F', `G' array of bytes */ short e; /* 0x5152 halfword */ int f; /* 0x6162_6364 word */ } s; Figure 28. C structure `s', showing values of within the instruction depend on the byte ordering for elements the storage access as shown in Figure 31. Big-Endian Byte Ordering 00 11 12 13 14 for i=0 to 3: 00 01 02 03 04 05 06 07 insti 1 MEM(EA+i,1) 08 21 22 23 24 25 26 27 28 Little-Endian Byte Ordering 08 09 0A 0B 0C 0D 0E 0F for i=0 to 3: 10 31 32 33 34 `A' `B' `C' `D' inst3-i 1 MEM(EA+i,1) 10 11 12 13 14 15 16 17 Note: In this table, subscripts refer to 18 `E' `F' `G' 51 52 bytes of the instruction rather than 18 19 1A 1B 1C 1D 1E 1F to bits as defined in Section 1.3.2. 20 61 62 63 64 Figure 31. Instructions and byte ordering 20 21 22 23 Figure 32 shows an example of a small assembly lan- Figure 29. Big-Endian mapping of structure `s' guage program p. loop: 11 12 13 14 00 cmplwi r5,0 beq done 07 06 05 04 03 02 01 00 lwzux r4,r5,r6 21 22 23 24 25 26 27 28 08 add r7,r7,r4 0F 0E 0D 0C 0B 0A 09 08 subi r5,r5,4 b loop `D' `C' `B' `A' 31 32 33 34 10 17 16 15 14 13 12 11 10 done: 51 52 `G' `F' `E' 18 stw r7,total 1F 1E 1D 1C 1B 1A 19 18 Figure 32. Assembly language program `p' 61 62 63 64 20 The Big-Endian mapping of program p is shown in 23 22 21 20 Figure 33 (assuming the program starts at address 0). Figure 30. Little-Endian mapping of structure `s' 00 loop: cmplwi r5,0 beq done 1.10.2 Instruction Fetches 00 01 02 03 04 05 06 07 Instructions are always four bytes long and 08 lwzux r4,r5,r6 add r7,r7,r4 word-aligned (except for VLE instructions; see Book 08 09 0A 0B 0C 0D 0E 0F VLE). 10 subi r5,r5,4 b loop When an instruction starting at effective address EA is 10 11 12 13 14 15 16 17 fetched from storage, the relative order of the bytes 18 done: stw r7,total 18 19 1A 1B 1C 1D 1E 1F Figure 33. Big-Endian mapping of program `p' The Little-Endian mapping of program p is shown in Figure 34. 24 Power ISATM I Version 2.05 beq done loop: cmplwi r5,0 00 07 06 05 04 03 02 01 00 add r7,r7,r4 lwzux r4,r5,r6 08 0F 0E 0D 0C 0B 0A 09 08 b loop subi r5,r5,4 10 17 16 15 14 13 12 11 10 done: stw r7,total 18 1F 1E 1D 1C 1B 1A 19 18 Figure 34. Little-Endian mapping of program `p' Chapter 1. Introduction 25 Version 2.05 Programming Note The terms Big-Endian and Little-Endian come from forbidden, and the whole Party rendered incapable Part I, Chapter 4, of Jonathan Swift's Gulliver's Travels. by Law of holding Employments. During the Here is the complete passage, from the edition printed Course of these Troubles, the Emperors of Ble- in 1734 by George Faulkner in Dublin. fuscu did frequently expostulate by their Ambassa- dors, accusing us of making a Schism in Religion, ... our Histories of six Thousand Moons make no by offending against a fundamental Doctrine of our Mention of any other Regions, than the two great great Prophet Lustrog, in the fifty-fourth Chapter of Empires of Lilliput and Blefuscu. Which two mighty the Brundrecal, (which is their Alcoran.) This, how- Powers have, as I was going to tell you, been ever, is thought to be a mere Strain upon the text: engaged in a most obstinate War for six and thirty For the Words are these; That all true Believers Moons past. It began upon the following Occasion. shall break their Eggs at the convenient End: and It is allowed on all Hands, that the primitive Way of which is the convenient End, seems, in my humble breaking Eggs before we eat them, was upon the Opinion, to be left to every Man's Conscience, or larger End: But his present Majesty's Grand-father, at least in the Power of the chief Magistrate to while he was a Boy, going to eat an Egg, and determine. Now the Big-Endian Exiles have found breaking it according to the ancient Practice, hap- so much Credit in the Emperor of Blefuscu's Court; pened to cut one of his Fingers. Whereupon the and so much private Assistance and Encourage- Emperor his Father, published an Edict, com- ment from their Party here at home, that a bloody manding all his Subjects, upon great Penalties, to War has been carried on between the two Empires break the smaller End of their Eggs. The People so for six and thirty Moons with various Success; dur- highly resented this Law, that our Histories tell us, ing which Time we have lost Forty Capital Ships, there have been six Rebellions raised on that and a much greater Number of smaller Vessels, Account; wherein one Emperor lost his Life, and together with thirty thousand of our best Seamen another his Crown. These civil Commotions were and Soldiers; and the Damage received by the constantly fomented by the Monarchs of Blefuscu; Enemy is reckoned to be somewhat greater than and when they were quelled, the Exiles always fled ours. However, they have now equipped a numer- for Refuge to that Empire. It is computed that ous Fleet, and are just preparing to make a eleven Thousand Persons have, at several Times, Descent upon us: and his Imperial Majesty, placing suffered Death, rather than submit to break their great Confidence in your Valour and Strength, hath Eggs at the smaller End. Many hundred large Vol- commanded me to lay this Account of his Affairs umes have been published upon this Controversy: before you. But the Books of the Big-Endians have been long 1.10.3 Effective Address Calcula- In 64-bit mode, the entire 64-bit result comprises the 64-bit effective address. The effective address arith- tion metic wraps around from the maximum address, 264 - 1, to address 0, except that if the current instruc- An effective address is computed by the processor tion is at effective address 264 - 4 the effective address when executing a Storage Access or Branch instruction of the next sequential instruction is undefined. (or certain other instructions described in Book II, Book III, and Book VLE) when fetching the next sequential In 32-bit mode, the low-order 32 bits of the 64-bit result, instruction, or when invoking a system error handler. preceded by 32 0 bits, comprise the 64-bit effective The following provides an overview of this process. address for the purpose of addressing storage. When More detail is provided in the individual instruction an effective address is placed into a register by an descriptions. instruction or event, the value placed into the high-order 32 bits of the register differs between the Effective address calculations, for both data and Server environment and the Embedded environment. instruction accesses, use 64-bit two's complement 1 Server environment: addition. All 64 bits of each address component partici- - Load with Update and Store with Update pate in the calculation regardless of mode (32-bit or instructions set the high-order 32 bits of regis- 64-bit). In this computation one operand is an address ter RA to the high-order 32 bits of the 64-bit (which is by definition an unsigned number) and the result. second is a signed offset. Carries out of the most signif- - In all other cases (e.g., the Link Register when icant bit are ignored. set by Branch instructions having LK=1, Spe- cial Purpose Registers when set to an effec- 26 Power ISATM I Version 2.05 tive address by invocation of a system error this address component is the effective address of handler) the high-order 32 bits of the register the next instruction. are set to 0s except as described in the last 1 With B-form Branch instructions, the 14-bit BD field sentence of this paragraph. is concatenated on the right with 0b00 and 1 Embedded environment: sign-extended to form a 64-bit address compo- The high-order 32 bits of the register are set to an nent. If AA=0, this address component is added to undefined value. the address of the Branch instruction to form the As used to address storage, the effective address arith- effective address of the next instruction. If AA=1, metic appears to wrap around from the maximum this address component is the effective address of address, 232 - 1, to address 0, except that if the current the next instruction. instruction is at effective address 232 - 4 the effective address of the next sequential instruction is undefined. 1 With XL-form Branch instructions, bits 0:61 of the Link Register or the Count Register are concate- The 64-bit current instruction address is not affected by nated on the right with 0b00 to form the effective a change from 32-bit mode to 64-bit mode, but is address of the next instruction. affected by a change from 64-bit mode to 32-bit mode. In the latter case, the high-order 32 bits are set to 0. 1 With sequential instruction fetching, the value 4 is The same is true for the 64-bit next instruction address, added to the address of the current instruction to except as described in the last item of the list below. form the effective address of the next instruction, except that if the current instruction is at the maxi- RA is a field in the instruction which specifies an mum instruction effective address for the mode address component in the computation of an effective (264 - 4 in 64-bit mode, 232 - 4 in 32-bit mode) the address. A zero in the RA field indicates the absence effective address of the next sequential instruction of the corresponding address component. A value of is undefined. (There is one other exception to this zero is substituted for the absent component of the rule; this exception involves changing between effective address computation. This substitution is 32-bit mode and 64-bit mode and is described in shown in the instruction descriptions as (RA|0). Section 5.3.2 of Book III-S and Section 4.3.2 of Effective addresses are computed as follows. In the Book III-E.) descriptions below, it should be understood that "the If the size of the operand of a storage access instruc- contents of a GPR" refers to the entire 64-bit contents, tion is more than one byte, the effective address for independent of mode, but that in 32-bit mode only bits each byte after the first is computed by adding 1 to the 32:63 of the 64-bit result of the computation are used to effective address of the preceding byte. address storage. 1 With X-form instructions, in computing the effective address of a data element, the contents of the GPR designated by RB (or the value zero for lswi and stswi) are added to the contents of the GPR designated by RA or to zero if RA=0. 1 With D-form instructions, the 16-bit D field is sign-extended to form a 64-bit address compo- nent. In computing the effective address of a data element, this address component is added to the contents of the GPR designated by RA or to zero if RA=0. 1 With DS-form instructions, the 14-bit DS field is concatenated on the right with 0b00 and sign-extended to form a 64-bit address compo- nent. In computing the effective address of a data element, this address component is added to the contents of the GPR designated by RA or to zero if RA=0. 1 With I-form Branch instructions, the 24-bit LI field is concatenated on the right with 0b00 and sign-extended to form a 64-bit address compo- nent. If AA=0, this address component is added to the address of the Branch instruction to form the effective address of the next instruction. If AA=1, Chapter 1. Introduction 27 Version 2.05 28 Power ISATM I Version 2.05 Chapter 2. Branch Processor 2.1 Branch Processor Overview . . . . . . 29 2.5 Condition Register Instructions . . . . 37 2.2 Instruction Execution Order . . . . . . 29 2.5.1 Condition Register Logical Instruc- 2.3 Branch Processor Registers . . . . . . 30 tions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.3.1 Condition Register . . . . . . . . . . . . 30 2.5.2 Condition Register Field Instruction . 2.3.2 Link Register . . . . . . . . . . . . . . . . 31 38 2.3.3 Count Register. . . . . . . . . . . . . . . 31 2.6 System Call Instruction . . . . . . . . . 39 2.4 Branch Instructions . . . . . . . . . . . . . 31 2.1 Branch Processor Overview that causes the exception need not complete before the next instruction begins execution, with This chapter describes the registers and instructions respect to setting exception bits and (if the excep- that make up the Branch Processor facility. tion is enabled) invoking the system error handler. 1 A Store instruction modifies one or more bytes in an area of storage that contains instructions that 2.2 Instruction Execution Order will subsequently be executed. Before an instruc- tion in that area of storage is executed, software In general, instructions appear to execute sequentially, synchronization is required to ensure that the in the order in which they appear in storage. The instructions executed are consistent with the exceptions to this rule are listed below. results produced by the Store instruction. 1 Branch instructions for which the branch is taken cause execution to continue at the target address Programming Note specified by the Branch instruction. This software synchronization will generally be 1 Trap instructions for which the trap conditions are provided by system library programs (see satisfied, and System Call instructions, cause the Section 1.8 of Book II). Application programs appropriate system handler to be invoked. should call the appropriate system library pro- gram before attempting to execute modified 1 Exceptions can cause the system error handler to instructions. be invoked, as described in Section 1.9, "Excep- tions" on page 22. 1 Returning from a system service program, system trap handler, or system error handler causes exe- cution to continue at a specified address. The model of program execution in which the proces- sor appears to execute one instruction at a time, com- pleting each instruction before beginning to execute the next instruction is called the "sequential execution model". In general, the processor obeys the sequential execution model. For the instructions and facilities defined in this Book, the only exceptions to this rule are the following. 1 A floating-point exception occurs when the proces- sor is running in one of the Imprecise floating-point exception modes (see Section 4.4). The instruction Chapter 2. Branch Processor 29 Version 2.05 2.3 Branch Processor Registers Bit Description 0 Negative (LT) The result is negative. 2.3.1 Condition Register 1 Positive (GT) The Condition Register (CR) is a 32-bit register which The result is positive. reflects the result of certain operations, and provides a 2 Zero (EQ) mechanism for testing (and branching). The result is zero. CR 3 Summary Overflow (SO) 32 63 This is a copy of the contents of XERSO at the completion of the instruction. Figure 35. Condition Register The stwcx. and stdcx. instructions (see Section 3.4.2, The bits in the Condition Register are grouped into "Load and Reserve and Store Conditional Instructions", eight 4-bit fields, named CR Field 0 (CR0), ..., CR Field in Book II) also set CR Field 0. 7 (CR7), which are set in one of the following ways. For all floating-point instructions in which Rc=1, CR 1 Specified fields of the CR can be set by a move to Field 1 (bits 36:39 of the Condition Register) is set to the CR from a GPR (mtcrf, mtocrf). the Floating-Point exception status, copied from bits 1 A specified field of the CR can be set by a move to 0:3 of the Floating-Point Status and Control Register. the CR from another CR field (mcrf), from This occurs regardless of whether any exceptions are XER32:35 (mcrxr), or from the FPSCR (mcrfs). enabled, and regardless of whether the writing of the 1 CR Field 0 can be set as the implicit result of a result is suppressed (see Section 4.4, "Floating-Point fixed-point instruction. Exceptions" on page 108). These bits are interpreted 1 CR Field 1 can be set as the implicit result of a as follows. floating-point instruction. 1 CR Field 6 can be set as the implicit result of a Bit Description vector instruction. 0 Floating-Point Exception Summary (FX) 1 A specified CR field can be set as the result of a This is a copy of the contents of FPSCRFX at Compare instruction. the completion of the instruction. Instructions are provided to perform logical operations 1 Floating-Point Enabled Exception Sum- on individual CR bits and to test individual CR bits. mary (FEX) This is a copy of the contents of FPSCRFEX at For all fixed-point instructions in which Rc=1, and for the completion of the instruction. addic., andi., and andis., the first three bits of CR Field 0 (bits 32:34 of the Condition Register) are set by 2 Floating-Point Invalid Operation Excep- signed comparison of the result to zero, and the fourth tion Summary (VX) bit of CR Field 0 (bit 35 of the Condition Register) is This is a copy of the contents of FPSCRVX at copied from the SO field of the XER. "Result" here the completion of the instruction. refers to the entire 64-bit value placed into the target 3 Floating-Point Overflow Exception (OX) register in 64-bit mode, and to bits 32:63 of the 64-bit This is a copy of the contents of FPSCROX at value placed into the target register in 32-bit mode. the completion of the instruction. if (64-bit mode) For Compare instructions, a specified CR field is set to then M 1 0 reflect the result of the comparison. The bits of the else M 1 32 specified CR field are interpreted as follows. A com- if (target_register)M:63 < 0 then c 1 0b100 plete description of how the bits are set is given in the else if (target_register)M:63 > 0 then c 1 0b010 instruction descriptions in Section 3.3.9, "Fixed-Point else c 1 0b001 CR0 1 c || XERSO Compare Instructions" on page 71, Section 4.6.8, "Floating-Point Compare Instructions" on page 138, If any portion of the result is undefined, then the value and Section 7.3.9, "SPE Instruction Set" on page 268. placed into the first three bits of CR Field 0 is unde- fined. Bit Description The bits of CR Field 0 are interpreted as follows. 0 Less Than, Floating-Point Less Than (LT, FL) For fixed-point Compare instructions, (RA) < SI or (RB) (signed comparison) or (RA) The sequence of instruction execution can be changed SI or (RB) (signed comparison) or (RA) >u UI by the Branch instructions. Because all instructions are or (RB) (unsigned comparison). For floating- on word boundaries, bits 62 and 63 of the generated point Compare instructions, (FRA) > (FRB). branch target address are ignored by the processor in 2 Equal, Floating-Point Equal (EQ, FE) performing the branch. For fixed-point Compare instructions, (RA) = The Branch instructions compute the effective address SI, UI, or (RB). For floating-point Compare (EA) of the target in one of the following four ways, as instructions, (FRA) = (FRB). described in Section 1.10.3, "Effective Address Calcu- 3 Summary Overflow, Floating-Point Unor- lation" on page 26. dered (SO,FU) 1. Adding a displacement to the address of the For fixed-point Compare instructions, this is a Branch instruction (Branch or Branch Conditional copy of the contents of XERSO at the comple- with AA=0). tion of the instruction. For floating-point Com- pare instructions, one or both of (FRA) and 2. Specifying an absolute address (Branch or Branch (FRB) is a NaN. Conditional with AA=1). 3. Using the address contained in the Link Register 2.3.2 Link Register (Branch Conditional to Link Register). The Link Register (LR) is a 64-bit register. It can be 4. Using the address contained in the Count Register used to provide the branch target address for the (Branch Conditional to Count Register). Branch Conditional to Link Register instruction, and it In all four cases, in 32-bit mode the final step in the holds the return address after Branch instructions for address computation is setting the high-order 32 bits of which LK=1. the target address to 0. LR For the first two methods, the target addresses can be computed sufficiently ahead of the Branch instruction 0 63 that instructions can be prefetched along the target Figure 36. Link Register path. For the third and fourth methods, prefetching instructions along the target path is also possible pro- vided the Link Register or the Count Register is loaded 2.3.3 Count Register sufficiently ahead of the Branch instruction. The Count Register (CTR) is a 64-bit register. It can be Branching can be conditional or unconditional, and the used to hold a loop count that can be decremented dur- return address can optionally be provided. If the return ing execution of Branch instructions that contain an address is to be provided (LK=1), the effective address appropriately coded BO field. If the value in the Count of the instruction following the Branch instruction is Register is 0 before being decremented, it is -1 after- placed into the Link Register after the branch target ward. The Count Register can also be used to provide address has been computed; this is done regardless of the branch target address for the Branch Conditional to whether the branch is taken. Count Register instruction. For Branch Conditional instructions, the BO field speci- CTR fies the conditions under which the branch is taken, as shown in Figure 38. In the figure, M=0 in 64-bit mode 0 63 and M=32 in 32-bit mode. Figure 37. Count Register Chapter 2. Branch Processor 31 Version 2.05 provides a hint about the use of the instruction, as shown in Figure 40. BO Description BH Hint 0000z Decrement the CTR, then branch if the dec- 00 bclr[l]: The instruction is a subroutine remented CTRM:630 and CRBI=0 return 0001z Decrement the CTR, then branch if the dec- bcctr[l]: The instruction is not a subroutine remented CTRM:63=0 and CRBI=0 return; the target address is likely to 001at Branch if CRBI=0 be the same as the target address 0100z Decrement the CTR, then branch if the dec- used the preceding time the branch remented CTRM:630 and CRBI=1 was taken 0101z Decrement the CTR, then branch if the dec- 01 bclr[l]: The instruction is not a subroutine remented CTRM:63=0 and CRBI=1 return; the target address is likely to be the same as the target address 011at Branch if CRBI=1 used the preceding time the branch 1a00t Decrement the CTR, then branch if the dec- was taken remented CTRM:630 bcctr[l]: Reserved 1a01t Decrement the CTR, then branch if the dec- remented CTRM:63=0 10 Reserved 1z1zz Branch always 11 bclr[l] and bcctr[l]: The target address is not predictable Notes: 1. "z" denotes a bit that is ignored. Figure 40. BH field encodings 2. The "a" and "t" bits are used as described below. Programming Note Figure 38. BO field encodings The hint provided by the BH field is independent of The "a" and "t" bits of the BO field can be used by soft- the hint provided by the "at" bits (e.g., the BH field ware to provide a hint about whether the branch is provides no indication of whether the branch is likely to be taken or is likely not to be taken, as shown likely to be taken). in Figure 39. at Hint Extended mnemonics for branches 00 No hint is given Many extended mnemonics are provided so that 01 Reserved Branch Conditional instructions can be coded with por- tions of the BO and BI fields as part of the mnemonic 10 The branch is very likely not to be taken rather than as part of a numeric operand. Some of 11 The branch is very likely to be taken these are shown as examples with the Branch instruc- Figure 39. "at" bit encodings tions. See Appendix D for additional extended mne- monics. Programming Note Programming Note Many implementations have dynamic mechanisms for predicting whether a branch will be taken. The hints provided by the "at" bits and by the BH Because the dynamic prediction is likely to be very field do not affect the results of executing the accurate, and is likely to be overridden by any hint instruction. provided by the "at" bits, the "at" bits should be set The "z" bits should be set to 0, because they may to 0b00 unless the static prediction implied by be assigned a meaning in some future version of at=0b10 or at=0b11 is highly likely to be correct. the architecture. For Branch Conditional to Link Register and Branch Conditional to Count Register instructions, the BH field 32 Power ISATM I Version 2.05 Programming Note Many implementations have dynamic mechanisms for 1 Direct subroutine linkage: predicting the target addresses of bclr[l] and bcctr[l] Here A calls B and B returns to A. The two instructions. These mechanisms may cache return branches should be as follows. addresses (i.e., Link Register values set by Branch - A calls B: use a bl or bcl instruction (LK=1). instructions for which LK=1 and for which the branch - B returns to A: use a bclr instruction (LK=0) was taken) and recently used branch target addresses. (the return address is in, or can be restored to, To obtain the best performance across the widest the Link Register). range of implementations, the programmer should 1 Indirect subroutine linkage: obey the following rules. Here A calls Glue, Glue calls B, and B returns to A rather than to Glue. (Such a calling sequence is 1 Use Branch instructions for which LK=1 only as common in linkage code used when the subroutine subroutine calls (including function calls, etc.). that the programmer wants to call, here B, is in a 1 Pair each subroutine call (i.e., each Branch different module from the caller; the Binder inserts instruction for which LK=1 and the branch is taken) "glue" code to mediate the branch.) The three with a bclr instruction that returns from the subrou- branches should be as follows. tine and has BH=0b00. 1 Do not use bclrl as a subroutine call. (Some - A calls Glue: use a bl or bcl instruction implementations access the return address cache (LK=1). at most once per instruction; such implementations - Glue calls B: place the address of B into the are likely to treat bclrl as a subroutine return, and Count Register, and use a bcctr instruction not as a subroutine call.) (LK=0). 1 For bclr[l] and bcctr[l], use the appropriate value - B returns to A: use a bclr instruction (LK=0) in the BH field. (the return address is in, or can be restored to, the Link Register). The following are examples of programming conven- tions that obey these rules. In the examples, BH is 1 Function call: assumed to contain 0b00 unless otherwise stated. In Here A calls a function, the identity of which may addition, the "at" bits are assumed to be coded appro- vary from one instance of the call to another, priately. instead of calling a specific program B. This case Let A, B, and Glue be specific programs. should be handled using the conventions of the preceding two bullets, depending on whether the 1 Loop counts: call is direct or indirect, with the following differ- Keep them in the Count Register, and use a bc ences. instruction (LK=0) to decrement the count and to branch back to the beginning of the loop if the dec- - If the call is direct, place the address of the remented count is nonzero. function into the Count Register, and use a bcctrl instruction (LK=1) instead of a bl or bcl 1 Computed goto's, case statements, etc.: instruction. Use the Count Register to hold the address to - For the bcctr[l] instruction that branches to branch to, and use a bcctr instruction (LK=0, and the function, use BH=0b11 if appropriate. BH=0b11 if appropriate) to branch to the selected address. Chapter 2. Branch Processor 33 Version 2.05 Compatibility Note The bits corresponding to the current "a" and "t" bits, and to the current "z" bits except in the "branch always" BO encoding, had different meanings in versions of the architecture that precede Version 2.00. 1 The bit corresponding to the "t" bit was called the "y" bit. The "y" bit indicated whether to use the architected default prediction (y=0) or to use the complement of the default prediction (y=1). The default prediction was defined as follows. - If the instruction is bc[l][a] with a negative value in the displacement field, the branch is taken. (This is the only case in which the prediction corresponding to the "y" bit differs from the prediction corresponding to the "t" bit.) - In all other cases (bc[l][a] with a nonnega- tive value in the displacement field, bclr[l], or bcctr[l]), the branch is not taken. 1 The BO encodings that test both the Count Register and the Condition Register had a "y" bit in place of the current "z" bit. The meaning of the "y" bit was as described in the preceding item. 1 The "a" bit was a "z" bit. Because these bits have always been defined either to be ignored or to be treated as hints, a given program will produce the same result on any implementation regardless of the values of the bits. Also, because even the "y" bit is ignored, in prac- tice, by most processors that comply with versions of the architecture that precede Version 2.00, the performance of a given program on those proces- sors will not be affected by the values of the bits. 34 Power ISATM I Version 2.05 Branch I-form Branch Conditional B-form b target_addr (AA=0 LK=0) bc BO,BI,target_addr (AA=0 LK=0) ba target_addr (AA=1 LK=0) bca BO,BI,target_addr (AA=1 LK=0) bl target_addr (AA=0 LK=1) bcl BO,BI,target_addr (AA=0 LK=1) bla target_addr (AA=1 LK=1) bcla BO,BI,target_addr (AA=1 LK=1) 18 LI AA LK 16 BO BI BD AA LK 0 6 30 31 0 6 11 16 30 31 if AA then NIA 1iea EXTS(LI || 0b00) if (64-bit mode) else NIA 1iea CIA + EXTS(LI || 0b00) then M 1 0 if LK then LR 1iea CIA + 4 else M 1 32 if ¬BO2 then CTR 1 CTR - 1 target_addr specifies the branch target address. ctr_ok 1 BO2 | ((CTRM:63 0) BO3) If AA=0 then the branch target address is the sum of cond_ok 1 BO0 | (CRBI+32 BO1) LI || 0b00 sign-extended and the address of this if ctr_ok & cond_ok then if AA then NIA 1iea EXTS(BD || 0b00) instruction, with the high-order 32 bits of the branch tar- else NIA 1iea CIA + EXTS(BD || 0b00) get address set to 0 in 32-bit mode. if LK then LR 1iea CIA + 4 If AA=1 then the branch target address is the value BI+32 specifies the Condition Register bit to be tested. LI || 0b00 sign-extended, with the high-order 32 bits of The BO field is used to resolve the branch as described the branch target address set to 0 in 32-bit mode. in Figure 38. target_addr specifies the branch target If LK=1 then the effective address of the instruction fol- address. lowing the Branch instruction is placed into the Link If AA=0 then the branch target address is the sum of Register. BD || 0b00 sign-extended and the address of this Special Registers Altered: instruction, with the high-order 32 bits of the branch tar- LR (if LK=1) get address set to 0 in 32-bit mode. If AA=1 then the branch target address is the value BD || 0b00 sign-extended, with the high-order 32 bits of the branch target address set to 0 in 32-bit mode. If LK=1 then the effective address of the instruction fol- lowing the Branch instruction is placed into the Link Register. Special Registers Altered: CTR (if BO2=0) LR (if LK=1) Extended Mnemonics: Examples of extended mnemonics for Branch Condi- tional: Extended: Equivalent to: blt target bc 12,0,target bne cr2,target bc 4,10,target bdnz target bc 16,0,target Chapter 2. Branch Processor 35 Version 2.05 Branch Conditional to Link Register Branch Conditional to Count Register XL-form XL-form bclr BO,BI,BH (LK=0) bcctr BO,BI,BH (LK=0) bclrl BO,BI,BH (LK=1) bcctrl BO,BI,BH (LK=1) 19 BO BI /// BH 16 LK 19 BO BI /// BH 528 LK 0 6 11 16 19 21 31 0 6 11 16 19 21 31 if (64-bit mode) cond_ok 1 BO0 | (CRBI+32 BO1) then M 1 0 if cond_ok then NIA 1iea CTR0:61 || 0b00 else M 1 32 if LK then LR 1iea CIA + 4 if ¬BO2 then CTR 1 CTR - 1 ctr_ok 1 BO2 | ((CTRM:63 0) BO3 BI+32 specifies the Condition Register bit to be tested. cond_ok 1 BO0 | (CRBI+32 BO1) The BO field is used to resolve the branch as described if ctr_ok & cond_ok then NIA 1iea LR0:61 || 0b00 in Figure 38. The BH field is used as described in if LK then LR 1iea CIA + 4 Figure 40. The branch target address is CTR0:61 || 0b00, with the high-order 32 bits of the BI+32 specifies the Condition Register bit to be tested. branch target address set to 0 in 32-bit mode. The BO field is used to resolve the branch as described in Figure 38. The BH field is used as described in If LK=1 then the effective address of the instruction fol- Figure 40. The branch target address is LR0:61 || 0b00, lowing the Branch instruction is placed into the Link with the high-order 32 bits of the branch target address Register. set to 0 in 32-bit mode. If the "decrement and test CTR" option is specified If LK=1 then the effective address of the instruction fol- (BO2=0), the instruction form is invalid. lowing the Branch instruction is placed into the Link Special Registers Altered: Register. LR (if LK=1) Special Registers Altered: Extended Mnemonics: CTR (if BO2=0) LR (if LK=1) Examples of extended mnemonics for Branch Condi- tional to Count Register. Extended Mnemonics: Examples of extended mnemonics for Branch Condi- Extended: Equivalent to: tional to Link Register: bcctr 4,6 bcctr 4,6,0 bltctr bcctr 12,0,0 Extended: Equivalent to: bnectr cr2 bcctr 4,10,0 bclr 4,6 bclr 4,6,0 bltlr bclr 12,0,0 bnelr cr2 bclr 4,10,0 bdnzlr bclr 16,0,0 Programming Note bclr, bclrl, bcctr, and bcctrl each serve as both a basic and an extended mnemonic. The Assembler will recognize a bclr, bclrl, bcctr, or bcctrl mne- monic with three operands as the basic form, and a bclr, bclrl, bcctr, or bcctrl mnemonic with two operands as the extended form. In the extended form the BH operand is omitted and assumed to be 0b00. 36 Power ISATM I Version 2.05 2.5 Condition Register Instructions 2.5.1 Condition Register Logical Instructions The Condition Register Logical instructions have pre- Extended mnemonics for Condition ferred forms; see Section 1.8.1. In the preferred forms, Register logical operations the BT and BB fields satisfy the following rule. 1 The bit specified by BT is in the same Condition A set of extended mnemonics is provided that allow Register field as the bit specified by BB. additional Condition Register logical operations, beyond those provided by the basic Condition Register Logical instructions, to be coded easily. Some of these are shown as examples with the Condition Register Logical instructions. See Appendix D for additional extended mnemonics. Condition Register AND XL-form Condition Register NAND XL-form crand BT,BA,BB crnand BT,BA,BB 19 BT BA BB 257 / 19 BT BA BB 225 / 0 6 11 16 21 31 0 6 11 16 21 31 CRBT+32 1 CRBA+32 & CRBB+32 CRBT+32 1 ¬(CRBA+32 & CRBB+32) The bit in the Condition Register specified by BA+32 is The bit in the Condition Register specified by BA+32 is ANDed with the bit in the Condition Register specified ANDed with the bit in the Condition Register specified by BB+32, and the result is placed into the bit in the by BB+32, and the complemented result is placed into Condition Register specified by BT+32. the bit in the Condition Register specified by BT+32. Special Registers Altered: Special Registers Altered: CRBT+32 CRBT+32 Condition Register OR XL-form Condition Register XOR XL-form cror BT,BA,BB crxor BT,BA,BB 19 BT BA BB 449 / 19 BT BA BB 193 / 0 6 11 16 21 31 0 6 11 16 21 31 CRBT+32 1 CRBA+32 | CRBB+32 CRBT+32 1 CRBA+32 CRBB+32 The bit in the Condition Register specified by BA+32 is The bit in the Condition Register specified by BA+32 is ORed with the bit in the Condition Register specified by XORed with the bit in the Condition Register specified BB+32, and the result is placed into the bit in the Con- by BB+32, and the result is placed into the bit in the dition Register specified by BT+32. Condition Register specified by BT+32. Special Registers Altered: Special Registers Altered: CRBT+32 CRBT+32 Extended Mnemonics: Extended Mnemonics: Example of extended mnemonics for Condition Regis- Example of extended mnemonics for Condition Regis- ter OR: ter XOR: Extended: Equivalent to: Extended: Equivalent to: crmove Bx,By cror Bx,By,By crclr Bx crxor Bx,Bx,Bx Chapter 2. Branch Processor 37 Version 2.05 Condition Register NOR XL-form Condition Register Equivalent XL-form crnor BT,BA,BB creqv BT,BA,BB 19 BT BA BB 33 / 19 BT BA BB 289 / 0 6 11 16 21 31 0 6 11 16 21 31 CRBT+32 1 ¬(CRBA+32 | CRBB+32) CRBT+32 1 CRBA+32 CRBB+32 The bit in the Condition Register specified by BA+32 is The bit in the Condition Register specified by BA+32 is ORed with the bit in the Condition Register specified by XORed with the bit in the Condition Register specified BB+32, and the complemented result is placed into the by BB+32, and the complemented result is placed into bit in the Condition Register specified by BT+32. the bit in the Condition Register specified by BT+32. Special Registers Altered: Special Registers Altered: CRBT+32 CRBT+32 Extended Mnemonics: Extended Mnemonics: Example of extended mnemonics for Condition Regis- Example of extended mnemonics for Condition Regis- ter NOR: ter Equivalent: Extended: Equivalent to: Extended: Equivalent to: crnot Bx,By crnor Bx,By,By crset Bx creqv Bx,Bx,Bx Condition Register AND with Complement Condition Register OR with Complement XL-form XL-form crandc BT,BA,BB crorc BT,BA,BB 19 BT BA BB 129 / 19 BT BA BB 417 / 0 6 11 16 21 31 0 6 11 16 21 31 CRBT+32 1 CRBA+32 & ¬CRBB+32 CRBT+32 1 CRBA+32 | ¬CRBB+32 The bit in the Condition Register specified by BA+32 is The bit in the Condition Register specified by BA+32 is ANDed with the complement of the bit in the Condition ORed with the complement of the bit in the Condition Register specified by BB+32, and the result is placed Register specified by BB+32, and the result is placed into the bit in the Condition Register specified by into the bit in the Condition Register specified by BT+32. BT+32. Special Registers Altered: Special Registers Altered: CRBT+32 CRBT+32 2.5.2 Condition Register Field Instruction Move Condition Register Field XL-form mcrf BF,BFA 19 BF // BFA // /// 0 / 0 6 9 11 14 16 21 31 CR4×BF+32:4×BF+35 1 CR4×BFA+32:4×BFA+35 The contents of Condition Register field BFA are cop- ied to Condition Register field BF. Special Registers Altered: CR field BF 38 Power ISATM I Version 2.05 2.6 System Call Instruction This instruction provides the means by which a pro- gram can call upon the system to perform a service. System Call SC-form sc LEV 17 /// /// // LEV // 1 / 0 6 11 16 20 27 30 31 This instruction calls the system to perform a service. A complete description of this instruction can be found in Book III. The use of the LEV field is described in Book III. The LEV values greater than 1 are reserved, and bits 0:5 of the LEV field (instruction bits 20:25) are treated as a reserved field. When control is returned to the program that executed the System Call instruction, the contents of the regis- ters will depend on the register conventions used by the program providing the system service. This instruction is context synchronizing (see Book III). Special Registers Altered: Dependent on the system service Programming Note sc serves as both a basic and an extended mne- monic. The Assembler will recognize an sc mne- monic with one operand as the basic form, and an sc mnemonic with no operand as the extended form. In the extended form the LEV operand is omitted and assumed to be 0. In application programs the value of the LEV oper- and for sc should be 0. Chapter 2. Branch Processor 39 Version 2.05 40 Power ISATM I Version 2.05 Chapter 3. Fixed-Point Processor 3.1 Fixed-Point Processor Overview . . 41 3.3.9 Fixed-Point Compare Instructions 71 3.2 Fixed-Point Processor Registers . . 42 3.3.10 Fixed-Point Trap Instructions . . . 73 3.2.1 General Purpose Registers . . . . . 42 3.3.10.1 64-bit Fixed-Point Trap Instruc- 3.2.2 Fixed-Point Exception Register . . 42 tions [Category: 64-Bit] . . . . . . . . . . . . . 74 3.2.3 Program Priority Register [Category: 3.3.11 Fixed-Point Select [Category: Server] . . . . . . . . . . . . . . . . . . . . . . . . . 43 Phased-In (sV2.06)]. . . . . . . . . . . . . . . . 74 3.2.4 Software Use SPRs [Category: 3.3.12 Fixed-Point Logical Instructions . 75 Embedded] . . . . . . . . . . . . . . . . . . . . . . 43 3.3.12.1 64-bit Fixed-Point Logical Instruc- 3.2.5 Device Control Registers tions [Category: 64-Bit] . . . . . . . . . . . . . 81 [Category: Embedded] . . . . . . . . . . . . . 43 3.3.12.2 Phased-In Fixed-Point Logical 3.3 Fixed-Point Processor Instructions . 44 Instructions [Category: Phased-In 3.3.1 Fixed-Point Storage Access Instruc- (sV2.05)] . . . . . . . . . . . . . . . . . . . . . . . . 81 tions . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.3.13 Fixed-Point Rotate and Shift 3.3.1.1 Storage Access Exceptions . . . 44 Instructions . . . . . . . . . . . . . . . . . . . . . . 82 3.3.2 Fixed-Point Load Instructions . . . 44 3.3.13.1 Fixed-Point Rotate Instructions 82 3.3.2.1 64-bit Fixed-Point Load Instruc- 3.3.13.1.1 64-bit Fixed-Point Rotate tions [Category: 64-Bit] . . . . . . . . . . . . . 49 Instructions [Category: 64-Bit] . . . . . . . . 85 3.3.3 Fixed-Point Store Instructions . . . 51 3.3.13.2 Fixed-Point Shift Instructions . 88 3.3.3.1 64-bit Fixed-Point Store Instruc- 3.3.13.2.1 64-bit Fixed-Point Shift Instruc- tions [Category: 64-Bit] . . . . . . . . . . . . . 54 tions [Category: 64-Bit] . . . . . . . . . . . . . 90 3.3.4 Fixed-Point Load and Store with Byte 3.3.14 Move To/From System Register Reversal Instructions . . . . . . . . . . . . . . 55 Instructions . . . . . . . . . . . . . . . . . . . . . . 92 3.3.5 Fixed-Point Load and Store Multiple 3.3.14.1 Move to/From One Condition Instructions . . . . . . . . . . . . . . . . . . . . . . 56 Register Field Instructions [Category: 3.3.6 Fixed-Point Move Assist Instructions Phased-In (sV2.05)]. . . . . . . . . . . . . . . . 96 [Category: Move Assist] . . . . . . . . . . . . 58 3.3.14.2 Move To/From System Registers 3.3.7 Other Fixed-Point Instructions . . . 61 [Category: Embedded]. . . . . . . . . . . . . . 97 3.3.8 Fixed-Point Arithmetic Instructions62 3.3.8.1 64-bit Fixed-Point Arithmetic Instructions [Category: 64-Bit]. . . . . . . . 69 3.1 Fixed-Point Processor Overview This chapter describes the registers and instructions that make up the Fixed-Point Processor facility. Chapter 3. Fixed-Point Processor 41 Version 2.05 3.2 Fixed-Point Processor Registers 3.2.1 General Purpose Registers causes SO to be set to 0 and OV to be set to 1. All manipulation of information is done in registers 33 Overflow (OV) internal to the Fixed-Point Processor. The principal The Overflow bit is set to indicate that an over- storage internal to the Fixed-Point Processor is a set of flow has occurred during execution of an 32 General Purpose Registers (GPRs). See Figure 41. instruction. XO-form Add, Subtract From, and Negate GPR 0 instructions having OE=1 set it to 1 if the carry GPR 1 out of bit M is not equal to the carry out of bit ... M+1, and set it to 0 otherwise. XO-form Multiply Low and Divide instructions ... having OE=1 set it to 1 if the result cannot be GPR 30 represented in 64 bits (mulld, divd, divdu) or in 32 bits (mullw, divw, divwu), and set it to 0 GPR 31 otherwise. The OV bit is not altered by Com- 0 63 pare instructions, nor by other instructions (except mtspr to the XER, and mcrxr) that Figure 41. General Purpose Registers cannot overflow. Each GPR is a 64-bit register. [Category: Legacy Integer Multiply-Accumulate] 3.2.2 Fixed-Point Exception Reg- XO-form Legacy Integer Multiply-Accumulate instructions set OV when OE=1 to reflect over- ister flow of the 32-bit result. For signed-integer accumulation, overflow occurs when the add The Fixed-Point Exception Register (XER) is a 64-bit produces a carry out of bit 32 that is not equal register. to the carry out of bit 33. For unsigned-integer accumulation, overflow occurs when the add XER produces a carry out of bit 32. 0 63 34 Carry (CA) Figure 42. Fixed-Point Exception Register The Carry bit is set as follows, during execu- The bit definitions for the Fixed-Point Exception Regis- tion of certain instructions. Add Carrying, Sub- ter are shown below. Here M=0 in 64-bit mode and tract From Carrying, Add Extended, and M=32 in 32-bit mode. Subtract From Extended types of instructions set it to 1 if there is a carry out of bit M, and The bits are set based on the operation of an instruc- set it to 0 otherwise. Shift Right Algebraic tion considered as a whole, not on intermediate results instructions set it to 1 if any 1-bits have been (e.g., the Subtract From Carrying instruction, the result shifted out of a negative operand, and set it to of which is specified as the sum of three values, sets 0 otherwise. The CA bit is not altered by Com- bits in the Fixed-Point Exception Register based on the pare instructions, nor by other instructions entire operation, not on an intermediate sum). (except Shift Right Algebraic, mtspr to the XER, and mcrxr) that cannot carry. Bit(s Description 35:56 Reserved 0:31 Reserved 57:63 This field specifies the number of bytes to be 32 Summary Overflow (SO) transferred by a Load String Indexed or Store The Summary Overflow bit is set to 1 when- String Indexed instruction. ever an instruction (except mtspr) sets the Overflow bit. Once set, the SO bit remains set [Category: Legacy Move Assist] until it is cleared by an mtspr instruction This field is used as a target by dmlzb to indi- (specifying the XER) or an mcrxr instruction. cate the byte location of the leftmost zero byte It is not altered by Compare instructions, nor found. by other instructions (except mtspr to the XER, and mcrxr) that cannot overflow. Exe- cuting an mtspr instruction to the XER, sup- plying the values 0 for SO and 1 for OV, 42 Power ISATM I Version 2.05 3.2.3 Program Priority Register tion programs. Additional Software Use SPRs are defined in Book III. [Category: Server] The Program Priority Register (PPR) is a 64-bit register SPRG4 that controls the program's priority. The layout of the SPRG5 PPR is shown in Figure 43. SPRG6 SPRG7 /// PRI /// ??? 0 63 0 11 14 44 63 Figure 44. Software-use SPRs Bit(s) Description The VRSAVE is a 32-bit register that also can be used 11:13 Program Priority (PRI) as a software use SPR. VRSAVE is also defined as part of Category: Embedded and Vector (see Section 010 low 6.3.3) 011 medium low 100 medium (normal) Programming Note 44:63 implementation-specific (read-only; values USPRG0 was made a 32-bit register and renamed written to this field by software are ignored) to VRSAVE; see Section 6.3.3 All other fields are reserved. Figure 43. Program Priority Register 3.2.5 Device Control Registers Programming Note [Category: Embedded] By setting the PRI field, a programmer may be able Device Control Registers (DCRs) are on-chip registers to improve system throughput by causing system that exist architecturally outside the processor and thus resources to be used more efficiently. are not actually part of the processor architecture. This E.g., if a program is waiting on a lock (see specification simply defines the existence of a Device Section B.2 of Book II), it could set low priority, with Control Register `address space' and the instructions to the result that more processor resources would be access them and does not define the Device Control diverted to the program that holds the lock. This Registers themselves. diversion of resources may enable the lock-holding Device Control Registers may control the use of program to complete the operation under the lock on-chip peripherals, such as memory controllers (the more quickly, and then relinquish the lock to the definition of specific Device Control Registers is imple- waiting program. mentation-dependent). The contents of user-mode-accessible Device Control Programming Note Registers can be read using mfdcrux and written using or Rx,Rx,Rx can be used to modify the PRI field; mtdcrux. see Section 3.3.14. Programming Note When the system error handler is invoked, the PRI field may be set to an undefined value. 3.2.4 Software Use SPRs [Cate- gory: Embedded] Software Use SPRs are 64-bit registers that have no defined functionality. SPRG4-7 can be read by applica- Chapter 3. Fixed-Point Processor 43 Version 2.05 3.3 Fixed-Point Processor Instructions 3.3.1 Fixed-Point Storage Access Instructions The Storage Access instructions compute the effective Programming Note address (EA) of the storage to be accessed as described in Section 1.10.3 on page 26. The DS field in DS-form Storage Access instruc- tions is a word offset, not a byte offset like the D Programming Note field in D-form Storage Access instructions. How- ever, for programming convenience, Assemblers The la extended mnemonic permits computing an should support the specification of byte offsets for effective address as a Load or Store instruction both forms of instruction. would, but loads the address itself into a GPR rather than loading the value that is in storage at that address. 3.3.1.1 Storage Access Exceptions Storage accesses will cause the system data storage error handler to be invoked if the program is not allowed to modify the target storage (Store only), or if the program attempts to access storage that is unavail- able. 3.3.2 Fixed-Point Load Instructions The byte, halfword, word, or doubleword in storage addressed by EA is loaded into register RT. Many of the Load instructions have an "update" form, in which register RA is updated with the effective address. For these forms, if RA0 and RART, the effective address is placed into register RA and the storage ele- ment (byte, halfword, word, or doubleword) addressed by EA is loaded into RT. Programming Note In some implementations, the Load Algebraic and Load with Update instructions may have greater latency than other types of Load instructions. Moreover, Load with Update instructions may take longer to execute in some implementations than the corresponding pair of a non-update Load instruction and an Add instruction. 44 Power ISATM I Version 2.05 Load Byte and Zero D-form Load Byte and Zero Indexed X-form lbz RT,D(RA) lbzx RT,RA,RB 34 RT RA D 31 RT RA RB 87 / 0 6 11 16 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTS(D) EA 1 b + (RB) RT 1 560 || MEM(EA, 1) RT 1 560 || MEM(EA, 1) Let the effective address (EA) be the sum (RA|0)+ D. Let the effective address (EA) be the sum The byte in storage addressed by EA is loaded into (RA|0)+ (RB). The byte in storage addressed by EA is RT56:63. RT0:55 are set to 0. loaded into RT56:63. RT0:55 are set to 0. Special Registers Altered: Special Registers Altered: None None Load Byte and Zero with Update D-form Load Byte and Zero with Update Indexed X-form lbzu RT,D(RA) lbzux RT,RA,RB 35 RT RA D 0 6 11 16 31 31 RT RA RB 119 / 0 6 11 16 21 31 EA 1 (RA) + EXTS(D) RT 1 560 || MEM(EA, 1) EA 1 (RA) + (RB) RA 1 EA RT 1 560 || MEM(EA, 1) RA 1 EA Let the effective address (EA) be the sum (RA)+ D. The byte in storage addressed by EA is loaded into RT56:63. Let the effective address (EA) be the sum (RA)+ (RB). RT0:55 are set to 0. The byte in storage addressed by EA is loaded into RT56:63. RT0:55 are set to 0. EA is placed into register RA. EA is placed into register RA. If RA=0 or RA=RT, the instruction form is invalid. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: None Special Registers Altered: None Chapter 3. Fixed-Point Processor 45 Version 2.05 Load Halfword and Zero D-form Load Halfword and Zero Indexed X-form lhz RT,D(RA) lhzx RT,RA,RB 40 RT RA D 31 RT RA RB 279 / 0 6 11 16 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTS(D) EA 1 b + (RB) RT 1 480 || MEM(EA, 2) RT 1 480 || MEM(EA, 2) Let the effective address (EA) be the sum (RA|0)+ D. Let the effective address (EA) be the sum The halfword in storage addressed by EA is loaded into (RA|0)+ (RB). The halfword in storage addressed by RT48:63. RT0:47 are set to 0. EA is loaded into RT48:63. RT0:47 are set to 0. Special Registers Altered: Special Registers Altered: None None Load Halfword and Zero with Update Load Halfword and Zero with Update D-form Indexed X-form lhzu RT,D(RA) lhzux RT,RA,RB 41 RT RA D 31 RT RA RB 311 / 0 6 11 16 31 0 6 11 16 21 31 EA 1 (RA) + EXTS(D) EA 1 (RA) + (RB) RT 1 480 || MEM(EA, 2) RT 1 480 || MEM(EA, 2) RA 1 EA RA 1 EA Let the effective address (EA) be the sum (RA)+ D. The Let the effective address (EA) be the sum (RA)+ (RB). halfword in storage addressed by EA is loaded into The halfword in storage addressed by EA is loaded into RT48:63. RT0:47 are set to 0. RT48:63. RT0:47 are set to 0. EA is placed into register RA. EA is placed into register RA. If RA=0 or RA=RT, the instruction form is invalid. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: Special Registers Altered: None None 46 Power ISATM I Version 2.05 Load Halfword Algebraic D-form Load Halfword Algebraic Indexed X-form lha RT,D(RA) lhax RT,RA,RB 42 RT RA D 31 RT RA RB 343 / 0 6 11 16 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTS(D) EA 1 b + (RB) RT 1 EXTS(MEM(EA, 2)) RT 1 EXTS(MEM(EA, 2)) Let the effective address (EA) be the sum (RA|0)+ D. Let the effective address (EA) be the sum The halfword in storage addressed by EA is loaded into (RA|0)+ (RB). The halfword in storage addressed by RT48:63. RT0:47 are filled with a copy of bit 0 of the EA is loaded into RT48:63. RT0:47 are filled with a copy loaded halfword. of bit 0 of the loaded halfword. Special Registers Altered: Special Registers Altered: None None Load Halfword Algebraic with Update Load Halfword Algebraic with Update D-form Indexed X-form lhau RT,D(RA) lhaux RT,RA,RB 43 RT RA D 31 RT RA RB 375 / 0 6 11 16 31 0 6 11 16 21 31 EA 1 (RA) + EXTS(D) EA 1 (RA) + (RB) RT 1 EXTS(MEM(EA, 2)) RT 1 EXTS(MEM(EA, 2)) RA 1 EA RA 1 EA Let the effective address (EA) be the sum (RA)+ D. The Let the effective address (EA) be the sum (RA)+ (RB). halfword in storage addressed by EA is loaded into The halfword in storage addressed by EA is loaded into RT48:63. RT0:47 are filled with a copy of bit 0 of the RT48:63. RT0:47 are filled with a copy of bit 0 of the loaded halfword. loaded halfword. EA is placed into register RA. EA is placed into register RA. If RA=0 or RA=RT, the instruction form is invalid. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: Special Registers Altered: None None Chapter 3. Fixed-Point Processor 47 Version 2.05 Load Word and Zero D-form Load Word and Zero Indexed X-form lwz RT,D(RA) lwzx RT,RA,RB 32 RT RA D 31 RT RA RB 23 / 0 6 11 16 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTS(D) EA 1 b + (RB) RT 1 320 || MEM(EA, 4) RT 1 320 || MEM(EA, 4) Let the effective address (EA) be the sum (RA|0)+ D. Let the effective address (EA) be the sum The word in storage addressed by EA is loaded into (RA|0)+ (RB). The word in storage addressed by EA is RT32:63. RT0:31 are set to 0. loaded into RT32:63. RT0:31 are set to 0. Special Registers Altered: Special Registers Altered: None None Load Word and Zero with Update D-form Load Word and Zero with Update Indexed X-form lwzu RT,D(RA) lwzux RT,RA,RB 33 RT RA D 0 6 11 16 31 31 RT RA RB 55 / 0 6 11 16 21 31 EA 1 (RA) + EXTS(D) RT 1 320 || MEM(EA, 4) EA 1 (RA) + (RB) RA 1 EA RT 1 320 || MEM(EA, 4) RA 1 EA Let the effective address (EA) be the sum (RA)+ D. The word in storage addressed by EA is loaded into Let the effective address (EA) be the sum (RA)+ (RB). RT32:63. RT0:31 are set to 0. The word in storage addressed by EA is loaded into RT32:63. RT0:31 are set to 0. EA is placed into register RA. EA is placed into register RA. If RA=0 or RA=RT, the instruction form is invalid. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: None Special Registers Altered: None 48 Power ISATM I Version 2.05 3.3.2.1 64-bit Fixed-Point Load Instructions [Category: 64-Bit] Load Word Algebraic DS-form Load Word Algebraic Indexed X-form lwa RT,DS(RA) lwax RT,RA,RB 58 RT RA DS 2 31 RT RA RB 341 / 0 6 11 16 30 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTS(DS || 0b00) EA 1 b + (RB) RT 1 EXTS(MEM(EA, 4)) RT 1 EXTS(MEM(EA, 4)) Let the effective address (EA) be the sum Let the effective address (EA) be the sum (RA|0)+ (DS||0b00). The word in storage addressed by (RA|0)+ (RB). The word in storage addressed by EA is EA is loaded into RT32:63. RT0:31 are filled with a copy loaded into RT32:63. RT0:31 are filled with a copy of bit 0 of bit 0 of the loaded word. of the loaded word. Special Registers Altered: Special Registers Altered: None None Load Word Algebraic with Update Indexed X-form lwaux RT,RA,RB 31 RT RA RB 373 / 0 6 11 16 21 31 EA 1 (RA) + (RB) RT 1 EXTS(MEM(EA, 4)) RA 1 EA Let the effective address (EA) be the sum (RA)+ (RB). The word in storage addressed by EA is loaded into RT32:63. RT0:31 are filled with a copy of bit 0 of the loaded word. EA is placed into register RA. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: None Chapter 3. Fixed-Point Processor 49 Version 2.05 Load Doubleword DS-form Load Doubleword Indexed X-form ld RT,DS(RA) ldx RT,RA,RB 58 RT RA DS 0 31 RT RA RB 21 / 0 6 11 16 30 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTS(DS || 0b00) EA 1 b + (RB) RT 1 MEM(EA, 8) RT 1 MEM(EA, 8) Let the effective address (EA) be the sum Let the effective address (EA) be the sum (RA|0)+ (DS||0b00). The doubleword in storage (RA|0)+ (RB). The doubleword in storage addressed by addressed by EA is loaded into RT. EA is loaded into RT. Special Registers Altered: Special Registers Altered: None None Load Doubleword with Update DS-form Load Doubleword with Update Indexed X-form ldu RT,DS(RA) ldux RT,RA,RB 58 RT RA DS 1 0 6 11 16 30 31 31 RT RA RB 53 / 0 6 11 16 21 31 EA 1 (RA) + EXTS(DS || 0b00) RT 1 MEM(EA, 8) EA 1 (RA) + (RB) RA 1 EA RT 1 MEM(EA, 8) RA 1 EA Let the effective address (EA) be the sum (RA)+ (DS||0b00). The doubleword in storage Let the effective address (EA) be the sum (RA)+ (RB). addressed by EA is loaded into RT. The doubleword in storage addressed by EA is loaded into RT. EA is placed into register RA. EA is placed into register RA. If RA=0 or RA=RT, the instruction form is invalid. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: None Special Registers Altered: None 50 Power ISATM I Version 2.05 3.3.3 Fixed-Point Store Instructions The contents of register RS are stored into the byte, 1 If RA0, the effective address is placed into regis- halfword, word, or doubleword in storage addressed by ter RA. EA. 1 If RS=RA, the contents of register RS are copied to the target storage element and then EA is Many of the Store instructions have an "update" form, placed into RA (RS). in which register RA is updated with the effective address. For these forms, the following rules apply. Store Byte D-form Store Byte Indexed X-form stb RS,D(RA) stbx RS,RA,RB 38 RS RA D 31 RS RA RB 215 / 0 6 11 16 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTS(D) EA 1 b + (RB) MEM(EA, 1) 1 (RS)56:63 MEM(EA, 1) 1 (RS)56:63 Let the effective address (EA) be the sum (RA|0)+ D. Let the effective address (EA) be the sum (RS)56:63 are stored into the byte in storage addressed (RA|0)+ (RB). (RS)56:63 are stored into the byte in stor- by EA. age addressed by EA. Special Registers Altered: Special Registers Altered: None None Store Byte with Update D-form Store Byte with Update Indexed X-form stbu RS,D(RA) stbux RS,RA,RB 39 RS RA D 31 RS RA RB 247 / 0 6 11 16 31 0 6 11 16 21 31 EA 1 (RA) + EXTS(D) EA 1 (RA) + (RB) MEM(EA, 1) 1 (RS)56:63 MEM(EA, 1) 1 (RS)56:63 RA 1 EA RA 1 EA Let the effective address (EA) be the sum (RA)+ D. Let the effective address (EA) be the sum (RA)+ (RB). (RS)56:63 are stored into the byte in storage addressed (RS)56:63 are stored into the byte in storage addressed by EA. by EA. EA is placed into register RA. EA is placed into register RA. If RA=0, the instruction form is invalid. If RA=0, the instruction form is invalid. Special Registers Altered: Special Registers Altered: None None Chapter 3. Fixed-Point Processor 51 Version 2.05 Store Halfword D-form Store Halfword Indexed X-form sth RS,D(RA) sthx RS,RA,RB 44 RS RA D 31 RS RA RB 407 / 0 6 11 16 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTS(D) EA 1 b + (RB) MEM(EA, 2) 1 (RS)48:63 MEM(EA, 2) 1 (RS)48:63 Let the effective address (EA) be the sum (RA|0)+ D. Let the effective address (EA) be the sum (RS)48:63 are stored into the halfword in storage (RA|0)+ (RB). (RS)48:63 are stored into the halfword in addressed by EA. storage addressed by EA. Special Registers Altered: Special Registers Altered: None None Store Halfword with Update D-form Store Halfword with Update Indexed X-form sthu RS,D(RA) sthux RS,RA,RB 45 RS RA D 0 6 11 16 31 31 RS RA RB 439 / 0 6 11 16 21 31 EA 1 (RA) + EXTS(D) MEM(EA, 2) 1 (RS)48:63 EA 1 (RA) + (RB) RA 1 EA MEM(EA, 2) 1 (RS)48:63 RA 1 EA Let the effective address (EA) be the sum (RA)+ D. (RS)48:63 are stored into the halfword in storage Let the effective address (EA) be the sum (RA)+ (RB). addressed by EA. (RS)48:63 are stored into the halfword in storage addressed by EA. EA is placed into register RA. EA is placed into register RA. If RA=0, the instruction form is invalid. If RA=0, the instruction form is invalid. Special Registers Altered: None Special Registers Altered: None 52 Power ISATM I Version 2.05 Store Word D-form Store Word Indexed X-form stw RS,D(RA) stwx RS,RA,RB 36 RS RA D 31 RS RA RB 151 / 0 6 11 16 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTS(D) EA 1 b + (RB) MEM(EA, 4) 1 (RS)32:63 MEM(EA, 4) 1 (RS)32:63 Let the effective address (EA) be the sum (RA|0)+ D. Let the effective address (EA) be the sum (RS)32:63 are stored into the word in storage addressed (RA|0)+ (RB). (RS)32:63 are stored into the word in stor- by EA. age addressed by EA. Special Registers Altered: Special Registers Altered: None None Store Word with Update D-form Store Word with Update Indexed X-form stwu RS,D(RA) stwux RS,RA,RB 37 RS RA D 31 RS RA RB 183 / 0 6 11 16 31 0 6 11 16 21 31 EA 1 (RA) + EXTS(D) EA 1 (RA) + (RB) MEM(EA, 4) 1 (RS)32:63 MEM(EA, 4) 1 (RS)32:63 RA 1 EA RA 1 EA Let the effective address (EA) be the sum (RA)+ D. Let the effective address (EA) be the sum (RA)+ (RB). (RS)32:63 are stored into the word in storage addressed (RS)32:63 are stored into the word in storage addressed by EA. by EA. EA is placed into register RA. EA is placed into register RA. If RA=0, the instruction form is invalid. If RA=0, the instruction form is invalid. Special Registers Altered: Special Registers Altered: None None Chapter 3. Fixed-Point Processor 53 Version 2.05 3.3.3.1 64-bit Fixed-Point Store Instructions [Category: 64-Bit] Store Doubleword DS-form Store Doubleword Indexed X-form std RS,DS(RA) stdx RS,RA,RB 62 RS RA DS 0 31 RS RA RB 149 / 0 6 11 16 30 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTS(DS || 0b00) EA 1 b + (RB) MEM(EA, 8) 1 (RS) MEM(EA, 8) 1 (RS) Let the effective address (EA) be the sum Let the effective address (EA) be the sum (RA|0)+ (DS||0b00). (RS) is stored into the doubleword (RA|0)+ (RB). (RS) is stored into the doubleword in in storage addressed by EA. storage addressed by EA. Special Registers Altered: Special Registers Altered: None None Store Doubleword with Update DS-form Store Doubleword with Update Indexed X-form stdu RS,DS(RA) stdux RS,RA,RB 62 RS RA DS 1 0 6 11 16 30 31 31 RS RA RB 181 / 0 6 11 16 21 31 EA 1 (RA) + EXTS(DS || 0b00) MEM(EA, 8) 1 (RS) EA 1 (RA) + (RB) RA 1 EA MEM(EA, 8) 1 (RS) RA 1 EA Let the effective address (EA) be the sum (RA)+ (DS||0b00). (RS) is stored into the doubleword in Let the effective address (EA) be the sum (RA)+ (RB). storage addressed by EA. (RS) is stored into the doubleword in storage addressed by EA. EA is placed into register RA. EA is placed into register RA. If RA=0, the instruction form is invalid. If RA=0, the instruction form is invalid. Special Registers Altered: None Special Registers Altered: None 54 Power ISATM I Version 2.05 3.3.4 Fixed-Point Load and Store with Byte Reversal Instructions Programming Note Programming Note These instructions have the effect of loading and In some implementations, the Load Byte-Reverse storing data in the opposite byte ordering from that instructions may have greater latency than other which would be used by other Load and Store Load instructions. instructions. Load Halfword Byte-Reverse Indexed Store Halfword Byte-Reverse Indexed X-form X-form lhbrx RT,RA,RB sthbrx RS,RA,RB 31 RT RA RB 790 / 31 RS RA RB 918 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + (RB) EA 1 b + (RB) load_data 1 MEM(EA, 2) MEM(EA, 2) 1 (RS)56:63 || (RS)48:55 RT 1 480 || load_data8:15 || load_data0:7 Let the effective address (EA) be the sum Let the effective address (EA) be the sum (RA|0)+(RB). (RA|0)+ (RB). (RS)56:63 are stored into bits 0:7 of the Bits 0:7 of the halfword in storage addressed by EA are halfword in storage addressed by EA. (RS)48:55 are loaded into RT56:63. Bits 8:15 of the halfword in storage stored into bits 8:15 of the halfword in storage addressed by EA are loaded into RT48:55. RT0:47 are addressed by EA. set to 0. Special Registers Altered: Special Registers Altered: None None Load Word Byte-Reverse Indexed X-form Store Word Byte-Reverse Indexed X-form lwbrx RT,RA,RB stwbrx RS,RA,RB 31 RT RA RB 534 / 31 RS RA RB 662 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + (RB) EA 1 b + (RB) load_data 1 MEM(EA, 4) MEM(EA, 4) 1 (RS)56:63 || (RS)48:55 || (RS)40:47 RT 1 320 || load_data24:31 || load_data16:23 ||(RS)32:39 || load_data8:15 || load_data0:7 Let the effective address (EA) be the sum Let the effective address (EA) be the sum (RA|0)+ (RB). (RS)56:63 are stored into bits 0:7 of the (RA|0)+ (RB). Bits 0:7 of the word in storage addressed word in storage addressed by EA. (RS)48:55 are stored by EA are loaded into RT56:63. Bits 8:15 of the word in into bits 8:15 of the word in storage addressed by EA. storage addressed by EA are loaded into RT48:55. Bits (RS)40:47 are stored into bits 16:23 of the word in stor- 16:23 of the word in storage addressed by EA are age addressed by EA. (RS)32:39 are stored into bits loaded into RT40:47. Bits 24:31 of the word in storage 24:31 of the word in storage addressed by EA. addressed by EA are loaded into RT32:39. RT0:31 are set to 0. Special Registers Altered: None Special Registers Altered: None Chapter 3. Fixed-Point Processor 55 Version 2.05 3.3.5 Fixed-Point Load and Store Multiple Instructions The Load/Store Multiple instructions have preferred (stored) from (into) the last byte of an aligned forms; see Section 1.8.1, "Preferred Instruction Forms" quadword in storage. on page 21. In the preferred forms, storage alignment For the Server environment, the Load/Store Multiple satisfies the following rule. instructions are not supported in Little-Endian mode. If 1 The combination of the EA and RT (RS) is such they are executed in Little-Endian mode, the system that the low-order byte of GPR 31 is loaded alignment error handler is invoked. Load Multiple Word D-form lmw RT,D(RA) 46 RT RA D 0 6 11 16 31 if RA = 0 then b 1 0 else b 1 (RA) EA 1 b + EXTS(D) r 1 RT do while r 31 GPR(r) 1 320 || MEM(EA, 4) r 1 r + 1 EA 1 EA + 4 Let n = (32-RT). Let the effective address (EA) be the sum (RA|0)+ D. n consecutive words starting at EA are loaded into the low-order 32 bits of GPRs RT through 31. The high-order 32 bits of these GPRs are set to zero. If RA is in the range of registers to be loaded, including the case in which RA=0, the instruction form is invalid. Special Registers Altered: None 56 Power ISATM I Version 2.05 Store Multiple Word D-form stmw RS,D(RA) 47 RS RA D 0 6 11 16 31 if RA = 0 then b 1 0 else b 1 (RA) EA 1 b + EXTS(D) r 1 RS do while r 31 MEM(EA, 4) 1 GPR(r)32:63 r 1 r + 1 EA 1 EA + 4 Let n = (32-RS). Let the effective address (EA) be the sum (RA|0)+ D. n consecutive words starting at EA are stored from the low-order 32 bits of GPRs RS through 31. Special Registers Altered: None Chapter 3. Fixed-Point Processor 57 Version 2.05 3.3.6 Fixed-Point Move Assist Instructions [Category: Move Assist] The Move Assist instructions allow movement of data 1 RT = 4 or 5 from storage to registers or from registers to storage 1 last register loaded/stored 12 without concern for alignment. These instructions can For some implementations, using GPR 4 for RS and be used for a short move between arbitrary storage RT may result in slightly faster execution than using locations or to initiate a long move between unaligned GPR 5. storage fields. For the Server environment, the Move Assist instruc- The Load/Store String instructions have preferred tions are not supported in Little-Endian mode. If they forms; see Section 1.8.1, "Preferred Instruction Forms" are executed in Little-Endian mode, the system align- on page 21. In the preferred forms, register usage sat- ment error handler may be invoked or the instructions isfies the following rules. may be treated as no-ops if the number of bytes speci- 1 RS = 4 or 5 fied by the instruction is 0. 58 Power ISATM I Version 2.05 Load String Word Immediate X-form Load String Word Indexed X-form lswi RT,RA,NB lswx RT,RA,RB 31 RT RA NB 597 / 31 RT RA RB 533 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then EA 1 0 if RA = 0 then b 1 0 else EA 1 (RA) else b 1 (RA) if NB = 0 then n 1 32 EA 1 b + (RB) else n 1 NB n 1 XER57:63 r 1 RT - 1 r 1 RT - 1 i 1 32 i 1 32 do while n > 0 RT 1 undefined if i = 32 then do while n > 0 r 1 r + 1 (mod 32) if i = 32 then GPR(r) 1 0 r 1 r + 1 (mod 32) GPR(r)i:i+7 1 MEM(EA, 1) GPR(r) 1 0 i 1 i + 8 GPR(r)i:i+7 1 MEM(EA, 1) if i = 64 then i 1 32 i 1 i + 8 EA 1 EA + 1 if i = 64 then i 1 32 n 1 n - 1 EA 1 EA + 1 n 1 n - 1 Let the effective address (EA) be (RA|0). Let n = NB if NB0, n = 32 if NB=0; n is the number of bytes to load. Let the effective address (EA) be the sum Let nr=CEIL(n/4); nr is the number of registers to (RA|0)+ (RB). Let n=XER57:63; n is the number of bytes receive data. to load. Let nr=CEIL(n/4); nr is the number of registers to receive data. n consecutive bytes starting at EA are loaded into GPRs RT through RT+nr-1. Data are loaded into the If n>0, n consecutive bytes starting at EA are loaded low-order four bytes of each GPR; the high-order four into GPRs RT through RT+nr-1. Data are loaded into bytes are set to 0. the low-order four bytes of each GPR; the high-order four bytes are set to 0. Bytes are loaded left to right in each register. The sequence of registers wraps around to GPR 0 if Bytes are loaded left to right in each register. The required. If the low-order four bytes of register sequence of registers wraps around to GPR 0 if RT+nr-1 are only partially filled, the unfilled low-order required. If the low-order four bytes of register byte(s) of that register are set to 0. RT+nr-1 are only partially filled, the unfilled low-order byte(s) of that register are set to 0. If RA is in the range of registers to be loaded, including the case in which RA=0, the instruction form is invalid. If n=0, the contents of register RT are undefined. Special Registers Altered: If RA or RB is in the range of registers to be loaded, None including the case in which RA=0, the instruction is treated as if the instruction form were invalid. If RT=RA or RT=RB, the instruction form is invalid. Special Registers Altered: None Chapter 3. Fixed-Point Processor 59 Version 2.05 Store String Word Immediate X-form Store String Word Indexed X-form stswi RS,RA,NB stswx RS,RA,RB 31 RS RA NB 725 / 31 RS RA RB 661 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then EA 1 0 if RA = 0 then b 1 0 else EA 1 (RA) else b 1 (RA) if NB = 0 then n 1 32 EA 1 b + (RB) else n 1 NB n 1 XER57:63 r 1 RS - 1 r 1 RS - 1 i 1 32 i 1 32 do while n > 0 do while n > 0 if i = 32 then r 1 r + 1 (mod 32) if i = 32 then r 1 r + 1 (mod 32) MEM(EA, 1) 1 GPR(r)i:i+7 MEM(EA, 1) 1 GPR(r)i:i+7 i 1 i + 8 i 1 i + 8 if i = 64 then i 1 32 if i = 64 then i 1 32 EA 1 EA + 1 EA 1 EA + 1 n 1 n - 1 n 1 n - 1 Let the effective address (EA) be (RA|0). Let n = NB if Let the effective address (EA) be the sum NB0, n = 32 if NB=0; n is the number of bytes to store. (RA|0)+ (RB). Let n = XER57:63; n is the number of Let nr =CEIL(n/4); nr is the number of registers to sup- bytes to store. Let nr = CEIL(n/4); nr is the number of ply data. registers to supply data. n consecutive bytes starting at EA are stored from If n>0, n consecutive bytes starting at EA are stored GPRs RS through RS+nr-1. Data are stored from the from GPRs RS through RS+nr-1. Data are stored from low-order four bytes of each GPR. the low-order four bytes of each GPR. Bytes are stored left to right from each register. The Bytes are stored left to right from each register. The sequence of registers wraps around to GPR 0 if sequence of registers wraps around to GPR 0 if required. required. Special Registers Altered: If n=0, no bytes are stored. None Special Registers Altered: None 60 Power ISATM I Version 2.05 3.3.7 Other Fixed-Point Instructions The remainder of the fixed-point instructions use the these bits are set by signed comparison of the result to contents of the General Purpose Registers (GPRs) as zero. In 32-bit mode, these bits are set by signed com- source operands, and place results into GPRs, into the parison of the low-order 32 bits of the result to zero. Fixed-Point Exception Register (XER), and into Condi- Unless otherwise noted and when appropriate, when tion Register fields. In addition, the Trap instructions CR Field 0 and the XER are set they reflect the value test the contents of a GPR or XER bit, invoking the sys- placed into the target register. tem trap handler if the result of the specified test is true. These instructions treat the source operands as signed Programming Note integers unless the instruction is explicitly identified as Instructions with the OE bit set or that set CA may performing an unsigned operation. execute slowly or may prevent the execution of sub- The X-form and XO-form instructions with Rc=1, and sequent instructions until the instruction has com- the D-form instructions addic., andi., and andis., set pleted. the first three bits of CR Field 0 to characterize the result placed into the target register. In 64-bit mode, Chapter 3. Fixed-Point Processor 61 Version 2.05 3.3.8 Fixed-Point Arithmetic Instructions The XO-form Arithmetic instructions with Rc=1, and the Extended mnemonics for addition and D-form Arithmetic instruction addic., set the first three subtraction bits of CR Field 0 as described in Section 3.3.7, "Other Fixed-Point Instructions". Several extended mnemonics are provided that use the Add Immediate and Add Immediate Shifted instructions addic, addic., subfic, addc, subfc, adde, subfe, to load an immediate value or an address into a target addme, subfme, addze, and subfze always set CA, to register. Some of these are shown as examples with reflect the carry out of bit 0 in 64-bit mode and out of bit the two instructions. 32 in 32-bit mode. The XO-form Arithmetic instructions set SO and OV when OE=1 to reflect overflow of the The Power ISA supplies Subtract From instructions, result. Except for the Multiply Low and Divide instruc- which subtract the second operand from the third. A set tions, the setting of these bits is mode-dependent, and of extended mnemonics is provided that use the more reflects overflow of the 64-bit result in 64-bit mode and "normal" order, in which the third operand is subtracted overflow of the low-order 32-bit result in 32-bit mode. from the second, with the third operand being either an For XO-form Multiply Low and Divide instructions, the immediate field or a register. Some of these are shown setting of these bits is mode-independent, and reflects as examples with the appropriate Add and Subtract overflow of the 64-bit result for mulld, divd, and divdu, From instructions. and overflow of the low-order 32-bit result for mullw, See Appendix D for additional extended mnemonics. divw, and divwu. Programming Note Notice that CR Field 0 may not reflect the "true" (infinitely precise) result if overflow occurs. Add Immediate D-form Add Immediate Shifted D-form addi RT,RA,SI addis RT,RA,SI 14 RT RA SI 15 RT RA SI 0 6 11 16 31 0 6 11 16 31 if RA = 0 then RT 1 EXTS(SI) if RA = 0 then RT 1 EXTS(SI || 160) else RT 1 (RA) + EXTS(SI) else RT 1 (RA) + EXTS(SI || 160) The sum (RA|0) + SI is placed into register RT. The sum (RA|0) + (SI || 0x0000) is placed into register RT. Special Registers Altered: None Special Registers Altered: None Extended Mnemonics: Extended Mnemonics: Examples of extended mnemonics for Add Immediate: Examples of extended mnemonics for Add Immediate Extended: Equivalent to: Shifted: li Rx,value addi Rx,0,value la Rx,disp(Ry) addi Rx,Ry,disp Extended: Equivalent to: subi Rx,Ry,value addi Rx,Ry,-value lis Rx,value addis Rx,0,value subis Rx,Ry,value addis Rx,Ry,-value Programming Note addi, addis, add, and subf are the preferred instructions for addition and subtraction, because they set few status bits. Notice that addi and addis use the value 0, not the contents of GPR 0, if RA=0. 62 Power ISATM I Version 2.05 Add XO-form Subtract From XO-form add RT,RA,RB (OE=0 Rc=0) subf RT,RA,RB (OE=0 Rc=0) add. RT,RA,RB (OE=0 Rc=1) subf. RT,RA,RB (OE=0 Rc=1) addo RT,RA,RB (OE=1 Rc=0) subfo RT,RA,RB (OE=1 Rc=0) addo. RT,RA,RB (OE=1 Rc=1) subfo. RT,RA,RB (OE=1 Rc=1) 31 RT RA RB OE 266 Rc 31 RT RA RB OE 40 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 RT 1 (RA) + (RB) RT 1 ¬(RA) + (RB) + 1 The sum (RA) + (RB) is placed into register RT. The sum ¬(RA) + (RB) +1 is placed into register RT. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) SO OV (if OE=1) SO OV (if OE=1) Extended Mnemonics: Example of extended mnemonics for Subtract From: Extended: Equivalent to: sub Rx,Ry,Rz subf Rx,Rz,Ry Add Immediate Carrying D-form Add Immediate Carrying and Record D-form addic RT,RA,SI addic. RT,RA,SI 12 RT RA SI 0 6 11 16 31 13 RT RA SI 0 6 11 16 31 RT 1 (RA) + EXTS(SI) RT 1 (RA) + EXTS(SI) The sum (RA) + SI is placed into register RT. The sum (RA) + SI is placed into register RT. Special Registers Altered: CA Special Registers Altered: CR0 CA Extended Mnemonics: Extended Mnemonics: Example of extended mnemonics for Add Immediate Carrying: Example of extended mnemonics for Add Immediate Carrying and Record: Extended: Equivalent to: subic Rx,Ry,value addic Rx,Ry,-value Extended: Equivalent to: subic. Rx,Ry,value addic. Rx,Ry,-value Chapter 3. Fixed-Point Processor 63 Version 2.05 Subtract From Immediate Carrying D-form subfic RT,RA,SI 8 RT RA SI 0 6 11 16 31 RT 1 ¬(RA) + EXTS(SI) + 1 The sum ¬(RA) + SI + 1 is placed into register RT. Special Registers Altered: CA Add Carrying XO-form Subtract From Carrying XO-form addc RT,RA,RB (OE=0 Rc=0) subfc RT,RA,RB (OE=0 Rc=0) addc. RT,RA,RB (OE=0 Rc=1) subfc. RT,RA,RB (OE=0 Rc=1) addco RT,RA,RB (OE=1 Rc=0) subfco RT,RA,RB (OE=1 Rc=0) addco. RT,RA,RB (OE=1 Rc=1) subfco. RT,RA,RB (OE=1 Rc=1) 31 RT RA RB OE 10 Rc 31 RT RA RB OE 8 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 RT 1 (RA) + (RB) RT 1 ¬(RA) + (RB) + 1 The sum (RA) + (RB) is placed into register RT. The sum ¬(RA) + (RB) + 1 is placed into register RT. Special Registers Altered: Special Registers Altered: CA CA CR0 (if Rc=1) CR0 (if Rc=1) SO OV (if OE=1) SO OV (if OE=1) Extended Mnemonics: Example of extended mnemonics for Subtract From Carrying: Extended: Equivalent to: subc Rx,Ry,Rz subfc Rx,Rz,Ry 64 Power ISATM I Version 2.05 Add Extended XO-form Subtract From Extended XO-form adde RT,RA,RB (OE=0 Rc=0) subfe RT,RA,RB (OE=0 Rc=0) adde. RT,RA,RB (OE=0 Rc=1) subfe. RT,RA,RB (OE=0 Rc=1) addeo RT,RA,RB (OE=1 Rc=0) subfeo RT,RA,RB (OE=1 Rc=0) addeo. RT,RA,RB (OE=1 Rc=1) subfeo. RT,RA,RB (OE=1 Rc=1) 31 RT RA RB OE 138 Rc 31 RT RA RB OE 136 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 RT 1 (RA) + (RB) + CA RT 1 ¬(RA) + (RB) + CA The sum (RA) + (RB) + CA is placed into register RT. The sum ¬(RA) + (RB) + CA is placed into register RT. Special Registers Altered: Special Registers Altered: CA CA CR0 (if Rc=1) CR0 (if Rc=1) SO OV (if OE=1) SO OV (if OE=1) Add to Minus One Extended XO-form Subtract From Minus One Extended XO-form addme RT,RA (OE=0 Rc=0) addme. RT,RA (OE=0 Rc=1) subfme RT,RA (OE=0 Rc=0) addmeo RT,RA (OE=1 Rc=0) subfme. RT,RA (OE=0 Rc=1) addmeo. RT,RA (OE=1 Rc=1) subfmeo RT,RA (OE=1 Rc=0) subfmeo. RT,RA (OE=1 Rc=1) 31 RT RA /// OE 234 Rc 0 6 11 16 21 22 31 31 RT RA /// OE 232 Rc 0 6 11 16 21 22 31 RT 1 (RA) + CA - 1 RT 1 ¬(RA) + CA - 1 The sum (RA) + CA + 641 is placed into register RT. The sum ¬(RA) + CA + 641 is placed into register RT. Special Registers Altered: CA Special Registers Altered: CR0 (if Rc=1) CA SO OV (if OE=1) CR0 (if Rc=1) SO OV (if OE=1) Chapter 3. Fixed-Point Processor 65 Version 2.05 Add to Zero Extended XO-form Subtract From Zero Extended XO-form addze RT,RA (OE=0 Rc=0) subfze RT,RA (OE=0 Rc=0) addze. RT,RA (OE=0 Rc=1) subfze. RT,RA (OE=0 Rc=1) addzeo RT,RA (OE=1 Rc=0) subfzeo RT,RA (OE=1 Rc=0) addzeo. RT,RA (OE=1 Rc=1) subfzeo. RT,RA (OE=1 Rc=1) 31 RT RA /// OE 202 Rc 31 RT RA /// OE 200 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 RT 1 (RA) + CA RT 1 ¬(RA) + CA The sum (RA) + CA is placed into register RT. The sum ¬(RA) + CA is placed into register RT. Special Registers Altered: Special Registers Altered: CA CA CR0 (if Rc=1) CR0 (if Rc=1) SO OV (if OE=1) SO OV (if OE=1) Programming Note The setting of CA by the Add and Subtract From instructions, including the Extended versions thereof, is mode-dependent. If a sequence of these instructions is used to perform extended-pre- cision addition or subtraction, the same mode should be used throughout the sequence. Negate XO-form neg RT,RA (OE=0 Rc=0) neg. RT,RA (OE=0 Rc=1) nego RT,RA (OE=1 Rc=0) nego. RT,RA (OE=1 Rc=1) 31 RT RA /// OE 104 Rc 0 6 11 16 21 22 31 RT 1 ¬(RA) + 1 The sum ¬(RA) + 1 is placed into register RT. If the processor is in 64-bit mode and register RA con- tains the most negative 64-bit number (0x8000_ 0000_0000_0000), the result is the most negative num- ber and, if OE=1, OV is set to 1. Similarly, if the pro- cessor is in 32-bit mode and (RA)32:63 contain the most negative 32-bit number (0x8000_0000), the low-order 32 bits of the result contain the most negative 32-bit number and, if OE=1, OV is set to 1. Special Registers Altered: CR0 (if Rc=1) SO OV (if OE=1) 66 Power ISATM I Version 2.05 Multiply Low Immediate D-form Multiply High Word XO-form mulli RT,RA,SI mulhw RT,RA,RB (Rc=0) mulhw. RT,RA,RB (Rc=1) 7 RT RA SI 0 6 11 16 31 31 RT RA RB / 75 Rc 0 6 11 16 21 22 31 prod0:127 1 (RA) × EXTS(SI) RT 1 prod64:127 prod0:63 1 (RA)32:63 × (RB)32:63 RT32:63 1 prod0:31 The 64-bit first operand is (RA). The 64-bit second RT0:31 1 undefined operand is the sign-extended value of the SI field. The low-order 64 bits of the 128-bit product of the operands The 32-bit operands are the low-order 32 bits of RA are placed into register RT. and of RB. The high-order 32 bits of the 64-bit product of the operands are placed into RT32:63. The contents Both operands and the product are interpreted as of RT0:31 are undefined. signed integers. Both operands and the product are interpreted as Special Registers Altered: signed integers. None Special Registers Altered: CR0 (bits 0:2 undefined in 64-bit mode) (if Rc=1) Multiply Low Word XO-form mullw RT,RA,RB (OE=0 Rc=0) Multiply High Word Unsigned XO-form mullw. RT,RA,RB (OE=0 Rc=1) mullwo RT,RA,RB (OE=1 Rc=0) mulhwu RT,RA,RB (Rc=0) mullwo. RT,RA,RB (OE=1 Rc=1) mulhwu. RT,RA,RB (Rc=1) 31 RT RA RB OE 235 Rc 31 RT RA RB / 11 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 RT 1 (RA)32:63 × (RB)32:63 prod0:63 1 (RA)32:63 × (RB)32:63 RT32:63 1 prod0:31 The 32-bit operands are the low-order 32 bits of RA RT0:31 1 undefined and of RB. The 64-bit product of the operands is placed into register RT. The 32-bit operands are the low-order 32 bits of RA and of RB. The high-order 32 bits of the 64-bit product If OE=1 then OV is set to 1 if the product cannot be rep- of the operands are placed into RT32:63. The contents resented in 32 bits. of RT0:31 are undefined. Both operands and the product are interpreted as Both operands and the product are interpreted as signed integers. unsigned integers, except that if Rc=1 the first three Special Registers Altered: bits of CR Field 0 are set by signed comparison of the CR0 (if Rc=1) result to zero. SO OV (if OE=1) Special Registers Altered: CR0 (bits 0:2undefined in 64-bit mode) (if Rc=1) Programming Note For mulli and mullw, the low-order 32 bits of the product are the correct 32-bit product for 32-bit mode. For mulli and mulld, the low-order 64 bits of the product are independent of whether the operands are regarded as signed or unsigned 64-bit integers. For mulli and mullw, the low-order 32 bits of the product are independent of whether the operands are regarded as signed or unsigned 32-bit integers. Chapter 3. Fixed-Point Processor 67 Version 2.05 Divide Word XO-form Divide Word Unsigned XO-form divw RT,RA,RB (OE=0 Rc=0) divwu RT,RA,RB (OE=0 Rc=0) divw. RT,RA,RB (OE=0 Rc=1) divwu. RT,RA,RB (OE=0 Rc=1) divwo RT,RA,RB (OE=1 Rc=0) divwuo RT,RA,RB (OE=1 Rc=0) divwo. RT,RA,RB (OE=1 Rc=1) divwuo. RT,RA,RB (OE=1 Rc=1) 31 RT RA RB OE 491 Rc 31 RT RA RB OE 459 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 dividend0:63 1 EXTS((RA)32:63) dividend0:63 1 320 || (RA)32:63 divisor0:63 1 EXTS((RB)32:63) divisor0:63 1 320 || (RB)32:63 RT32:63 1 dividend ÷ divisor RT32:63 1 dividend ÷ divisor RT0:31 1 undefined RT0:31 1 undefined The 64-bit dividend is the sign-extended value of The 64-bit dividend is the zero-extended value of (RA)32:63. The 64-bit divisor is the sign-extended value (RA)32:63. The 64-bit divisor is the zero-extended value of (RB)32:63. The 64-bit quotient is formed. The of (RB)32:63. The 64-bit quotient is formed. The low-order 32 bits of the 64-bit quotient are placed into low-order 32 bits of the 64-bit quotient are placed into RT32:63. The contents of RT0:31 are undefined. The RT32:63. The contents of RT0:31 are undefined. The remainder is not supplied as a result. remainder is not supplied as a result. Both operands and the quotient are interpreted as Both operands and the quotient are interpreted as signed integers. The quotient is the unique signed inte- unsigned integers, except that if Rc=1 the first three ger that satisfies bits of CR Field 0 are set by signed comparison of the result to zero. The quotient is the unique unsigned dividend = (quotient × divisor) + r integer that satisfies where 0 r < |divisor| if the dividend is nonnegative, dividend = (quotient × divisor) + r and -|divisor| < r 0 if the dividend is negative. where 0 r < divisor. If an attempt is made to perform any of the divisions If an attempt is made to perform the division 0x8000_0000 ÷ -1 ÷ 0 ÷ 0 then the contents of register RT are undefined as are then the contents of register RT are undefined as are (if (if Rc=1) the contents of the LT, GT, and EQ bits of CR Rc=1) the contents of the LT, GT, and EQ bits of CR Field 0. In these cases, if OE=1 then OV is set to 1. Field 0. In this case, if OE=1 then OV is set to 1. Special Registers Altered: Special Registers Altered: CR0 (bits 0:2 undefined in 64-bit mode) (if Rc=1) CR0 (bits 0:2 undefined in 64-bit mode) (if Rc=1) SO OV (if OE=1) SO OV (if OE=1) Programming Note Programming Note The 32-bit signed remainder of dividing (RA)32:63 The 32-bit unsigned remainder of dividing (RA)32:63 by (RB)32:63 can be computed as follows, except in by (RB)32:63 can be computed as follows. the case that (RA)32:63 = -231 and (RB)32:63 = -1. divwu RT,RA,RB # RT = quotient divw RT,RA,RB # RT = quotient mullw RT,RT,RB # RT = quotient×divisor mullw RT,RT,RB # RT = quotient×divisor subf RT,RT,RA # RT = remainder subf RT,RT,RA # RT = remainder 68 Power ISATM I Version 2.05 3.3.8.1 64-bit Fixed-Point Arithmetic Instructions [Category: 64-Bit] Multiply Low Doubleword XO-form Multiply High Doubleword XO-form mulld RT,RA,RB (OE=0 Rc=0) mulhd RT,RA,RB (Rc=0) mulld. RT,RA,RB (OE=0 Rc=1) mulhd. RT,RA,RB (Rc=1) mulldo RT,RA,RB (OE=1 Rc=0) mulldo. RT,RA,RB (OE=1 Rc=1) 31 RT RA RB / 73 Rc 0 6 11 16 21 22 31 31 RT RA RB OE 233 Rc 0 6 11 16 21 22 31 prod0:127 1 (RA) × (RB) RT 1 prod0:63 prod0:127 1 (RA) × (RB) The 64-bit operands are (RA) and (RB). The RT 1 prod64:127 high-order 64 bits of the 128-bit product of the oper- The 64-bit operands are (RA) and (RB). The low-order ands are placed into register RT. 64 bits of the 128-bit product of the operands are Both operands and the product are interpreted as placed into register RT. signed integers. If OE=1 then OV is set to 1 if the product cannot be rep- Special Registers Altered: resented in 64 bits. CR0 (if Rc=1) Both operands and the product are interpreted as signed integers. Special Registers Altered: CR0 (if Rc=1) SO OV (if OE=1) Programming Note The XO-form Multiply instructions may execute faster on some implementations if RB contains the operand having the smaller absolute value. Multiply High Doubleword Unsigned XO-form mulhdu RT,RA,RB (Rc=0) mulhdu. RT,RA,RB (Rc=1) 31 RT RA RB / 9 Rc 0 6 11 16 21 22 31 prod0:127 1 (RA) × (RB) RT 1 prod0:63 The 64-bit operands are (RA) and (RB). The high-order 64 bits of the 128-bit product of the oper- ands are placed into register RT. Both operands and the product are interpreted as unsigned integers, except that if Rc=1 the first three bits of CR Field 0 are set by signed comparison of the result to zero. Special Registers Altered: CR0 (if Rc=1) Chapter 3. Fixed-Point Processor 69 Version 2.05 Divide Doubleword XO-form Divide Doubleword Unsigned XO-form divd RT,RA,RB (OE=0 Rc=0) divdu RT,RA,RB (OE=0 Rc=0) divd. RT,RA,RB (OE=0 Rc=1) divdu. RT,RA,RB (OE=0 Rc=1) divdo RT,RA,RB (OE=1 Rc=0) divduo RT,RA,RB (OE=1 Rc=0) divdo. RT,RA,RB (OE=1 Rc=1) divduo. RT,RA,RB (OE=1 Rc=1) 31 RT RA RB OE 489 Rc 31 RT RA RB OE 457 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 dividend0:63 1 (RA) dividend0:63 1 (RA) divisor0:63 1 (RB) divisor0:63 1 (RB) RT 1 dividend ÷ divisor RT 1 dividend ÷ divisor The 64-bit dividend is (RA). The 64-bit divisor is (RB). The 64-bit dividend is (RA). The 64-bit divisor is (RB). The 64-bit quotient of the dividend and divisor is placed The 64-bit quotient of the dividend and divisor is placed into register RT. The remainder is not supplied as a into register RT. The remainder is not supplied as a result. result. Both operands and the quotient are interpreted as Both operands and the quotient are interpreted as signed integers. The quotient is the unique signed inte- unsigned integers, except that if Rc=1 the first three ger that satisfies bits of CR Field 0 are set by signed comparison of the result to zero. The quotient is the unique unsigned dividend = (quotient × divisor) + r integer that satisfies where 0 r < |divisor| if the dividend is nonnegative, dividend = (quotient × divisor) + r and -|divisor| < r 0 if the dividend is negative. where 0 r < divisor. If an attempt is made to perform any of the divisions If an attempt is made to perform the division 0x8000_0000_0000_0000 ÷ -1 ÷ 0 ÷ 0 then the contents of register RT are undefined as are (if then the contents of register RT are undefined as are (if Rc=1) the contents of the LT, GT, and EQ bits of CR Rc=1) the contents of the LT, GT, and EQ bits of CR Field 0. In these cases, if OE=1 then OV is set to 1. Field 0. In this case, if OE=1 then OV is set to 1. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) SO OV (if OE=1) SO OV (if OE=1) Programming Note Programming Note The 64-bit signed remainder of dividing (RA) by The 64-bit unsigned remainder of dividing (RA) by (RB) can be computed as follows, except in the (RB) can be computed as follows. case that (RA) = -263 and (RB) = -1. divdu RT,RA,RB # RT = quotient divd RT,RA,RB # RT = quotient mulld RT,RT,RB # RT = quotient×divisor mulld RT,RT,RB # RT = quotient×divisor subf RT,RT,RA # RT = remainder subf RT,RT,RA # RT = remainder 70 Power ISATM I Version 2.05 3.3.9 Fixed-Point Compare Instructions The fixed-point Compare instructions compare the con- two to 0. XERSO is copied to bit 3 of the designated CR tents of register RA with (1) the sign-extended value of field. the SI field, (2) the zero-extended value of the UI field, The CR field is set as follows or (3) the contents of register RB. The comparison is signed for cmpi and cmp, and unsigned for cmpli and . cmpl. Bit Name Description 0 LT (RA) < SI or (RB) (signed comparison) The L field controls whether the operands are treated (RA) SI or (RB) (signed comparison) L Operand length (RA) >u UI or (RB) (unsigned comparison) 0 32-bit operands 2 EQ (RA) = SI, UI, or (RB) 1 64-bit operands 3 SO Summary Overflow from the XER L=1 is part of Category: 64-Bit. Extended mnemonics for compares When the operands are treated as 32-bit signed quanti- A set of extended mnemonics is provided so that com- ties, bit 32 of the register (RA or RB) is the sign bit. pares can be coded with the operand length as part of The Compare instructions set one bit in the leftmost the mnemonic rather than as a numeric operand. Some three bits of the designated CR field to 1, and the other of these are shown as examples with the Compare instructions. See Appendix D for additional extended mnemonics. Compare Immediate D-form Compare X-form cmpi BF,L,RA,SI cmp BF,L,RA,RB 11 BF / L RA SI 31 BF / L RA RB 0 / 0 6 9 10 11 16 31 0 6 9 10 11 16 21 31 if L = 0 then a 1 EXTS((RA)32:63) if L = 0 then a 1 EXTS((RA)32:63) else a 1 (RA) b 1 EXTS((RB)32:63) if a < EXTS(SI) then c 1 0b100 else a 1 (RA) else if a > EXTS(SI) then c 1 0b010 b 1 (RB) else c 1 0b001 if a < b then c 1 0b100 CR4×BF+32:4×BF+35 1 c || XERSO else if a > b then c 1 0b010 else c 1 0b001 The contents of register RA ((RA)32:63 sign-extended to CR4×BF+32:4×BF+35 1 c || XERSO 64 bits if L=0) are compared with the sign-extended value of the SI field, treating the operands as signed The contents of register RA ((RA)32:63 if L=0) are com- integers. The result of the comparison is placed into CR pared with the contents of register RB ((RB)32:63 if field BF. L=0), treating the operands as signed integers. The result of the comparison is placed into CR field BF. Special Registers Altered: CR field BF Special Registers Altered: CR field BF Extended Mnemonics: Extended Mnemonics: Examples of extended mnemonics for Compare Imme- diate: Examples of extended mnemonics for Compare: Extended: Equivalent to: Extended: Equivalent to: cmpdi Rx,value cmpi 0,1,Rx,value cmpd Rx,Ry cmp 0,1,Rx,Ry cmpwi cr3,Rx,value cmpi 3,0,Rx,value cmpw cr3,Rx,Ry cmp 3,0,Rx,Ry Chapter 3. Fixed-Point Processor 71 Version 2.05 Compare Logical Immediate D-form Compare Logical X-form cmpli BF,L,RA,UI cmpl BF,L,RA,RB 10 BF / L RA UI 31 BF / L RA RB 32 / 0 6 9 10 11 16 31 0 6 9 10 11 16 21 31 if L = 0 then a 1 320 || (RA)32:63 if L = 0 then a 1 320 || (RA)32:63 else a 1 (RA) b 1 320 || (RB)32:63 if a u (480 || UI) then c 1 0b010 b 1 (RB) else c 1 0b001 if a u b then c 1 0b010 else c 1 0b001 The contents of register RA ((RA)32:63 zero-extended CR4×BF+32:4×BF+35 1 c || XERSO to 64 bits if L=0) are compared with 480 || UI, treating the operands as unsigned integers. The result of the The contents of register RA ((RA)32:63 if L=0) are com- comparison is placed into CR field BF. pared with the contents of register RB ((RB)32:63 if L=0), treating the operands as unsigned integers. The Special Registers Altered: result of the comparison is placed into CR field BF. CR field BF Special Registers Altered: Extended Mnemonics: CR field BF Examples of extended mnemonics for Compare Logi- Extended Mnemonics: cal Immediate: Examples of extended mnemonics for Compare Logi- Extended: Equivalent to: cal: cmpldi Rx,value cmpli 0,1,Rx,value cmplwi cr3,Rx,value cmpli 3,0,Rx,value Extended: Equivalent to: cmpld Rx,Ry cmpl 0,1,Rx,Ry cmplw cr3,Rx,Ry cmpl 3,0,Rx,Ry 72 Power ISATM I Version 2.05 3.3.10 Fixed-Point Trap Instructions The Trap instructions are provided to test for a speci- TO Bit ANDed with Condition fied set of conditions. If any of the conditions tested by 0 Less Than, using signed comparison a Trap instruction are met, the system trap handler is 1 Greater Than, using signed comparison invoked. If none of the tested conditions are met, 2 Equal instruction execution continues normally. 3 Less Than, using unsigned comparison 4 Greater Than, using unsigned comparison The contents of register RA are compared with either the sign-extended value of the SI field or the contents of register RB, depending on the Trap instruction. For Extended mnemonics for traps tdi and td, the entire contents of RA (and RB) partici- A set of extended mnemonics is provided so that traps pate in the comparison; for twi and tw, only the con- can be coded with the condition as part of the mne- tents of the low-order 32 bits of RA (and RB) participate monic rather than as a numeric operand. Some of in the comparison. these are shown as examples with the Trap instruc- This comparison results in five conditions which are tions. See Appendix D for additional extended mne- ANDed with TO. If the result is not 0 the system trap monics. handler is invoked. These conditions are as follows. Trap Word Immediate D-form Trap Word X-form twi TO,RA,SI tw TO,RA,RB 3 TO RA SI 31 TO RA RB 4 / 0 6 11 16 31 0 6 11 16 21 31 a 1 EXTS((RA)32:63) a 1 EXTS((RA)32:63) if (a < EXTS(SI)) & TO0 then TRAP b 1 EXTS((RB)32:63) if (a > EXTS(SI)) & TO1 then TRAP if (a < b) & TO0 then TRAP if (a = EXTS(SI)) & TO2 then TRAP if (a > b) & TO1 then TRAP if (a u EXTS(SI)) & TO4 then TRAP if (a u b) & TO4 then TRAP The contents of RA32:63 are compared with the sign-extended value of the SI field. If any bit in the TO The contents of RA32:63 are compared with the con- field is set to 1 and its corresponding condition is met tents of RB32:63. If any bit in the TO field is set to 1 and by the result of the comparison, the system trap han- its corresponding condition is met by the result of the dler is invoked. comparison, the system trap handler is invoked. If the trap conditions are met, this instruction is context If the trap conditions are met, this instruction is context synchronizing (see Book III). synchronizing (see Book III). Special Registers Altered: Special Registers Altered: None None Extended Mnemonics: Extended Mnemonics: Examples of extended mnemonics for Trap Word Examples of extended mnemonics for Trap Word: Immediate: Extended: Equivalent to: Extended: Equivalent to: tweq Rx,Ry tw 4,Rx,Ry twgti Rx,value twi 8,Rx,value twlge Rx,Ry tw 5,Rx,Ry twllei Rx,value twi 6,Rx,value trap tw 31,0,0 Chapter 3. Fixed-Point Processor 73 Version 2.05 3.3.10.1 64-bit Fixed-Point Trap Instructions [Category: 64-Bit] Trap Doubleword Immediate D-form tdi TO,RA,SI Trap Doubleword X-form 2 TO RA SI td TO,RA,RB 0 6 11 16 31 31 TO RA RB 68 / a 1 (RA) 0 6 11 16 21 31 b 1 EXTS(SI) if (a < b) & TO0 then TRAP if (a > b) & TO1 then TRAP a 1 (RA) if (a = b) & TO2 then TRAP b 1 (RB) if (a u b) & TO4 then TRAP if (a > b) & TO1 then TRAP if (a = b) & TO2 then TRAP The contents of register RA are compared with the if (a u b) & TO4 then TRAP field is set to 1 and its corresponding condition is met The contents of register RA are compared with the con- by the result of the comparison, the system trap han- tents of register RB. If any bit in the TO field is set to 1 dler is invoked. and its corresponding condition is met by the result of If the trap conditions are met, this instruction is context the comparison, the system trap handler is invoked. synchronizing (see Book III). If the trap conditions are met, this instruction is context Special Registers Altered: synchronizing (see Book III). None Special Registers Altered: Extended Mnemonics: None Examples of extended mnemonics for Trap Double- Extended Mnemonics: word Immediate: Examples of extended mnemonics for Trap Double- Extended: Equivalent to: word: tdlti Rx,value tdi 16,Rx,value Extended: Equivalent to: tdnei Rx,value tdi 24,Rx,value tdge Rx,Ry td 12,Rx,Ry tdlnl Rx,Ry td 5,Rx,Ry 3.3.11 Fixed-Point Select [Category: Phased-In (sV2.06)] Integer Select A-form isel RT,RA,RB,BC 31 RT RA RB BC 15 / 0 6 11 16 21 26 31 if RA=0 then a 10 else a 1 (RA) if CRBC+32=1 then RT 1 a else RT 1 (RB) If the contents of bit BC+32 of the Condition Register are equal to 1, then the contents of register RA (or 0) are placed into register RT. Otherwise, the contents of register RB are placed into register RT. Special Registers Altered: None 74 Power ISATM I Version 2.05 3.3.12 Fixed-Point Logical Instructions The Logical instructions perform bit-parallel operations resources. This form is based on the OR Immediate on 64-bit operands. instruction. The second type is the executed form, which is intended to consume the same amount of the The X-form Logical instructions with Rc=1, and the processor's execution resources as if it were not a D-form Logical instructions andi. and andis., set the no-op. This form is based on the XOR Immediate first three bits of CR Field 0 as described in instruction. (There are also no-ops which affect pro- Section 3.3.7, "Other Fixed-Point Instructions" on gram priority, for which extended mnemonics have not page 61. The Logical instructions do not change the been assigned.) SO, OV, and CA bits in the XER. Extended mnemonics are provided that use the OR Extended mnemonics for logical oper- and NOR instructions to copy the contents of one regis- ter to another, with and without complementing. These ations are shown as examples with the two instructions. Extended mnemonics are provided that generate two See Appendix D, "Assembler Extended Mnemonics" on different types of "no-ops" (instructions that do noth- page 383 for additional extended mnemonics. ing). The first type is the preferred form, which is opti- mized to minimize its use of the processor's execution AND Immediate D-form OR Immediate D-form andi. RA,RS,UI ori RA,RS,UI 28 RS RA UI 24 RS RA UI 0 6 11 16 31 0 6 11 16 31 RA 1 (RS) & (480 || UI) RA 1 (RS) | (480 || UI) The contents of register RS are ANDed with 480 || UI The contents of register RS are ORed with 480 || UI and and the result is placed into register RA. the result is placed into register RA. Special Registers Altered: The preferred "no-op" (an instruction that does nothing) CR0 is: AND Immediate Shifted D-form ori 0,0,0 Special Registers Altered: andis. RA,RS,UI None 29 RS RA UI Extended Mnemonics: 0 6 11 16 31 Example of extended mnemonics for OR Immediate: RA 1 (RS) & (320 || UI || 160) Extended: Equivalent to: no-op ori 0,0,0 The contents of register RS are ANDed with 32 0 || UI || 160 and the result is placed into register RA. Special Registers Altered: CR0 Chapter 3. Fixed-Point Processor 75 Version 2.05 OR Immediate Shifted D-form oris RA,RS,UI 25 RS RA UI 0 6 11 16 31 RA 1 (RS) | (320 || UI || 160) The contents of register RS are ORed with 320 || UI || 160 and the result is placed into register RA. Special Registers Altered: None XOR Immediate D-form XOR Immediate Shifted D-form xori RA,RS,UI xoris RA,RS,UI 26 RS RA UI 27 RS RA UI 0 6 11 16 31 0 6 11 16 31 RA 1 (RS) XOR (480 || UI) RA 1 (RS) XOR (320 || UI || 160) The contents of register RS are XORed with 480 || UI The contents of register RS are XORed with 32 and the result is placed into register RA. 0 || UI || 160 and the result is placed into register RA. The executed form of a "no-op" (an instruction that Special Registers Altered: does nothing, but consumes execution resources nev- None ertheless) is: xori 0,0,0 Special Registers Altered: None Extended Mnemonics: Example of extended mnemonics for XOR Immediate: Extended: Equivalent to: xnop xori 0,0,0 Programming Note The executed form of no-op should be used only when the intent is to alter the timing of a program. 76 Power ISATM I Version 2.05 AND X-form OR X-form and RA,RS,RB (Rc=0) or RA,RS,RB (Rc=0) and. RA,RS,RB (Rc=1) or. RA,RS,RB (Rc=1) 31 RS RA RB 28 Rc 31 RS RA RB 444 Rc 0 6 11 16 21 31 0 6 11 16 21 31 RA 1 (RS) & (RB) RA 1 (RS) | (RB) The contents of register RS are ANDed with the con- The contents of register RS are ORed with the contents tents of register RB and the result is placed into register of register RB and the result is placed into register RA. RA. For implementations that support the PPR (see Section Special Registers Altered: 3.2.3), or Rx,Rx,Rx can be used to set PPRPRI as CR0 (if Rc=1) shown in Figure 45. or. Rx,Rx,Rx does not set PPRPRI. Rx PPRPRI Priority XOR X-form 1 010 low 6 011 medium low xor RA,RS,RB (Rc=0) 2 100 medium (normal) xor. RA,RS,RB (Rc=1) Figure 45. Priority levels for or Rx,Rx,Rx 31 RS RA RB 316 Rc 0 6 11 16 21 31 Special Registers Altered: CR0 (if Rc=1) RA 1 (RS) (RB) Extended Mnemonics: The contents of register RS are XORed with the con- Example of extended mnemonics for OR: tents of register RB and the result is placed into register RA. Extended: Equivalent to: mr Rx,Ry or Rx,Ry,Ry Special Registers Altered: CR0 (if Rc=1) Programming Note Warning: Other forms of or Rx,Rx,Rx that are not described in Figure 45 may also cause program NAND X-form priority to change. Use of these forms should be avoided except when software explicitly intends to nand RA,RS,RB (Rc=0) alter program priority. If a no-op is needed, the pre- nand. RA,RS,RB (Rc=1) ferred no-op (ori 0,0,0) should be used. 31 RS RA RB 476 Rc 0 6 11 16 21 31 RA 1 ¬((RS) & (RB)) The contents of register RS are ANDed with the con- tents of register RB and the complemented result is placed into register RA. Special Registers Altered: CR0 (if Rc=1) Programming Note nand or nor with RS=RB can be used to obtain the one's complement. Chapter 3. Fixed-Point Processor 77 Version 2.05 NOR X-form Equivalent X-form nor RA,RS,RB (Rc=0) eqv RA,RS,RB (Rc=0) nor. RA,RS,RB (Rc=1) eqv. RA,RS,RB (Rc=1) 31 RS RA RB 124 Rc 31 RS RA RB 284 Rc 0 6 11 16 21 31 0 6 11 16 21 31 RA 1 ¬((RS) | (RB)) RA 1 (RS) (RB) The contents of register RS are ORed with the contents The contents of register RS are XORed with the con- of register RB and the complemented result is placed tents of register RB and the complemented result is into register RA. placed into register RA. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) Extended Mnemonics: Example of extended mnemonics for NOR: Extended: Equivalent to: not Rx,Ry nor Rx,Ry,Ry AND with Complement X-form OR with Complement X-form andc RA,RS,RB (Rc=0) orc RA,RS,RB (Rc=0) andc. RA,RS,RB (Rc=1) orc. RA,RS,RB (Rc=1) 31 RS RA RB 60 Rc 31 RS RA RB 412 Rc 0 6 11 16 21 31 0 6 11 16 21 31 RA 1 (RS) & ¬(RB) RA 1 (RS) | ¬(RB) The contents of register RS are ANDed with the com- The contents of register RS are ORed with the comple- plement of the contents of register RB and the result is ment of the contents of register RB and the result is placed into register RA. placed into register RA. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) 78 Power ISATM I Version 2.05 Extend Sign Byte X-form Extend Sign Halfword X-form extsb RA,RS (Rc=0) extsh RA,RS (Rc=0) extsb. RA,RS (Rc=1) extsh. RA,RS (Rc=1) 31 RS RA /// 954 Rc 31 RS RA /// 922 Rc 0 6 11 16 21 31 0 6 11 16 21 31 s 1 (RS)56 s 1 (RS)48 RA56:63 1 (RS)56:63 RA48:63 1 (RS)48:63 RA0:55 1 56s RA0:47 1 48s (RS)56:63 are placed into RA56:63. RA0:55 are filled with (RS)48:63 are placed into RA48:63. RA0:47 are filled with a copy of (RS)56. a copy of (RS)48. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) Count Leading Zeros Word X-form Compare Bytes X-form cntlzw RA,RS (Rc=0) cmpb RA,RS,RB cntlzw. RA,RS (Rc=1) 31 RS RA RB 508 / 31 RS RA /// 26 Rc 0 6 11 16 21 31 0 6 11 16 21 31 do n = 0 to 7 n 1 32 if RS8×n:8×n+7 = (RB)8×n:8×n+7 then do while n < 64 RA8×n:8×n+7 1 81 if (RS)n = 1 then leave else n 1 n + 1 RA8×n:8×n+7 1 80 RA 1 n - 32 Each byte of the contents of register RS is compared to A count of the number of consecutive zero bits starting each corresponding byte of the contents in register RB. at bit 32 of register RS is placed into register RA. This If they are equal, the corresponding byte in RA is set to number ranges from 0 to 32, inclusive. 0xFF. Otherwise the corresponding byte in RA is set to 0x00. If Rc=1, CR Field 0 is set to reflect the result. Special Registers Altered: Special Registers Altered: None CR0 (if Rc=1) Programming Note For both Count Leading Zeros instructions, if Rc=1 then LT is set to 0 in CR Field 0. Chapter 3. Fixed-Point Processor 79 Version 2.05 Parity Doubleword X-form Parity Word X-form prtyd RA,RS prtyw RA,RS [Category: 64-bit] 31 RS RA /// 154 / 31 RS RA /// 186 / 0 6 11 16 21 31 0 6 11 16 21 31 s 1 0 s 1 0 t 1 0 do i = 0 to 7 do i = 0 to 3 s 1 s 1 (RS)i18+7 s 1 s 1 (RS)i18+7 RA 1 630 || s do i = 4 to 7 t 1 t 1 (RS)i18+7 The least significant bit in each byte of the contents of RA0:31 1 310 || s register RS is examined. If there is an odd number of RA32:63 1 310 || t one bits the value 1 is placed into register RA; other- wise the value 0 is placed into register RA. The least significant bit in each byte of (RS)0:31 is examined. If there is an odd number of one bits the Special Registers Altered: value 1 is placed into RA0:31; otherwise the value 0 is None placed into RA0:31. The least significant bit in each byte of (RS)32:63 is examined. If there is an odd number of one bits the value 1 is placed into RA32:63; otherwise the value 0 is placed into RA32:63. Special Registers Altered: None Programming Note The Parity instructions are designed to be used in conjunction with the Population Count instruction to compute the parity of words or a doubleword. The parity of the upper and lower words in (RS) can be computed as follows. popcntb RA, RS prtyw RA, RA The parity of (RS) can be computed as follows. popcntb RA, RS prtyd RA, RA 80 Power ISATM I Version 2.05 3.3.12.1 64-bit Fixed-Point Logical 3.3.12.2 Phased-In Fixed-Point Logical Instructions [Category: 64-Bit] Instructions [Category: Phased-In (sV2.05)] Extend Sign Word X-form Population Count Bytes X-form extsw RA,RS (Rc=0) extsw. RA,RS (Rc=1) popcntb RA, RS 31 RS RA /// 986 Rc 31 RS RA /// 122 / 0 6 11 16 21 31 0 6 11 16 21 31 s 1 (RS)32 do i = 0 to 7 RA32:63 1 (RS)32:63 n 1 0 RA0:31 1 32s do j = 0 to 7 (RS)32:63 are placed into RA32:63. RA0:31 are filled with if (RS)(i×8)+j = 1 then n 1 n+1 a copy of (RS)32. RA(i×8):(i×8)+7 1 n Special Registers Altered: A count of the number of one bits in each byte of regis- CR0 (if Rc=1) ter RS is placed into the corresponding byte of register RA. This number ranges from 0 to 8, inclusive. Special Registers Altered: Count Leading Zeros Doubleword X-form None cntlzd RA,RS (Rc=0) Programming Note cntlzd. RA,RS (Rc=1) The total number of one bits in register RS can be 31 RS RA /// 58 Rc computed as follows. In this example it is assumed 0 6 11 16 21 31 that register RB contains the value 0x0101_0101_0101_0101 n 1 0 popcntb RA,RS do while n < 64 mulld RT,RA,RB if (RS)n = 1 then leave srdi RT,RT,56 # RT = population count n 1 n + 1 RA 1 n A count of the number of consecutive zero bits starting at bit 0 of register RS is placed into register RA. This number ranges from 0 to 64, inclusive. If Rc=1, CR Field 0 is set to reflect the result. Special Registers Altered: CR0 (if Rc=1) Chapter 3. Fixed-Point Processor 81 Version 2.05 3.3.13 Fixed-Point Rotate and Shift Instructions The Fixed-Point Processor performs rotation opera- There is no way to specify an all-zero mask. tions on data from a GPR and returns the result, or a For instructions that use the rotate32 operation, the portion of the result, to a GPR. mask start and stop positions are always in the The rotation operations rotate a 64-bit quantity left by a low-order 32 bits of the mask. specified number of bit positions. Bits that exit from The use of the mask is described in following sections. position 0 enter at position 63. The Rotate and Shift instructions with Rc=1 set the first Two types of rotation operation are supported. three bits of CR field 0 as described in Section 3.3.7, For the first type, denoted rotate64 or ROTL64, the value "Other Fixed-Point Instructions" on page 61. Rotate rotated is the given 64-bit value. The rotate64 operation and Shift instructions do not change the OV and SO is used to rotate a given 64-bit quantity. bits. Rotate and Shift instructions, except algebraic right shifts, do not change the CA bit. For the second type, denoted rotate32 or ROTL32, the value rotated consists of two copies of bits 32:63 of the given 64-bit value, one copy in bits 0:31 and the other Extended mnemonics for rotates and in bits 32:63. The rotate32 operation is used to rotate a shifts given 32-bit quantity. The Rotate and Shift instructions, while powerful, can The Rotate and Shift instructions employ a mask gen- be complicated to code (they have up to five operands). erator. The mask is 64 bits long, and consists of 1-bits A set of extended mnemonics is provided that allow from a start bit, mstart, through and including a stop bit, simpler coding of often-used functions such as clearing mstop, and 0-bits elsewhere. The values of mstart and the leftmost or rightmost bits of a register, left justifying mstop range from 0 to 63. If mstart > mstop, the 1-bits or right justifying an arbitrary field, and performing sim- wrap around from position 63 to position 0. Thus the ple rotates and shifts. Some of these are shown as mask is formed as follows: examples with the Rotate instructions. See Appendix D, "Assembler Extended Mnemonics" on if mstart mstop then page 383 for additional extended mnemonics. maskmstart:mstop = ones maskall other bits = zeros else maskmstart:63 = ones mask0:mstop = ones maskall other bits = zeros 3.3.13.1 Fixed-Point Rotate Instructions These instructions rotate the contents of a register. Rotate Left Word Immediate then AND The result of the rotation is with Mask M-form 1 inserted into the target register under control of a rlwinm RA,RS,SH,MB,ME (Rc=0) mask (if a mask bit is 1 the associated bit of the rlwinm. RA,RS,SH,MB,ME (Rc=1) rotated data is placed into the target register, and if the mask bit is 0 the associated bit in the target register remains unchanged); or 21 RS RA SH MB ME Rc 0 6 11 16 21 26 31 1 ANDed with a mask before being placed into the target register. n 1 SH The Rotate Left instructions allow right-rotation of the r 1 ROTL32((RS)32:63, n) contents of a register to be performed (in concept) by a m 1 MASK(MB+32, ME+32) left-rotation of 64-n, where n is the number of bits by RA 1 r & m which to rotate right. They allow right-rotation of the The contents of register RS are rotated32 left SH bits. contents of the low-order 32 bits of a register to be per- A mask is generated having 1-bits from bit MB+32 formed (in concept) by a left-rotation of 32-n, where n through bit ME+32 and 0-bits elsewhere. The rotated is the number of bits by which to rotate right. data are ANDed with the generated mask and the result is placed into register RA. Special Registers Altered: CR0 (if Rc=1) 82 Power ISATM I Version 2.05 Extended Mnemonics: Rotate Left Word then AND with Mask Examples of extended mnemonics for Rotate Left Word M-form Immediate then AND with Mask: rlwnm RA,RS,RB,MB,ME (Rc=0) Extended: Equivalent to: rlwnm. RA,RS,RB,MB,ME (Rc=1) extlwi Rx,Ry,n,b rlwinm Rx,Ry,b,0,n-1 srwi Rx,Ry,n rlwinm Rx,Ry,32-n,n,31 23 RS RA RB MB ME Rc 0 6 11 16 21 26 31 clrrwi Rx,Ry,n rlwinm Rx,Ry,0,0,31-n Programming Note n 1 (RB)59:63 r 1 ROTL32((RS)32:63, n) Let RSL represent the low-order 32 bits of register m 1 MASK(MB+32, ME+32) RS, with the bits numbered from 0 through 31. RA 1 r & m rlwinm can be used to extract an n-bit field that The contents of register RS are rotated32 left the num- starts at bit position b in RSL, right-justified into the ber of bits specified by (RB)59:63. A mask is generated low-order 32 bits of register RA (clearing the having 1-bits from bit MB+32 through bit ME+32 and remaining 32-n bits of the low-order 32 bits of RA), 0-bits elsewhere. The rotated data are ANDed with the by setting SH=b+n, MB=32-n, and ME=31. It can generated mask and the result is placed into register be used to extract an n-bit field that starts at bit RA. position b in RSL, left-justified into the low-order 32 bits of register RA (clearing the remaining 32-n bits Special Registers Altered: of the low-order 32 bits of RA), by setting SH=b, CR0 (if Rc=1) MB = 0, and ME=n-1. It can be used to rotate the Extended Mnemonics: contents of the low-order 32 bits of a register left (right) by n bits, by setting SH=n (32-n), MB=0, and Example of extended mnemonics for Rotate Left Word ME=31. It can be used to shift the contents of the then AND with Mask: low-order 32 bits of a register right by n bits, by set- Extended: Equivalent to: ting SH=32-n, MB=n, and ME=31. It can be used rotlw Rx,Ry,Rz rlwnm Rx,Ry,Rz,0,31 to clear the high-order b bits of the low-order 32 bits of the contents of a register and then shift the result left by n bits, by setting SH=n, MB=b-n, and Programming Note ME=31-n. It can be used to clear the low-order n Let RSL represent the low-order 32 bits of register bits of the low-order 32 bits of a register, by setting RS, with the bits numbered from 0 through 31. SH=0, MB=0, and ME=31-n. rlwnm can be used to extract an n-bit field that For all the uses given above, the high-order 32 bits starts at variable bit position b in RSL, right-justified of register RA are cleared. into the low-order 32 bits of register RA (clearing the remaining 32-n bits of the low-order 32 bits of Extended mnemonics are provided for all of these RA), by setting RB59:63=b+n, MB=32-n, and uses; see Appendix D, "Assembler Extended Mne- ME=31. It can be used to extract an n-bit field that monics" on page 383. starts at variable bit position b in RSL, left-justified into the low-order 32 bits of register RA (clearing the remaining 32-n bits of the low-order 32 bits of RA), by setting RB59:63=b, MB = 0, and ME=n-1. It can be used to rotate the contents of the low-order 32 bits of a register left (right) by variable n bits, by setting RB59:63=n (32-n), MB=0, and ME=31. For all the uses given above, the high-order 32 bits of register RA are cleared. Extended mnemonics are provided for some of these uses; see Appendix D, "Assembler Extended Mnemonics" on page 383. Chapter 3. Fixed-Point Processor 83 Version 2.05 Rotate Left Word Immediate then Mask Insert M-form rlwimi RA,RS,SH,MB,ME (Rc=0) rlwimi. RA,RS,SH,MB,ME (Rc=1) 20 RS RA SH MB ME Rc 0 6 11 16 21 26 31 n 1 SH r 1 ROTL32((RS)32:63, n) m 1 MASK(MB+32, ME+32) RA 1 r&m | (RA)&¬m The contents of register RS are rotated32 left SH bits. A mask is generated having 1-bits from bit MB+32 through bit ME+32 and 0-bits elsewhere. The rotated data are inserted into register RA under control of the generated mask. Special Registers Altered: CR0 (if Rc=1) Extended Mnemonics: Example of extended mnemonics for Rotate Left Word Immediate then Mask Insert: Extended: Equivalent to: inslwi Rx,Ry,n,b rlwimi Rx,Ry,32-b,b,b+n-1 Programming Note Let RAL represent the low-order 32 bits of register RA, with the bits numbered from 0 through 31. rlwimi can be used to insert an n-bit field that is left-justified in the low-order 32 bits of register RS, into RAL starting at bit position b, by setting SH=32-b, MB=b, and ME=(b+n)-1. It can be used to insert an n-bit field that is right-justified in the low-order 32 bits of register RS, into RAL starting at bit position b, by setting SH=32-(b+n), MB=b, and ME=(b+n)-1. Extended mnemonics are provided for both of these uses; see Appendix D, "Assembler Extended Mnemonics" on page 383. 84 Power ISATM I Version 2.05 3.3.13.1.1 64-bit Fixed-Point Rotate Instructions [Category: 64-Bit] Rotate Left Doubleword Immediate then Rotate Left Doubleword Immediate then Clear Left MD-form Clear Right MD-form rldicl RA,RS,SH,MB (Rc=0) rldicr RA,RS,SH,ME (Rc=0) rldicl. RA,RS,SH,MB (Rc=1) rldicr. RA,RS,SH,ME (Rc=1) 30 RS RA sh mb 0 sh Rc 30 RS RA sh me 1 sh Rc 0 6 11 16 21 27 30 31 0 6 11 16 21 27 30 31 n 1 sh5 || sh0:4 n 1 sh5 || sh0:4 r 1 ROTL64((RS), n) r 1 ROTL64((RS), n) b 1 mb5 || mb0:4 e 1 me5 || me0:4 m 1 MASK(b, 63) m 1 MASK(0, e) RA 1 r & m RA 1 r & m The contents of register RS are rotated64 left SH bits. The contents of register RS are rotated64 left SH bits. A mask is generated having 1-bits from bit MB through A mask is generated having 1-bits from bit 0 through bit bit 63 and 0-bits elsewhere. The rotated data are ME and 0-bits elsewhere. The rotated data are ANDed ANDed with the generated mask and the result is with the generated mask and the result is placed into placed into register RA. register RA. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) Extended Mnemonics: Extended Mnemonics: Examples of extended mnemonics for Rotate Left Dou- Examples of extended mnemonics for Rotate Left Dou- bleword Immediate then Clear Left: bleword Immediate then Clear Right: Extended: Equivalent to: Extended: Equivalent to: extrdi Rx,Ry,n,b rldicl Rx,Ry,b+n,64-n extldi Rx,Ry,n,b rldicr Rx,Ry,b,n-1 srdi Rx,Ry,n rldicl Rx,Ry,64-n,n sldi Rx,Ry,n rldicr Rx,Ry,n,63-n clrldi Rx,Ry,n rldicl Rx,Ry,0,n clrrdi Rx,Ry,n rldicr Rx,Ry,0,63-n Programming Note Programming Note rldicl can be used to extract an n-bit field that starts rldicr can be used to extract an n-bit field that at bit position b in register RS, right-justified into starts at bit position b in register RS, left-justified register RA (clearing the remaining 64-n bits of into register RA (clearing the remaining 64-n bits RA), by setting SH=b+n and MB=64-n. It can be of RA), by setting SH=b and ME=n-1. It can be used to rotate the contents of a register left (right) used to rotate the contents of a register left (right) by n bits, by setting SH=n (64-n) and MB=0. It can by n bits, by setting SH=n (64-n) and ME=63. It be used to shift the contents of a register right by n can be used to shift the contents of a register left by bits, by setting SH=64-n and MB=n. It can be used n bits, by setting SH=n and ME=63-n. It can be to clear the high-order n bits of a register, by setting used to clear the low-order n bits of a register, by SH=0 and MB=n. setting SH=0 and ME=63-n. Extended mnemonics are provided for all of these Extended mnemonics are provided for all of these uses; see Appendix D, "Assembler Extended Mne- uses (some devolve to rldicl); see Appendix D, monics" on page 383. "Assembler Extended Mnemonics" on page 383. Chapter 3. Fixed-Point Processor 85 Version 2.05 Rotate Left Doubleword Immediate then Rotate Left Doubleword then Clear Left Clear MD-form MDS-form rldic RA,RS,SH,MB (Rc=0) rldcl RA,RS,RB,MB (Rc=0) rldic. RA,RS,SH,MB (Rc=1) rldcl. RA,RS,RB,MB (Rc=1) 30 RS RA sh mb 2 sh Rc 30 RS RA RB mb 8 Rc 0 6 11 16 21 27 30 31 0 6 11 16 21 27 31 n 1 sh5 || sh0:4 n 1 (RB)58:63 r 1 ROTL64((RS), n) r 1 ROTL64((RS), n) b 1 mb5 || mb0:4 b 1 mb5 || mb0:4 m 1 MASK(b, ¬n) m 1 MASK(b, 63) RA 1 r & m RA 1 r & m The contents of register RS are rotated64 left SH bits. The contents of register RS are rotated64 left the num- A mask is generated having 1-bits from bit MB through ber of bits specified by (RB)58:63. A mask is generated bit 63-SH and 0-bits elsewhere. The rotated data are having 1-bits from bit MB through bit 63 and 0-bits else- ANDed with the generated mask and the result is where. The rotated data are ANDed with the generated placed into register RA. mask and the result is placed into register RA. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) Extended Mnemonics: Extended Mnemonics: Example of extended mnemonics for Rotate Left Dou- Example of extended mnemonics for Rotate Left Dou- bleword Immediate then Clear: bleword then Clear Left: Extended: Equivalent to: Extended: Equivalent to: clrlsldi Rx,Ry,b,n rldic Rx,Ry,n,b-n rotld Rx,Ry,Rz rldcl Rx,Ry,Rz,0 Programming Note Programming Note rldic can be used to clear the high-order b bits of rldcl can be used to extract an n-bit field that starts the contents of a register and then shift the result at variable bit position b in register RS, right-justi- left by n bits, by setting SH=n and MB=b-n. It can fied into register RA (clearing the remaining 64-n be used to clear the high-order n bits of a register, bits of RA), by setting RB58:63=b+n and MB=64-n. by setting SH=0 and MB=n. It can be used to rotate the contents of a register left (right) by variable n bits, by setting RB58:63=n Extended mnemonics are provided for both of (64-n) and MB=0. these uses (the second devolves to rldicl); see Appendix D, "Assembler Extended Mnemonics" on Extended mnemonics are provided for some of page 383. these uses; see Appendix D, "Assembler Extended Mnemonics" on page 383. 86 Power ISATM I Version 2.05 Rotate Left Doubleword then Clear Right Rotate Left Doubleword Immediate then MDS-form Mask Insert MD-form rldcr RA,RS,RB,ME (Rc=0) rldimi RA,RS,SH,MB (Rc=0) rldcr. RA,RS,RB,ME (Rc=1) rldimi. RA,RS,SH,MB (Rc=1) 30 RS RA RB me 9 Rc 30 RS RA sh mb 3 sh Rc 0 6 11 16 21 27 31 0 6 11 16 21 27 30 31 n 1 (RB)58:63 n 1 sh5 || sh0:4 r 1 ROTL64((RS), n) r 1 ROTL64((RS), n) e 1 me5 || me0:4 b 1 mb5 || mb0:4 m 1 MASK(0, e) m 1 MASK(b, ¬n) RA 1 r & m RA 1 r&m | (RA)&¬m The contents of register RS are rotated64 left the num- The contents of register RS are rotated64 left SH bits. ber of bits specified by (RB)58:63. A mask is generated A mask is generated having 1-bits from bit MB through having 1-bits from bit 0 through bit ME and 0-bits else- bit 63-SH and 0-bits elsewhere. The rotated data are where. The rotated data are ANDed with the generated inserted into register RA under control of the generated mask and the result is placed into register RA. mask. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) Extended Mnemonics: Programming Note rldcr can be used to extract an n-bit field that starts Example of extended mnemonics for Rotate Left Dou- at variable bit position b in register RS, left-justified bleword Immediate then Mask Insert: into register RA (clearing the remaining 64-n bits of RA), by setting RB58:63=b and ME=n-1. It can Extended: Equivalent to: be used to rotate the contents of a register left insrdi Rx,Ry,n,b rldimi Rx,Ry,64-(b+n),b (right) by variable n bits, by setting RB58:63=n (64-n) and ME=63. Programming Note rldimi can be used to insert an n-bit field that is Extended mnemonics are provided for some of right-justified in register RS, into register RA start- these uses (some devolve to rldcl); see ing at bit position b, by setting SH=64-(b+n) and Appendix D, "Assembler Extended Mnemonics" on MB=b. page 383. An extended mnemonic is provided for this use; see Appendix D, "Assembler Extended Mnemon- ics" on page 383. Chapter 3. Fixed-Point Processor 87 Version 2.05 3.3.13.2 Fixed-Point Shift Instructions The instructions in this section perform left and right Programming Note shifts. Any Shift Right Algebraic instruction, followed by addze, can be used to divide quickly by 2n. The Extended mnemonics for shifts setting of the CA bit by the Shift Right Algebraic Immediate-form logical (unsigned) shift operations are instructions is independent of mode. obtained by specifying appropriate masks and shift val- ues for certain Rotate instructions. A set of extended Programming Note mnemonics is provided to make coding of such shifts simpler and easier to understand. Some of these are Multiple-precision shifts can be programmed as shown as examples with the Rotate instructions. See shown in Section E.1, "Multiple-Precision Shifts" on Appendix D, "Assembler Extended Mnemonics" on page 397. page 383 for additional extended mnemonics. Shift Left Word X-form Shift Right Word X-form slw RA,RS,RB (Rc=0) srw RA,RS,RB (Rc=0) slw. RA,RS,RB (Rc=1) srw. RA,RS,RB (Rc=1) 31 RS RA RB 24 Rc 31 RS RA RB 536 Rc 0 6 11 16 21 31 0 6 11 16 21 31 n 1 (RB)59:63 n 1 (RB)59:63 r 1 ROTL32((RS)32:63, n) r 1 ROTL32((RS)32:63, 64-n) if (RB)58 = 0 then if (RB)58 = 0 then m 1 MASK(32, 63-n) m 1 MASK(n+32, 63) else m 1 640 else m 1 640 RA 1 r & m RA 1 r & m The contents of the low-order 32 bits of register RS are The contents of the low-order 32 bits of register RS are shifted left the number of bits specified by (RB)58:63. shifted right the number of bits specified by (RB)58:63. Bits shifted out of position 32 are lost. Zeros are sup- Bits shifted out of position 63 are lost. Zeros are sup- plied to the vacated positions on the right. The 32-bit plied to the vacated positions on the left. The 32-bit result is placed into RA32:63. RA0:31 are set to zero. result is placed into RA32:63. RA0:31 are set to zero. Shift amounts from 32 to 63 give a zero result. Shift amounts from 32 to 63 give a zero result. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) 88 Power ISATM I Version 2.05 Shift Right Algebraic Word Immediate Shift Right Algebraic Word X-form X-form sraw RA,RS,RB (Rc=0) srawi RA,RS,SH (Rc=0) sraw. RA,RS,RB (Rc=1) srawi. RA,RS,SH (Rc=1) 31 RS RA RB 792 Rc 31 RS RA SH 824 Rc 0 6 11 16 21 31 0 6 11 16 21 31 n 1 (RB)59:63 n 1 SH r 1 ROTL32((RS)32:63, 64-n) r 1 ROTL32((RS)32:63, 64-n) if (RB)58 = 0 then m 1 MASK(n+32, 63) m 1 MASK(n+32, 63) s 1 (RS)32 else m 1 640 RA 1 r&m | (64s)&¬m s 1 (RS)32 CA 1 s & ((r&¬m)32:630) RA 1 r&m | (64s)&¬m CA 1 s & ((r&¬m)32:630) The contents of the low-order 32 bits of register RS are shifted right SH bits. Bits shifted out of position 63 are The contents of the low-order 32 bits of register RS are lost. Bit 32 of RS is replicated to fill the vacated posi- shifted right the number of bits specified by (RB)58:63. tions on the left. The 32-bit result is placed into Bits shifted out of position 63 are lost. Bit 32 of RS is RA32:63. Bit 32 of RS is replicated to fill RA0:31. CA is replicated to fill the vacated positions on the left. The set to 1 if the low-order 32 bits of (RS) contain a nega- 32-bit result is placed into RA32:63. Bit 32 of RS is repli- tive number and any 1-bits are shifted out of position cated to fill RA0:31. CA is set to 1 if the low-order 32 63; otherwise CA is set to 0. A shift amount of zero bits of (RS) contain a negative number and any 1-bits causes RA to receive EXTS((RS)32:63), and CA to be are shifted out of position 63; otherwise CA is set to 0. set to 0. A shift amount of zero causes RA to receive EXTS((RS)32:63), and CA to be set to 0. Shift amounts from 32 to 63 give a result of 64 sign bits, and cause Special Registers Altered: CA to receive the sign bit of (RS)32:63. CA CR0 (if Rc=1) Special Registers Altered: CA CR0 (if Rc=1) Chapter 3. Fixed-Point Processor 89 Version 2.05 3.3.13.2.1 64-bit Fixed-Point Shift Instructions [Category: 64-Bit] Shift Left Doubleword X-form Shift Right Doubleword X-form sld RA,RS,RB (Rc=0) srd RA,RS,RB (Rc=0) sld. RA,RS,RB (Rc=1) srd. RA,RS,RB (Rc=1) 31 RS RA RB 27 Rc 31 RS RA RB 539 Rc 0 6 11 16 21 31 0 6 11 16 21 31 n 1 (RB)58:63 n 1 (RB)58:63 r 1 ROTL64((RS), n) r 1 ROTL64((RS), 64-n) if (RB)57 = 0 then if (RB)57 = 0 then m 1 MASK(0, 63-n) m 1 MASK(n, 63) else m 1 640 else m 1 640 RA 1 r & m RA 1 r & m The contents of register RS are shifted left the number The contents of register RS are shifted right the num- of bits specified by (RB)57:63. Bits shifted out of posi- ber of bits specified by (RB)57:63. Bits shifted out of tion 0 are lost. Zeros are supplied to the vacated posi- position 63 are lost. Zeros are supplied to the vacated tions on the right. The result is placed into register RA. positions on the left. The result is placed into register Shift amounts from 64 to 127 give a zero result. RA. Shift amounts from 64 to 127 give a zero result. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) 90 Power ISATM I Version 2.05 Shift Right Algebraic Doubleword Shift Right Algebraic Doubleword X-form Immediate XS-form srad RA,RS,RB (Rc=0) sradi RA,RS,SH (Rc=0) srad. RA,RS,RB (Rc=1) sradi. RA,RS,SH (Rc=1) 31 RS RA RB 794 Rc 31 RS RA sh 413 sh Rc 0 6 11 16 21 31 0 6 11 16 21 30 31 n 1 (RB)58:63 n 1 sh5 || sh0:4 r 1 ROTL64((RS), 64-n) r 1 ROTL64((RS), 64-n) if (RB)57 = 0 then m 1 MASK(n, 63) m 1 MASK(n, 63) s 1 (RS)0 else m 1 640 RA 1 r&m | (64s)&¬m s 1 (RS)0 CA 1 s & ((r&¬m)0) RA 1 r&m | (64s)&¬m CA 1 s & ((r&¬m)0) The contents of register RS are shifted right SH bits. Bits shifted out of position 63 are lost. Bit 0 of RS is The contents of register RS are shifted right the num- replicated to fill the vacated positions on the left. The ber of bits specified by (RB)57:63. Bits shifted out of result is placed into register RA. CA is set to 1 if (RS) is position 63 are lost. Bit 0 of RS is replicated to fill the negative and any 1-bits are shifted out of position 63; vacated positions on the left. The result is placed into otherwise CA is set to 0. A shift amount of zero causes register RA. CA is set to 1 if (RS) is negative and any RA to be set equal to (RS), and CA to be set to 0. 1-bits are shifted out of position 63; otherwise CA is set to 0. A shift amount of zero causes RA to be set equal Special Registers Altered: to (RS), and CA to be set to 0. Shift amounts from 64 CA to 127 give a result of 64 sign bits in RA, and cause CA CR0 (if Rc=1) to receive the sign bit of (RS). Special Registers Altered: CA CR0 (if Rc=1) Chapter 3. Fixed-Point Processor 91 Version 2.05 3.3.14 Move To/From System Register Instructions The Move To Condition Register Fields instruction has SPR name as part of the mnemonic rather than as a a preferred form; see Section 1.8.1, "Preferred Instruc- numeric operand. An extended mnemonic is provided tion Forms" on page 21. In the preferred form, the FXM for the mtcrf instruction for compatibility with old soft- field satisfies the following rule. ware (written for a version of the architecture that pre- 1 Exactly one bit of the FXM field is set to 1. cedes Version 2.00) that uses it to set the entire Condition Register. Some of these extended mnemon- Extended mnemonics ics are shown as examples with the relevant instruc- tions. See Appendix D, "Assembler Extended Extended mnemonics are provided for the mtspr and Mnemonics" on page 383 for additional extended mne- mfspr instructions so that they can be coded with the monics. 92 Power ISATM I Version 2.05 Move To Special Purpose Register Compiler and Assembler Note XFX-form For the mtspr and mfspr instructions, the SPR mtspr SPR,RS number coded in assembler language does not appear directly as a 10-bit binary number in the 31 RS spr 467 / instruction. The number coded is split into two 5-bit 0 6 11 21 31 halves that are reversed in the instruction, with the high-order 5 bits appearing in bits 16:20 of the instruction and the low-order 5 bits in bits 11:15. n 1 spr5:9 || spr0:4 if length(SPR(n)) = 64 then SPR(n) 1 (RS) else SPR(n) 1 (RS)32:63 The SPR field denotes a Special Purpose Register, encoded as shown in the table below. The contents of register RS are placed into the designated Special Pur- pose Register. For Special Purpose Registers that are 32 bits long, the low-order 32 bits of RS are placed into the SPR. SPR1 Register decimal spr5:9 spr0:4 Name 1 00000 00001 XER 8 00000 01000 LR 9 00000 01001 CTR 256 01000 00000 VRSAVE2 512 10000 00000 SPEFSCR3 896 11100 00000 PPR4 1 Note that the order of the two 5-bit halves of the SPR number is reversed. 2 Category: Embedded and Vector ( see Programming Note in Section 3.2.4). 3 Category: SPE. 4 Category: Server. If execution of this instruction specifying an SPR num- ber other than one of the values shown above is attempted, then one of the following occurs. 1 If spr0 = 0, the illegal instruction error handler is invoked. 1 If spr0 = 1, the system privileged instruction error handler is invoked. A complete description of this instruction can be found in Book III. Special Registers Altered: See above Extended Mnemonics: Examples of extended mnemonics for Move To Special Purpose Register: Extended: Equivalent to: mtxer Rx mtspr 1,Rx mtlr Rx mtspr 8,Rx mtctr Rx mtspr 9,Rx Chapter 3. Fixed-Point Processor 93 Version 2.05 Move From Special Purpose Register Special Registers Altered: XFX-form None Extended Mnemonics: mfspr RT,SPR Examples of extended mnemonics for Move From Spe- 31 RT spr 339 / cial Purpose Register: 0 6 11 21 31 Extended: Equivalent to: mfxer Rx mfspr Rx,1 n 1 spr5:9 || spr0:4 if length(SPR(n)) = 64 then mflr Rx mfspr Rx,8 RT 1 SPR(n) mfctr Rx mfspr Rx,9 else RT 1 320 || SPR(n) . Note The SPR field denotes a Special Purpose Register, See the Notes that appear with mtspr. encoded as shown in the table below. The contents of the designated Special Purpose Register are placed into register RT. For Special Purpose Registers that are 32 bits long, the low-order 32 bits of RT receive the contents of the Special Purpose Register and the high-order 32 bits of RT are set to zero. SPR1 Register decimal spr5:9 spr0:4 Name 1 00000 00001 XER 8 00000 01000 LR 9 00000 01001 CTR 136 00100 01000 CTRL 256 01000 00000 VRSAVE2 259 01000 00011 SPRG3 260 01000 00100 SPRG43 261 01000 00101 SPRG53 262 01000 00110 SPRG63 263 01000 00111 SPRG73 268 01000 01100 TB4 269 01000 01101 TBU4 512 10000 00000 SPEFSCR5 526 10000 01110 ATB4,6 527 10000 01111 ATBU4,6 896 11100 00000 PPR7 1 Note that the order of the two 5-bit halves of the SPR number is reversed. 2 Category: Embedded and Vector ( see Programming Note in Section 3.2.4). 3 Category: Embedded. 4 See Chapter 4 of Book II. 5 Category: SPE. 6 Category: Alternate Time Base. 7 Category: Server. If execution of this instruction specifying an SPR num- ber other than one of the values shown above is attempted, then one of the following occurs. 1 If spr0 = 0, the illegal instruction error handler is invoked. 1 If spr0 = 1, the system privileged instruction error handler is invoked. A complete description of this instruction can be found in Book III. 94 Power ISATM I Version 2.05 Move To Condition Register Fields Move From Condition Register XFX-form XFX-form mfcr RT mtcrf FXM,RS 31 RT 0 /// 19 / 0 6 11 12 21 31 31 RS 0 FXM / 144 / 0 6 11 12 20 21 31 RT 1 320 || CR The contents of the Condition Register are placed into mask 1 4(FXM0) || 4(FXM1) || ... 4(FXM7) RT32:63. RT0:31 are set to 0. CR 1 ((RS)32:63 & mask) | (CR & ¬mask) Special Registers Altered: The contents of bits 32:63 of register RS are placed None into the Condition Register under control of the field mask specified by FXM. The field mask identifies the 4-bit fields affected. Let i be an integer in the range 0-7. If FXMi=1 then CR field i (CR bits 4×i+32:4×i+35) is set to the contents of the corresponding field of the low-order 32 bits of RS. Special Registers Altered: CR fields selected by mask Extended Mnemonics: Example of extended mnemonics for Move To Condi- tion Register Fields: Extended: Equivalent to: mtcr Rx mtcrf 0xFF,Rx Programming Note In the preferred form of this instruction (mtocrf), only one Condition Register field is updated. Chapter 3. Fixed-Point Processor 95 Version 2.05 3.3.14.1 Move to/From One Condition Register Field Instructions [Category: Phased-In (sV2.05)] Move To One Condition Register Field Move From One Condition Register Field XFX-form XFX-form mtocrf FXM,RS mfocrf RT,FXM [Category: Phased-In] [Category: Phased-In] 31 RS 1 FXM / 144 / 31 RT 1 FXM / 19 / 0 6 11 12 20 21 31 0 6 11 12 20 21 31 count 1 0 RT 1 undefined do i = 0 to 7 count 1 0 if FXMi = 1 then do i = 0 to 7 n 1 i if FXMi = 1 then count 1 count + 1 n 1 i if count = 1 then count 1 count + 1 CR4×n+32:4×n+35 1 (RS)4×n+32:4×n+35 if count = 1 then else CR 1 undefined RT4×n+32:4×n+35 1 CR4×n+32:4×n+35 If exactly one bit of the FXM field is set to 1, let n be the If exactly one bit of the FXM field is set to 1, let n be the position of that bit in the field (0 n 7). The contents position of that bit in the field (0 n 7). The contents of bits 4×n+32:4×n+35 of register RS are placed into of CR field n (CR bits 4×n+32:4×n+35) are placed into CR field n (CR bits 4×n+32:4×n+35). Otherwise, the bits 4×n+32:4×n+35 of register RT and the contents of contents of the Condition Register are undefined. the remaining bits of register RT are undefined. Other- wise, the contents of register RT are undefined. Special Registers Altered: CR field selected by FXM Special Registers Altered: None Programming Note These forms of the mtcrf and mfcr instructions are intended to replace the old forms of the instructions (the forms shown in page 95), which will eventually be phased out of the architecture. The new forms are backward compatible with most processors that comply with versions of the architecture that pre- cede Version 2.00. On those processors, the new forms are treated as the old forms. However, on some processors that comply with ver- sions of the architecture that precede Version 2.00 the new forms may be treated as follows: mtocrf: may cause the system illegal instruction error handler to be invoked mfocrf: may place an undefined value into register RT 96 Power ISATM I Version 2.05 3.3.14.2 Move To/From System Registers [Category: Embedded] Move to Condition Register from XER Move From APID Indirect X-form X-form mfapidi RT,RA mcrxr BF 31 RT RA /// 275 / 31 BF // /// /// 512 / 0 6 11 16 21 31 0 6 9 11 16 21 31 RT 1 implementation-dependent value based on (RA) CR4×BF+32:4×BF+35 1 XER32:35 The contents of RA are provided to any auxiliary pro- XER32:35 1 0b0000 cessors that may be present. A value, that is implemen- The contents of XER32:35 are copied to Condition Reg- tation-dependent, is placed in RT. ister field BF. XER32:35 are set to zero. Special Registers Altered: Special Registers Altered: None CR field BF XER32:35 Programming Note This instruction is provided as a mechanism for software to query the presence and configuration of one or more auxiliary processors. See the imple- mentation for details on the behavior of this instruc- tion. Move To Device Control Register Move From Device Control Register User-mode Indexed X-form User-mode Indexed X-form mtdcrux RS,RA mfdcrux RT,RA 31 RS RA /// 419 / 31 RT RA /// 291 / 0 6 11 16 21 31 0 6 11 16 21 31 DCRN 1 (RA) DCRN 1 (RA) DCR(DCRN) 1 RS RT 1 DCR(DCRN) Let the contents of register RA denote a Device Control Let the contents of register RA denote a Device Control Register. (The supported Device Control Registers are Register. (The supported Device Control Registers are implementation-dependent.) implementation-dependent.) The contents of RS are placed into the designated The contents of the designated Device Control Register Device Control Register. For 32-bit Device Control are placed into RT. For 32-bit Device Control Regis- Registers, the contents of bits 32:63 of RS are placed ters, the contents of bits 32:63 of the designated into the Device Control Register. Device Control Register are placed into RT. See "Move To Device Control Register Indexed X-form" See "Move From Device Control Register Indexed on page 624 in Book III for more information on this X-form" on page 625 in Book III for more information on instruction. this instruction. Special Registers Altered: Special Registers Altered: Implementation-dependent Implementation-dependent Chapter 3. Fixed-Point Processor 97 Version 2.05 98 Power ISATM I Version 2.05 Chapter 4. Floating-Point Processor [Category: Floating-Point] 4.1 Floating-Point Processor Overview 99 4.5.2 Execution Model for 4.2 Floating-Point Processor Registers100 Multiply-Add Type Instructions . . . . . . 115 4.2.1 Floating-Point Registers . . . . . . 100 4.6 Floating-Point Processor Instructions . 4.2.2 Floating-Point Status and Control 116 Register. . . . . . . . . . . . . . . . . . . . . . . . 101 4.6.1 Floating-Point Storage Access 4.3 Floating-Point Data . . . . . . . . . . . . 103 Instructions . . . . . . . . . . . . . . . . . . . . . 117 4.3.1 Data Format. . . . . . . . . . . . . . . . 103 4.6.1.1 Storage Access Exceptions . . 117 4.3.2 Value Representation . . . . . . . . 104 4.6.2 Floating-Point Load Instructions. 117 4.3.3 Sign of Result . . . . . . . . . . . . . . 105 4.6.3 Floating-Point Store Instructions 121 4.3.4 Normalization and 4.6.4 Floating-Point Load Store Double- Denormalization . . . . . . . . . . . . . . . . . 106 word Pair Instructions [Category: Float- 4.3.5 Data Handling and Precision . . . 106 ing-Point.Phased-Out] . . . . . . . . . . . . . 125 4.3.5.1 Single-Precision Operands . . . 106 4.6.5 Floating-Point Move Instructions 126 4.3.5.2 Integer-Valued Operands . . . . 107 4.6.6 Floating-Point Arithmetic Instructions 4.3.6 Rounding . . . . . . . . . . . . . . . . . . 107 127 4.4 Floating-Point Exceptions . . . . . . . 108 4.6.6.1 Floating-Point Elementary Arith- 4.4.1 Invalid Operation Exception. . . . 110 metic Instructions. . . . . . . . . . . . . . . . . 127 4.4.1.1 Definition. . . . . . . . . . . . . . . . . 110 4.6.6.2 Floating-Point Multiply-Add Instruc- 4.4.1.2 Action . . . . . . . . . . . . . . . . . . . 110 tions . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 4.4.2 Zero Divide Exception . . . . . . . . .111 4.6.7 Floating-Point Rounding and Con- 4.4.2.1 Definition. . . . . . . . . . . . . . . . . .111 version Instructions . . . . . . . . . . . . . . . 134 4.4.2.2 Action . . . . . . . . . . . . . . . . . . . .111 4.6.7.1 Floating-Point Rounding Instruc- 4.4.3 Overflow Exception . . . . . . . . . . .111 tion. . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 4.4.3.1 Definition. . . . . . . . . . . . . . . . . .111 4.6.7.2 Floating-Point Convert To/From 4.4.3.2 Action . . . . . . . . . . . . . . . . . . . 112 Integer Instructions . . . . . . . . . . . . . . . 134 4.4.4 Underflow Exception . . . . . . . . . 112 4.6.7.3 Floating Round to Integer Instruc- 4.4.4.1 Definition. . . . . . . . . . . . . . . . . 112 tions [Category: Floating-Point.Phased-In 4.4.4.2 Action . . . . . . . . . . . . . . . . . . . 112 (sV2.05)] . . . . . . . . . . . . . . . . . . . . . . . 136 4.4.5 Inexact Exception . . . . . . . . . . . 113 4.6.8 Floating-Point Compare Instructions 4.4.5.1 Definition. . . . . . . . . . . . . . . . . 113 138 4.4.5.2 Action . . . . . . . . . . . . . . . . . . . 113 4.6.9 Floating-Point Select Instruction 139 4.5 Floating-Point Execution Models . 113 4.6.10 Floating-Point Status and Control 4.5.1 Execution Model for IEEE Opera- Register Instructions . . . . . . . . . . . . . . 140 tions . . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.1 Floating-Point Processor The processor (augmented by appropriate software support, where required) implements a floating-point Overview system compliant with the ANSI/IEEE Standard 754-1985, "IEEE Standard for Binary Floating-Point This chapter describes the registers and instructions Arithmetic" (hereafter referred to as "the IEEE stan- that make up the Floating-Point Processor facility. dard"). That standard defines certain required "opera- tions" (addition, subtraction, etc.). Herein, the term Chapter 4. Floating-Point Processor [Category: Floating-Point] 99 Version 2.05 "floating-point operation" is used to refer to one of the Floating-Point Status and Control Register these required operations and to additional operations (FPSCR). They can cause the system floating-point defined (e.g., those performed by Multiply-Add or enabled exception error handler to be invoked, pre- Reciprocal Estimate instructions). A Non-IEEE mode is cisely or imprecisely, if the proper control bits are set. also provided. This mode, which may produce results not in strict compliance with the IEEE standard, allows Floating-Point Exceptions shorter latency. The following floating-point exceptions are detected by Instructions are provided to perform arithmetic, round- the processor: ing, conversion, comparison, and other operations in floating-point registers; to move floating-point data 1 Invalid Operation Exception (VX) between storage and these registers; and to manipu- SNaN (VXSNAN) late the Floating-Point Status and Control Register Infinity-Infinity (VXISI) explicitly. Infinity÷Infinity (VXIDI) These instructions are divided into two categories. Zero÷Zero (VXZDZ) Infinity×Zero (VXIMZ) 1 computational instructions Invalid Compare (VXVC) The computational instructions are those that per- Software-Defined Condition (VXSOFT) form addition, subtraction, multiplication, division, Invalid Square Root (VXSQRT) extracting the square root, rounding, conversion, Invalid Integer Convert (VXCVI) comparison, and combinations of these opera- 1 Zero Divide Exception (ZX) tions. These instructions provide the floating-point 1 Overflow Exception (OX) operations. They place status information into the 1 Underflow Exception (UX) Floating-Point Status and Control Register. They 1 Inexact Exception (XX) are the instructions described in Sections 4.6.6 Each floating-point exception, and each category of through 4.6.8. Invalid Operation Exception, has an exception bit in the 1 non-computational instructions FPSCR. In addition, each floating-point exception has a corresponding enable bit in the FPSCR. See The non-computational instructions are those that Section 4.2.2, "Floating-Point Status and Control Reg- perform loads and stores, move the contents of a ister" on page 101 for a description of these exception floating-point register to another floating-point reg- and enable bits, and Section 4.4, "Floating-Point ister possibly altering the sign, manipulate the Exceptions" on page 108 for a detailed discussion of Floating-Point Status and Control Register explic- floating-point exceptions, including the effects of the itly, and select the value from one of two float- enable bits. ing-point registers based on the value in a third floating-point register. The operations performed by these instructions are not considered float- ing-point operations. With the exception of the 4.2 Floating-Point Processor instructions that manipulate the Floating-Point Sta- Registers tus and Control Register explicitly, they do not alter the Floating-Point Status and Control Register. They are the instructions described in Sections 4.2.1 Floating-Point Registers 4.6.2 through 4.6.5, and 4.6.10. Implementations of this architecture provide 32 float- A floating-point number consists of a signed exponent ing-point registers (FPRs). The floating-point instruction and a signed significand. The quantity expressed by formats provide 5-bit fields for specifying the FPRs to this number is the product of the significand and the be used in the execution of the instruction. The FPRs number 2exponent. Encodings are provided in the data are numbered 0-31. See Figure 46 on page 101. format to represent finite numeric values, ±Infinity, and Each FPR contains 64 bits that support the float- values that are "Not a Number" (NaN). Operations ing-point double format. Every instruction that inter- involving infinities produce results obeying traditional prets the contents of an FPR as a floating-point value mathematical conventions. NaNs have no mathemati- uses the floating-point double format for this interpreta- cal interpretation. Their encoding permits a variable tion. diagnostic information field. They may be used to indi- cate such things as uninitialized variables and can be The computational instructions, and the Move and produced by certain invalid operations. Select instructions, operate on data located in FPRs and, with the exception of the Compare instructions, There is one class of exceptional events that occur dur- place the result value into an FPR and optionally (when ing instruction execution that is unique to the Float- Rc=1) place status information into the Condition Reg- ing-Point Processor: the Floating-Point Exception. Floating-point exceptions are signaled with bits set in 100 Power ISATM I Version 2.05 ister. Instruction forms with Rc=1 are part of Category: FEX and VX are simply the ORs of other FPSCR bits. Floating-Point.Record. Therefore these two bits are not listed among the FPSCR bits affected by the various instructions. Load Double and Store Double instructions are pro- vided that transfer 64 bits of data between storage and FPSCR the FPRs with no conversion. Load Single instructions 0 63 are provided to transfer and convert floating-point val- ues in floating-point single format from storage to the Figure 47. Floating-Point Status and Control same value in floating-point double format in the FPRs. Register Store Single instructions are provided to transfer and convert floating-point values in floating-point double The bit definitions for the FPSCR are as follows. format from the FPRs to the same value in float- ing-point single format in storage. Bit(s) Description Instructions are provided that manipulate the Float- 0:31 Reserved ing-Point Status and Control Register and the Condi- 32 Floating-Point Exception Summary (FX) tion Register explicitly. Some of these instructions copy Every floating-point instruction, except mtfsfi data from an FPR to the Floating-Point Status and Con- and mtfsf, implicitly sets FPSCRFX to 1 if that trol Register or vice versa. instruction causes any of the floating-point The computational instructions and the Select instruc- exception bits in the FPSCR to change from 0 tion accept values from the FPRs in double format. For to 1. mcrfs, mtfsfi, mtfsf, mtfsb0, and single-precision arithmetic instructions, all input values mtfsb1 can alter FPSCRFX explicitly. must be representable in single format; if they are not, Programming Note the result placed into the target FPR, and the setting of status bits in the FPSCR and in the Condition Register FPSCRFX is defined not to be altered (if Rc=1), are undefined. implicitly by mtfsfi and mtfsf because permitting these instructions to alter FPR 0 FPSCRFX implicitly could cause a para- dox. An example is an mtfsfi or mtfsf FPR 1 instruction that supplies 0 for FPSCRFX ... and 1 for FPSCROX, and is executed ... when FPSCROX=0. See also the Pro- gramming Notes with the definition of FPR 30 these two instructions. FPR 31 0 63 33 Floating-Point Enabled Exception Sum- mary (FEX) Figure 46. Floating-Point Registers This bit is the OR of all the floating-point exception bits masked by their respective enable bits. mcrfs, mtfsfi, mtfsf, mtfsb0, and 4.2.2 Floating-Point Status and mtfsb1 cannot alter FPSCRFEX explicitly. Control Register 34 Floating-Point Invalid Operation Excep- tion Summary (VX) The Floating-Point Status and Control Register This bit is the OR of all the Invalid Operation (FPSCR) controls the handling of floating-point excep- exception bits. mcrfs, mtfsfi, mtfsf, mtfsb0, tions and records status resulting from the float- and mtfsb1 cannot alter FPSCRVX explicitly. ing-point operations. Bits 32:55 are status bits. Bits 56:63 are control bits. 35 Floating-Point Overflow Exception (OX) See Section 4.4.3, "Overflow Exception" on The exception bits in the FPSCR (bits 35:44, 53:55) are page 111. sticky; that is, once set to 1 they remain set to 1 until they are set to 0 by an mcrfs, mtfsfi, mtfsf, or mtfsb0 36 Floating-Point Underflow Exception (UX) instruction. The exception summary bits in the FPSCR See Section 4.4.4, "Underflow Exception" on (FX, FEX, and VX, which are bits 32:34) are not consid- page 112. ered to be "exception bits", and only FX is sticky. 37 Floating-Point Zero Divide Exception (ZX) See Section 4.4.2, "Zero Divide Exception" on page 111. 38 Floating-Point Inexact Exception (XX) See Section 4.4.5, "Inexact Exception" on page 113. Chapter 4. Floating-Point Processor [Category: Floating-Point] 101 Version 2.05 FPSCRXX is a sticky version of FPSCRFI (see value placed into FPRF is undefined. Addi- below). Thus the following rules completely tional details are given below. describe how FPSCRXX is set by a given instruction. Programming Note 1 If the instruction affects FPSCRFI, the A single-precision operation that produces new value of FPSCRXX is obtained by a denormalized result sets FPRF to indi- ORing the old value of FPSCRXX with cate a denormalized number. When pos- the new value of FPSCRFI. sible, single-precision denormalized 1 If the instruction does not affect numbers are represented in normalized FPSCRFI, the value of FPSCRXX is double format in the target register. unchanged. 39 Floating-Point Invalid Operation Excep- tion (SNaN) (VXSNAN) 47 Floating-Point Result Class Descriptor (C) See Section 4.4.1, "Invalid Operation Excep- Arithmetic, rounding, and Convert From Inte- tion" on page 110. ger instructions may set this bit with the FPCC bits, to indicate the class of the result as 40 Floating-Point Invalid Operation Excep- shown in Figure 48 on page 103. tion ( - ) (VXISI) See Section 4.4.1. 48:51 Floating-Point Condition Code (FPCC) Floating-point Compare instructions set one of 41 Floating-Point Invalid Operation Excep- the FPCC bits to 1 and the other three FPCC tion ( ÷ ) (VXIDI) bits to 0. Arithmetic, rounding, and Convert See Section 4.4.1. From Integer instructions may set the FPCC 42 Floating-Point Invalid Operation Excep- bits with the C bit, to indicate the class of the tion (0 ÷0) (VXZDZ) result as shown in Figure 48 on page 103. See Section 4.4.1. Note that in this case the high-order three bits of the FPCC retain their relational significance 43 Floating-Point Invalid Operation Excep- indicating that the value is less than, greater tion ( ×0) (VXIMZ) than, or equal to zero. See Section 4.4.1. 48 Floating-Point Less Than or Negative (FL 44 Floating-Point Invalid Operation Excep- or <) tion (Invalid Compare) (VXVC) See Section 4.4.1. 49 Floating-Point Greater Than or Positive (FG or >) 45 Floating-Point Fraction Rounded (FR) The last Arithmetic or Rounding and Conver- 50 Floating-Point Equal or Zero (FE or =) sion instruction incremented the fraction dur- 51 Floating-Point Unordered or NaN (FU or ?) ing rounding. See Section 4.3.6, "Rounding" on page 107. This bit is not sticky. 52 Reserved 46 Floating-Point Fraction Inexact (FI) 53 Floating-Point Invalid Operation Excep- The last Arithmetic or Rounding and Conver- tion (Software-Defined Condition) sion instruction either produced an inexact (VXSOFT) result during rounding or caused a disabled This bit can be altered only by mcrfs, mtfsfi, Overflow Exception. See Section 4.3.6. This mtfsf, mtfsb0, or mtfsb1. See Section 4.4.1. bit is not sticky. Programming Note See the definition of FPSCRXX, above, FPSCRVXSOFT can be used by software regarding the relationship between FPSCRFI to indicate the occurrence of an arbitrary, and FPSCRXX. software-defined, condition that is to be 47:51 Floating-Point Result Flags (FPRF) treated as an Invalid Operation Exception. Arithmetic, rounding, and Convert From Inte- For example, the bit could be set by a pro- ger instructions set this field based on the gram that computes a base 10 logarithm if result placed into the target register and on the supplied input is negative. the target precision, except that if any portion of the result is undefined then the value 54 Floating-Point Invalid Operation Excep- placed into FPRF is undefined. Floating-point tion (Invalid Square Root) (VXSQRT) Compare instructions set this field based on See Section 4.4.1. the relative values of the operands being com- pared. For Convert To Integer instructions, the 102 Power ISATM I Version 2.05 55 Floating-Point Invalid Operation Excep- Programming Note tion (Invalid Integer Convert) (VXCVI) See Section 4.4.1. When the processor is in floating-point non-IEEE mode, the results of float- 56 Floating-Point Invalid Operation Excep- ing-point operations may be approximate, tion Enable (VE) and performance for these operations See Section 4.4.1. may be better, more predictable, or less 57 Floating-Point Overflow Exception Enable data-dependent than when the processor (OE) is not in non-IEEE mode. For example, in See Section 4.4.3, "Overflow Exception" on non-IEEE mode an implementation may page 111. return 0 instead of a denormalized num- ber, and may return a large number 58 Floating-Point Underflow Exception instead of an infinity. Enable (UE) See Section 4.4.4, "Underflow Exception" on 62:63 Floating-Point Rounding Control (RN) See page 112. Section 4.3.6, "Rounding" on page 107. 59 Floating-Point Zero Divide Exception 00 Round to Nearest Enable (ZE) 01 Round toward Zero See Section 4.4.2, "Zero Divide Exception" on 10 Round toward +Infinity page 111. 11 Round toward -Infinity 60 Floating-Point Inexact Exception Enable (XE) See Section 4.4.5, "Inexact Exception" on Result page 113. Flags Result Value Class C < > = ? 61 Floating-Point Non-IEEE Mode (NI) Floating-point non-IEEE mode is optional. If 1 0 0 0 1 Quiet NaN floating-point non-IEEE mode is not imple- 0 1 0 0 1 - Infinity mented, this bit is treated as reserved, and the 0 1 0 0 0 - Normalized Number remainder of the definition of this bit does not 1 1 0 0 0 - Denormalized Number apply. 1 0 0 1 0 - Zero If floating-point non-IEEE mode is imple- 0 0 0 1 0 + Zero mented, this bit has the following meaning. 1 0 1 0 0 + Denormalized Number 0 The processor is not in floating-point 0 0 1 0 0 + Normalized Number non-IEEE mode (i.e., all floating-point 0 0 1 0 1 + Infinity operations conform to the IEEE standard). Figure 48. Floating-Point Result Flags 1 The processor is in floating-point non-IEEE mode. When the processor is in floating-point 4.3 Floating-Point Data non-IEEE mode, the remaining FPSCR bits may have meanings different from those given in this document, and floating-point operations 4.3.1 Data Format need not conform to the IEEE standard. The This architecture defines the representation of a float- effects of executing a given floating-point ing-point value in two different binary fixed-length for- instruction with FPSCRNI=1, and any addi- mats. The format may be a 32-bit single format for a tional requirements for using non-IEEE mode, single-precision value or a 64-bit double format for a are implementation-dependent. The results of double-precision value. The single format may be used executing a given instruction in non-IEEE for data in storage. The double format may be used for mode may vary between implementations, data in storage and for data in floating-point registers. and between different executions on the same implementation. The lengths of the exponent and the fraction fields dif- fer between these two formats. The structure of the sin- gle and double formats is shown below. S EXP FRACTION 32 33 41 63 Figure 49. Floating-point single format Chapter 4. Floating-Point Processor [Category: Floating-Point] 103 Version 2.05 ties as defined below. The relative location on the real S EXP FRACTION number line for each of the defined entities is shown in Figure 52. 0 1 12 63 Figure 50. Floating-point double format -INF -NOR -DEN -0 +0 +DEN +NOR +INF Values in floating-point format are composed of three fields: Figure 52. Approximation to real numbers S sign bit The NaNs are not related to the numeric values or infin- EXP exponent+bias ities by order or value but are encodings used to con- FRACTION fraction vey diagnostic information such as the representation of uninitialized variables. Representation of numeric values in the floating-point formats consists of a sign bit (S), a biased exponent The following is a description of the different float- (EXP), and the fraction portion (FRACTION) of the sig- ing-point values defined in the architecture: nificand. The significand consists of a leading implied Binary floating-point numbers bit concatenated on the right with the FRACTION. This Machine representable values used as approximations leading implied bit is 1 for normalized numbers and 0 to real numbers. Three categories of numbers are sup- for denormalized numbers and is located in the unit bit ported: normalized numbers, denormalized numbers, position (i.e., the first bit to the left of the binary point). and zero values. Values representable within the two floating-point for- mats can be specified by the parameters listed in Normalized numbers (± NOR) Figure 51. These are values that have a biased exponent value in the range: Format Single Double 1 to 254 in single format 1 to 2046 in double format Exponent Bias +127 +1023 They are values in which the implied unit bit is 1. Nor- Maximum Exponent +127 +1023 malized numbers are interpreted as follows: Minimum Exponent -126 -1022 NOR = (-1)s x 2E x (1.fraction) Widths (bits) where s is the sign, E is the unbiased exponent, and Format 32 64 1.fraction is the significand, which is composed of a leading unit bit (implied bit) and a fraction part. Sign 1 1 Exponent 8 11 The ranges covered by the magnitude (M) of a normal- Fraction 23 52 ized floating-point number are approximately equal to: Significand 24 53 Single Format: Figure 51. IEEE floating-point fields 1.2x10-38 M 3.4x1038 The architecture requires that the FPRs of the Float- Double Format: ing-Point Processor support the floating-point double 2.2x10-308 M 1.8x10308 format only. Zero values (± 0) These are values that have a biased exponent value of 4.3.2 Value Representation zero and a fraction value of zero. Zeros can have a This architecture defines numeric and non-numeric val- positive or negative sign. The sign of zero is ignored by ues representable within each of the two supported for- comparison operations (i.e., comparison regards +0 as mats. The numeric values are approximations to the equal to -0). real numbers and include the normalized numbers, Denormalized numbers (± DEN) denormalized numbers, and zero values. The These are values that have a biased exponent value of non-numeric values representable are the infinities and zero and a nonzero fraction value. They are nonzero the Not a Numbers (NaNs). The infinities are adjoined numbers smaller in magnitude than the representable to the real numbers, but are not numbers themselves, normalized numbers. They are values in which the and the standard rules of arithmetic do not hold when implied unit bit is 0. Denormalized numbers are inter- they are used in an operation. They are related to the preted as follows: real numbers by order alone. It is possible however to define restricted operations among numbers and infini- DEN = (-1)s x 2Emin x (0.fraction) 104 Power ISATM I Version 2.05 where Emin is the minimum representable exponent then FRT 1 (FRB)0:34 || 290 value (-126 for single-precision, -1022 for double-pre- else FRT 1 (FRB) cision). else if (FRC) is a NaN then FRT 1 (FRC) Infinities (± ) else if generated QNaN These are values that have the maximum biased expo- then FRT 1 generated QNaN nent value: If the operand specified by FRA is a NaN, then that 255 in single format NaN is stored as the result. Otherwise, if the operand 2047 in double format specified by FRB is a NaN (if the instruction specifies and a zero fraction value. They are used to approxi- an FRB operand), then that NaN is stored as the result, mate values greater in magnitude than the maximum with the low-order 29 bits of the result set to 0 if the normalized value. instruction is frsp. Otherwise, if the operand specified by FRC is a NaN (if the instruction specifies an FRC Infinity arithmetic is defined as the limiting case of real operand), then that NaN is stored as the result. Other- arithmetic, with restricted operations defined among wise, if a QNaN was generated due to a disabled numbers and infinities. Infinities and the real numbers Invalid Operation Exception, then that QNaN is stored can be related by ordering in the affine sense: as the result. If a QNaN is to be generated as a result, - < every finite number < + then the QNaN generated has a sign bit of 0, an expo- nent field of all 1s, and a high-order fraction bit of 1 with Arithmetic on infinities is always exact and does not all other fraction bits 0. Any instruction that generates a signal any exception, except when an exception occurs QNaN as the result of a disabled Invalid Operation due to the invalid operations as described in Exception generates this QNaN (i.e., Section 4.4.1, "Invalid Operation Exception" on 0x7FF8_0000_0000_0000). page 110. A double-precision NaN is considered to be represent- For comparison operations, +Infinity compares equal to able in single format if and only if the low-order 29 bits +Infinity and -Infinity compares equal to -Infinity. of the double-precision NaN's fraction are zero. Not a Numbers (NaNs) These are values that have the maximum biased expo- 4.3.3 Sign of Result nent value and a nonzero fraction value. The sign bit is ignored (i.e., NaNs are neither positive nor negative). If The following rules govern the sign of the result of an the high-order bit of the fraction field is 0 then the NaN arithmetic, rounding, or conversion operation, when the is a Signaling NaN; otherwise it is a Quiet NaN. operation does not yield an exception. They apply even when the operands or results are zeros or infinities. Signaling NaNs are used to signal exceptions when they appear as operands of computational instructions. 1 The sign of the result of an add operation is the sign of the operand having the larger absolute Quiet NaNs are used to represent the results of certain value. If both operands have the same sign, the invalid operations, such as invalid arithmetic operations sign of the result of an add operation is the same on infinities or on NaNs, when Invalid Operation Excep- as the sign of the operands. The sign of the result tion is disabled (FPSCRVE=0). Quiet NaNs propagate of the subtract operation x-y is the same as the through all floating-point operations except ordered sign of the result of the add operation x+(-y). comparison, Floating Round to Single-Precision, and conversion to integer. Quiet NaNs do not signal excep- When the sum of two operands with opposite sign, tions, except for ordered comparison and conversion to or the difference of two operands with the same integer operations. Specific encodings in QNaNs can sign, is exactly zero, the sign of the result is posi- thus be preserved through a sequence of floating-point tive in all rounding modes except Round toward operations, and used to convey diagnostic information -Infinity, in which mode the sign is negative. to help identify results from invalid operations. 1 The sign of the result of a multiply or divide opera- When a QNaN is the result of a floating-point operation tion is the Exclusive OR of the signs of the oper- because one of the operands is a NaN or because a ands. QNaN was generated due to a disabled Invalid Opera- 1 The sign of the result of a Square Root or Recipro- tion Exception, then the following rule is applied to cal Square Root Estimate operation is always pos- determine the NaN with the high-order fraction bit set to itive, except that the square root of -0 is -0 and 1 that is to be stored as the result. the reciprocal square root of -0 is -Infinity. if (FRA) is a NaN 1 The sign of the result of a Round to Single-Preci- then FRT 1 (FRA) sion, or Convert From Integer, or Round to Integer else if (FRB) is a NaN operation is the sign of the operand being con- then if instruction is frsp verted. Chapter 4. Floating-Point Processor [Category: Floating-Point] 105 Version 2.05 For the Multiply-Add instructions, the rules given above to access a true single-precision representation in stor- are applied first to the multiply operation and then to age, and a fixed-point integer representation in GPRs. the add or subtract operation (one of the inputs to the add or subtract operation is the result of the multiply operation). 4.3.5.1 Single-Precision Operands For single format data, a format conversion from single to double is performed when loading from storage into 4.3.4 Normalization and an FPR and a format conversion from double to single Denormalization is performed when storing from an FPR to storage. No floating-point exceptions are caused by these instruc- The intermediate result of an arithmetic or frsp instruc- tions. An instruction is provided to explicitly convert a tion may require normalization and/or denormalization double format operand in an FPR to single-precision. as described below. Normalization and denormalization Floating-point single-precision is enabled with four do not affect the sign of the result. types of instruction. When an arithmetic or rounding instruction produces an intermediate result which carries out of the significand, or in which the significand is nonzero but has a leading 1. Load Floating-Point Single zero bit, it is not a normalized number and must be nor- This form of instruction accesses a single-preci- malized before it is stored. For the carry-out case, the sion operand in single format in storage, converts significand is shifted right one bit, with a one shifted it to double format, and loads it into an FPR. No into the leading significand bit, and the exponent is floating-point exceptions are caused by these incremented by one. For the leading-zero case, the sig- instructions. nificand is shifted left while decrementing its exponent by one for each bit shifted, until the leading significand 2. Round to Floating-Point Single-Precision bit becomes one. The Guard bit and the Round bit (see The Floating Round to Single-Precision instruction Section 4.5.1, "Execution Model for IEEE Operations" rounds a double-precision operand to single-preci- on page 113) participate in the shift with zeros shifted sion, checking the exponent for single-precision into the Round bit. The exponent is regarded as if its range and handling any exceptions according to range were unlimited. respective enable bits, and places that operand After normalization, or if normalization was not into an FPR in double format. For results produced required, the intermediate result may have a nonzero by single-precision arithmetic instructions, sin- significand and an exponent value that is less than the gle-precision loads, and other instances of the minimum value that can be represented in the format Floating Round to Single-Precision instruction, this specified for the result. In this case, the intermediate operation does not alter the value. result is said to be "Tiny" and the stored result is deter- 3. Single-Precision Arithmetic Instructions mined by the rules described in Section 4.4.4, "Under- flow Exception". These rules may require This form of instruction takes operands from the denormalization. FPRs in double format, performs the operation as if it produced an intermediate result having infinite A number is denormalized by shifting its significand precision and unbounded exponent range, and right while incrementing its exponent by 1 for each bit then coerces this intermediate result to fit in single shifted, until the exponent is equal to the format's mini- format. Status bits, in the FPSCR and optionally in mum value. If any significant bits are lost in this shifting the Condition Register, are set to reflect the sin- process then "Loss of Accuracy" has occurred (See gle-precision result. The result is then converted to Section 4.4.4, "Underflow Exception" on page 112) and double format and placed into an FPR. The result Underflow Exception is signaled. lies in the range supported by the single format. All input values must be representable in single 4.3.5 Data Handling and Precision format; if they are not, the result placed into the target FPR, and the setting of status bits in the Most of the Floating-Point Processor Architecture, FPSCR and in the Condition Register (if Rc=1), including all computational, Move, and Select instruc- are undefined. tions, use the floating-point double format to represent data in the FPRs. Single-precision and integer-valued 4. Store Floating-Point Single operands may be manipulated using double-precision This form of instruction converts a double-preci- operations. Instructions are provided to coerce these sion operand to single format and stores that oper- values from a double format operand. Instructions are and into storage. No floating-point exceptions are also provided for manipulations which do not require caused by these instructions. (The value being double-precision. In addition, instructions are provided stored is effectively assumed to be the result of an instruction of one of the preceding three types.) 106 Power ISATM I Version 2.05 When the result of a Load Floating-Point Single, Float- The Floating Convert To Integer instructions con- ing Round to Single-Precision, or single-precision arith- vert a double-precision operand to a 32-bit or metic instruction is stored in an FPR, the low-order 29 64-bit signed fixed-point integer format. Variants FRACTION bits are zero. are provided both to perform rounding based on the value of FPSCRRN and to round toward zero. Programming Note These instructions may cause Invalid Operation The Floating Round to Single-Precision instruction (VXSNaN, VXCVI) and Inexact exceptions. The is provided to allow value conversion from dou- Floating Convert From Integer instruction converts ble-precision to single-precision with appropriate a 64-bit signed fixed-point integer to a double-pre- exception checking and rounding. This instruction cision floating-point integer. Because of the limita- should be used to convert double-precision float- tions of the source format, only an Inexact ing-point values (produced by double-precision exception may be generated. load and arithmetic instructions and by fcfid) to sin- gle-precision values prior to storing them into single 4.3.6 Rounding format storage elements or using them as oper- ands for single-precision arithmetic instructions. The material in this section applies to operations that Values produced by single-precision load and arith- have numeric operands (i.e., operands that are not metic instructions are already single-precision val- infinities or NaNs). Rounding the intermediate result of ues and can be stored directly into single format such an operation may cause an Overflow Exception, storage elements, or used directly as operands for an Underflow Exception, or an Inexact Exception. The single-precision arithmetic instructions, without pre- remainder of this section assumes that the operation ceding the store, or the arithmetic instruction, by a causes no exceptions and that the result is numeric. Floating Round to Single-Precision instruction. See Section 4.3.2, "Value Representation" and Section 4.4, "Floating-Point Exceptions" for the cases Programming Note not covered here. A single-precision value can be used in double-pre- The Arithmetic and Rounding and Conversion instruc- cision arithmetic operations. The reverse is true tions round their intermediate results. With the excep- only if the double-precision value is representable tion of the Estimate instructions, these instructions in single format. produce an intermediate result that can be regarded as having infinite precision and unbounded exponent Some implementations may execute single-preci- range. All but two groups of these instructions normal- sion arithmetic instructions faster than double-pre- ize or denormalize the intermediate result prior to cision arithmetic instructions. Therefore, if rounding and then place the final result into the target double-precision accuracy is not required, sin- FPR in double format. The Floating Round to Integer gle-precision data and instructions should be used. and Floating Convert To Integer instructions with biased exponents ranging from 1022 through 1074 are prepared for rounding by repetitively shifting the signifi- 4.3.5.2 Integer-Valued Operands cand right one position and incrementing the biased Instructions are provided to round floating-point oper- exponent until it reaches a value of 1075. (Intermediate ands to integer values in floating-point format. To facili- results with biased exponents 1075 or larger are tate exchange of data between the floating-point and already integers, and with biased exponents 1021 or fixed-point processors, instructions are provided to less round to zero.) After rounding, the final result for convert between floating-point double format and Floating Round to Integer is normalized and put in dou- fixed-point integer format in an FPR. Computation on ble format, and for Floating Convert To Integer is con- integer-valued operands may be performed using arith- verted to a signed fixed-point integer. metic instructions of the required precision. (The results FPSCR bits FR and FI generally indicate the results of may not be integer values.) The two groups of instruc- rounding. Each of the instructions which rounds its tions provided specifically to support integer-valued intermediate result sets these bits. If the fraction is operands are described below. incremented during rounding then FR is set to 1, other- 1. Floating Round to Integer wise FR is set to 0. If the result is inexact then FI is set to 1, otherwise FI is set to zero. The Round to Integer The Floating Round to Integer instructions round a instructions are exceptions to this rule, setting FR and double-precision operand to an integer value in FI to 0. The Estimate instructions set FR and FI to floating-point double format. These instructions undefined values. The remaining floating-point instruc- may cause Invalid Operation (VXSNAN) excep- tions do not alter FR and FI. tions. See Sections 4.3.6 and 4.5.1 for more infor- mation about rounding. Four user-selectable rounding modes are provided through the Floating-Point Rounding Control field in the 2. Floating Convert To/From Integer Chapter 4. Floating-Point Processor [Category: Floating-Point] 107 Version 2.05 FPSCR. See Section 4.2.2, "Floating-Point Status and Control Register". These are encoded as follows. 4.4 Floating-Point Exceptions This architecture defines the following floating-point exceptions: RN Rounding Mode 1 Invalid Operation Exception 00 Round to Nearest SNaN 01 Round toward Zero Infinity-Infinity 10 Round toward +Infinity Infinity÷Infinity 11 Round toward -Infinity Zero÷Zero Let Z be the intermediate arithmetic result or the oper- Infinity×Zero and of a convert operation. If Z can be represented Invalid Compare exactly in the target format, then the result in all round- Software-Defined Condition ing modes is Z as represented in the target format. If Z Invalid Square Root cannot be represented exactly in the target format, let Invalid Integer Convert Z1 and Z2 bound Z as the next larger and next smaller 1 Zero Divide Exception numbers representable in the target format. Then Z1 or 1 Overflow Exception Z2 can be used to approximate the result in the target 1 Underflow Exception format. 1 Inexact Exception Figure 53 shows the relation of Z, Z1, and Z2 in this These exceptions, other than Invalid Operation Excep- case. The following rules specify the rounding in the tion due to Software-Defined Condition, may occur dur- four modes. "LSB" means "least significant bit". ing execution of computational instructions. An Invalid Operation Exception due to Software-Defined Condi- tion occurs when a Move To FPSCR instruction sets By Incrementing LSB of Z FPSCRVXSOFT to 1. Infinitely Precise Value By Truncating after LSB Each floating-point exception, and each category of Invalid Operation Exception, has an exception bit in the FPSCR. In addition, each floating-point exception has a Z2 Z1 0 Z2 Z1 corresponding enable bit in the FPSCR. The exception Z Z bit indicates occurrence of the corresponding excep- Negative values Positive values tion. If an exception occurs, the corresponding enable bit governs the result produced by the instruction and, Figure 53. Selection of Z1 and Z2 in conjunction with the FE0 and FE1 bits (see page 109), whether and how the system floating-point Round to Nearest enabled exception error handler is invoked. (In general, Choose the value that is closer to Z (Z1 or the enabling specified by the enable bit is of invoking Z2). In case of a tie, choose the one that is the system error handler, not of permitting the excep- even (least significant bit 0). tion to occur. The occurrence of an exception depends Round toward Zero only on the instruction and its inputs, not on the setting Choose the smaller in magnitude (Z1 or Z2). of any control bits. The only deviation from this general rule is that the occurrence of an Underflow Exception Round toward +Infinity may depend on the setting of the enable bit.) Choose Z1. A single instruction, other than mtfsfi or mtfsf, may set Round toward -Infinity more than one exception bit only in the following cases: Choose Z2. 1 Inexact Exception may be set with Overflow See Section 4.5.1, "Execution Model for IEEE Opera- Exception. tions" on page 113 for a detailed explanation of round- 1 Inexact Exception may be set with Underflow ing. Exception. 1 Invalid Operation Exception (SNaN) is set with Invalid Operation Exception (×0) for Multiply-Add instructions for which the values being multiplied are infinity and zero and the value being added is an SNaN. 1 Invalid Operation Exception (SNaN) may be set with Invalid Operation Exception (Invalid Compare) for Compare Ordered instructions. 1 Invalid Operation Exception (SNaN) may be set with Invalid Operation Exception (Invalid Integer Convert) for Convert To Integer instructions. 108 Power ISATM I Version 2.05 When an exception occurs the writing of a result to the ing-point exception occurs. The system floating-point target register may be suppressed or a result may be enabled exception error handler is also invoked if a delivered, depending on the exception. Move To FPSCR instruction causes an exception bit and the corresponding enable bit both to be 1; the The writing of a result to the target register is sup- Move To FPSCR instruction is considered to cause the pressed for the following kinds of exception, so that enabled exception. there is no possibility that one of the operands is lost: The FE0 and FE1 bits control whether and how the 1 Enabled Invalid Operation system floating-point enabled exception error handler 1 Enabled Zero Divide is invoked if an enabled floating-point exception occurs. For the remaining kinds of exception, a result is gener- The location of these bits and the requirements for ated and written to the destination specified by the altering them are described in Book III. (The system instruction causing the exception. The result may be a floating-point enabled exception error handler is never different value for the enabled and disabled conditions invoked because of a disabled floating-point excep- for some of these exceptions. The kinds of exception tion.) The effects of the four possible settings of these that deliver a result are the following: bits are as follows. 1 Disabled Invalid Operation 1 Disabled Zero Divide FE0 FE1 Description 1 Disabled Overflow 0 0 Ignore Exceptions Mode 1 Disabled Underflow Floating-point exceptions do not cause 1 Disabled Inexact the system floating-point enabled excep- 1 Enabled Overflow tion error handler to be invoked. 1 Enabled Underflow 0 1 Imprecise Nonrecoverable Mode 1 Enabled Inexact The system floating-point enabled excep- Subsequent sections define each of the floating-point tion error handler is invoked at some point exceptions and specify the action that is taken when at or beyond the instruction that caused they are detected. the enabled exception. It may not be pos- sible to identify the excepting instruction The IEEE standard specifies the handling of excep- or the data that caused the exception. tional conditions in terms of "traps" and "trap handlers". Results produced by the excepting In this architecture, an FPSCR exception enable bit of 1 instruction may have been used by or may causes generation of the result value specified in the have affected subsequent instructions IEEE standard for the "trap enabled" case; the expecta- that are executed before the error handler tion is that the exception will be detected by software, is invoked. which will revise the result. An FPSCR exception 1 0 Imprecise Recoverable Mode enable bit of 0 causes generation of the "default result" The system floating-point enabled excep- value specified for the "trap disabled" (or "no trap tion error handler is invoked at some point occurs" or "trap is not implemented") case; the expecta- at or beyond the instruction that caused tion is that the exception will not be detected by soft- the enabled exception. Sufficient informa- ware, which will simply use the default result. The result tion is provided to the error handler that it to be delivered in each case for each exception is can identify the excepting instruction and described in the sections below. the operands, and correct the result. No The IEEE default behavior when an exception occurs is results produced by the excepting instruc- to generate a default value and not to notify software. In tion have been used by or have affected this architecture, if the IEEE default behavior when an subsequent instructions that are executed exception occurs is desired for all exceptions, all before the error handler is invoked. FPSCR exception enable bits should be set to 0 and 1 1 Precise Mode Ignore Exceptions Mode (see below) should be used. The system floating-point enabled excep- In this case the system floating-point enabled exception tion error handler is invoked precisely at error handler is not invoked, even if floating-point the instruction that caused the enabled exceptions occur: software can inspect the FPSCR exception. exception bits if necessary, to determine whether exceptions have occurred. In all cases, the question of whether a floating-point result is stored, and what value is stored, is governed In this architecture, if software is to be notified that a by the FPSCR exception enable bits, as described in given kind of exception has occurred, the correspond- subsequent sections, and is not affected by the value of ing FPSCR exception enable bit must be set to 1 and a the FE0 and FE1 bits. mode other than Ignore Exceptions Mode must be used. In this case the system floating-point enabled In all cases in which the system floating-point enabled exception error handler is invoked if an enabled float- exception error handler is invoked, all instructions Chapter 4. Floating-Point Processor [Category: Floating-Point] 109 Version 2.05 before the instruction at which the system floating-point 4.4.1 Invalid Operation Exception enabled exception error handler is invoked have com- pleted, and no instruction after the instruction at which the system floating-point enabled exception error han- 4.4.1.1 Definition dler is invoked has begun execution. The instruction at An Invalid Operation Exception occurs when an oper- which the system floating-point enabled exception error and is invalid for the specified operation. The invalid handler is invoked has completed if it is the excepting operations are: instruction and there is only one such instruction. Oth- 1 Any floating-point operation on a Signaling NaN erwise it has not begun execution (or may have been (SNaN) partially executed in some cases, as described in Book 1 For add or subtract operations, magnitude subtrac- III). tion of infinities ( - ) 1 Division of infinity by infinity ( ÷ ) Programming Note 1 Division of zero by zero (0 ÷ 0) In any of the three non-Precise modes, a Float- 1 Multiplication of infinity by zero ( × 0) ing-Point Status and Control Register instruction 1 Ordered comparison involving a NaN (Invalid can be used to force any exceptions, due to Compare) instructions initiated before the Floating-Point Sta- 1 Square root or reciprocal square root of a negative tus and Control Register instruction, to be recorded (and nonzero) number (Invalid Square Root) in the FPSCR. (This forcing is superfluous for Pre- 1 Integer convert involving a number too large in cise Mode.) magnitude to be represented in the target format, In either of the Imprecise modes, a Floating-Point or involving an infinity or a NaN (Invalid Integer Status and Control Register instruction can be used Convert) to force any invocations of the system floating-point An Invalid Operation Exception also occurs when an enabled exception error handler, due to instructions mtfsfi, mtfsf, or mtfsb1 instruction is executed that initiated before the Floating-Point Status and Con- sets FPSCRVXSOFT to 1 (Software-Defined Condition). trol Register instruction, to occur. (This forcing has no effect in Ignore Exceptions Mode, and is super- fluous for Precise Mode.) 4.4.1.2 Action The last sentence of the paragraph preceding this The action to be taken depends on the setting of the Programming Note can apply only in the Imprecise Invalid Operation Exception Enable bit of the FPSCR. modes, or if the mode has just been changed from When Invalid Operation Exception is enabled Ignore Exceptions Mode to some other mode. (It (FPSCRVE=1) and an Invalid Operation Exception always applies in the latter case.) occurs, the following actions are taken: In order to obtain the best performance across the wid- 1. One or two Invalid Operation Exceptions are set est range of implementations, the programmer should FPSCRVXSNAN (if SNaN) obey the following guidelines. FPSCRVXISI (if - ) FPSCRVXIDI (if ÷ ) 1 If the IEEE default results are acceptable to the FPSCRVXZDZ (if 0 ÷ 0) application, Ignore Exceptions Mode should be FPSCRVXIMZ (if × 0) used with all FPSCR exception enable bits set to FPSCRVXVC (if invalid comp) 0. FPSCRVXSOFT (if sfw-def cond) 1 If the IEEE default results are not acceptable to the FPSCRVXSQRT (if invalid sqrt) application, Imprecise Nonrecoverable Mode FPSCRVXCVI (if invalid int cvrt) should be used, or Imprecise Recoverable Mode if 2. If the operation is an arithmetic, Floating Round to recoverability is needed, with FPSCR exception Single-Precision, Floating Round to Integer, or enable bits set to 1 for those exceptions for which convert to integer operation, the system floating-point enabled exception error the target FPR is unchanged handler is to be invoked. FPSCRFR FI are set to zero 1 Ignore Exceptions Mode should not, in general, be FPSCRFPRF is unchanged used when any FPSCR exception enable bits are 3. If the operation is a compare, set to 1. FPSCRFR FI C are unchanged 1 Precise Mode may degrade performance in some FPSCRFPCC is set to reflect unordered implementations, perhaps substantially, and there- 4. If an mtfsfi, mtfsf, or mtfsb1 instruction is exe- fore should be used only for debugging and other cuted that sets FPSCRVXSOFT to 1, specialized applications. The FPSCR is set as specified in the instruc- tion description. 110 Power ISATM I Version 2.05 When Invalid Operation Exception is disabled 4.4.2.2 Action (FPSCRVE=0) and an Invalid Operation Exception occurs, the following actions are taken: The action to be taken depends on the setting of the Zero Divide Exception Enable bit of the FPSCR. 1. One or two Invalid Operation Exceptions are set FPSCRVXSNAN (if SNaN) When Zero Divide Exception is enabled (FPSCRZE=1) FPSCRVXISI (if - ) and a Zero Divide Exception occurs, the following FPSCRVXIDI (if ÷ ) actions are taken: FPSCRVXZDZ (if 0 ÷ 0) 1. Zero Divide Exception is set FPSCRVXIMZ (if × 0) FPSCRZX 1 1 FPSCRVXVC (if invalid comp) 2. The target FPR is unchanged FPSCRVXSOFT (if sfw-def cond) 3. FPSCRFR FI are set to zero FPSCRVXSQRT (if invalid sqrt) 4. FPSCRFPRF is unchanged FPSCRVXCVI (if invalid int cvrt) 2. If the operation is an arithmetic or Floating Round When Zero Divide Exception is disabled (FPSCRZE=0) to Single-Precision operation, and a Zero Divide Exception occurs, the following the target FPR is set to a Quiet NaN actions are taken: FPSCRFR FI are set to zero 1. Zero Divide Exception is set FPSCRFPRF is set to indicate the class of the FPSCRZX 1 1 result (Quiet NaN) 2. The target FPR is set to ± Infinity, where the sign is 3. If the operation is a convert to 64-bit integer opera- determined by the XOR of the signs of the oper- tion, ands the target FPR is set as follows: 3. FPSCRFR FI are set to zero FRT is set to the most positive 64-bit integer 4. FPSCRFPRF is set to indicate the class and sign of if the operand in FRB is a positive number the result (± Infinity) or + , and to the most negative 64-bit inte- ger if the operand in FRB is a negative num- ber, - , or NaN 4.4.3 Overflow Exception FPSCRFR FI are set to zero FPSCRFPRF is undefined 4. If the operation is a convert to 32-bit integer opera- 4.4.3.1 Definition tion, An Overflow Exception occurs when the magnitude of the target FPR is set as follows: what would have been the rounded result if the expo- FRT0:31 1 undefined nent range were unbounded exceeds that of the largest FRT32:63 are set to the most positive 32-bit finite number of the specified result precision. integer if the operand in FRB is a positive number or +infinity, and to the most nega- tive 32-bit integer if the operand in FRB is a 4.4.3.2 Action negative number, -infinity, or NaN The action to be taken depends on the setting of the FPSCRFR FI are set to zero Overflow Exception Enable bit of the FPSCR. FPSCRFPRF is undefined 5. If the operation is a compare, When Overflow Exception is enabled (FPSCROE=1) FPSCRFR FI C are unchanged and an Overflow Exception occurs, the following FPSCRFPCC is set to reflect unordered actions are taken: 1. Overflow Exception is set FPSCROX 1 1 6. If an mtfsfi, mtfsf, or mtfsb1 instruction is exe- 2. For double-precision arithmetic instructions, the cuted that sets FPSCRVXSOFT to 1, exponent of the normalized intermediate result is The FPSCR is set as specified in the instruc- adjusted by subtracting 1536 tion description. 3. For single-precision arithmetic instructions and the Floating Round to Single-Precision instruction, the 4.4.2 Zero Divide Exception exponent of the normalized intermediate result is adjusted by subtracting 192 4. The adjusted rounded result is placed into the tar- 4.4.2.1 Definition get FPR 5. FPSCRFPRF is set to indicate the class and sign of A Zero Divide Exception occurs when a Divide instruc- the result (± Normal Number) tion is executed with a zero divisor value and a finite nonzero dividend value. It also occurs when a Recipro- When Overflow Exception is disabled (FPSCROE=0) cal Estimate instruction (fre[s] or frsqrte[s]) is exe- and an Overflow Exception occurs, the following cuted with an operand value of zero. actions are taken: Chapter 4. Floating-Point Processor [Category: Floating-Point] 111 Version 2.05 1. Overflow Exception is set 4.4.4 Underflow Exception FPSCROX 1 1 2. Inexact Exception is set FPSCRXX 1 1 4.4.4.1 Definition 3. The result is determined by the rounding mode Underflow Exception is defined separately for the (FPSCRRN) and the sign of the intermediate result enabled and disabled states: as follows: - Round to Nearest 1 Enabled: Store ± Infinity, where the sign is the sign Underflow occurs when the intermediate result is of the intermediate result "Tiny". - Round toward Zero 1 Disabled: Store the format's largest finite number Underflow occurs when the intermediate result is with the sign of the intermediate result "Tiny" and there is "Loss of Accuracy". - Round toward + Infinity For negative overflow, store the format's A "Tiny" result is detected before rounding, when a most negative finite number; for positive nonzero intermediate result computed as though both overflow, store +Infinity the precision and the exponent range were unbounded - Round toward -Infinity would be less in magnitude than the smallest normal- For negative overflow, store -Infinity; for ized number. positive overflow, store the format's larg- If the intermediate result is "Tiny" and Underflow est finite number Exception is disabled (FPSCRUE=0) then the interme- 4. The result is placed into the target FPR diate result is denormalized (see Section 4.3.4, "Nor- 5. FPSCRFR is undefined malization and Denormalization" on page 106) and 6. FPSCRFI is set to 1 rounded (see Section 4.3.6, "Rounding" on page 107) 7. FPSCRFPRF is set to indicate the class and sign of before being placed into the target FPR. the result (± Infinity or ± Normal Number) "Loss of Accuracy" is detected when the delivered result value differs from what would have been com- puted were both the precision and the exponent range unbounded. 4.4.4.2 Action The action to be taken depends on the setting of the Underflow Exception Enable bit of the FPSCR. When Underflow Exception is enabled (FPSCRUE=1) and an Underflow Exception occurs, the following actions are taken: 1. Underflow Exception is set FPSCRUX 1 1 2. For double-precision arithmetic instructions, the exponent of the normalized intermediate result is adjusted by adding 1536 3. For single-precision arithmetic instructions and the Floating Round to Single-Precision instruction, the exponent of the normalized intermediate result is adjusted by adding 192 4. The adjusted rounded result is placed into the tar- get FPR 5. FPSCRFPRF is set to indicate the class and sign of the result (± Normalized Number) 112 Power ISATM I Version 2.05 Programming Note 4.5 Floating-Point Execution The FR and FI bits are provided to allow the system Models floating-point enabled exception error handler, when invoked because of an Underflow Exception, All implementations of this architecture must provide to simulate a "trap disabled" environment. That is, the equivalent of the following execution models to the FR and FI bits allow the system floating-point ensure that identical results are obtained. enabled exception error handler to unround the result, thus allowing the result to be denormalized. Special rules are provided in the definition of the com- putational instructions for the infinities, denormalized numbers and NaNs. The material in the remainder of When Underflow Exception is disabled (FPSCRUE=0) this section applies to instructions that have numeric and an Underflow Exception occurs, the following operands and a numeric result (i.e., operands and actions are taken: result that are not infinities or NaNs), and that cause no 1. Underflow Exception is set exceptions. See Section 4.3.2 and Section 4.4 for the FPSCRUX 1 1 cases not covered here. 2. The rounded result is placed into the target FPR 3. FPSCRFPRF is set to indicate the class and sign of Although the double format specifies an 11-bit expo- the result (± Normalized Number, ± Denormalized nent, exponent arithmetic makes use of two additional Number, or ± Zero) bits to avoid potential transient overflow conditions. One extra bit is required when denormalized dou- ble-precision numbers are prenormalized. The second 4.4.5 Inexact Exception bit is required to permit the computation of the adjusted exponent value in the following cases when the corre- sponding exception enable bit is 1: 4.4.5.1 Definition 1 Underflow during multiplication using a denormal- An Inexact Exception occurs when one of two condi- ized operand. tions occur during rounding: 1 Overflow during division using a denormalized divi- 1. The rounded result differs from the intermediate sor. result assuming both the precision and the expo- The IEEE standard includes 32-bit and 64-bit arith- nent range of the intermediate result to be metic. The standard requires that single-precision arith- unbounded. In this case the result is said to be metic be provided for single-precision operands. The inexact. (If the rounding causes an enabled Over- standard permits double-precision floating-point opera- flow Exception or an enabled Underflow Excep- tions to have either (or both) single-precision or dou- tion, an Inexact Exception also occurs only if the ble-precision operands, but states that single-precision significands of the rounded result and the interme- floating-point operations should not accept double-pre- diate result differ.) cision operands. The Power ISA follows these guide- 2. The rounded result overflows and Overflow Excep- lines; double-precision arithmetic instructions can have tion is disabled. operands of either or both precisions, while single-pre- cision arithmetic instructions require all operands to be 4.4.5.2 Action single-precision. Double-precision arithmetic instruc- tions and fcfid produce double-precision values, while The action to be taken does not depend on the setting single-precision arithmetic instructions produce sin- of the Inexact Exception Enable bit of the FPSCR. gle-precision values. When an Inexact Exception occurs, the following For arithmetic instructions, conversions from dou- actions are taken: ble-precision to single-precision must be done explicitly 1. Inexact Exception is set by software, while conversions from single-precision to FPSCRXX 1 1 double-precision are done implicitly. 2. The rounded or overflowed result is placed into the target FPR 4.5.1 Execution Model for IEEE 3. FPSCRFPRF is set to indicate the class and sign of the result Operations The following description uses 64-bit arithmetic as an Programming Note example. 32-bit arithmetic is similar except that the In some implementations, enabling Inexact Excep- FRACTION is a 23-bit field, and the single-precision tions may degrade performance more than does Guard, Round, and Sticky bits (described in this sec- enabling other types of floating-point exception. tion) are logically adjacent to the 23-bit FRACTION field. Chapter 4. Floating-Point Processor [Category: Floating-Point] 113 Version 2.05 IEEE-conforming significand arithmetic is considered to The significand of the intermediate result is prepared be performed with a floating-point accumulator having for rounding by shifting its contents right, if required, the following format, where bits 0:55 comprise the sig- until the least significant bit to be retained is in the nificand of the intermediate result. low-order bit position of the fraction. Four user-select- able rounding modes are provided through FPSCRRN S C L FRACTION GR X as described in Section 4.3.6, "Rounding" on page 107. 0 1 53 54 55 Using Z1 and Z2 as defined on page 107, the rules for rounding in each mode are as follows. Figure 54. IEEE 64-bit execution model 1 Round to Nearest The S bit is the sign bit. Guard bit = 0 The C bit is the carry bit, which captures the carry out The result is truncated. (Result exact (GRX=000) of the significand. or closest to next lower value in magnitude (GRX=001, 010, or 011)) The L bit is the leading unit bit of the significand, which receives the implicit bit from the operand. Guard bit = 1 The FRACTION is a 52-bit field that accepts the frac- Depends on Round and Sticky bits: tion of the operand. Case a The Guard (G), Round (R), and Sticky (X) bits are If the Round or Sticky bit is 1 (inclusive), the extensions to the low-order bits of the accumulator. result is incremented. (Result closest to The G and R bits are required for postnormalization of next higher value in magnitude (GRX=101, the result. The G, R, and X bits are required during 110, or 111)) rounding to determine if the intermediate result is Case b equally near the two nearest representable values. The If the Round and Sticky bits are 0 (result X bit serves as an extension to the G and R bits by rep- midway between closest representable val- resenting the logical OR of all bits that may appear to ues), then if the low-order bit of the result is the low-order side of the R bit, due either to shifting the 1 the result is incremented. Otherwise (the accumulator right or to other generation of low-order low-order bit of the result is 0) the result is result bits. The G and R bits participate in the left shifts truncated (this is the case of a tie rounded with zeros being shifted into the R bit. Figure 55 shows to even). the significance of the G, R, and X bits with respect to the intermediate result (IR), the representable number 1 Round toward Zero next lower in magnitude (NL), and the representable Choose the smaller in magnitude of Z1 or Z2. If the number next higher in magnitude (NH). Guard, Round, or Sticky bit is nonzero, the result is inexact. GRX Interpretation 1 Round toward + Infinity 000 IR is exact Choose Z1. 001 1 Round toward - Infinity 010 IR closer to NL Choose Z2. 011 If rounding results in a carry into C, the significand is 100 IR midway between NL and NH shifted right one position and the exponent is incre- mented by one. This yields an inexact result, and possi- 101 bly also exponent overflow. If any of the Guard, Round, 110 IR closer to NH or Sticky bits is nonzero, then the result is also inexact. 111 Fraction bits are stored to the target FPR. For Floating Round to Integer, Floating Round to Single-Precision, Figure 55. Interpretation of G, R, and X bits and single-precision arithmetic instructions, low-order Figure 56 shows the positions of the Guard, Round, zeros must be appended as appropriate to fill out the and Sticky bits for double-precision and single-preci- double-precision fraction. sion floating-point numbers relative to the accumulator illustrated in Figure 54. Format Guard Round Sticky Double G bit R bit X bit Single 24 25 OR of 26:52, G, R, X Figure 56. Location of the Guard, Round, and Sticky bits in the IEEE execution model 114 Power ISATM I Version 2.05 4.5.2 Execution Model for If the instruction is Floating Negative Multiply-Add or Floating Negative Multiply-Subtract, the final result is Multiply-Add Type Instructions negated. The Power ISA provides a special form of instruction that performs up to three operations in one instruction (a multiplication, an addition, and a negation). With this added capability comes the special ability to produce a more exact intermediate result as input to the rounder. 32-bit arithmetic is similar except that the FRACTION field is smaller. Multiply-add significand arithmetic is considered to be performed with a floating-point accumulator having the following format, where bits 0:106 comprise the signifi- cand of the intermediate result. S C L FRACTION X' 0 1 2 3 106 Figure 57. Multiply-add 64-bit execution model The first part of the operation is a multiplication. The multiplication has two 53-bit significands as inputs, which are assumed to be prenormalized, and produces a result conforming to the above model. If there is a carry out of the significand (into the C bit), then the sig- nificand is shifted right one position, shifting the L bit (leading unit bit) into the most significant bit of the FRACTION and shifting the C bit (carry out) into the L bit. All 106 bits (L bit, the FRACTION) of the product take part in the add operation. If the exponents of the two inputs to the adder are not equal, the significand of the operand with the smaller exponent is aligned (shifted) to the right by an amount that is added to that exponent to make it equal to the other input's exponent. Zeros are shifted into the left of the significand as it is aligned and bits shifted out of bit 105 of the significand are ORed into the X' bit. The add operation also pro- duces a result conforming to the above model with the X' bit taking part in the add operation. The result of the addition is then normalized, with all bits of the addition result, except the X' bit, participating in the shift. The normalized result serves as the inter- mediate result that is input to the rounder. For rounding, the conceptual Guard, Round, and Sticky bits are defined in terms of accumulator bits. Figure 58 shows the positions of the Guard, Round, and Sticky bits for double-precision and single-precision float- ing-point numbers in the multiply-add execution model. Format Guard Round Sticky Double 53 54 OR of 55:105, X' Single 24 25 OR of 26:105, X' Figure 58. Location of the Guard, Round, and Sticky bits in the multiply-add execution model The rules for rounding the intermediate result are the same as those given in Section 4.5.1. Chapter 4. Floating-Point Processor [Category: Floating-Point] 115 Version 2.05 4.6 Floating-Point Processor Instructions For each instruction in this section that defines the use of an Rc bit, the behavior defined for the instruction cor- responding to Rc=1 is considered part of the Float- ing-Point.Record category. 116 Power ISATM I Version 2.05 4.6.1 Floating-Point Storage Access Instructions The Storage Access instructions compute the effective 4.6.1.1 Storage Access Exceptions address (EA) of the storage to be accessed as described in Section 1.10.3, "Effective Address Calcu- Storage accesses will cause the system data storage lation" on page 26. error handler to be invoked if the program is not allowed to modify the target storage (Store only), or if Programming Note the program attempts to access storage that is unavail- able. The la extended mnemonic permits computing an effective address as a Load or Store instruction would, but loads the address itself into a GPR rather than loading the value that is in storage at that address. This extended mnemonic is described in Section D.9, "Miscellaneous Mnemon- ics" on page 393. 4.6.2 Floating-Point Load Instructions There are three basic forms of load instruction: sin- exp 1 exp - 1 gle-precision, double-precision, and integer. The inte- FRT0 1 sign ger form is provided by the Load Floating-Point as FRT1:11 1 exp + 1023 Integer Word Algebraic instruction, described on FRT12:63 1 frac1:52 page 120. Because the FPRs support only float- Zero / Infinity / NaN ing-point double format, single-precision Load Float- if WORD1:8 = 255 or WORD1:31 = 0 then ing-Point instructions convert single-precision data to FRT0:1 1 WORD0:1 double format prior to loading the operand into the tar- FRT2 1 WORD1 get FPR. The conversion and loading steps are as fol- FRT3 1 WORD1 lows. FRT4 1 WORD1 Let WORD0:31 be the floating-point single-precision FRT5:63 1 WORD2:31 || 290 operand accessed from storage. For double-precision Load Floating-Point instructions Normalized Operand and for the Load Floating-Point as Integer Word Alge- if WORD1:8 > 0 and WORD1:8 < 255 then braic instruction no conversion is required, as the data FRT0:1 1 WORD0:1 from storage are copied directly into the FPR. FRT2 1 ¬WORD1 Many of the Load Floating-Point instructions have an FRT3 1 ¬WORD1 "update" form, in which register RA is updated with the FRT4 1 ¬WORD1 effective address. For these forms, if RA0, the effec- FRT5:63 1 WORD2:31 || 290 tive address is placed into register RA and the storage Denormalized Operand element (word or doubleword) addressed by EA is if WORD1:8 = 0 and WORD9:31 0 then loaded into FRT. sign 1 WORD0 Note: Recall that RA and RB denote General Purpose exp 1 -126 Registers, while FRT denotes a Floating-Point Regis- frac0:52 1 0b0 || WORD9:31 || 290 ter. normalize the operand do while frac0 = 0 frac0:52 1 frac1:52 || 0b0 Chapter 4. Floating-Point Processor [Category: Floating-Point] 117 Version 2.05 Load Floating-Point Single D-form Load Floating-Point Single Indexed X-form lfs FRT,D(RA) lfsx FRT,RA,RB 48 FRT RA D 0 6 11 16 31 31 FRT RA RB 535 / 0 6 11 16 21 31 if RA = 0 then b 1 0 else b 1 (RA) if RA = 0 then b 1 0 EA 1 b + EXTS(D) else b 1 (RA) FRT 1 DOUBLE(MEM(EA, 4)) EA 1 b + (RB) FRT 1 DOUBLE(MEM(EA, 4)) Let the effective address (EA) be the sum (RA|0)+D. Let the effective address (EA) be the sum (RA|0)+(RB). The word in storage addressed by EA is interpreted as a floating-point single-precision operand. This word is The word in storage addressed by EA is interpreted as converted to floating-point double format (see a floating-point single-precision operand. This word is page 117) and placed into register FRT. converted to floating-point double format (see page 117) and placed into register FRT. Special Registers Altered: None Special Registers Altered: None Load Floating-Point Single with Update Load Floating-Point Single with Update D-form Indexed X-form lfsu FRT,D(RA) lfsux FRT,RA,RB 49 FRT RA D 31 FRT RA RB 567 / 0 6 11 16 31 0 6 11 16 21 31 EA 1 (RA) + EXTS(D) EA 1 (RA) + (RB) FRT 1 DOUBLE(MEM(EA, 4)) FRT 1 DOUBLE(MEM(EA, 4)) RA 1 EA RA 1 EA Let the effective address (EA) be the sum (RA)+D. Let the effective address (EA) be the sum (RA)+(RB). The word in storage addressed by EA is interpreted as The word in storage addressed by EA is interpreted as a floating-point single-precision operand. This word is a floating-point single-precision operand. This word is converted to floating-point double format (see converted to floating-point double format (see page 117) and placed into register FRT. page 117) and placed into register FRT. EA is placed into register RA. EA is placed into register RA. If RA=0, the instruction form is invalid. If RA=0, the instruction form is invalid. Special Registers Altered: Special Registers Altered: None None 118 Power ISATM I Version 2.05 Load Floating-Point Double D-form Load Floating-Point Double Indexed X-form lfd FRT,D(RA) lfdx FRT,RA,RB 50 FRT RA D 0 6 11 16 31 31 FRT RA RB 599 / 0 6 11 16 21 31 if RA = 0 then b 1 0 else b 1 (RA) if RA = 0 then b 1 0 EA 1 b + EXTS(D) else b 1 (RA) FRT 1 MEM(EA, 8) EA 1 b + (RB) FRT 1 MEM(EA, 8) Let the effective address (EA) be the sum (RA|0)+D. Let the effective address (EA) be the sum (RA|0)+(RB). The doubleword in storage addressed by EA is loaded into register FRT. The doubleword in storage addressed by EA is loaded into register FRT. Special Registers Altered: None Special Registers Altered: None Load Floating-Point Double with Update Load Floating-Point Double with Update D-form Indexed X-form lfdu FRT,D(RA) lfdux FRT,RA,RB 51 FRT RA D 31 FRT RA RB 631 / 0 6 11 16 31 0 6 11 16 21 31 EA 1 (RA) + EXTS(D) EA 1 (RA) + (RB) FRT 1 MEM(EA, 8) FRT 1 MEM(EA, 8) RA 1 EA RA 1 EA Let the effective address (EA) be the sum (RA)+D. Let the effective address (EA) be the sum (RA)+(RB). The doubleword in storage addressed by EA is loaded The doubleword in storage addressed by EA is loaded into register FRT. into register FRT. EA is placed into register RA. EA is placed into register RA. If RA=0, the instruction form is invalid. If RA=0, the instruction form is invalid. Special Registers Altered: Special Registers Altered: None None Chapter 4. Floating-Point Processor [Category: Floating-Point] 119 Version 2.05 Load Floating-Point as Integer Word Algebraic Indexed X-form lfiwax FRT,RA,RB 31 FRT RA RB 855 / 0 6 11 16 21 31 if RA = 0 then b 1 0 else b 1 (RA) EA 1 b + (RB) FRT 1 EXTS(MEM(EA, 4)) Let the effective address (EA) be the sum (RA|0)+(RB). The word in storage addressed by EA is loaded into FRT32:63. FRT0:31 are filled with a copy of bit 0 of the loaded word. Special Registers Altered: None 120 Power ISATM I Version 2.05 4.6.3 Floating-Point Store Instructions There are three basic forms of store instruction: sin- gle-precision Load Floating-Point from WORD will not gle-precision, double-precision, and integer. The inte- compare equal to the contents of the original source ger form is provided by the Store Floating-Point as register). Integer Word instruction, described on page 124. For double-precision Store Floating-Point instructions Because the FPRs support only floating-point double and for the Store Floating-Point as Integer Word format for floating-point data, single-precision Store instruction no conversion is required, as the data from Floating-Point instructions convert double-precision the FPR are copied directly into storage. data to single format prior to storing the operand into storage. The conversion steps are as follows. Many of the Store Floating-Point instructions have an "update" form, in which register RA is updated with the Let WORD0:31 be the word in storage written to. effective address. For these forms, if RA0, the effec- No Denormalization Required (includes Zero / Infin- tive address is placed into register RA. ity / NaN) Note: Recall that RA and RB denote General Purpose if FRS1:11 > 896 or FRS1:63 = 0 then Registers, while FRS denotes a Floating-Point Regis- WORD0:1 1 FRS0:1 ter. WORD2:31 1 FRS5:34 Denormalization Required if 874 FRS1:11 896 then sign 1 FRS0 exp 1 FRS1:11 - 1023 frac0:52 1 0b1 || FRS12:63 denormalize operand do while exp < -126 frac0:52 1 0b0 || frac0:51 exp 1 exp + 1 WORD0 1 sign WORD1:8 1 0x00 WORD9:31 1 frac1:23 else WORD 1 undefined Notice that if the value to be stored by a single-preci- sion Store Floating-Point instruction is larger in magni- tude than the maximum number representable in single format, the first case above (No Denormalization Required) applies. The result stored in WORD is then a well-defined value, but is not numerically equal to the value in the source register (i.e., the result of a sin- Chapter 4. Floating-Point Processor [Category: Floating-Point] 121 Version 2.05 Store Floating-Point Single D-form Store Floating-Point Single Indexed X-form stfs FRS,D(RA) stfsx FRS,RA,RB 52 FRS RA D 0 6 11 16 31 31 FRS RA RB 663 / 0 6 11 16 21 31 if RA = 0 then b 1 0 else b 1 (RA) if RA = 0 then b 1 0 EA 1 b + EXTS(D) else b 1 (RA) MEM(EA, 4) 1 SINGLE((FRS)) EA 1 b + (RB) MEM(EA, 4) 1 SINGLE((FRS)) Let the effective address (EA) be the sum (RA|0)+D. Let the effective address (EA) be the sum (RA|0)+(RB). The contents of register FRS are converted to single format (see page 121) and stored into the word in stor- The contents of register FRS are converted to single age addressed by EA. format (see page 121) and stored into the word in stor- age addressed by EA. Special Registers Altered: None Special Registers Altered: None Store Floating-Point Single with Update Store Floating-Point Single with Update D-form Indexed X-form stfsu FRS,D(RA) stfsux FRS,RA,RB 53 FRS RA D 31 FRS RA RB 695 / 0 6 11 16 31 0 6 11 16 21 31 EA 1 (RA) + EXTS(D) EA 1 (RA) + (RB) MEM(EA, 4) 1 SINGLE((FRS)) MEM(EA, 4) 1 SINGLE((FRS)) RA 1 EA RA 1 EA Let the effective address (EA) be the sum (RA)+D. Let the effective address (EA) be the sum (RA)+(RB). The contents of register FRS are converted to single The contents of register FRS are converted to single format (see page 121) and stored into the word in stor- format (see page 121) and stored into the word in stor- age addressed by EA. age addressed by EA. EA is placed into register RA. EA is placed into register RA. If RA=0, the instruction form is invalid. If RA=0, the instruction form is invalid. Special Registers Altered: Special Registers Altered: None None 122 Power ISATM I Version 2.05 Store Floating-Point Double D-form Store Floating-Point Double Indexed X-form stfd FRS,D(RA) stfdx FRS,RA,RB 54 FRS RA D 0 6 11 16 31 31 FRS RA RB 727 / 0 6 11 16 21 31 if RA = 0 then b 1 0 else b 1 (RA) if RA = 0 then b 1 0 EA 1 b + EXTS(D) else b 1 (RA) MEM(EA, 8) 1 (FRS) EA 1 b + (RB) MEM(EA, 8) 1 (FRS) Let the effective address (EA) be the sum (RA|0)+D. Let the effective address (EA) be the sum (RA|0)+(RB). The contents of register FRS are stored into the dou- bleword in storage addressed by EA. The contents of register FRS are stored into the dou- bleword in storage addressed by EA. Special Registers Altered: None Special Registers Altered: None Store Floating-Point Double with Update Store Floating-Point Double with Update D-form Indexed X-form stfdu FRS,D(RA) stfdux FRS,RA,RB 55 FRS RA D 31 FRS RA RB 759 / 0 6 11 16 31 0 6 11 16 21 31 EA 1 (RA) + EXTS(D) EA 1 (RA) + (RB) MEM(EA, 8) 1 (FRS) MEM(EA, 8) 1 (FRS) RA 1 EA RA 1 EA Let the effective address (EA) be the sum (RA)+D. Let the effective address (EA) be the sum (RA)+(RB). The contents of register FRS are stored into the dou- The contents of register FRS are stored into the dou- bleword in storage addressed by EA. bleword in storage addressed by EA. EA is placed into register RA. EA is placed into register RA. If RA=0, the instruction form is invalid. If RA=0, the instruction form is invalid. Special Registers Altered: Special Registers Altered: None None Chapter 4. Floating-Point Processor [Category: Floating-Point] 123 Version 2.05 Store Floating-Point as Integer Word Indexed X-form stfiwx FRS,RA,RB 31 FRS RA RB 983 / 0 6 11 16 21 31 if RA = 0 then b 1 0 else b 1 (RA) EA 1 b + (RB) MEM(EA, 4) 1 (FRS)32:63 Let the effective address (EA) be the sum (RA|0)+(RB). (FRS)32:63 are stored, without conversion, into the word in storage addressed by EA. If the contents of register FRS were produced, either directly or indirectly, by a Load Floating-Point Single instruction, a single-precision Arithmetic instruction, or frsp, then the value stored is undefined. (The contents of register FRS are produced directly by such an instruction if FRS is the target register for the instruc- tion. The contents of register FRS are produced indi- rectly by such an instruction if FRS is the final target register of a sequence of one or more Floating-Point Move instructions, with the input to the sequence hav- ing been produced directly by such an instruction.) Special Registers Altered: None 124 Power ISATM I Version 2.05 4.6.4 Floating-Point Load Store Doubleword Pair Instructions [Category: Floating-Point.Phased-Out] For lfdp[x], the doubleword-pair in storage addressed odd-numbered FPR being stored into the rightmost by EA is loaded into an even-odd pair of FPRs with the doubleword. even-numbered FPR being loaded with the leftmost doubleword from storage and the odd-numbered FPR Programming Note being loaded with the rightmost doubleword. The instructions described in this section should For stfdp[x], the content of an even-odd pair of FPRs not be used to access an operand in DFP128 for- is stored into the doubleword-pair in storage mat when MSRLE=1. addressed by EA, with the even-numbered FPR being stored into the leftmost doubleword in storage and the Load Floating-Point Double Pair DS-form Store Floating-Point Double Pair DS-form lfdp FRTp,DS(RA) stfdp FRSp,DS(RA) 57 FRTp RA DS 00 61 FRSp RA DS 00 0 6 11 16 30 31 0 6 11 16 30 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1(RA) else b 1 (RA) EA 1 b + EXTS(DS||0b00) EA 1 b + EXTS(DS||0b00) FRTp 1 MEM(EA, 16) MEM(EA, 16) 1 FRSp Let the effective address (EA) be the sum (RA|0) + Let the effective address (EA) be the sum (RA|0) + (DS||0b00). The doubleword-pair in storage addressed (DS||0b00). The contents of register-pair FRSp are by EA is placed into register-pair FRTp. stored into the doubleword-pair in storage addressed by EA. If FRTp is odd, the instruction form is invalid. If FRSp is odd, the instruction form is invalid. Special Registers Altered: None Special Registers Altered: None Load Floating-Point Double Pair Indexed X-form Store Floating-Point Double Pair Indexed X-form lfdpx FRTp,RA,RB stfdpx FRSp,RA,RB 31 FRTp RA RB 791 / 0 6 11 16 21 31 31 FRSp RA RB 919 / 0 6 11 16 21 31 if RA = 0 then b 1 0 else b 1 (RA) if RA = 0 then b 1 0 EA 1 b + (RB) else b 1 (RA) FRTp 1 MEM(EA, 16) EA 1 b + (RB) MEM(EA, 16) 1 FRSp Let the effective address (EA) be the sum (RA|0) + (RB). The doubleword-pair in storage addressed by EA Let the effective address (EA) be the sum (RA|0) + is placed into register-pair FRTp. (RB). The contents of register-pair FRSp are stored into the doubleword-pair in storage addressed by EA. If FRTp is odd, the instruction form is invalid. If FRSp is odd, the instruction form is invalid. Special Registers Altered: None Special Registers Altered: None Chapter 4. Floating-Point Processor [Category: Floating-Point] 125 Version 2.05 4.6.5 Floating-Point Move Instructions These instructions copy data from one floating-point value (e.g., the sign bit of a NaN may be altered by register to another, altering the sign bit (bit 0) as fneg, fabs, fnabs, and fcpsgn). These instructions do described below for fneg, fabs, fnabs, and fcpsgn. not alter the FPSCR. These instructions treat NaNs just like any other kind of Floating Move Register X-form Floating Negate X-form fmr FRT,FRB (Rc=0) fneg FRT,FRB (Rc=0) fmr. FRT,FRB (Rc=1) fneg. FRT,FRB (Rc=1) 63 FRT /// FRB 72 Rc 63 FRT /// FRB 40 Rc 0 6 11 16 21 31 0 6 11 16 21 31 The contents of register FRB are placed into register The contents of register FRB with bit 0 inverted are FRT. placed into register FRT. Special Registers Altered: Special Registers Altered: CR1 (if Rc=1) CR1 (if Rc=1) Floating Absolute Value X-form Floating Copy Sign X-form fabs FRT,FRB (Rc=0) fcpsgn FRT, FRA, FRB (Rc=0) fabs. FRT,FRB (Rc=1) fcpsgn. FRT, FRA, FRB (Rc=1) 63 FRT /// FRB 264 Rc 63 FRT FRA FRB 8 Rc 0 6 11 16 21 31 0 6 11 16 21 31 The contents of register FRB with bit 0 set to zero are The contents of register FRB with bit 0 set to the value placed into register FRT. of bit 0 of register FRA are placed into register FRT. Special Registers Altered: Special Registers Altered: CR1 (if Rc=1) CR1 (if Rc=1) Floating Negative Absolute Value X-form fnabs FRT,FRB (Rc=0) fnabs. FRT,FRB (Rc=1) 63 FRT /// FRB 136 Rc 0 6 11 16 21 31 The contents of register FRB with bit 0 set to one are placed into register FRT. Special Registers Altered: CR1 (if Rc=1) 126 Power ISATM I Version 2.05 4.6.6 Floating-Point Arithmetic Instructions 4.6.6.1 Floating-Point Elementary Arithmetic Instructions Floating Add [Single] A-form Floating Subtract [Single] A-form fadd FRT,FRA,FRB (Rc=0) fsub FRT,FRA,FRB (Rc=0) fadd. FRT,FRA,FRB (Rc=1) fsub. FRT,FRA,FRB (Rc=1) 63 FRT FRA FRB /// 21 Rc 63 FRT FRA FRB /// 20 Rc 0 6 11 16 21 26 31 0 6 11 16 21 26 31 fadds FRT,FRA,FRB (Rc=0) fsubs FRT,FRA,FRB (Rc=0) fadds. FRT,FRA,FRB (Rc=1) fsubs. FRT,FRA,FRB (Rc=1) 59 FRT FRA FRB /// 21 Rc 59 FRT FRA FRB /// 20 Rc 0 6 11 16 21 26 31 0 6 11 16 21 26 31 The floating-point operand in register FRA is added to The floating-point operand in register FRB is subtracted the floating-point operand in register FRB. from the floating-point operand in register FRA. If the most significant bit of the resultant significand is If the most significant bit of the resultant significand is not 1, the result is normalized. The result is rounded to not 1, the result is normalized. The result is rounded to the target precision under control of the Floating-Point the target precision under control of the Floating-Point Rounding Control field RN of the FPSCR and placed Rounding Control field RN of the FPSCR and placed into register FRT. into register FRT. Floating-point addition is based on exponent compari- The execution of the Floating Subtract instruction is son and addition of the two significands. The expo- identical to that of Floating Add, except that the con- nents of the two operands are compared, and the tents of FRB participate in the operation with the sign significand accompanying the smaller exponent is bit (bit 0) inverted. shifted right, with its exponent increased by one for FPSCRFPRF is set to the class and sign of the result, each bit shifted, until the two exponents are equal. The except for Invalid Operation Exceptions when two significands are then added or subtracted as FPSCRVE=1. appropriate, depending on the signs of the operands, to form an intermediate sum. All 53 bits of the significand Special Registers Altered: as well as all three guard bits (G, R, and X) enter into FPRF FR FI the computation. FX OX UX XX VXSNAN VXISI If a carry occurs, the sum's significand is shifted right CR1 (if Rc=1) one bit position and the exponent is increased by one. FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when FPSCRVE=1. Special Registers Altered: FPRF FR FI FX OX UX XX VXSNAN VXISI CR1 (if Rc=1) Chapter 4. Floating-Point Processor [Category: Floating-Point] 127 Version 2.05 Floating Multiply [Single] A-form Floating Divide [Single] A-form fmul FRT,FRA,FRC (Rc=0) fdiv FRT,FRA,FRB (Rc=0) fmul. FRT,FRA,FRC (Rc=1) fdiv. FRT,FRA,FRB (Rc=1) 63 FRT FRA /// FRC 25 Rc 63 FRT FRA FRB /// 18 Rc 0 6 11 16 21 26 31 0 6 11 16 21 26 31 fmuls FRT,FRA,FRC (Rc=0) fdivs FRT,FRA,FRB (Rc=0) fmuls. FRT,FRA,FRC (Rc=1) fdivs. FRT,FRA,FRB (Rc=1) 59 FRT FRA /// FRC 25 Rc 59 FRT FRA FRB /// 18 Rc 0 6 11 16 21 26 31 0 6 11 16 21 26 31 The floating-point operand in register FRA is multiplied The floating-point operand in register FRA is divided by by the floating-point operand in register FRC. the floating-point operand in register FRB. The remain- der is not supplied as a result. If the most significant bit of the resultant significand is not 1, the result is normalized. The result is rounded to If the most significant bit of the resultant significand is the target precision under control of the Floating-Point not 1, the result is normalized. The result is rounded to Rounding Control field RN of the FPSCR and placed the target precision under control of the Floating-Point into register FRT. Rounding Control field RN of the FPSCR and placed into register FRT. Floating-point multiplication is based on exponent addi- tion and multiplication of the significands. Floating-point division is based on exponent subtrac- tion and division of the significands. FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when FPSCRFPRF is set to the class and sign of the result, FPSCRVE=1. except for Invalid Operation Exceptions when FPSCRVE=1 and Zero Divide Exceptions when Special Registers Altered: FPSCRZE=1. FPRF FR FI FX OX UX XX Special Registers Altered: VXSNAN VXIMZ FPRF FR FI CR1 (if Rc=1) FX OX UX ZX XX VXSNAN VXIDI VXZDZ CR1 (if Rc=1) 128 Power ISATM I Version 2.05 Floating Square Root [Single] A-form Floating Reciprocal Estimate [Single] A-form fsqrt FRT,FRB (Rc=0) fsqrt. FRT,FRB (Rc=1) fre FRT,FRB,L (Rc=0) fre. FRT,FRB,L (Rc=1) 63 FRT /// FRB /// 22 Rc [Category: Floating-Point.Phased-In (sV2.05)] 0 6 11 16 21 26 31 63 FRT /// L FRB /// 24 Rc 0 6 11 15 16 21 26 31 fsqrts FRT,FRB (Rc=0) fsqrts. FRT,FRB (Rc=1) fres FRT,FRB,L (Rc=0) 59 FRT /// FRB /// 22 Rc fres. FRT,FRB,L (Rc=1) 0 6 11 16 21 26 31 [Category: Floating-Point.Phased-In (sV2.05)] 59 FRT /// L FRB /// 24 Rc The square root of the floating-point operand in register 0 6 11 15 16 21 26 31 FRB is placed into register FRT. If the most significant bit of the resultant significand is An estimate of the reciprocal of the floating-point oper- not 1, the result is normalized. The result is rounded to and in register FRB is placed into register FRT. The the target precision under control of the Floating-Point estimate placed into register FRT is correct to a preci- Rounding Control field RN of the FPSCR and placed sion of one part in 256 of the reciprocal of (FRB), i.e., into register FRT. estimate ­ 1 / x 1 ABS(--------------------------------------) --------- - - Operation with various special values of the operand is 1/x 256 summarized below. where x is the initial value in FRB. Operand Result Exception Operation with various special values of the operand is - QNaN1 VXSQRT summarized below. <0 QNaN1 VXSQRT -0 -0 None Operand Result Exception + + None -1 -0 None SNaN QNaN1 VXSNAN -0 -11 ZX QNaN QNaN None +0 +11 ZX 1 No result if FPSCRVE = 1 +1 +0 None SNaN QNaN2 VXSNAN FPSCRFPRF is set to the class and sign of the result, QNaN QNaN None except for Invalid Operation Exceptions when 1 No result if FPSCRZE = 1. FPSCRVE=1. 2 No result if FPSCR VE = 1. Special Registers Altered: If L=1 [Category: Phased-Out], an operand may be FPRF FR FI treated as if it were zero having the same sign as the FX XX operand in the following cases. VXSNAN VXSQRT 1 The operand is a denormalized number. CR1 (if Rc=1) 1 The operand would be a denormalized number in single format and was produced by a Load Float- ing-Point Single instruction, a single-precision arithmetic instruction, or frsp, or by a sequence of one or more Floating-Point Move instructions for which the input to the sequence was produced by such an instruction. FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when FPSCRVE=1 and Zero Divide Exceptions when FPSCRZE=1. The results of executing this instruction may vary between implementations, and between different exe- cutions on the same implementation. Special Registers Altered: FPRF FR (undefined) FI (undefined) Chapter 4. Floating-Point Processor [Category: Floating-Point] 129 Version 2.05 FX OX UX ZX XX (undefined) Floating Reciprocal Square Root Estimate VXSNAN [Single] A-form CR1 (if Rc=1) frsqrte FRT,FRB,L (Rc=0) Programming Note frsqrte. FRT,FRB,L (Rc=1) fre and fres serve as both basic and extended [Category: Floating-Point.Phased-In (sV2.05)] mnemonics. The Assembler will recognize a fre or fres mnemonic with three operands as the basic 63 FRT /// L FRB /// 26 Rc 0 6 11 15 16 21 26 31 form, and a fre or fres mnemonic with two oper- ands as the extended form. In the extended form the L operand is omitted and assumed to be 0. frsqrtes FRT,FRB,L (Rc=0) frsqrtes. FRT,FRB,L (Rc=1) [Category: Floating-Point.Phased-In (sV2.05)] Programming Note For the Floating-Point Estimate instructions, some 59 FRT /// L FRB /// 26 Rc implementations might implement a precision 0 6 11 15 16 21 26 31 higher than the minimum architected precision. Thus, a program may take advantage of the higher An estimate of the reciprocal of the square root of the precision instructions to increase performance by floating-point operand in register FRB is placed into decreasing the iterations needed for software emu- register FRT. The estimate placed into register FRT is lation of floating-point instructions. However, there correct to a precision of one part in 32 of the reciprocal is no guarantee given about the precision which of the square root of (FRB), i.e., may vary (up or down) between implementations. ABS(estimate ­ 1 / ( x )) ----- Only programs targeted at a specific implementa- 1 ----------------------------------------------- - - tion (i.e., the program will not be migrated to 1 / ( x) 32 another implementation) should take advantage of where x is the initial value in FRB. the higher precision of the instructions. All other Operation with various special values of the operand is programs should rely on the minimum architected summarized below. precision, which will guarantee the program to run properly across different implementations. Operand Result Exception -1 QNaN2 VXSQRT <0 QNaN2 VXSQRT Programming Note -0 -11 ZX In some implementations execution of fre[s]. with +0 +11 ZX L=1 may have a shorter latency than execution with +1 +0 None L=0. SNaN QNaN2 VXSNAN QNaN QNaN None 1 No result if FPSCRZE = 1. 2 No result if FPSCRVE = 1. If L=1 [Category: Phased-Out], an operand may be treated as if it were zero having the same sign as the operand in the following cases. 1 The operand is a denormalized number. 1 The operand would be a denormalized number in single format and was produced by a Load Float- ing-Point Single instruction, a single-precision arithmetic instruction, or frsp, or by a sequence of one or more Floating-Point Move instructions for which the input to the sequence was produced by such an instruction. FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when FPSCRVE=1 and Zero Divide Exceptions when FPSCRZE=1. The results of executing this instruction may vary between implementations, and between different exe- cutions on the same implementation. 130 Power ISATM I Version 2.05 Special Registers Altered: FPRF FR (undefined) FI (undefined) FX ZX XX (undefined) VXSNAN VXSQRT CR1 (if Rc=1) Note See the Notes that appear with fre[s]. Chapter 4. Floating-Point Processor [Category: Floating-Point] 131 Version 2.05 4.6.6.2 Floating-Point Multiply-Add Instructions These instructions combine a multiply and an add oper- based on the final result of the operation, and not ation without an intermediate rounding operation. The on the result of the multiplication. fraction part of the intermediate product is 106 bits wide 1 Invalid Operation Exception bits are set as if the (L bit, FRACTION), and all 106 bits take part in the add/ multiplication and the addition were performed subtract portion of the instruction. using two separate instructions (fmul[s], followed Status bits are set as follows. by fadd[s] or fsub[s]). That is, multiplication of infinity by 0 or of anything by an SNaN, and/or 1 Overflow, Underflow, and Inexact Exception bits, addition of an SNaN, cause the corresponding the FR and FI bits, and the FPRF field are set exception bits to be set. Floating Multiply-Add [Single] A-form Floating Multiply-Subtract [Single] A-form fmadd FRT,FRA,FRC,FRB (Rc=0) fmsub FRT,FRA,FRC,FRB (Rc=0) fmadd. FRT,FRA,FRC,FRB (Rc=1) fmsub. FRT,FRA,FRC,FRB (Rc=1) 63 FRT FRA FRB FRC 29 Rc 63 FRT FRA FRB FRC 28 Rc 0 6 11 16 21 26 31 0 6 11 16 21 26 31 fmadds FRT,FRA,FRC,FRB (Rc=0) fmsubs FRT,FRA,FRC,FRB (Rc=0) fmadds. FRT,FRA,FRC,FRB (Rc=1) fmsubs. FRT,FRA,FRC,FRB (Rc=1) 59 FRT FRA FRB FRC 29 Rc 59 FRT FRA FRB FRC 28 Rc 0 6 11 16 21 26 31 0 6 11 16 21 26 31 The operation The operation FRT 1 [(FRA)×(FRC)] + (FRB) FRT 1 [(FRA)×(FRC)] - (FRB) is performed. is performed. The floating-point operand in register FRA is multiplied The floating-point operand in register FRA is multiplied by the floating-point operand in register FRC. The by the floating-point operand in register FRC. The floating-point operand in register FRB is added to this floating-point operand in register FRB is subtracted intermediate result. from this intermediate result. If the most significant bit of the resultant significand is If the most significant bit of the resultant significand is not 1, the result is normalized. The result is rounded to not 1, the result is normalized. The result is rounded to the target precision under control of the Floating-Point the target precision under control of the Floating-Point Rounding Control field RN of the FPSCR and placed Rounding Control field RN of the FPSCR and placed into register FRT. into register FRT. FPSCRFPRF is set to the class and sign of the result, FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when except for Invalid Operation Exceptions when FPSCRVE=1. FPSCRVE=1. Special Registers Altered: Special Registers Altered: FPRF FR FI FPRF FR FI FX OX UX XX FX OX UX XX VXSNAN VXISI VXIMZ VXSNAN VXISI VXIMZ CR1 (if Rc=1) CR1 (if Rc=1) 132 Power ISATM I Version 2.05 Floating Negative Multiply-Add [Single] Floating Negative Multiply-Subtract A-form [Single] A-form fnmadd FRT,FRA,FRC,FRB (Rc=0) fnmsub FRT,FRA,FRC,FRB (Rc=0) fnmadd. FRT,FRA,FRC,FRB (Rc=1) fnmsub. FRT,FRA,FRC,FRB (Rc=1) 63 FRT FRA FRB FRC 31 Rc 63 FRT FRA FRB FRC 30 Rc 0 6 11 16 21 26 31 0 6 11 16 21 26 31 fnmadds FRT,FRA,FRC,FRB (Rc=0) fnmsubs FRT,FRA,FRC,FRB (Rc=0) fnmadds. FRT,FRA,FRC,FRB (Rc=1) fnmsubs. FRT,FRA,FRC,FRB (Rc=1) 59 FRT FRA FRB FRC 31 Rc 59 FRT FRA FRB FRC 30 Rc 0 6 11 16 21 26 31 0 6 11 16 21 26 31 The operation The operation FRT 1 - ( [(FRA)×(FRC)] + (FRB) ) FRT 1 - ( [(FRA)×(FRC)] - (FRB) ) is performed. is performed. The floating-point operand in register FRA is multiplied The floating-point operand in register FRA is multiplied by the floating-point operand in register FRC. The by the floating-point operand in register FRC. The floating-point operand in register FRB is added to this floating-point operand in register FRB is subtracted intermediate result. from this intermediate result. If the most significant bit of the resultant significand is If the most significant bit of the resultant significand is not 1, the result is normalized. The result is rounded to not 1, the result is normalized. The result is rounded to the target precision under control of the Floating-Point the target precision under control of the Floating-Point Rounding Control field RN of the FPSCR, then negated Rounding Control field RN of the FPSCR, then negated and placed into register FRT. and placed into register FRT. This instruction produces the same result as would be This instruction produces the same result as would be obtained by using the Floating Multiply-Add instruction obtained by using the Floating Multiply-Subtract and then negating the result, with the following excep- instruction and then negating the result, with the follow- tions. ing exceptions. 1 QNaNs propagate with no effect on their "sign" bit. 1 QNaNs propagate with no effect on their "sign" bit. 1 QNaNs that are generated as the result of a dis- 1 QNaNs that are generated as the result of a dis- abled Invalid Operation Exception have a "sign" bit abled Invalid Operation Exception have a "sign" bit of 0. of 0. 1 SNaNs that are converted to QNaNs as the result 1 SNaNs that are converted to QNaNs as the result of a disabled Invalid Operation Exception retain of a disabled Invalid Operation Exception retain the "sign" bit of the SNaN. the "sign" bit of the SNaN. FPSCRFPRF is set to the class and sign of the result, FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when except for Invalid Operation Exceptions when FPSCRVE=1. FPSCRVE=1. Special Registers Altered: Special Registers Altered: FPRF FR FI FPRF FR FI FX OX UX XX FX OX UX XX VXSNAN VXISI VXIMZ VXSNAN VXISI VXIMZ CR1 (if Rc=1) CR1 (if Rc=1) Chapter 4. Floating-Point Processor [Category: Floating-Point] 133 Version 2.05 4.6.7 Floating-Point Rounding and Conversion Instructions Programming Note Examples of uses of these instructions to perform various conversions can be found in Section E.2, "Floating-Point Conversions [Category: Float- ing-Point]" on page 400. 4.6.7.1 Floating-Point Rounding 4.6.7.2 Floating-Point Convert To/From Instruction Integer Instructions Floating Round to Single-Precision Floating Convert To Integer Doubleword X-form X-form frsp FRT,FRB (Rc=0) fctid FRT,FRB (Rc=0) frsp. FRT,FRB (Rc=1) fctid. FRT,FRB (Rc=1) 63 FRT /// FRB 12 Rc 63 FRT /// FRB 814 Rc 0 6 11 16 21 31 0 6 11 16 21 31 The floating-point operand in register FRB is rounded The floating-point operand in register FRB is converted to single-precision, using the rounding mode specified to a 64-bit signed fixed-point integer, using the round- by FPSCRRN, and placed into register FRT. ing mode specified by FPSCRRN, and placed into regis- ter FRT. The rounding is described fully in Section A.1, "Float- ing-Point Round to Single-Precision Model" on If the operand in FRB is greater than 263 - 1, then FRT page 361. is set to 0x7FFF_FFFF_FFFF_FFFF. If the operand in FRB is less than -263, then FRT is set to FPSCRFPRF is set to the class and sign of the result, 0x8000_0000_0000_0000. except for Invalid Operation Exceptions when FPSCRVE=1. The conversion is described fully in Section A.2, "Float- ing-Point Convert to Integer Model" on page 365. Special Registers Altered: FPRF FR FI Except for enabled Invalid Operation Exceptions, FX OX UX XX FPSCRFPRF is undefined. FPSCRFR is set if the result VXSNAN is incremented when rounded. FPSCRFI is set if the CR1 (if Rc=1) result is inexact. Special Registers Altered: FPRF (undefined) FR FI FX XX VXSNAN VXCVI CR1 (if Rc=1) Programming Note The Floating Convert From Integer Word function can be performed by loading the desired word into an FPR using lfiwax (see Section 4.6.2), and then converting the contents of that FPR to a float- ing-point integer using fcfid. 134 Power ISATM I Version 2.05 Floating Convert To Integer Doubleword Floating Convert To Integer Word X-form with round toward Zero X-form fctiw FRT,FRB (Rc=0) fctidz FRT,FRB (Rc=0) fctiw. FRT,FRB (Rc=1) fctidz. FRT,FRB (Rc=1) 63 FRT /// FRB 14 Rc 63 FRT /// FRB 815 Rc 0 6 11 16 21 31 0 6 11 16 21 31 The floating-point operand in register FRB is converted The floating-point operand in register FRB is converted to a 32-bit signed fixed-point integer, using the round- to a 64-bit signed fixed-point integer, using the round- ing mode specified by FPSCRRN, and placed into ing mode Round toward Zero, and placed into register FRT32:63. The contents of FRT0:31 are undefined. FRT. If the operand in FRB is greater than 231 - 1, then bits 63 32:63 of FRT are set to 0x7FFF_FFFF. If the operand If the operand in FRB is greater than 2 - 1, then FRT is set to 0x7FFF_FFFF_FFFF_FFFF. If the operand in in FRB is less than -231, then bits 32:63 of FRT are set FRB is less than -263, then FRT is set to to 0x8000_0000. 0x8000_0000_0000_0000. The conversion is described fully in Section A.2, "Float- The conversion is described fully in Section A.2, "Float- ing-Point Convert to Integer Model" on page 365. ing-Point Convert to Integer Model" on page 365. Except for enabled Invalid Operation Exceptions, Except for enabled Invalid Operation Exceptions, FPSCRFPRF is undefined. FPSCRFR is set if the result FPSCRFPRF is undefined. FPSCRFR is set if the result is incremented when rounded. FPSCRFI is set if the is incremented when rounded. FPSCRFI is set if the result is inexact. result is inexact. Special Registers Altered: Special Registers Altered: FPRF (undefined) FR FI FPRF (undefined) FR FI FX XX FX XX VXSNAN VXCVI VXSNAN VXCVI CR1 (if Rc=1) CR1 (if Rc=1) Chapter 4. Floating-Point Processor [Category: Floating-Point] 135 Version 2.05 Floating Convert To Integer Word with 4.6.7.3 Floating Round to Integer round toward Zero X-form Instructions [Category: Float- fctiwz FRT,FRB (Rc=0) ing-Point.Phased-In (sV2.05)] fctiwz. FRT,FRB (Rc=1) The Floating Round to Integer instructions provide direct support for rounding functions found in high level 63 FRT /// FRB 15 Rc languages. For example, frin, friz, frip, and frim imple- 0 6 11 16 21 31 ment C++ round(), trunc(), ceil(), and floor(), respec- tively. Note that frin does not implement the IEEE The floating-point operand in register FRB is converted Round to Nearest function, which is often further to a 32-bit signed fixed-point integer, using the round- described as "ties to even." The rounding performed by ing mode Round toward Zero, and placed into these instructions is described fully in Section A.4, FRT32:63. The contents of FRT0:31 are undefined. "Floating-Point Round to Integer Model" on page 369. If the operand in FRB is greater than 231 - 1, then bits Programming Note 32:63 of FRT are set to 0x7FFF_FFFF. If the operand These instructions set FPSCRFR FI to 0b00 regard- in FRB is less than -231, then bits 32:63 of FRT are set less of whether the result is inexact or rounded to 0x8000_0000. because there is a desire to preserve the value of The conversion is described fully in Section A.2, "Float- FPSCRXX. Furthermore, it is believed that most ing-Point Convert to Integer Model". programs do not need to know whether these rounding operations produce inexact or rounded Except for enabled Invalid Operation Exceptions, results. If it is necessary to determine whether the FPSCRFPRF is undefined. FPSCRFR is set if the result result is inexact or rounded, software must com- is incremented when rounded. FPSCRFI is set if the pare the result with the original source operand. result is inexact. Special Registers Altered: FPRF (undefined) FR FI FX XX VXSNAN VXCVI CR1 (if Rc=1) Floating Convert From Integer Doubleword X-form fcfid FRT,FRB (Rc=0) fcfid. FRT,FRB (Rc=1) 63 FRT /// FRB 846 Rc 0 6 11 16 21 31 The 64-bit signed fixed-point operand in register FRB is converted to an infinitely precise floating-point integer. The result of the conversion is rounded to double-preci- sion, using the rounding mode specified by FPSCRRN, and placed into register FRT. The conversion is described fully in Section A.3, "Float- ing-Point Convert from Integer Model". FPSCRFPRF is set to the class and sign of the result. FPSCRFR is set if the result is incremented when rounded. FPSCRFI is set if the result is inexact. Special Registers Altered: FPRF FR FI FX XX CR1 (if Rc=1) 136 Power ISATM I Version 2.05 Floating Round to Integer Nearest X-form Floating Round to Integer Plus X-form frin FRT,FRB (Rc=0) frip FRT,FRB (Rc=0) frin. FRT,FRB (Rc=1) frip. FRT,FRB (Rc=1) 63 FRT /// FRB 392 Rc 63 FRT /// FRB 456 Rc 0 6 11 16 21 31 0 6 11 16 21 31 The floating-point operand in register FRB is rounded The floating-point operand in register FRB is rounded to an integral value as follows, with the result placed to an integral value using the rounding mode round into register FRT. If the sign of the operand is positive, toward +infinity, and the result is placed into register (FRB) + 0.5 is truncated to an integral value, otherwise FRT. (FRB) - 0.5 is truncated to an integral value. FPSCRFPRF is set to the class and sign of the result, FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when except for Invalid Operation Exceptions when FPSCRVE = 1. FPSCRVE = 1. Special Registers Altered: Special Registers Altered: FPRF FR (set to 0) FI (set to 0) FPRF FR (set to 0) FI (set to 0) FX FX VXSNAN VXSNAN CR1 (if Rc = 1) CR1 (if Rc = 1) Floating Round to Integer Toward Zero Floating Round to Integer Minus X-form X-form frim FRT,FRB (Rc=0) friz FRT,FRB (Rc=0) frim. FRT,FRB (Rc=1) friz. FRT,FRB (Rc=1) 63 FRT /// FRB 488 Rc 63 FRT /// FRB 424 Rc 0 6 11 16 21 31 0 6 11 16 21 31 The floating-point operand in register FRB is rounded The floating-point operand in register FRB is rounded to an integral value using the rounding mode round to an integral value using the rounding mode round toward -infinity, and the result is placed into register toward zero, and the result is placed into register FRT. FRT. FPSCRFPRF is set to the class and sign of the result, FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when except for Invalid Operation Exceptions when FPSCRVE = 1. FPSCRVE = 1. Special Registers Altered: Special Registers Altered: FPRF FR (set to 0) FI (set to 0) FPRF FR (set to 0) FI (set to 0) FX FX VXSNAN VXSNAN CR1 (if Rc = 1) CR1 (if Rc = 1) Chapter 4. Floating-Point Processor [Category: Floating-Point] 137 Version 2.05 4.6.8 Floating-Point Compare Instructions The floating-point Compare instructions compare the The CR field and the FPCC are set as follows. contents of two floating-point registers. Comparison ignores the sign of zero (i.e., regards +0 as equal to Bit Name Description -0). The comparison can be ordered or unordered. 0 FL (FRA) < (FRB) 1 FG (FRA) > (FRB) The comparison sets one bit in the designated CR field 2 FE (FRA) = (FRB) to 1 and the other three to 0. The FPCC is set in the 3 FU (FRA) ? (FRB) (unordered) same way. Floating Compare Unordered X-form Floating Compare Ordered X-form fcmpu BF,FRA,FRB fcmpo BF,FRA,FRB 63 BF // FRA FRB 0 / 63 BF // FRA FRB 32 / 0 6 9 11 16 21 31 0 6 9 11 16 21 31 if (FRA) is a NaN or if (FRA) is a NaN or (FRB) is a NaN then c 1 0b0001 (FRB) is a NaN then c 1 0b0001 else if (FRA) < (FRB) then c 1 0b1000 else if (FRA) < (FRB) then c 1 0b1000 else if (FRA) > (FRB) then c 1 0b0100 else if (FRA) > (FRB) then c 1 0b0100 else c 1 0b0010 else c 1 0b0010 FPCC 1 c FPCC 1 c CR4×BF:4×BF+3 1 c CR4×BF:4×BF+3 1 c if (FRA) is an SNaN or if (FRA) is an SNaN or (FRB) is an SNaN then (FRB) is an SNaN then VXSNAN 1 1 VXSNAN 1 1 if VE = 0 then VXVC 1 1 The floating-point operand in register FRA is compared else if (FRA) is a QNaN or to the floating-point operand in register FRB. The (FRB) is a QNaN then VXVC 1 1 result of the compare is placed into CR field BF and the FPCC. The floating-point operand in register FRA is compared to the floating-point operand in register FRB. The If either of the operands is a NaN, either quiet or signal- result of the compare is placed into CR field BF and the ing, then CR field BF and the FPCC are set to reflect FPCC. unordered. If either of the operands is a Signaling NaN, then VXSNAN is set. If either of the operands is a NaN, either quiet or signal- ing, then CR field BF and the FPCC are set to reflect Special Registers Altered: unordered. If either of the operands is a Signaling NaN, CR field BF then VXSNAN is set and, if Invalid Operation is dis- FPCC abled (VE=0), VXVC is set. If neither operand is a Sig- FX naling NaN but at least one operand is a Quiet NaN, VXSNAN then VXVC is set. Special Registers Altered: CR field BF FPCC FX VXSNAN VXVC 138 Power ISATM I Version 2.05 4.6.9 Floating-Point Select Instruction Floating Select A-form fsel FRT,FRA,FRC,FRB (Rc=0) fsel. FRT,FRA,FRC,FRB (Rc=1) 63 FRT FRA FRB FRC 23 Rc 0 6 11 16 21 26 31 if (FRA) 0.0 then FRT 1 (FRC) else FRT 1 (FRB) The floating-point operand in register FRA is compared to the value zero. If the operand is greater than or equal to zero, register FRT is set to the contents of reg- ister FRC. If the operand is less than zero or is a NaN, register FRT is set to the contents of register FRB. The comparison ignores the sign of zero (i.e., regards +0 as equal to -0). Special Registers Altered: CR1 (if Rc=1) Programming Note Examples of uses of this instruction can be found in Sections E.2, "Floating-Point Conversions [Cate- gory: Floating-Point]" on page 400 and E.3, "Float- ing-Point Selection [Category: Floating-Point]" on page 402. Warning: Care must be taken in using fsel if IEEE compatibility is required, or if the values being tested can be NaNs or infinities; see Section E.3.4, "Notes" on page 402. Chapter 4. Floating-Point Processor [Category: Floating-Point] 139 Version 2.05 4.6.10 Floating-Point Status and Control Register Instructions Every Floating-Point Status and Control Register Move From FPSCR X-form instruction synchronizes the effects of all floating-point instructions executed by a given processor. Executing mffs FRT (Rc=0) a Floating-Point Status and Control Register instruction mffs. FRT (Rc=1) ensures that all floating-point instructions previously ini- tiated by the given processor have completed before 63 FRT /// /// 583 Rc the Floating-Point Status and Control Register instruc- 0 6 11 16 21 31 tion is initiated, and that no subsequent floating-point instructions are initiated by the given processor until The contents of the FPSCR are placed into register the Floating-Point Status and Control Register instruc- FRT. tion has completed. In particular: Special Registers Altered: 1 All exceptions that will be caused by the previously CR1 (if Rc=1) initiated instructions are recorded in the FPSCR before the Floating-Point Status and Control Reg- ister instruction is initiated. Move to Condition Register from FPSCR X-form 1 All invocations of the system floating-point enabled exception error handler that will be caused by the mcrfs BF,BFA previously initiated instructions have occurred before the Floating-Point Status and Control Reg- 63 BF // BFA // /// 64 / ister instruction is initiated. 0 6 9 11 14 16 21 31 1 No subsequent floating-point instruction that depends on or alters the settings of any FPSCR The contents of FPSCR32:63 field BFA are copied to bits is initiated until the Floating-Point Status and Condition Register field BF. All exception bits copied Control Register instruction has completed. are set to 0 in the FPSCR. If the FX bit is copied, it is set to 0 in the FPSCR. (Floating-point Storage Access instructions are not affected.) Special Registers Altered: CR field BF The instruction descriptions in this section refer to FX OX (if BFA=0) "FPSCR fields," where FPSCR field k is FPSCR bits UX ZX XX VXSNAN (if BFA=1) 4xk:4xk+3. VXISI VXIDI VXZDZ VXIMZ (if BFA=2) VXVC (if BFA=3) VXSOFT VXSQRT VXCVI (if BFA=5) 140 Power ISATM I Version 2.05 Move To FPSCR Field Immediate X-form Move To FPSCR Fields XFL-form mtfsfi BF,U,W (Rc=0) mtfsf FLM,FRB,L,W (Rc=0) mtfsfi. BF,U,W (Rc=1) mtfsf. FLM,FRB,L,W (Rc=1) 63 BF // /// W U / 134 Rc 63 L FLM W FRB 711 Rc 0 6 9 11 15 16 20 21 31 0 6 7 15 16 21 31 The value of the U field is placed into FPSCR field The FPSCR is modified as specified by the FLM, L, and BF+82(1-W). W fields. FPSCRFX is altered only if BF = 0 and W = 0. L=0 Special Registers Altered: The contents of register FRB are placed into the FPSCR field BF + 82(1-W) FPSCR under control of the W field and the field CR1 (if Rc=1) mask specified by FLM. W and the field mask iden- tify the 4-bit fields affected. Let i be an integer in Programming Note the range 0-7. If FLMi=1 then FPSCR field k is set mtfsfi serves as both a basic and an extended to the contents of the corresponding field of regis- mnemonic. The Assembler will recognize a mtfsfi ter FRB, where k = i+82(1-W). mnemonic with three operands as the basic form, L=1 and a mtfsfi mnemonic with two operands as the extended form. In the extended form the W oper- The contents of register FRB are placed into the and is omitted and assumed to be 0. FPSCR. FPSCRFX is not altered implicitly by this instruction. Programming Note Special Registers Altered: When FPSCR32:35 is specified, bits 32 (FX) and 35 FPSCR fields selected by mask, L, and W (OX) are set to the values of U0 and U3 (i.e., even if CR1 (if Rc=1) this instruction causes OX to change from 0 to 1, FX is set from U0 and not by the usual rule that FX Programming Note is set to 1 when an exception bit changes from 0 to mtfsf serves as both a basic and an extended 1). Bits 33 and 34 (FEX and VX) are set according mnemonic. The Assembler will recognize a mtfsf to the usual rule, given on page 101, and not from mnemonic with four operands as the basic form, U1:2. and a mtfsf mnemonic with two operands as the extended form. In the extended form the W and L operands are omitted and both are assumed to be 0. Programming Note Updating fewer than eight fields of the FPSCR may have substantially poorer performance on some implementations than updating eight fields or all of the fields. Programming Note If L=1 or if L=0 and FPSCR32:35 is specified, bits 32 (FX) and 35 (OX) are set to the values of (FRB)32 and (FRB)35 (i.e., even if this instruction causes OX to change from 0 to 1, FX is set from (FRB)32 and not by the usual rule that FX is set to 1 when an exception bit changes from 0 to 1). Bits 33 and 34 (FEX and VX) are set according to the usual rule, given on page 101, and not from (FRB)33:34. Chapter 4. Floating-Point Processor [Category: Floating-Point] 141 Version 2.05 Move To FPSCR Bit 0 X-form Move To FPSCR Bit 1 X-form mtfsb0 BT (Rc=0) mtfsb1 BT (Rc=0) mtfsb0. BT (Rc=1) mtfsb1. BT (Rc=1) 63 BT /// /// 70 Rc 63 BT /// /// 38 Rc 0 6 11 16 21 31 0 6 11 16 21 31 Bit BT+32 of the FPSCR is set to 0. Bit BT+32 of the FPSCR is set to 1. Special Registers Altered: Special Registers Altered: FPSCR bit BT+32 FPSCR bits BT+32 and FX CR1 (if Rc=1) CR1 (if Rc=1) Programming Note Programming Note Bits 33 and 34 (FEX and VX) cannot be explicitly Bits 32 and 34 (FEX and VX) cannot be explicitly reset. set. 142 Power ISATM I Version 2.05 Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 5.1 Decimal Floating-Point (DFP) Proces- 5.5.8.1 Data-Format Conversion . . . . . 153 sor Overview . . . . . . . . . . . . . . . . . . . . 143 5.5.8.2 Data-Type Conversion . . . . . . 154 5.2 DFP Register Handling . . . . . . . . . 144 5.5.9 Format Operations . . . . . . . . . . . 154 5.2.1 DFP Usage of Floating-Point Regis- 5.5.10 DFP Exceptions . . . . . . . . . . . . 154 ters . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 5.5.10.1 Invalid Operation Exception . 156 5.3 DFP Support for Non-DFP Data Types 5.5.10.2 Zero Divide Exception . . . . . . 157 146 5.5.10.3 Overflow Exception . . . . . . . . 157 5.4 DFP Number Representation . . . . 147 5.5.10.4 Underflow Exception . . . . . . . 158 5.4.1 DFP Data Format . . . . . . . . . . . 148 5.5.10.5 Inexact Exception . . . . . . . . . 159 5.4.1.1 Fields Within the Data Format 148 5.5.11 Summary of Normal Rounding And 5.4.1.2 Summary of DFP Data Formats . . Range Actions . . . . . . . . . . . . . . . . . . . 160 149 5.6 DFP Instruction Descriptions. . . . . 162 5.4.1.3 Preferred DPD Encoding . . . . 149 5.6.1 DFP Arithmetic Instructions . . . . 163 5.4.2 Classes of DFP Data . . . . . . . . . 149 5.6.2 DFP Compare Instructions . . . . . 167 5.5 DFP Execution Model . . . . . . . . . . 150 5.6.3 DFP Test Instructions. . . . . . . . . 170 5.5.1 Rounding . . . . . . . . . . . . . . . . . . 150 5.6.4 DFP Quantum Adjustment Instruc- 5.5.2 Rounding Mode Specification . . 151 tions . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 5.5.3 Formation of Final Result. . . . . . 152 5.6.5 DFP Conversion Instructions . . . 182 5.5.3.1 Use of Ideal Exponent . . . . . . 152 5.6.5.1 DFP Data-Format Conversion 5.5.4 Arithmetic Operations . . . . . . . . 152 Instructions . . . . . . . . . . . . . . . . . . . . . 182 5.5.4.1 Sign of Arithmetic Result . . . . 152 5.6.5.2 DFP Data-Type Conversion 5.5.5 Compare Operations . . . . . . . . . 153 Instructions . . . . . . . . . . . . . . . . . . . . . 185 5.5.6 Test Operations . . . . . . . . . . . . . 153 5.6.6 DFP Format Instructions . . . . . . 187 5.5.7 Quantum Adjustment Operations 153 5.6.7 DFP Instruction Summary . . . . . 191 5.5.8 Conversion Operations . . . . . . . 153 5.1 Decimal Floating-Point meaning of some control and status bits in the FPSCR are different between the BFP and DFP processors. (DFP) Processor Overview The DFP processor also shares the Condition Register This chapter describes the behavior of the decimal (CR) with the fixed-point processor, the BFP proces- floating-point processor, the supported data types, for- sor, and the vector processor. mats, and classes, and the usage of registers. Also The DFP processor supports three DFP data formats: included are the execution model, exceptions, and DFP Short (single precision), DFP Long (double preci- instructions supported by the decimal floating-point sion), and DFP Extended (quad precision). Most opera- processor. tions are performed on DFP Long or DFP Extended The decimal floating-point (DFP) processor shares the format directly. Support for DFP Short is limited to con- 32 floating-point registers (FPRs) and the Floating- version to and from DFP Long. Some DFP instructions Point Status and Control Register (FPSCR) with the operate on other data types, including signed or binary floating-point (BFP) processor. However, the unsigned binary fixed-point data, and signed or interpretation of data formats in the FPRs, and the unsigned decimal data. Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 143 Version 2.05 DFP instructions are provided to perform arithmetic, Invalid conversion (VXCVI) compare, test, quantum-adjustment, conversion, and 1 Zero Divide Exception (ZX) format operations on operands held in FPRs or FPR 1 Overflow Exception (OX) pairs. 1 Underflow Exception (UX) 1 Inexact Exception (XX) 1 Arithmetic instructions These instructions perform addition, subtraction, Each DFP exception and each category of Invalid multiplication, and division operations. Operation Exception has an exception status bit in the FPSCR. In addition, each of the five DFP exceptions 1 Compare instructions has a corresponding enable bit in the FPSCR. These These instructions perform a comparison opera- enable bits enable or disable the invocation of the sys- tion on the numerical value of two DFP operands. tem floating-point enabled exception error handler, and may affect the setting of some exception status bits in 1 Test instructions the FPSCR. These instructions test the data class, the data The usage of these bits by the DFP processor differs group, the exponent, or the number of significant from the usage by the BFP processor. Section 5.5.10 digits of a DFP operand. "DFP Exceptions" on page 154 provides a detailed dis- 1 Quantum-adjustment instructions cussion of DFP exceptions, including the effects of the enable bits. These instructions convert a DFP number to a result in the form that has the designated expo- nent, which may be explicitly or implicitly specified. 5.2 DFP Register Handling 1 Conversion instructions The following sections describe first how the floating- These instructions perform conversion between point registers are utilized by the DFP processor. The different data formats or data types. subsequent section covers the DFP usage of CR and 1 Format instructions FPSCR. These instructions facilitate composing or decom- posing a DFP operand. 5.2.1 DFP Usage of Floating-Point These instructions are described in Section 5.6 "DFP Registers Instruction Descriptions" on page 162. The DFP processor shares the same 32 64-bit FPRs The three DFP data formats allow finite numbers to be with the BFP processor. Like the BFP instructions, DFP represented with different precision and ranges. Spe- instructions also use 5-bit fields for designating the cial codes are also provided to represent +Infinity, FPRs to hold the source or target operands. -Infinity, Quiet NaN (Not-a-Number), and Signaling When data in DFP Short format is held in a FPR, it NaN. Operations involving infinities produce results occupies the rightmost 32 bits of the FPR. The Load obeying traditional mathematical conventions. NaNs Floating-Point as Integer Word Algebraic instruction is have no mathematical interpretation. The encoding of provided to load the rightmost 32 bits of a FPR with a NaNs provides a diagnostic information field. This diag- single-word data from storage. The Store Floating- nostic field may be used to indicate such things as the Point as Integer Word instruction is available to store source of an uninitialized variable or the reason an the rightmost 32 bits of a FPR to a storage location. invalid result was produced. Data in DFP Long format, 64-bit binary fixed-point val- The DFP processor recognizes a set of DFP excep- ues, or 64-bit BCD values is held in a FPR using all 64 tions which are indicated via bits set in the FPSCR. bits. Data of 64 bits may be loaded from storage via Additionally, the DFP exception actions depend on the any of the Load Floating-Point Double instructions and setting of the various exception enable bits in the stored via any of the Store Floating-Point Double FPSCR. instructions. The following DFP exceptions are detected by the DFP Data in DFP Extended format or 128-bit BCD values is processor. The exception status bits in the FPSCR are held in an even-odd FPR pair using all 128 bits. Data of indicated in parentheses. 128 bits must be loaded into the desired even-odd pair 1 Invalid Operation Exception (VX) of floating-point registers using an appropriate SNaN (VXSNAN) sequence of the Load Floating-Point Double instruc- - (VXISI) tions and stored using an appropriate sequence of the ÷ (VXIDI) Store Floating-Point Double instructions. 0 ÷ 0 (VXZDZ) 2330 (VXIMZ) Data used as a source operand by any Decimal Float- Invalid Compare (VXVC) ing-Point instruction that was produced, either directly 144 Power ISATM I - III Version 2.05 or indirectly, by a Load Floating-Point Single instruc- 35 Floating-Point Overflow Exception (OX) tion, a Floating Round to Single-Precision instruction, See Section 5.5.10.3, "Overflow Exception" or a binary floating-point single-precision arithmetic on page 157. instruction is boundedly undefined. 36 Floating-Point Underflow Exception (UX) When an even-odd FPR pair is used to hold a 128-bit See Section 5.5.10.4, "Underflow Exception" operand, the even-numbered FPR is used to hold the on page 158. leftmost doubleword of the operand and the next 37 Floating-Point Zero Divide Exception (ZX) higher-numbered FPR is used to hold the rightmost See Section 5.5.10.2, "Zero Divide Exception" doubleword. A DFP instruction designating an odd- on page 157. numbered FPR for a 128-bit operand is an invalid instruction form. 38 Floating-Point Inexact Exception (XX) See Section 5.5.10.5, "Inexact Exception" on Programming Note page 159. The Floating-Point Move instructions can be used FPSCRXX is a sticky version of FPSCRFI (see to move operands between FPRs. below). Thus the following rules completely describe how FPSCRXX is set by a given The bit definitions for the FPSCR are as follows. instruction. Bit(s) Description 1 If the instruction affects FPSCRFI, the new value of FPSCRXX is obtained by 0:28 Reserved ORing the old value of FPSCRXX with 29:31 DFP Rounding Control (DRN) the new value of FPSCRFI. See Section 5.5.2, "Rounding Mode Specifi- 1 If the instruction does not affect cation" on page 151. FPSCRFI, the value of FPSCRXX is unchanged. 000 Round to Nearest, Ties to Even 001 Round toward Zero 39 Floating-Point Invalid Operation Excep- 010 Round toward +Infinity tion (SNaN) (VXSNAN) 011 Round toward -Infinity See Section 5.5.10.1, "Invalid Operation 100 Round to Nearest, Ties away from 0 Exception" on page 156. 101 Round to Nearest, Ties toward 0 40 Floating-Point Invalid Operation Excep- 110 Round to away from Zero tion (12- 1) (VXISI) 111 Round to Prepare for Shorter Precision See Section 5.5.10.1. Programming Note 41 Floating-Point Invalid Operation Excep- tion (12121) (VXIDI) FPSCR28 is reserved for extension of the See Section 5.5.10.1. DRN field, therefore DRN may be set using the mtfsfi instruction to set the 142 Floating-Point Invalid Operation Excep- rounding mode. tion (0120) (VXZDZ) See Section 5.5.10.1. 32 Floating-Point Exception Summary (FX) 43 Floating-Point Invalid Operation Excep- Every floating-point instruction, except mtfsfi tion (12320) (VXIMZ) and mtfsf, implicitly sets FPSCRFX to 1 if that See Section 5.5.10.1. instruction causes any of the floating-point exception bits in the FPSCR to change from 0 44 Floating-Point Invalid Operation Excep- to 1. mcrfs, mtfsfi, mtfsf, mtfsb0, and tion (Invalid Compare) (VXVC) mtfsb1 can alter FPSCRFX explicitly. See Section 5.5.10.1. 33 Floating-Point Enabled Exception Sum- 45 Floating-Point Fraction Rounded (FR) mary (FEX) The last Arithmetic or Rounding and Conver- This bit is the OR of all the floating-point sion instruction incremented the fraction dur- exception bits masked by their respective ing rounding. See Section 5.5.1, "Rounding" enable bits. mcrfs, mtfsfi, mtfsf, mtfsb0, on page 150. This bit is not sticky. and mtfsb1 cannot alter FPSCRFEX explicitly. 46 Floating-Point Fraction Inexact (FI) 34 Floating-Point Invalid Operation Excep- The last Arithmetic or Rounding and Conver- tion Summary (VX) sion instruction either produced an inexact This bit is the OR of all the Invalid Operation result during rounding or caused a disabled exception bits. mcrfs, mtfsfi, mtfsf, mtfsb0, Overflow Exception. See Section 5.5.1. This and mtfsb1 cannot alter FPSCRVX explicitly. bit is not sticky. Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 145 Version 2.05 See the definition of FPSCRXX, above, 57 Floating-Point Overflow Exception Enable regarding the relationship between FPSCRFI (OE) and FPSCRXX. See Section 5.5.10.3, "Overflow Exception" on page 157. 47:51 Floating-Point Result Flags (FPRF) This field is set as described below. For arith- 58 Floating-Point Underflow Exception metic, rounding, and conversion instructions, Enable (UE) the field is set based on the result placed into See Section 5.5.10.4, "Underflow Exception" the target register, except that if any portion of on page 158. the result is undefined then the value placed 59 Floating-Point Zero Divide Exception into FPRF is undefined. Enable (ZE) 47 Floating-Point Result Class Descriptor (C) See Section 5.5.10.2, "Zero Divide Exception" Arithmetic, rounding, and conversion instruc- on page 157. tions may set this bit with the FPCC bits, to 60 Floating-Point Inexact Exception Enable indicate the class of the result as shown in (XE) Figure 59 on page 146. See Section 5.5.10.5, "Inexact Exception" on 48:51 Floating-Point Condition Code (FPCC) page 159 Floating-point Compare and DFP Test instruc- 61 Reserved (not used by DFP) tions set one of the FPCC bits to 1 and the other three FPCC bits to 0. Arithmetic, round- 62:63 Binary Floating-Point Rounding Control ing, and conversion instructions may set the (RN) FPCC bits with the C bit, to indicate the class See Section 5.5.1, "Rounding" on page 150. of the result as shown in Figure 59 on 00 Round to Nearest page 146. Note that in this case the high-order 01 Round toward Zero three bits of the FPCC retain their relational 10 Round toward +Infinity significance indicating that the value is less 11 Round toward -Infinity than, greater than, or equal to zero. 48 Floating-Point Less Than or Negative (FL Result or <) Flags Result Value Class C < > = ? 49 Floating-Point Greater Than or Positive (FG or >) 0 0 0 0 1 Signaling NaN (DFP only) 1 0 0 0 1 Quiet NaN 50 Floating-Point Equal or Zero (FE or =) 0 1 0 0 1 - Infinity 51 Floating-Point Unordered or NaN (FU or ?) 0 1 0 0 0 - Normal Number 52 Reserved 1 1 0 0 0 - Subnormal Number 1 0 0 1 0 - Zero 53 Floating-Point Invalid Operation Excep- 0 0 0 1 0 + Zero tion (Software Request) (VXSOFT) 1 0 1 0 0 + Subnormal Number This bit can be altered only by mcrfs, mtfsfi, mtfsf, mtfsb0, or mtfsb1. See 0 0 1 0 0 + Normal Number Section 5.5.10.1, "Invalid Operation Excep- 0 0 1 0 1 + Infinity tion" on page 156. Figure 59. Floating-Point Result Flags 54 Neither used nor changed by DFP. Programming Note 5.3 DFP Support for Non-DFP Although the architecture does not pro- vide a DFP square root instruction, if soft- Data Types ware simulates such an instruction, it In addition to the DFP data types, the DFP processor should set bit 54 whenever the source provides limited support for the following non-DFP data operand of the square root function is types: signed or unsigned binary fixed-point data, and invalid. signed or unsigned decimal data. 55 Floating-Point Invalid Operation Excep- In unsigned binary fixed-point data, all bits are used to tion (Invalid Conversion) (VXCVI) express the absolute value of the number. For signed See Section 5.5.10.1. binary fixed-point data, the leftmost bit represents the sign, which is followed by the numeric field. Positive 56 Floating-Point Invalid Operation Excep- numbers are represented in true binary notation with tion Enable (VE) the sign bit set to zero. When the value is zero, all bits See Section 5.5.10.1. 146 Power ISATM I - III Version 2.05 are zeros, including the sign bit. Negative numbers are summary of digit and sign codes are provided in represented in two's complement binary notation with a Figure 62. one in the sign-bit position. Binary Recognized As For decimal data, each byte contains a pair of four-bit nibbles; each four-bit nibble contains a binary-coded- Code Digit Sign decimal (BCD) code. There are two kinds of BCD 0000 0 Invalid codes: digit code and sign code. For unsigned decimal 0001 1 Invalid data, all nibbles contain a digit code (D) as shown in Figure 60 0010 2 Invalid 0011 3 Invalid D D D D ... D D D D 0100 4 Invalid Figure 60. Format for Unsigned Decimal Data 0101 5 Invalid For signed decimal data, the rightmost nibble contains 0110 6 Invalid a sign code (S) and all other nibbles contain a digit 0111 7 Invalid code as shown in Figure 61. 1000 8 Invalid 1001 9 Invalid D D D D ... D D D S 1010 Invalid Plus Figure 61. Format for Signed Decimal Data 1011 Invalid Minus The decimal digits 0-9 have the binary encoding 0000- 1100 Invalid Plus (preferred; option 1) 1001. The preferred plus-sign codes are 1100 and 1101 Invalid Minus (preferred) 1111. The preferred minus sign code is 1101. These are the sign codes generated for the results of the Decode 1110 Invalid Plus DPD To BCD instruction. A selection is provided by this 1111 Invalid Plus (preferred; option 2) instruction to specify which of the two preferred plus sign codes is to be generated. Alternate sign codes are Figure 62. Summary of BCD Digit and Sign Codes also recognized as valid in the sign position: 1010 and 1110 are alternate sign codes for plus, and 1011 is an alternate sign code for minus. Alternate sign codes are 5.4 DFP Number Representation accepted for any source operand, but are not gener- A DFP finite number consists of three components: a ated as a result by the instruction. When an invalid digit sign bit, a signed exponent, and a significand. The or sign code is detected by the Encode BCD To DPD signed exponent is a signed binary integer. The signifi- instruction, an invalid-operation exception occurs. A cand consists of a number of decimal digits, which are to the left of the implied decimal point. The rightmost digit of the significand is called the units digit. The numerical value of a DFP finite number is represented as (-1)sign 2 significand 2 10exponent and the unit value of this number is (1 1 10exponent), which is called the quantum. DFP finite numbers are not normalized. This allows leading zeros and trailing zeros to exist in the signifi- cand. This unnormalized DFP number representation allows some values to have redundant forms; each form represents the DFP number with a different com- bination of the significand value and the exponent value. For example, 1000000 2 105 and 10 2 1010 are two different forms of the same numerical value. A form of this number representation carries information about both the numerical value and the quantum of a DFP finite number. The significant digits of a DFP finite number are the digits in the significand beginning with the leftmost non- zero digit and ending with the units digit. Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 147 Version 2.05 5.4.1 DFP Data Format for denoting the value as either a Not-a-Number or an Infinity. DFP numbers and NaNs may be represented in FPRs The first 5 bits of the combination field contain the in any of the three data formats: DFP Short, DFP Long, encoding of NaN or infinity, or the two leftmost bits of or DFP Extended. The contents of each data format the biased exponent and the leftmost digit (LMD) of the represent encoded information. Special codes are significand. The following tables show the encoding: assigned to NaNs and infinities. Different formats sup- port different sizes in both significand and exponent. Arithmetic, compare, test, quantum-adjustment, and G0:4 Description format instructions are provided for DFP Long and DFP 11111 NaN Extended formats only. 11110 Infinity The sign is encoded as a one bit binary value. Signifi- cand is encoded as an unsigned decimal integer in two All others Finite Number (see Figure 67) distinct parts. The leftmost digit (LMD) of the significand Figure 66. Encoding of the G field for Special is encoded as part of the combination field; the remain- Symbols ing digits of the significand are encoded in the trailing significand field. The exponent is contained in the com- Leftmost 2-bits of biased exponent bination field in two parts. However, prior to encoding, LMD the exponent is converted to an unsigned binary value 00 01 10 called the biased exponent by adding a bias value 0 00000 01000 10000 which is a constant for each format. The two leftmost 1 00001 01001 10001 bits of the biased exponent are encoded with the left- 2 00010 01010 10010 most digit of the significand in the leftmost bits of the combination field. The rest of the biased exponent 3 00011 01011 10011 occupies the remaining portion of the combination field. 4 00100 01100 10100 5 00101 01101 10101 5.4.1.1 Fields Within the Data Format 6 00110 01110 10110 The DFP data representation comprises three fields, as 7 00111 01111 10111 diagrammed below for each of the three formats: 8 11000 11010 11100 9 11001 11011 11101 S G T Figure 67. Encoding of bits 0:4 of the G field for 0 1 12 31 Finite Numbers Figure 63. DFP Short format For DFP finite numbers, the rightmost N-5 bits of the N-bit combination field contain the remaining bits of the biased exponent. For NaNs, bit 5 of the combination S G T field is used to distinguish a Quiet NaN from a Signal- 0 1 14 63 ing NaN; the remaining bits in a source operand are Figure 64. DFP Long format ignored and they are set to zeros in a target operand by most operations. For infinities, the rightmost N-5 bits of the N-bit combination field of a source operand are S G T ignored and they are set to zeros in a target operand by 0 1 18 63 most operations. T (continued) Trailing Significand field (T) 64 127 For DFP finite numbers, this field contains the remain- Figure 65. DFP Extended format ing significand digits. For NaNs, this field may be used to contain diagnostic information. For infinities, con- The fields are defined as follows: tents in this field of a source operand are ignored and they are set to zeros in a target operand by most oper- Sign bit (S) ations. The trailing significand field is a multiple of 10- The sign bit is in bit 0 of each format, and is zero for bit blocks. The multiple depends on the format. Each plus and one for minus. 10-bit block is called a declet and represents three dec- Combination field (G) imal digits, using the Densely Packed Decimal (DPD) As the name implies, this field provides a combination encoding defined in Appendix A. of the exponent and the left-most digit (LMD) of the sig- nificand, for finite numbers, or provides a special code 148 Power ISATM I - III Version 2.05 5.4.1.2 Summary of DFP Data Formats The properties of the three DFP formats are summa- rized in the following table:. Format DFP Short DFP Long DFP Extended Widths (bits): Format 32 64 128 Sign (S) 1 1 1 Combination (G) 11 13 17 Trailing Significand (T) 20 50 110 Exponent: Maximum biased 191 767 12,287 Maximum (Xmax) 90 369 6111 Minimum (Xmin) -101 -398 -6176 Bias 101 398 6176 Precision (p) (digits) 7 16 34 Magnitude: Maximum normal number (Nmax) (107 - 1) x 1090 (1016 - 1) x 10369 (1034 - 1) x 106111 -95 -383 Minimum normal number (Nmin) 1 x 10 1 x 10 1 x 10-6143 Minimum subnormal number (Dmin) 1 x 10-101 1 x 10-398 1 x 10-6176 Figure 68. Summary of DFP Formats 5.4.1.3 Preferred DPD Encoding 5.4.2 Classes of DFP Data Execution of DFP instructions decodes source oper- There are six classes of DFP data, which include ands from DFP data formats to an internal format for numerical and nonnumeric entities. The numerical enti- processing, and encodes the operation result before ties include zero, subnormal number, normal number, the final result is returned as the target operand. and infinity data classes. The nonnumeric entities As part of the decoding process, declets in the trailing include quiet and signaling NaNs data classes. The significand field of source operands are decoded to value of a DFP finite number, including zero, subnor- their corresponding BCD digit codes using the DPD-to- mal number, and normal number, is a quantization of BCD decoding algorithm. As part of the encoding pro- the real number based on the data format. The Test cess, BCD digit codes to be stored into the trailing sig- Data Class instruction may be used to determine the nificand field of the target operand are encoded into class of a DFP operand. In general, an operation that declets using the BCD-to-DPD encoding algorithm. returns a DFP result sets the FPSCRFPRF field to indi- Both the decoding and encoding algorithms are defined cate the data class of the result. in Appendix A. The following tables show the value ranges for finite- As explained in Appendix A, there are eight 3-digit dec- number data classes, and the codes for NaNs and imal values that have redundant DPD codes and one infinities. preferred DPD code. All redundant DPD codes are rec- ognized in source operands for the associated 3-digit decimal number. DFP operations will always generate the preferred DPD codes for the trailing significand field of the target operand. Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 149 Version 2.05 Data Class Sign Magnitude 0b111110 in the leftmost 6 bits of the combination field indicates a Quiet NaN, whereas 0b111111 indicates a Zero ± 0* Signaling NaN. Subnormal ± Dmin |X| < Nmin A special QNaN is sometimes supplied as the default Normal ± Nmin |Y| Nmax QNaN for a disabled invalid-operation exception; it has * The significand is zero and the exponent is any rep- a plus sign, the leftmost 6 bits of the combination field resentable value set to 0b111110 and remaining bits in the combination field and the trailing significand field set to zero. Figure 69. Value Ranges for Finite Number Data Classes Normally, source QNaNs are propagated during opera- tions so that they will remain visible at the end. When a Data Class S G T QNaN is propagated, the sign is preserved, the decimal +Infinity 0 11110xxx . . . xxx xxx . . . xxx value of the trailing significand field is preserved but reencoded using the preferred DPD codes, and the ­Infinity 1 11110xxx . . . xxx xxx . . . xxx contents in the rightmost N-6 bits of the combination Quiet NaN x 111110xx . . . xxx xxx . . . xxx field set to zero, where N is the width of the combina- Signaling NaN x 111111xx . . . xxx xxx . . . xxx tion field for the format. x Don't care A source SNaN generally causes an invalid-operation exception. If the exception is disabled, the SNaN is converted to the corresponding QNaN and propagated. Figure 70. Encoding of NaN and Infinity Data The primary encoding difference between an SNaN Classes and a QNaN is that bit 5 of an SNaN is 1 and bit 5 of a Zeros QNaN is 0. When an SNaN is propagated as a QNaN, Zeros have a zero significand and any representable bit 5 is set to 0, and, just as with QNaN proagation, the value in the exponent. A +0 is distinct from -0, and sign is preserved, the decimal value of the trailing sig- zeros with different exponents are distinct, except that nificand field is preserved but reencoded using the pre- comparison treats them as equal. ferred DPD codes, and the contents in the rightmost N- 6 bits of the combination field set to zero, where N is Subnormal Numbers the width of the combination field for the format. For Subnormal numbers have values that are smaller than some format-conversion instructions, a source SNaN Nmin and greater than zero in magnitude. does not cause an invalid-operation exception, and an Normal Numbers SNaN is returned as the target operand. Normal numbers are nonzero finite numbers whose For instructions with two source NaNs and a NaN is to magnitude is between Nmin and Nmax inclusively. be propagated as the result, do the following. Infinities 1 If there is a QNaN in FRA and an SNaN in FRB, Infinities are represented by 0b11110 in the leftmost 5 the SNaN in FRB is propagated. bits of the combination field. When an operation is 1 Otherwise, propagate the NaN is FRA. defined to generate an infinity as the result, a default infinity is sometimes supplied. A default infinity has all remaining bits in the combination field and trailing sig- 5.5 DFP Execution Model nificand field set to zeros. DFP operations are performed as if they first produce When infinities are used as source operands, only the an intermediate result correct to infinite precision and leftmost 5 bits of the combination field are interpreted with unbounded range. The intermediate result is then (i.e., 0b11110 indicates the value is an infinity). The rounded to the destination's precision according to one trailing significand field of infinities is usually ignored. of the eight DFP rounding modes. If the rounded result For generated infinities, the leftmost 5 bits of the combi- has only one form, it is delivered as the final result; if nation field are set to 0b11110 and all remaining combi- the rounded result has redundant forms, then an ideal nation bits are set to zero. exponent is used to select the form of the final result. The ideal exponent determines the form, not the value, Infinities can participate in most arithmetic operations of the final result. (See Section 5.5.3 "Formation of and give a consistent result. In comparisons, any Final Result" on page 152.) +Infinity compares greater than any finite number, and any -Infinity compares less than any finite number. All +Infinity are compared equal and all -Infinity are com- 5.5.1 Rounding pared equal. Rounding takes a number regarded as infinitely precise Signaling and Quiet NaNs and, if necessary, modifies it to fit the destination's pre- There are two types of Not-a-Numbers (NaNs), Signal- cision. The destination's precision of an operation ing (SNaN) and Quiet (QNaN). defines the set of permissible resultant values. For 150 Power ISATM I - III Version 2.05 most operations, the destination's precision is the tar- of a tie, choose the larger in magnitude (Z1 or Z2). get-format precision and the permissible resultant val- However, an infinitely precise result with magnitude at ues are those values representable in the target format. least (Nmax + 0.5Q(Nmax)) is rounded to infinity with no For some special operations, the destination precision change in sign; where Q(Nmax) is the quantum of Nmax. is constrained by both the target format and some addi- Round to Nearest, Ties toward 0 tional restrictions, and the permissible resultant values Choose the value that is closer to Z (Z1 or Z2). In case are a subset of the values representable in the target of a tie, choose the smaller in magnitude (Z1 or Z2). format. However, an infinitely precise result with magnitude Rounding sets FPSCR bits FR and FI. When an inex- greater than (Nmax + 0.5Q(Nmax)) is rounded to infinity act exception occurs, FI is set to one; otherwise, FI is with no change in sign; where Q(Nmax) is the quantum set to zero. When an inexact exception occurs and if of Nmax. the rounded result is greater in magnitude than the Round away from 0 intermediate result, then FR is set to one; otherwise, Choose the larger in magnitude (Z1 or Z2). FR is set to zero. The exception is the Round to FP Integer Without Inexact instruction, which always sets Round to prepare for shorter precision FR and FI to zero. Rounding may cause an overflow Choose the smaller in magnitude (Z1 or Z2). If the exception or underflow exception; it may also cause an selected value is inexact and the units digit of the inexact exception. selected value is either 0 or 5, then the digit is incre- mented by one and the incremented result is delivered. Refer to Figure 71 below for rounding. Let Z be the In all other cases, the selected value is delivered. intermediate result of a DFP operation. Z may or may When a value has redundant forms, the units digit is not fit in the destination's precision. If Z is exactly one determined by using the form that has the smallest of the permissible representable resultant values, then exponent. the final result in all rounding modes is Z. Otherwise, either Z1 or Z2 is chosen to approximate the result, where Z1 and Z2 are the next larger and smaller per- 5.5.2 Rounding Mode Specifica- missible resultant values, respectively. tion Unless otherwise specified in the instruction definition, the rounding mode used by an operation is specified in By increasing |Z| the DFP rounding control (DRN) field of the FPSCR. Infinitely precise value The eight DFP rounding modes are encoded in the By decreasing |Z| DRN field as specified in the table below. DRN Rounding Mode Z2 Z1 0 Z2 Z1 000 Round to Nearest, Ties to Even Z Z 001 Round toward 0 Negative values Positive Values 010 Round toward +Infinity 011 Round toward -Infinity 100 Round to Nearest, Ties away from 0 Figure 71. Rounding 101 Round to Nearest, Ties toward 0 Round to Nearest, Ties to Even 110 Round away from 0 Choose the value that is closer to Z (Z1 or Z2). In case 111 Round to Prepare for Shorter Precision of a tie, choose the one whose units digit would have Figure 72. Encoding of DFP Rounding-Mode been even in the form with the largest common quan- Control (DRN) tum of the two permissible resultant values. However, an infinitely precise result with magnitude at least (Nmax For the quantum-adjustment, a 2-bit immediate field, + 0.5Q(Nmax)) is rounded to infinity with no change in called RMC (Rounding Mode Control), in the instruction sign; where Q(Nmax) is the quantum of Nmax. specifies the rounding mode used. The RMC field may contain a primary encoding or a secondary encoding. Round toward 0 For Quantize, Quantize Immediate, and Reround, the Choose the smaller in magnitude (Z1 or Z2). RMC field contains the primary encoding. For Round Round toward + to FP Integer the field contains either encoding, Choose Z1. depending on the setting of a RMC-encoding-selection Round toward - Choose Z2. Round to Nearest, Ties away from 0 Choose the value that is closer to Z (Z1 or Z2). In case Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 151 Version 2.05 bit. The following tables define the primary encoding The following table specifies the ideal exponent for and the secondary encoding. each instruction. Primary Operations Ideal Exponent Rounding Mode RMC Add min(E(FRA), E(FRB)) 00 Round to nearest, ties to even 01 Round toward 0 Subtract min(E(FRA), E(FRB)) 10 Round to nearest, ties away from 0 Multiply E(FRA) + E(FRB) 11 Round according to FPSCRDRN Divide E(FRA) - E(FRB) Figure 73. Primary Encoding of Rounding-Mode Quantize-Immediate See Instruction Description Control Quantize E(FRA) Reround See Instruction Description Secondary Rounding Mode Round to FP Integer max(0, E(FRA)) RMC 00 Round to + Convert to DFP Long E(FRA) 01 Round to - Convert to DFP E(FRA) 10 Round away from 0 Extended 11 Round to nearest, ties toward 0 Round to DFP Short E(FRA) Figure 74. Secondary Encoding of Rounding-Mode Round to DFP Long E(FRA) Control Convert from Fixed 0 Encode BCD to DPD 0 5.5.3 Formation of Final Result Insert Biased Expo- E(FRA) nent An ideal exponent is defined for each DFP instruction that returns a DFP data operand. Notes: E(x) - exponent of the DFP operand in register x. 5.5.3.1 Use of Ideal Exponent Figure 75. Summary of Ideal Exponents For all DFP operations, 1 if the rounded intermediate result has only one 5.5.4 Arithmetic Operations form, then that form is delivered as the final result. 1 if the rounded intermediate result has redundant. Four arithmetic operations are provided: Add, Subtract, forms and is exact, then the form with the expo- Multiply, and Divide. nent closest to the ideal exponent is delivered. 1 if the rounded intermediate result has redundant 5.5.4.1 Sign of Arithmetic Result forms and is inexact, then the form with the small- est exponent is delivered. The following rules govern the sign of an arithmetic operation when the operation does not yield an excep- tion. They apply even when the operands or results are zeros or infinities. 1 The sign of the result of an add operation is the sign of the source operand having the larger abso- lute value. If both source operands have the same sign, the sign of the result of an add operation is the same as the sign of the source operands. When the sum of two operands with opposite signs is exactly zero, the sign of the result is positive in all rounding modes except Round toward -, in which case the sign is negative. 1 The sign of the result of the subtract operation x - y is the same as the sign of the result of the add operation x + (-y). 1 The sign of the result of a multiply or divide opera- tion is the exclusive-OR of the signs of the source operands. 152 Power ISATM I - III Version 2.05 5.5.5 Compare Operations equal, and NaNs compare equal. The test result is indi- cated in the FPSCRFPCC field and CR field BF. Two sets of instructions are provided for comparing The Test Significance instruction compares the number numerical values: Compare Ordered and Compare of significant digits of one source operand with the ref- Unordered. In the absence of NaNs, these instructions erenced number of significant digits in another source work the same. These instructions work differently operand. The test result is indicated in the FPSCRFPCC when either of the followings is true: field and CR field BF. 1. At least one source operand of the instruction is an Execution of a test instruction does not cause any DFP SNaN and the invalid-operation exception is dis- exception. abled. 2. When there is no SNaN in any source operand, at least one source operand of the instruction is a 5.5.7 Quantum Adjustment Opera- QNaN tions In case 1, Compare Unordered recognizes an invalid- operation exception and sets the FPSCRVXSNAN flag, Four kinds of quantum-adjustment operations are pro- but Compare Ordered recognizes the exception and vided: Quantize, Quantize Immediate, Reround, and sets both the FPSCRVXSNAN and FPSCRVXVC flags. Round To FP Integer. Each of them has an immediate In case 2, Compare Unordered does not recognize an field which specifies whether the rounding mode in exception, but Compare Ordered recognizes an invalid- FPSCR or a different one is to be used. operation exception and sets the FPSCRVXVC flag. The Quantize instruction is used to adjust a DFP num- For finite numbers, comparisons are performed on val- ber to the form that has the specified target exponent. ues, that is, all redundant forms of a DFP number are The Quantize Immediate instruction is similar to the treated equal. Quantize instruction, except that the target exponent is specified in a 5-bit immediate field as a signed binary Comparisons are always exact and cannot cause an integer and has a limited range. inexact exception. The Reround instruction is used to simulate a DFP Comparison ignores the sign of zero, that is, +0 equals operation of a precision other than that of DFP Long or -0. DFP Extended. For the Reround instruction to produce Infinities with like sign compare equal, that is, + a result which accurately reflects that which would have equals +, and - equals -. resulted from a DFP operation of the desired precision d in the range {1: 33} inclusively, the following condi- A NaN compares as unordered with any other operand, tions must be met: whether a finite number, an infinity, or another NaN, including itself. 1 The precision of the preceding DFP operation must be at least one digit larger than d. Execution of a compare instruction always completes, 1 The rounding mode used by the preceding DFP regardless of whether any DFP exception occurs or operation must be round-to-prepare-for-shorter- not, and whether the exception is enabled or not. precision. The Round To FP Integer instruction is used to round a 5.5.6 Test Operations DFP number to an integer value of the same format. Four kinds of test operations are provided: Test Data The target exponent is implicitly specified, and is Class, Test Data Group, Test Exponent, and Test Sig- greater than or equal to zero. nificance. The Test Data Class instruction examines the contents 5.5.8 Conversion Operations of a source operand and determines if the operand is There are two kinds of conversion operations: data-for- one of the specified data classes. The test result and mat conversion and data-type conversion. the sign of the source operand are indicated in the FPSCRFPCC field and CR field BF. The Test Data Group instruction examines the contents 5.5.8.1 Data-Format Conversion of a source operand and determines if the operand is The instructions Convert To DFP Long and Convert To one of the specified data groups. The test result and DFP Extended convert DFP operands to wider formats; the sign of the source operand are indicated in the the instructions Round To DFP Short and Round To FPSCRFPCC field and CR field BF. DFP Long convert DFP operands to narrower formats. The Test Exponent instruction compares the exponent When converting a finite number to a wider format, the of the two source operands. The test operation ignores result is exact. When converting a finite number to a the sign and significand of operands. Infinities compare narrower format, the source operand is rounded to the Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 153 Version 2.05 target-format precision, which is specified by the 5.5.9 Format Operations instruction, not by the target register size. The format instructions are provided to facilitate com- When converting a finite number, the ideal exponent of posing or decomposing a DFP number, and consist of the result is the source exponent. Encode BCD To DPD, Decode DPD To BCD, Extract Conversion of an infinity or NaN to a different format Biased Exponent, Insert Biased Exponent, Shift Signifi- does not preserve the source combination field. Let N cand Left Immediate, and Shift Significand Right Imme- be the width of the target format's combination field. diate. A source operand of SNaN does not cause an invalid-operation exception, and an SNaN may be pro- 1 When the result is an infinity or a QNaN, the con- duced as the target operand. tents of the rightmost N-5 bits of the N-bit target combination field are set to zero. 1 When the result is an SNaN, bit 5 of the target for- 5.5.10 DFP Exceptions mat's combination field is set to one and the right- This architecture defines the following DFP exceptions: most N-6 bits of the N-bit target combination field are set to zero. 1 Invalid Operation Exception SNaN When converting a NaN to a wider format or when con- 1- verting an infinity from DFP Short to DFP Long, digits in ÷ the source trailing significand field are reencoded using 0÷0 the preferred DPD codes with sufficient zeros 20 appended on the left to form the target trailing signifi- Invalid Compare cand field. When converting a NaN to a narrower for- Invalid Conversion mat or when converting an infinity from DFP Long to 1 Zero Divide Exception DFP Short, the appropriate number of leftmost digits of 1 Overflow Exception the source trailing significand field are removed and the 1 Underflow Exception remaining digits of the field are reencoded using the 1 Inexact Exception preferred DPD codes to form the target trailing signifi- cand field. These exceptions may occur during execution of a DFP instruction. When converting an infinity between DFP Long and DFP Extended, a default infinity with the same sign is Each DFP exception, and each category of the Invalid produced. Operation Exception, has an exception status bit in the FPSCR. In addition, each DFP exception has a corre- When converting an SNaN between DFP Short and sponding enable bit in the FPSCR. The exception sta- DFP Long, it is converted to an SNaN without causing tus bit indicates occurrence of the corresponding an invalid-operation exception. When converting an exception. If an exception occurs, the corresponding SNaN between DFP Long and DFP Extended, the enable bit governs the result produced by the instruc- invalid-operation exception occurs; if the invalid-opera- tion and, in conjunction with the FE0 and FE1 bits (see tion exception is disabled, the result is converted to the the discussion of FE0 and FE1 below), whether and corresponding QNaN. how the system floating-point enabled exception error handler is invoked. (In general, the enabling specified 5.5.8.2 Data-Type Conversion by the enable bit is of invoking the system error han- dler, not of permitting the exception to occur. The The instructions Convert From Fixed and Convert To occurrence of an exception depends only on the Fixed are provided to convert a number between the instruction and its source operands, not on the setting DFP data type and the signed 64-bit binary-integer of any control bits. The only deviation from this general data type. rule is that the occurrence of an Underflow Exception Conversion of a signed 64-bit binary integer to a DFP may depend on the setting of the enable bit.) Extended number is always exact. A single instruction, other than mtfsfi or mtfsf, may set Conversion of a DFP number to a signed 64-bit binary more than one exception bit only in the following cases: integer results in an invalid-operation exception when 1 Inexact Exception may be set with Overflow the converted value does not fit into the target format, Exception. or when the source operand is an infinity or NaN. When 1 Inexact Exception may be set with Underflow the exception is disabled, the most positive integer is Exception. returned if the source operand is a positive number or 1 Invalid Operation Exception (SNaN) may be set +, and the most negative integer is returned if the with Invalid Operation Exception (Invalid Compare) source operand is a negative number, -, or NaN. for Compare Ordered instructions 154 Power ISATM I - III Version 2.05 1 Invalid Operation Exception (SNaN) may be set In this case the system floating-point enabled exception with Invalid Operation Exception (Invalid Conver- error handler is not invoked, even if DFP exceptions sion) for Convert To Fixed instructions. occur: software can inspect the FPSCR exception bits if necessary, to determine whether exceptions have When an exception occurs the instruction execution occurred. may be completed or partially completed, depending on the exception and the operation. In this architecture, if software is to be notified that a given kind of exception has occurred, the correspond- For all instructions, except for the Compare and Test ing FPSCR exception enable bit must be set to one and instructions, the following exceptions cause the instruc- a mode other than Ignore Exceptions Mode must be tion execution to be partially completed. That is, setting used. In this case the system floating-point enabled of CR field 1(when Rc=1) and exception status flags is exception error handler is invoked if an enabled DFP performed, but no result is stored into the target FPR or exception occurs. The system floating-point enabled FPR pair. For Compare and Test instructions, instruc- exception error handler is also invoked if a Move To tion execution is always completed, regardless of FPSCR instruction causes an exception bit and the cor- whether any DFP exception occurs or not, and whether responding enable bit both to be 1; the Move To the exception is enabled or not. FPSCR instruction is considered to cause the enabled 1 Enabled Invalid Operation exception. 1 Enabled Zero Divide The FE0 and FE1 bits control whether and how the For the remaining kinds of exceptions, instruction exe- system floating-point enabled exception error handler cution is completed, a result, if specified by the instruc- is invoked if an enabled DFP exception occurs. The tion, is generated and stored into the target FPR or location of these bits and the requirements for altering FPR pair, and appropriate status flags are set. The them are described in Book III, PowerPC AS Operating result may be a different value for the enabled and dis- Environment Architecture. (The system floating-point abled conditions for some of these exceptions. The enabled exception error handler is never invoked kinds of exceptions that deliver a result in target FPR because of a disabled DFP exception.) The effects of are the following: the four possible settings of these bits are as follows. 1 Disabled Invalid Operation 1 Disabled Zero Divide FE0 FE1 Description 1 Disabled Overflow 0 0 Ignore Exceptions Mode 1 Disabled Underflow DFP exceptions do not cause the system 1 Disabled Inexact floating-point enabled exception error 1 Enabled Overflow handler to be invoked. 1 Enabled Underflow 0 1 Imprecise Nonrecoverable Mode 1 Enabled Inexact The system floating-point enabled excep- Subsequent sections define each of the DFP excep- tion error handler is invoked at some point tions and specify the action that is taken when they are at or beyond the instruction that caused detected. the enabled exception. It may not be pos- sible to identify the excepting instruction The IEEE standard specifies the handling of excep- or the data that caused the exception. tional conditions in terms of "traps" and "trap handlers". Results produced by the excepting In this architecture, a FPSCR exception enable bit of 1 instruction may have been used by or may causes generation of the result value specified in the have affected subsequent instructions IEEE standard for the "trap enabled" case: the expecta- that are executed before the error handler tion is that the exception will be detected by software, is invoked. which will revise the result. A FPSCR exception enable 1 0 Imprecise Recoverable Mode bit of 0 causes generation of the "default result" value The system floating-point enabled excep- specified for the "trap disabled" (or "no trap occurs" or tion error handler is invoked at some point "trap is not implemented") case: the expectation is that at or beyond the instruction that caused the exception will not be detected by software, which the enabled exception. Sufficient informa- will simply use the default result. The result to be deliv- tion is provided to the error handler that it ered in each case for each exception is described in can identify the excepting instruction and the sections below. the operands, and correct the result. No The IEEE default behavior when an exception occurs is results produced by the excepting instruc- to generate a default value and not to notify software. tion have been used by or have affected In this architecture, if the IEEE default behavior when subsequent instructions that are executed an exception occurs is desired for all exceptions, all before the error handler is invoked. FPSCR exception enable bits should be set to zero and Ignore Exceptions Mode (see below) should be used. Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 155 Version 2.05 FE0 FE1 Description enable bits set to one for those exceptions for which the system floating-point enabled exception 1 1 Precise Mode error handler is to be invoked. The system floating-point enabled excep- tion error handler is invoked precisely at 1 Ignore Exceptions Mode should not, in general, be the instruction that caused the enabled used when any FPSCR exception enable bits are exception. set to one. In all cases, the question of whether a DFP result is 1 Precise Mode may degrade performance in some stored, and what value is stored, is governed by the implementations, perhaps substantially, and there- FPSCR exception enable bits, as described in subse- fore should be used only for debugging and other quent sections, and is not affected by the value of the specialized applications. FE0 and FE1 bits. In all cases in which the system floating-point enabled 5.5.10.1 Invalid Operation Exception exception error handler is invoked, all instructions before the instruction at which the system floating-point Definition enabled exception error handler is invoked have com- pleted, and no instruction after the instruction at which An Invalid Operation Exception occurs when an oper- the system floating-point enabled exception error han- and is invalid for the specified DFP operation. The dler is invoked has begun execution. (Recall that, for invalid DFP operations are: the two Imprecise modes, the instruction at which the 1 Any DFP operation on a signaling NaN (SNaN), system floating-point enabled exception error handler except for Test, Round To DFP Short, Convert To is invoked need not be the instruction that caused the DFP Long, Decode DPD To BCD, Extract Biased exception.) The instruction at which the system float- Exponent, Insert Biased Exponent, Shift Signifi- ing-point enabled exception error handler is invoked cand Left Immediate, and Shift Significand Right has not been executed unless it is the excepting Immediate instruction, in which case it has been executed if the 1 For add or subtract operations, magnitude subtrac- exception is not among those listed on page 154 as tion of infinities (+) + (-) suppressed. 1 Division of infinity by infinity ( ÷ ) 1 Division of zero by zero (0 ÷ 0) Programming Note 1 Multiplication of infinity by zero ( 2 0) In the ignore and both imprecise modes, a Float- 1 Ordered comparison involving a NaN (Invalid ing-Point Status and Control Register instruction Compare) can be used to force any exceptions, due to 1 The Quantize operation detects that the signifi- instructions initiated before the Floating-Point Sta- cand associated with the specified target exponent tus and Control Register instruction, to be recorded would have more significant digits than the target- in the FPSCR. (This forcing is superfluous for Pre- format precision cise Mode.) 1 For the Quantize operation, when one source operand specifies an infinity and the other speci- In either of the Imprecise modes, a Floating-Point fies a finite number Status and Control Register instruction can be 1 The Reround operation detects that the target used to force any invocations of the system float- exponent associated with the specified target sig- ing-point enabled exception error handler, due to nificance would be greater than Xmax instructions initiated before the Floating-Point Sta- 1 The Encode BCD To DPD operation detects an tus and Control Register instruction, to occur. (This invalid BCD digit or sign code forcing has no effect in Ignore Exceptions Mode, 1 The Convert To Fixed operation involving a num- and is superfluous for Precise Mode.) ber too large in magnitude to be represented in the target format, or involving a NaN. In order to obtain the best performance across the wid- est range of implementations, the programmer should obey the following guidelines. 1 If the IEEE default results are acceptable to the application, Ignore Exceptions Mode should be used with all FPSCR exception enable bits set to zero. 1 If the IEEE default results are not acceptable to the application, Imprecise Nonrecoverable Mode should be used, or Imprecise Recoverable Mode if recoverability is needed, with FPSCR exception 156 Power ISATM I - III Version 2.05 +, and to the most negative 64-bit binary Programming Note integer if the operand in FRB is a negative In addition, an Invalid Operation Exception occurs if number, - , or NaN. software explicitly requests this by executing an FPSCRFR FI are set to zero mtfsfi, mtfsf, or mtfsb1 instruction that sets FPSCRFPRF is unchanged FPSCRVXSOFT to 1 (Software Request). The pur- 4. If the operation is a compare, pose of FPSCRVXSOFT is to allow software to FPSCRFR FI C are unchanged cause an Invalid Operation Exception for a condi- FPSCRFPCC is set to reflect unordered tion that is not necessarily associated with the exe- cution of a DFP instruction. For example, it might be set by a program that computes a square root, if 5.5.10.2 Zero Divide Exception the source operand is negative. Definition Action A Zero Divide Exception occurs when a Divide instruc- tion is executed with a zero divisor value and a finite The action to be taken depends on the setting of the nonzero dividend value. Invalid Operation Exception Enable bit of the FPSCR. When Invalid Operation Exception is enabled Action (FPSCRVE=1) and Invalid Operation occurs, the follow- The action to be taken depends on the setting of the ing actions are taken: Zero Divide Exception Enable bit of the FPSCR. 1. One or two Invalid Operation Exceptions are set: When Zero Divide Exception is enabled (FPSCRZE=1) FPSCRVXSNAN (if SNaN) and Zero Divide occurs, the following actions are taken: FPSCRVXISI (if - ) FPSCRVXIDI (if ÷ ) 1. Zero Divide Exception is set FPSCRVXZDZ (if 0 ÷ 0) FPSCRZX 1 1 FPSCRVXIMZ (if 2 0) 2. The target FPR is unchanged FPSCRVXVC (if invalid comp) 3. FPSCRFR FI are set to zero FPSCRVXCVI (if invalid conversion) 4. FPSCRFPRF is unchanged 2. If the operation is an arithmetic, quantum-adjust- ment, conversion, or format, When Zero Divide Exception is disabled (FPSCRZE=0) the target FPR is unchanged, and Zero Divide occurs, the following actions are taken: FPSCRFR FI are set to zero, and 1. Zero Divide Exception is set FPSCRFPRF is unchanged. FPSCRZX 1 1 3. If the operation is a compare, 2. The target FPR is set to ±, where the sign is FPSCRFR FI C are unchanged, and determined by the XOR of the signs of the oper- FPSCRFPCC is set to reflect unordered. ands 3. FPSCRFR FI are set to zero When Invalid Operation Exception is disabled 4. FPSCRFPRF is set to indicate the class and sign of (FPSCRVE=0) and Invalid Operation occurs, the follow- the result (±) ing actions are taken: 1. One or two Invalid Operation Exceptions are set: FPSCRVXSNAN (if SNaN) 5.5.10.3 Overflow Exception FPSCRVXISI (if - ) FPSCRVXIDI (if ÷ ) Definition FPSCRVXZDZ (if 0 ÷ 0) An overflow exception occurs whenever the target for- FPSCRVXIMZ (if 2 0) mat's largest finite number is exceeded in magnitude FPSCRVXVC (if invalid comp) by what would have been the rounded result if the FPSCRVXCVI (if invalid conversion) exponent range were unbounded. 2. If the operation is an arithmetic, quantum-adjust- ment, Round to DFP Long, Convert to DFP Extended, or format Action the target FPR is set to a Quiet NaN Except for Reround, the following describes the han- FPSCRFR FI are set to zero dling of the IEEE overflow exception condition. The FPSCRFPRF is set to indicate the class of the Reround operation does not recognize an overflow result (Quiet NaN) exception condition. 3. If the operation is a Convert To Fixed the target FPR is set as follows: The action to be taken depends on the setting of the FRT is set to the most positive 64-bit binary Overflow Exception Enable bit of the FPSCR. integer if the operand in FRB is a positive or Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 157 Version 2.05 When Overflow Exception is enabled (FPSCROE=1) 4. The result is placed into the target FPR and overflow occurs, the following actions are taken: 5. FPSCRFR is set to one if the returned result is ± , and is set to zero if the returned result is ±Nmax 1. Overflow Exception is set 6. FPSCRFI is set to one FPSCROX 1 1 7. FPSCRFPRF is set to indicate the class and sign of 2. The infinitely precise result is divided by 10. That the result (± or ± Normal number) is, the exponent adjustment is subtracted from the exponent. This is called the wrapped result. The exponent adjustment for all operations, except 5.5.10.4 Underflow Exception for Round To DFP Short and Round To DFP Long, is 576 for DFP Long and 9216 for DFP Extended. Definition For Round To DFP Short and Round To DFP Long, the exponent adjustment is 192 for the Except for Reround, the following describes the han- source format of DFP Long and 3072 for the dling of the IEEE underflow exception condition. The source format of DFP Extended. Reround operation does not recognize an underflow 3. The wrapped result is rounded to the target-format exception condition. precision. This is called the wrapped rounded The Underflow Exception is defined differently for the result. enabled and disabled states. However, a tininess con- 4. If the wrapped rounded result has only one form, it dition is recognized in both states when a result com- is the delivered result. If the wrapped rounded puted as though both the precision and exponent range result has redundant forms and is exact, the result were unbounded would be nonzero and less than the of the form that has the exponent closest to the target format's smallest normal number, Nmin, in magni- wrapped ideal exponent is returned. If the wrapped tude. rounded result has redundant forms and is inexact, the result of the form that has the smallest expo- Unless otherwise defined in the instruction description, nent is returned. The wrapped ideal exponent is an underflow exception occurs as follows: the result of subtracting the exponent adjustment 1 Enabled: from the ideal exponent. When the tininess condition is recognized. 5. FPSCRFPRF is set to indicate the class and sign of the result (± Normal Number) 1 Disabled: When the tininess condition is recognized and When Overflow Exception is disabled (FPSCROE=0) when the delivered result value differs from what and overflow occurs, the following actions are taken: would have been computed were both the preci- 1. Overflow Exception is set sion and the exponent range unbounded. FPSCROX 1 1 2. Inexact Exception is set Action FPSCRXX 1 1 3. The result is determined by the rounding mode The action to be taken depends on the setting of the and the sign of the intermediate result as follows. Underflow Exception Enable bit of the FPSCR. When Underflow Exception is enabled (FPSCRUE=1) Sign of inter- and underflow occurs, the following actions are taken: mediate result 1. Underflow Exception is set Rounding Mode Plus Minus FPSCRUX 1 1 2. The infinitely precise result is multiplied by 10. Round to Nearest, Ties to Even + - That is, the exponent adjustment is added to the Round toward 0 +Nmax -Nmax exponent. This is called the wrapped result. The exponent adjustment for all operations, except for Round toward + + -Nmax Round To DFP Short and Round To DFP Long, is Round toward - +Nmax - 576 for DFP Long and 9216 for DFP Extended. For Round to Nearest, Ties away + - Round To DFP Short and Round To DFP Long, from 0 the exponent adjustment is 192 for the source for- mat of DFP Long and 3072 for the source format of Round to Nearest, Ties toward 0 + - DFP Extended. Round away from 0 + - 3. The wrapped result is rounded to the target-format precision. This is called the wrapped rounded Round to prepare for shorter pre- +Nmax -Nmax result. cision 4. If the wrapped rounded result has only one form, it Figure 76. Overflow Results When Exception Is is the delivered result. If the wrapped rounded Disabled result has redundant forms and is exact, the result of the form that has the exponent closest to the 158 Power ISATM I - III Version 2.05 wrapped ideal exponent is returned. If the wrapped rounded result has redundant forms and is inexact, the result of the form that has the smallest expo- nent is returned. The wrapped ideal exponent is the result of adding the exponent adjustment to the ideal exponent. 5. FPSCRFPRF is set to indicate the class and sign of the result (± Normal number) When Underflow Exception is disabled (FPSCRUE=0) and underflow occurs, the following actions are taken: 1. Underflow Exception is set FPSCRUX 1 1 2. The infinitely precise result is rounded to the tar- get-format precision. 3. The rounded result is returned. If this result has redundant forms, the result of the form that is clos- est to the ideal exponent is returned. 4. FPSCRFPRF is set to indicate the class and sign of the result (± Normal number, ± Subnormal Num- ber, or ± Zero) 5.5.10.5 Inexact Exception Definition Except for Round to FP Integer Without Inexact, the fol- lowing describes the handling of the IEEE inexact exception condition. The Round to FP Integer Without Inexact does not recognize an inexact exception condi- tion. An Inexact Exception occurs when either of two condi- tions occur during rounding: 1. The delivered result differs from what would have been computed were both the precision and expo- nent range unbounded. 2. The rounded result overflows and Overflow Excep- tion is disabled. Action The action to be taken does not depend on the setting of the Inexact Exception Enable bit of the FPSCR. When Inexact Exception occurs, the following actions are taken: 1. Inexact Exception is set FPSCRXX 1 1 2. The rounded or overflowed result is placed into the target FPR 3. FPSCRFPRF is set to indicate the class and sign of the result Programming Note In some implementations, enabling Inexact Excep- tions may degrade performance more than does enabling other types of floating-point exception. Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 159 Version 2.05 5.5.11 Summary of Normal Rounding And Range Actions Figure 77 and Figure 78 summarize rounding and 1 The Round to FP Integer Without Inexact opera- range actions, with the following exceptions: tion does not recognize the inexact operation 1 The Reround operation recognizes neither an exception. underflow nor an overflow exception. Result (r) when Rounding Mode Is Range of v Case RNE RNTZ RNAZ RAFZ RTMI RFSP RTPI RTZ v < -Nmax, q < -Nmax Overflow -1 -1 -1 -1 -1 -Nmax -Nmax -Nmax v < -Nmax, q = -Nmax Normal -Nmax -Nmax -Nmax -- -- -Nmax -Nmax -Nmax -Nmax v -Nmin Normal b b b b b b b b -Nmin < v -Dmin Tiny b* b* b* b* b* b* b b -Dmin < v < -Dmin/2 Tiny -Dmin -Dmin -Dmin -Dmin -Dmin -Dmin -0 -0 v = -Dmin/2 Tiny -0 -0 -Dmin -Dmin -Dmin -Dmin -0 -0 -Dmin/2 < v < 0 Tiny -0 -0 -0 -Dmin -Dmin -Dmin -0 -0 v=0 EZD +0 +0 +0 +0 -0 +0 +0 +0 0 < v < +Dmin/2 Tiny +0 +0 +0 +Dmin +0 +Dmin +Dmin +0 v = +Dmin/2 Tiny +0 +0 +Dmin +Dmin +0 +Dmin +Dmin +0 +Dmin/2 < v < +Dmin Tiny +Dmin +Dmin +Dmin +Dmin +0 +Dmin +Dmin +0 +Dmin v < +Nmin Tiny b* b* b* b* b b* b* b +Nmin v +Nmax Normal b b b b b b b b +Nmax < v, q = +Nmax Normal +Nmax +Nmax +Nmax -- +Nmax +Nmax -- +Nmax +Nmax < v, q > +Nmax Overflow +1 +1 +1 +1 +Nmax +Nmax +1 +Nmax Explanation: -- This situation cannot occur. 1 The normal result r is considered to have been incremented. * The rounded value, in the extreme case, may be Nmin. In this case, the exception conditions are underflow, inexact, and incremented. b The value derived when the precise result v is rounded to the destination's precision, including both bounded precision and bounded exponent range. q The value derived when the precise result v is rounded to the destination's precision, but assuming an unbounded exponent range. r This is the returned value when neither overflow nor underflow is enabled. v Precise result before rounding, assuming unbounded precision and an unbounded exponent range. For data- format conversion operations, v is the source value. Dmin Smallest (in magnitude) representable subnormal number in the target format. EZD The result r of the exact-zero-difference case applies only to ADD and SUBTRACT with both source operands having opposite signs. (For ADD and SUBTRACT, when both source operands have the same sign, the sign of the zero result is the same sign as the sign of the source operands.) Nmax Largest (in magnitude) representable finite number in the target format. Nmin Smallest (in magnitude) representable normalized number in the target format. RAFZ Round away from 0. RFSP Round to Prepare for Shorter Precision. RNAZ Round to Nearest, Ties away from 0. RNE Round to Nearest, Ties to even. RNTZ Round to Nearest, Ties toward 0. RTPI Round toward +. RTMI Round toward -. RTZ Round toward 0. Figure 77. Rounding and Range Actions (Part 1) 160 Power ISATM I - III Version 2.05 Is r Is r Incre- Is q Is q Incre- inexact mented inexact mented Case (rv) OE=1 UE=1 XE=1 (|r|>|v|) (qv) (|q|>|v|) Returned Results and Status Setting* Overflow Yes1 No -- No No -- -- T(r), OX 1, FI 1, FR 0, XX11 Overflow Yes1 No -- No Yes -- -- T(r), OX 1, FI 1, FR 1, XX11 Overflow Yes1 No -- Yes No -- -- T(r), OX 1, FI 1, FR 0, XX 1 1, TX Overflow Yes1 No -- Yes Yes -- -- T(r), OX 1, FI 1, FR 1, XX 1 1, TX Overflow Yes1 Yes -- -- -- No No1 Tw(q÷), OX 1, FI 0, FR 0, TO Overflow Yes1 Yes -- -- -- Yes No Tw(q÷), OX 1, FI 1, FR 0, XX1 1,TO Overflow Yes 1 Yes -- -- -- Yes Yes Tw(q÷), OX 1, FI 1, FR 1, XX1 1,TO Normal No -- -- -- -- -- -- T(r), FI 0, FR 0 Normal Yes -- -- No No -- -- T(r), FI 1, FR 0, XX 11 Normal Yes -- -- No Yes -- -- T(r), FI 1, FR 1, XX 11 Normal Yes -- -- Yes No -- -- T(r), FI 1, FR 0, XX 1 1, TX Normal Yes -- -- Yes Yes -- -- T(r), FI 1, FR 1, XX 1 1, TX Tiny No -- No -- -- -- -- T(r), FI 0, FR 0 Tiny No -- Yes -- -- No1 No1 Tw(q·), UX 1, FI 0, FR 0, TU Tiny Yes -- No No No -- -- T(r), UX 1, FI 1, FR 0, XX 1 1 Tiny Yes -- No No Yes -- -- T(r), UX 1, FI 1, FR 1, XX 1 1 Tiny Yes -- No Yes No -- -- T(r), UX 1, FI 1, FR 0, XX 1, TX 1 Tiny Yes -- No Yes Yes -- -- T(r), UX 1, FI 1, FR 1, XX 1, TX 1 Tiny Yes -- Yes -- -- No No1 Tw(q·), UX 1, FI 0, FR 0, TU Tiny Yes -- Yes -- -- Yes No Tw(q·), UX 1, FI 1, FR 0, XX 1,TU 1 Tiny Yes -- Yes -- -- Yes Yes Tw(q·), UX 1, FI 1, FR 1, XX 1,TU 1 Explanation: -- The results do not depend on this condition. 1 This condition is true by virtue of the state of some condition to the left of this column. * Rounding sets only the FI and FR status flags. Setting of the OX, XX, or UX flag is part of the exception actions. They are listed here for reference. Wrap adjust, which depends on the type of operation and operand format. For all operations except Round to DFP Short and Round to DFP Long, the wrap adjust depends on the target format: = 10, where is 576 for DFP Long, and 9216 for DFP Extended. For Round to DFP Short and Round to DFP Long, the wrap adjust depends on the source format: = 10 where is 192 for DFP Long and 3072 for DFP Extended. q The value derived when the precise result v is rounded to destination's precision, but assuming an unbounded exponent range. r The result as defined in Part 1 of this figure. v Precise result before rounding, assuming unbounded precision and unbounded exponent range. FI Floating-Point-Fraction-Inexact status flag, FPSCRFI. This status flag is non-sticky. FR Floating-Point-Fraction-Rounded status flag, FPSCRFR. OX Floating-Point Overflow Exception status flag, FPSCRoX. TO The system floating-point enabled exception error handler is invoked for the overflow exception if the FE0 and FE1 bits in the machine-state register are set to any mode other than the ignore-exception mode. TU The system floating-point enabled exception error handler is invoked for the underflow exception if the FE0 and FE1 bits in the machine-state register are set to any mode other than the ignore-exception mode. TX The system floating-point enabled exception error handler is invoked for the inexact exception if the FE0 and FE1 bits in the machine-state register are set to any mode other than the ignore-exception mode. T(x) The value x is placed at the target operand location. Tw(x) The wrapped rounded result x is placed at the target operand location. For all operations except data format conversions, the wrapped rounded result is in the same format and length as normal results at the target location. For data format conversions, the wrapped rounded result is in the same format and length as the source, but rounded to the target-format precision. UX Floating-Point-Underflow-Exception status flag, FPSCRUX XX Float-Point-Inexact-Exception Status flag, FPSCRXX. The flag is a sticky version of FPSCRFI. When FPSCRFI is set to a new value, the new value of FPSCRXX is set to the result of ORing the old value of FPSCRXX with the new value of FPSCRFI. Figure 78. Rounding and Range Actions (Part 2) Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 161 Version 2.05 5.6 DFP Instruction Descriptions The following sections describe the DFP instructions. When a 128-bit operand is used, it is held in a FPR pair and the instruction mnemonic uses a letter "q" to mean the quad-precision operation. Note that in the following descriptions, FPXp denotes a FPR pair and must address an even-odd pair. If the FPXp field specifies an odd-numbered register, then the instruction form is invalid. The notation FPX[p] means either a FPR, FPX, or a FPR pair, FPXp. For DFP instructions, if a DFP operand is returned, the trailing significand field of the target operand is encoded using preferred DPD codes. 162 Power ISATM I - III Version 2.05 5.6.1 DFP Arithmetic Instructions All DFP arithmetic instructions are X-form instructions. The arithmetic instructions consist of Add, Divide, Multi- They all set the FI and FR status flags, and also set the ply, and Subtract. FPSCRFPRF field. Furthermore, they all have an ideal exponent assigned and employ the record bit (Rc). DFP Add [Quad] X-form DFP Subtract [Quad] X-form dadd FRT,FRA,FRB (Rc=0) dsub FRT,FRA,FRB (Rc=0) dadd. FRT,FRA,FRB (Rc=1) dsub. FRT,FRA,FRB (Rc=1) 59 FRT FRA FRB 2 Rc 59 FRT FRA FRB 514 Rc 0 6 11 16 21 31 0 6 11 16 21 31 daddq FRTp,FRAp,FRBp (Rc=0) dsubq FRTp,FRAp,FRBp (Rc=0) daddq. FRTp,FRAp,FRBp (Rc=1) dsubq. FRTp,FRAp,FRBp (Rc=1) 63 FRTp FRAp FRBp 2 Rc 63 FRTp FRAp FRBp 514 Rc 0 6 11 16 21 31 0 6 11 16 21 31 The DFP operand in FRB[p] is subtracted from the DFP The DFP operand in FRA[p] is added to the DFP oper- operand in FRA[p]. and in FRB[p]. The result is rounded to the target-format precision The result is rounded to the target-format precision under control of the DRN (bits 29:31) of the FPSCR. under control of the DRN (bits 29:31) of the FPSCR. An appropriate form of the rounded result is selected An appropriate form of the rounded result is selected based on the ideal exponent and is placed in FRT[p]. based on the ideal exponent and is placed in FRT[p]. The ideal exponent is the smaller exponent of the two The ideal exponent is the smaller exponent of the two source operands. source operands. The execution of Subtract is identical to that of Add, Figure 79 summarizes the actions for Add. Figure 79 except that the operand in FRB participates in the oper- does not include the setting of the FPSCRFPRF field. ation with its sign bit inverted. See Figure 79. The table The FPSCRFPRF field is always set to the class and does not include the setting of the FPSCRFPRF field. sign of the result, except for an enabled invalid-opera- The FPSCRFPRF field is always set to the class and tion exception, in which case the field remains sign of the result, except for an enabled invalid-opera- unchanged. tion exception, in which case the field remains Special Registers Altered: unchanged. FPRF FR FI Special Registers Altered: FX OX UX XX FPRF FR FI VXSNAN VXISI FX OX UX XX CR1 (if Rc=1) VXSNAN VXISI CR1 (if Rc=1) Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 163 Version 2.05 Operand a Actions for Add (a + b) when operand b in FRB[p] is in FRA[p] is - F + QNaN SNaN - T(-dINF) T(-dINF) VXISI: T(dNaN) P(b) VXSNAN: U(b) F T(-dINF) S(a + b) T(+dINF) P(b) VXSNAN: U(b) + VXISI: T(dNaN) T(+dINF) T(+dINF) P(b) VXSNAN: U(b) QNaN P(a) P(a) P(a) P(a) VXSNAN: U(b) SNaN VXSNAN: U(a) VXSNAN: U(a) VXSNAN: U(a) VXSNAN: U(a) VXSNAN: U(a) Explanation: a+b The value a added to b, rounded to the target-format precision and returned in the appropriate form. (See Section 5.5.11 on page 160) +dINF Default plus infinity. - dINF Default minus infinity. dNaN Default quiet NaN. F All finite numbers, including zeros. P(x) The QNaN of operand x is propagated and placed in FRT[p]. S(x) The value x is placed in FRT[p] with the sign set by the rules of algebra. When the source oper- ands have the same sign, the sign of the result is the same as the sign of the operands, includ- ing the case when the result is zero. When the operands have opposite signs, the sign of a zero result is positive in all rounding modes, except round toward -, in which case, the sign is minus. T(x) The value x is placed in FRT[p]. U(x) The SNaN of operand x is converted to the corresponding QNaN and placed in FRT[p]. VXISI The Invalid-Operation Exception (VXISI) occurs. The result is produced only when the exception is disabled. (See Section 5.5.10.1 "Invalid Operation Exception" on page 156 for the exception actions.) VXSNAN The Invalid-Operation Exception (VXSNAN) occurs. The result is produced only when the excep- tion is disabled. (See Section 5.5.10.1 "Invalid Operation Exception" on page 156 for the exception actions.) Figure 79. Actions: Add 164 Power ISATM I - III Version 2.05 DFP Multiply [Quad] X-form Special Registers Altered: FPRF FR FI dmul FRT,FRA,FRB (Rc=0) FX OX UX XX dmul. FRT,FRA,FRB (Rc=1) VXSNAN VXIMZ CR1 (if Rc=1) 59 FRT FRA FRB 34 Rc 0 6 11 16 21 31 dmulq FRTp,FRAp,FRBp (Rc=0) dmulq. FRTp,FRAp,FRBp (Rc=1) 63 FRTp FRAp FRBp 34 Rc 0 6 11 16 21 31 The DFP operand in FRA[p] is multiplied by the DFP operand in FRB[p]. The result is rounded to the target-format precision under control of the DRN (bits 29:31) of the FPSCR. An appropriate form of the rounded result is selected based on the ideal exponent and is placed in FRT[p]. The ideal exponent is the sum of the two exponents of the source operands. Figure 80 summarizes the actions for Multiply. Figure 80 does not include the setting of the FPSCRF- PRF field. The FPSCRFPRF field is always set to the class and sign of the result, except for an enabled invalid-operation exception, in which case the field remains unchanged. Operand a Actions for Multiply (a*b) when operand b in FRB[p] is in FRA[p] is 0 Fn QNaN SNaN 0 S(a * b) S(a * b) VXIMZ: T(dNaN) P(b) VXSNAN: U(b) Fn S(a * b) S(a * b) S(dINF) P(b) VXSNAN: U(b) VXIMZ: T(dNaN) S(dINF) S(dINF) P(b) VXSNAN: U(b) QNaN P(a) P(a) P(a) P(a) VXSNAN: U(b) SNaN VXSNAN: U(a) VXSNAN: U(a) VXSNAN: U(a) VXSNAN: U(a) VXSNAN: U(a) Explanation: a*b The value a multiplied by b, rounded to the target-format precision and returned in the appropriate form. (See Section 5.5.11 on page 160) dINF Default infinity. dNaN Default quiet NaN. Fn Finite nonzero number (includes both normal and subnormal numbers). P(x) The QNaN of operand x is propagated and placed in FRT[p]. S(x) The value x is placed in FRT[p] with the sign set to the exclusive-OR of the source-operand signs. T(x) The value x is placed in FRT[p]. U(x) The SNaN of operand x is converted to the corresponding QNaN and placed in FRT[p]. VXIMZ: The Invalid-Operation Exception (VXIMZ) occurs. The result is produced only when the exception is disabled. (See Section 5.5.10.1 "Invalid Operation Exception" on page 156 for the exception actions.) VXSNAN: The Invalid-Operation Exception (VXSNAN) occurs. The result is produced only when the exception is disabled. (See Section 5.5.10.1 "Invalid Operation Exception" on page 156 for the exception actions.) Figure 80. Actions: Multiply Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 165 Version 2.05 DFP Divide [Quad] X-form Figure 81 summarizes the actions for Divide. Figure 81 does not include the setting of the FPSCRFPRF field. ddiv FRT,FRA,FRB (Rc=0) The FPSCRFPRF field is always set to the class and ddiv. FRT,FRA,FRB (Rc=1) sign of the result, except for an enabled invalid-opera- tion and enabled zero-divide exceptions, in which 59 FRT FRA FRB 546 Rc cases the field remains unchanged. 0 6 11 16 21 31 Special Registers Altered: FPRF FR FI ddivq FRTp,FRAp,FRBp (Rc=0) FX OX UX ZX XX ddivq. FRTp,FRAp,FRBp (Rc=1) VXSNAN VXIDI VXZDZ CR1 (if Rc=1) 63 FRTp FRAp FRBp 546 Rc 0 6 11 16 21 31 The DFP operand in FRA[p] is divided by the DFP operand in FRB[p]. The result is rounded to the target-format precision under control of the DRN (bits 29:31) of the FPSCR. An appropriate form of the rounded result is selected based on the ideal exponent and is placed in FRT[p]. The ideal exponent is the difference of subtracting the exponent of the divisor from the exponent of the divi- dend. Operand a Actions for Divide (a ÷ b) when operand b in FRB[p] is in FRA[p] is 0 Fn QNaN SNaN 0 VXZDZ: T(dNaN) S(a ÷ b) S(zt) P(b) VXSNAN: U(b) Fn Zx: S(dINF) S(a ÷ b) S(zt) P(b) VXSNAN: U(b) S(dINF) S(dINF) VXIDI: T(dNaN) P(b) VXSNAN: U(b) QNaN P(a) P(a) P(a) P(a) VXSNAN: U(b) SNaN VXSNAN: U(a) VXSNAN: U(a) VXSNAN: U(a) VXSNAN: U(a) VXSNAN: U(a) Explanation: a ÷ b The value a divided by b, rounded to the target-format precision and returned in the appropriate form. (See Section 5.5.11 on page 160.) dINF Default infinity. dNaN Default quiet NaN. Fn Finite nonzero number (includes both normal and subnormal numbers). P(x) The QNaN of operand x is propagated and placed in FRT[p]. S(x) The value x is placed in FRT[p] with the sign set to the exclusive-OR of the source-operand signs. T(x) The value x is placed in FRT[p]. U(x) The SNaN of operand x is converted to the corresponding QNaN and placed in FRT[p]. VXIDI: The Invalid-Operation Exception (VXIDI) occurs. The result is produced only when the exception is disabled. (See Section 5.5.10.1 "Invalid Operation Exception" on page 156 for the exception actions.) VXSNAN: The Invalid-Operation Exception (VXSNAN) occurs. The result is produced only when the exception is disabled. (See Section 5.5.10.1 "Invalid Operation Exception" on page 156 for the exception actions.) VXZDZ: The Invalid-Operation Exception (VXZDZ) occurs. The result is produced only when the exception is disabled. (See Section 5.5.10.1 "Invalid Operation Exception" on page 156 for the exception actions.) zt True zero (zero significand and most negative exponent). Zx The Zero-Divide Exception occurs. The result is produced only when the exception is disabled (See Section 5.5.10.2 "Zero Divide Exception" on page 157 for the exception actions.) Figure 81. Actions: Divide 166 Power ISATM I - III Version 2.05 5.6.2 DFP Compare Instructions The DFP compare instructions consist of the Compare The codes in the CR field BF and FPSCRFPCC are Ordered and Compare Unordered instructions. The defined for the DFP compare operations as follows. compare instructions do not provide the record bit. Bit Name Description The comparison sets the designated CR field to indi- 0 FL (FRA[p]) < (FRB[p]) cate the result. The FPSCRFPCC is set in the same 1 FG (FRA[p]) > (FRB[p]) way. 2 FE (FRA[p]) = (FRB[p]) 3 FU (FRA[p]) ? (FRB[p]) Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 167 Version 2.05 DFP Compare Unordered [Quad] X-form dcmpu BF,FRA,FRB 59 BF // FRA FRB 642 / 0 6 9 11 16 21 31 dcmpuq BF,FRAp,FRBp 63 BF // FRAp FRBp 642 / 0 6 9 11 16 21 31 The DFP operand in FRA[p] is compared to the DFP operand in FRB[p]. The result of the compare is placed into CR field BF and the FPSCRFPCC. Special Registers Altered: CR field BF FPCC FX VXSNAN Operand a in Actions for Compare Unordered (a:b) when operand b in FRB[p] is FRA[p] is - F + QNaN SNaN - AeqB AltB AltB AuoB Fu, VXSNAN F AgtB C(a:b) AltB AuoB Fu, VXSNAN + AgtB AgtB AeqB AuoB Fu, VXSNAN QNaN AuoB AuoB AuoB AuoB Fu, VXSNAN SNaN Fu, VXSNAN Fu, VXSNAN Fu, VXSNAN Fu, VXSNAN Fu, VXSNAN Explanation: C(a:b) Algebraic comparison. See the table below. F All finite numbers, including zeros. AeqB CR field BF and FPSCRFPCC are set to 0b0010. AgtB CR field BF and FPSCRFPCC are set to 0b0100. AltB CR field BF and FPSCRFPCC are set to 0b1000. AuoB CR field BF and FPSCRFPCC are set to 0b0001. VXSNAN The invalid-operation exception (VXSNAN) occurs. See Section 5.5.10.1 for actions. Relation of Value a to Value b Action for C(a:b) a = b AeqB a < b AltB a > b AgtB Figure 82. Actions: Compare Unordered 168 Power ISATM I - III Version 2.05 DFP Compare Ordered [Quad] X-form dcmpo BF,FRA,FRB 59 BF // FRA FRB 130 / 0 6 9 11 16 21 31 dcmpoq BF,FRAp,FRBp 63 BF // FRAp FRBp 130 / 0 6 9 11 16 21 31 The DFP operand in FRA[p] is compared to the DFP operand in FRB[p]. The result of the compare is placed into CR field BF and the FPSCRFPCC. Special Registers Altered: CR field BF FPCC FX VXSNAN VXVC Operand a in Actions for Compare ordered (a:b) when operand b in FRB[p] is FRA[p] is - F + QNaN SNaN - AeqB AltB AltB AuoB, VXVC AuoB, VXSV F AgtB C(a:b) AltB AuoB, VXVC AuoB, VXSV + AgtB AgtB AeqB AuoB, VXVC AuoB, VXSV QNaN AuoB, VXVC AuoB, VXVC AuoB, VXVC AuoB, VXVC AuoB, VXSV SNaN AuoB, VXSV AuoB, VXSV AuoB, VXSV AuoB, VXSV AuoB, VXSV Explanation: C(a:b) Algebraic comparison. See the table below F All finite numbers, including zeros AeqB CR field BF and FPSCRFPCC are set to 0b0010. AgtB CR field BF and FPSCRFPCC are set to 0b0100. AltB CR field BF and FPSCRFPCC are set to 0b1000. AuoB CR field BF and FPSCRFPCC are set to 0b0001. VXSV The invalid-operation exception (VXSNAN) occurs. Additionally, if the exception is disabled (FPSCRVE=0), then FPSCRVXVC is also set to one. See Section 5.5.10.1 for actions. VXVC The invalid-operation exception (VXVC) occurs. See Section 5.5.10.1 for actions. Relation of Value a to Value b Action for C(a:b) a = b AeqB a < b AltB a > b AgtB Figure 83. Actions: Compare Ordered Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 169 Version 2.05 5.6.3 DFP Test Instructions The DFP test instructions consist of the Test Data The test instructions set the designated CR field to indi- Class, Test Data Group, Test Exponent, and Test Sig- cate the result. The FPSCRFPCC is set in the same nificance instructions, and they do not provide the way. record bit. DFP Test Data Class [Quad] Z22-form DFP Test Data Group [Quad] Z22-form dtstdc BF,FRA,DCM dtstdg BF,FRA,DGM 59 BF // FRA DCM 194 / 59 BF // FRA DGM 226 / 0 6 9 11 16 22 31 0 6 9 11 16 22 31 dtstdcq BF,FRAp,DCM dtstdgq BF,FRAp,DGM 63 BF // FRAp DCM 194 / 63 BF // FRAp DGM 226 / 0 6 9 11 16 22 31 0 6 9 11 16 22 31 Let the DCM (Data Class Mask) field specify one or Let the DGM (Data Group Mask) field specify one or more of the 6 possible data classes, where each bit more of the 6 possible data groups, where each bit cor- corresponds to a specific data class. responds to a specific data group. DCM Bit Data Class The term extreme exponent means either the maxi- mum exponent, Xmax, or the minimum exponent, Xmin. 0 Zero 1 Subnormal DGM Bit Data Group 2 Normal 0 Zero with non-extreme exponent 3 Infinity 1 Zero with extreme exponent 4 Quiet NaN 2 Subnormal or (Normal with extreme expo- 5 Signaling NaN nent) CR field BF and FPSCRFPCC are set to indicate the 3 Normal with non-extreme exponent and sign of the DFP operand in FRA[p] and whether the leftmost zero digit in significand data class of the DFP operand in FRA[p] matches any 4 Normal with non-extreme exponent and of the data classes specified by DCM. leftmost nonzero digit in significand 5 Special symbol (Infinity, QNaN, or SNaN) Field Meaning CR field BF and FPSCRFPCC are set to indicate the 0000 Operand positive with no match sign of the DFP operand in FRA[p] and whether the 0010 Operand positive with match data group of the DFP operand in FRA[p] matches any 1000 Operand negative with no match of the data groups specified by DGM. 1010 Operand negative with match Field Meaning Special Registers Altered: CR field BF 0000 Operand positive with no match FPCC 0010 Operand positive with match 1000 Operand negative with no match 1010 Operand negative with match Special Registers Altered: CR field BF FPCC 170 Power ISATM I - III Version 2.05 DFP Test Exponent [Quad] X-form dtstex BF,FRA,FRB 59 BF // FRA FRB 162 / 0 6 9 11 16 21 31 dtstexq BF,FRAp,FRBp 63 BF // FRAp FRBp 162 / 0 6 9 11 16 21 31 The exponent value (Ea) of the DFP operand in FRA[p] is compared to the exponent value (Eb) of the DFP operand in FRB [p]. The result of the compare is placed into CR field BF and the FPSCRFPCC. The codes in the CR field BF and FPSCRFPCC are defined for the DFP Test Exponent operations as fol- lows. Bit Description 0 Ea < Eb 1 Ea > Eb 2 Ea = Eb 3 Ea ? Eb Special Registers Altered: CR field BF FPCC Operand a in Actions for Test Exponent (Ea:Eb) when operand b in FRB[p] is FRA[p] is F QNaN SNaN F C(Ea:Eb) AuoB AuoB AuoB AuoB AeqB AuoB AuoB QNaN AuoB AuoB AeqB AeqB SNaN AuoB AuoB AeqB AeqB Explanation: C(Ea:Eb) Algebraic comparison. See the table below. F All finite numbers, including zeros AeqB CR field BF and FPSCRFPCC are set to 0b0010. AgtB CR field BF and FPSCRFPCC are set to 0b0100. AltB CR field BF and FPSCRFPCC are set to 0b1000. AuoB CR field BF and FPSCRFPCC are set to 0b0001. Relation of Value Ea to Value Eb Action for C(Ea:Eb) Ea = Eb AeqB Ea < Eb AltB Ea > Eb AgtB Figure 84. Actions: Test Exponent Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 171 Version 2.05 DFP Test Significance [Quad] X-form dtstsf BF,FRA,FRB Actions for Test Significance when the operand in FRB[p] is 59 BF // FRA FRB 674 / F QNaN SNaN 0 6 9 11 16 21 31 C(k: NSDb) AuoB AuoB AuoB Explanation: dtstsfq BF,FRA,FRBp C(k: NSDb) Algebraic comparison. See the table below. 63 BF // FRA FRBp 674 / F All finite numbers, including zeros. 0 6 9 11 16 21 31 AeqB CR field BF and FPSCRFPCC are set to 0b0010. Let k be the contents of bits 58:63 of FRA that specifies AgtB CR field BF and FPSCRFPCC are the reference significance. set to 0b0100. The number of significant digits of the DFP operand in AltB CR field BF and FPSCRFPCC are FRB[p], NSDb, is compared to the reference signifi- set to 0b1000. cance, k. For this instruction, the number of significant AuoB CR field BF and FPSCRFPCC are digits of the value 0 is considered to be zero. The set to 0b0001. result of the compare is placed into CR field BF and the FPSCRFPCC as follows. Relation of Value NSDb to Action for Bit Description Value k C(k:NSDb) 0 k 4 0 and k < NSDb k 430 and k = NSDb AeqB 1 k 4 0 and k > NSDb, or k = 0 k 430 and k < NSDb AltB 2 k 4 0 and k = NSDb k 430 and k > NSDb, or k = 0 AgtB 3 k ? NSDb Figure 85. Actions: Test Significance Special Registers Altered: CR field BF FPCC Programming Note The reference significance can be loaded into a FPR using a Load Float as Integer Word Algebraic instruction 172 Power ISATM I - III Version 2.05 5.6.4 DFP Quantum Adjustment Instructions The Quantum Adjustment operations consist of the mary or secondary encoding, depending on the setting Quantize, Quantize Immediate, Reround, and Round of a RMC-encoding-selection bit. See Section 5.5.2 To FP Integer operations. "Rounding Mode Specification" on page 151 for the definition of RMC encoding. The Quantum Adjustment instructions are Z23-form instructions and have an immediate RMC (Rounding- All Quantum Adjustment instructions set the FI and FR Mode-Control) field, which specifies the rounding mode status flags, and also set the FPSCRFPRF field. The used. For Quantize, Quantize Immediate, and record bit is provided to each of these instructions. Reround, the RMC field contains the primary encoding. They return the target operand in a form with the ideal For Round to FP Integer, the field contains either pri- exponent. DFP Quantize Immediate [Quad] Z23-form Programming Note dquai TE,FRT,FRB,RMC (Rc=0) DFP Quantize Immediate can be used to adjust dquai. TE,FRT,FRB,RMC (Rc=1) values to a form having the specified exponent in the range -16 to 15. If the adjustment requires the 59 FRT TE FRB RMC 67 Rc significand to be shifted left, then: 0 6 11 16 21 23 31 1 if the result would cause overflow from the most significant digit, the result is a default dquaiq TE,FRTp,FRBp,RMC (Rc=0) QNaN.; dquaiq. TE,FRTp,FRBp,RMC (Rc=1) 1 otherwise the result is the adjusted value (left shifted with matching exponent). 63 FRTp TE FRBp RMC 67 Rc If the adjustment requires the significand to be 0 6 11 16 21 23 31 shifted right, the result is rounded based on the value of the RMC field. The DFP operand in FRB[p] is converted and rounded to the form with the exponent specified by TE based on DFP Quantize Immediate can round a value to a the rounding mode specified in the RMC field. TE is a specific number of fractional digits. Consider the 5-bit signed binary integer. The result of that form is computation of sales tax. Values expressed in U.S. placed in FRT[p]. The sign of the result is the same as dollars have 2 fractional digits, and sales tax rates the sign of the operand in FRB[p]. The ideal exponent typically have 3 fractional digits. The product of is the exponent specified by TE. value and rate will yield 5 fractional digits. For example: When the value of the operand in FRB[p] is greater than (10p-1) 2 10TE, where p is the format precision, an 39.95 * 0.075 = 2.99625 invalid operation exception is recognized. This result needs to be rounded to the penny to When the delivered result differs in value from the oper- compute the correct tax of $3.00. and in FRB[p], an inexact exception is recognized. No The following sequence computes the sales tax underflow exception is recognized by this operation, assuming the pre-tax total is in FRA and the tax regardless of the value of the operand in FRB[p]. rate is in FRB. The DFP Quantize Immediate The FPSCRFPRF field is always set to the class and instruction rounds the product (FRA * FRB) to 2 sign of the result, except for an enabled invalid-opera- fractional digits (TE field = -2) using Round to near- tion exception, in which case the field remains est, ties away from 0 (RMC field = 2). The quan- unchanged. tized and rounded result is placed in FRT. Special Registers Altered: dmul f0,FRA,FRB FPRF FR FI dquai -2,FRT,f0,2 FX XX VXSNAN VXCVI CR1 (if Rc=1) Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 173 Version 2.05 DFP Quantize [Quad] Z23-form underflow exception is recognized by this operation, regardless of the value of the operand in FRB[p]. dqua FRT,FRA,FRB,RMC (Rc=0) Figure 87 and Figure 88 summarize the actions. The dqua. FRT,FRA,FRB,RMC (Rc=1) tables do not include the setting of the FPSCRFPRF field. The FPSCRFPRF field is always set to the class 59 FRT FRA FRB RMC 3 Rc and sign of the result, except for an enabled invalid- 0 6 11 16 21 23 31 operation exception, in which case the field remains unchanged. dquaq FRTp,FRAp,FRBp,RMC (Rc=0) Special Register Altered: dquaq. FRTp,FRAp,FRBp,RMC (Rc=1) FPRF FR FI FX XX 63 FRTp FRAp FRBp RMC 3 Rc VXSNAN VXCVI 0 6 11 16 21 23 31 CR1 (if Rc=1) The DFP operand in register FRB[p] is converted and Programming Note rounded to the form with the same exponent as that of DFP Quantize can be used to adjust one DFP the DFP operand in FRA[p] based on the rounding value (FRB[p]) to a form having the same exponent mode specified in the RMC field. The result of that form as a second DFP value (FRA[p]). If the adjustment is placed in FRT[p]. The sign of the result is the same requires the significand to be shifted left, then: as the sign of the operand in FRB[p]. The ideal expo- nent is the exponent specified in FRA[p]. 1 if the result would cause overflow from the most significant digit, the result is a default When the value of the operand in FRB[p] is greater QNaN.; than (10p-1) 2 10Ea, where p is the format precision 1 otherwise the result is the adjusted value (left and Ea is the exponent of the operand in FRA[p], an shifted with matching exponent). invalid operation exception is recognized. If the adjustment requires the significand to be When the delivered result differs in value from the oper- shifted right, the result is rounded based on the and in FRB[p], an inexact exception is recognized. No value of the RMC field. Figure 86 shows examples of these adjustments. FRA FRB FRT when RMC=1 FRT when RMC=2 1 (1 x 100) 9. (9 x 100) 9 (9 x 100) 9 (9 x 100) 1.00 (100 x 10-2) 9. (9 x 100) 9.00 (900 x 10-2) 9.00 (900 x 10-2) 1 (1 x 100) 49.1234 (491234 x 10-4) 49 (49 x 100) 49 (49 x 100) 1.00 (100 x 10-2) 49.1234 (491234 x 10-4) 49.12 (4912 x 10-2) 49.12 (4912 x 10-2) 1 (1 x 100) 49.9876 (499876 x 10-4) 49 (49 x 100) 50 (50 x 100) 1.00 (100 x 10-2) 49.9876 (499876 x 10-4) 49.98 (4998 x 10-2) 49.99 (4999 x 10-2) 0.01 (1 x 10-2) 49.9876 (499876 x 10-4) 49.98 (4998 x 10-2) 49.99 (4999 x 10-2) 9999999999999999 9999999999999999 9999999999999999 1 (1 x 100) (9999999999999999 x 100) (9999999999999999 x 100) (9999999999999999 x 100) 9999999999999999 1.0 (10 x 10-1) QNaN QNaN (9999999999999999 x 100) Figure 86. DFP Quantize examples 174 Power ISATM I - III Version 2.05 Operand a Actions for Quantize when operand b in FRB[p] is in FRA[p] is 0 Fn QNaN SNaN 0 * * VXCVI: T(dNaN) P(b) VXSNAN: U(b) Fn * * VXCVI: T(dNaN) P(b) VXSNAN: U(b) · VXCVI: T(dNaN) VXCVI: T(dNaN) T(dINF) P(b) VXSNAN: U(b) QNaN P(a) P(a) P(a) P(a) VXSNAN: U(b) SNaN VXSNAN: U(a) VXSNAN: U(a) VXSNAN: U(a) VXSNAN: U(a) VXSNAN: U(a) Explanation: * See next table. dINF Default infinity dNaN Default quiet NaN Fn Finite nonzero numbers (includes both subnormal and normal numbers) P(x) The QNaN of operand x is propagated and placed in FRT[p] T(x) The value x is placed in FRT[p] U(x) The SNaN of operand x is converted to the corresponding QNaN and placed in FRT[p]. VXCVI The Invalid-Operation Exception (VXCVI) occurs. The result is produced only when the exception is disabled. (See Section 5.5.10.1 for actions) VXSNAN The Invalid-Operation Exception (VXSNAN) occurs. The result is produced only when the exception is disabled. (See Section 5.5.10.1 for actions) Figure 87. Actions (part 1) Quantize Actions for Quantize when operand b in FRB[p] is 0 Fn Te < Se Vb 1 (10p - 1) 2 10Te E(0) VXCVI: T(dNaN) Vb 5 (10p - 1) 2 10Te E(0) L(b) Te = Se E(0) W(b) Te > Se E(0) QR(b) Explanation: dNaN Default quiet NaN E(0) The value of zero with the exponent value Te is placed in FRT[p]. L(x) The operand x is converted to the form with the exponent value Te. p The precision of the format. QR(x) The operand x is rounded to the result of the form with the exponent value Te based on the specified rounding mode. The result of that form is placed in FRT[p]. Se The exponent of the operand in FRB[p]. Te The target exponent; FRA[p] for dqua[q], or TE, a 5-bit signed binary integer for dquai[q]. T(x) The value x is placed in FRT[p]. Vb The value of the operand in FRB[p]. W(x) The value and the form of operand x is placed in FRT[p]. VXCVI: The Invalid-Operation Exception (VXCVI) occurs. The result is produced only when the exception is disabled. (See Section 5.5.10.1 for actions.) Figure 88. Actions (part2) Quantize Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 175 Version 2.05 DFP Reround [Quad] Z23-form operation exception, in which case the field remains unchanged. drrnd FRT,FRA,FRB,RMC (Rc=0) Special Registers Altered: drrnd. FRT,FRA,FRB,RMC (Rc=1 FPRF FR FI FX XX 59 FRT FRA FRB RMC 35 Rc VXSNAN VXCVI 0 6 11 16 21 23 31 CR1 (if Rc=1) drrndq FRTp,FRA,FRBp,RMC (Rc=0) Programming Note drrndq. FRTp,FRA,FRBp,RMC (Rc=1) DFP Reround can be used to adjust a DFP value (FRB[p]) to have no more than a specified number 63 FRTp FRA FRBp RMC 35 Rc (FRA[p]58:63) of significant digits. The result 0 6 11 16 21 23 31 (FRT[p]) is right-justified leaving the specified num- ber of digits and rounded as specified by the RMC Let k be the contents of bits 58:63 of FRA that specifies field. If rounding increases the number of significant the reference significance. digits, the result is adjusted again (the significand is shifted right 1 digit and the exponent is incremented When the DFP operand in FRB[p] is a finite number, by 1). Figure 89 has example results from DFP and if the reference significance is zero, or if the refer- Reround for 1, 2, and 10 significant digits. ence significance is nonzero and the number of signifi- cant digits of the source operand is less than or equal to the reference significance, then the value and the Programming Note form of the source operand is placed in FRT[p]. If the DFP Reround is primarily used to round a DFP reference significance is nonzero and the number of value to a specific number of digits before conver- significant digits of the source operand is greater than sion to string format for printing or display. Another the reference significance, then the source operand is use for DFP Reround is to obtain the effective expo- converted and rounded to the number of significant dig- nent of the most significant digit by specifying a ref- its specified in the reference significance based on the erence significance of 1. The exponent can be rounding mode specified in the RMC field. The result extracted and used to compute the number of signif- of the form with the specified number of significant dig- icant digits or to left-justify a value. its is placed in FRT[p]. The sign of the result is the For example, the following sequence computes the same as the sign of the operand in FRB[p]. number of significant digits and returns it as an inte- For this instruction, the number of significant digits of ger. FRB is the DFP value for which we want the the value 0 is considered to be zero. The ideal expo- number of significant digits; f13 contains the refer- nent is the greater value of the exponent of the operand ence significance value 0x0000000000000001; and in FRB[p] and the referenced exponent. The refer- r1 is the stack pointer, with free space for double- enced exponent is the resultant exponent if the oper- words at offsets -8 and -16. These doublewords are and in FRB[p] would have been converted and rounded used to transfer the biased exponents from the to the number of significant digits specified in the refer- FPRs to GPRs for integer computation. R3 contains ence significance based on the rounding mode speci- the result of E(reround(1,FRA) ) - E(FRA) + 1, fied in the RMC field. where E(x) represents the biased exponent of x. If the exponent of the rounded result of the form that dxex f0,FRB has the specified number of significant digits would be stfd f0,-16(r1) greater than Xmax, an invalid operation exception drrnd f1,f13,FRB,1 # reround 1 digit toward 0 (VXCVI) occurs. When the invalid-operation exception dxex f1,f1 stfd f1,-8(r1) occurs, and if the exception is disabled, a default QNaN lfd r11,-16(r1) is returned. When an invalid-operation exception lfd r3,-8(r1) occurs, no inexact exception is recognized. subf r3,r11,r3 In the absence of an invalid-operation exception, if the addi r3,r3,1 result differs in value from the operand in FRB[p], an Given the value 412.34 the result is E(4 x 102) - inexact exception is recognized. E(41234 x 10-2) + 1 = (398+2) - (398-2) + 1 = 400 - 396 + 1 = 5. Additional code is required to detect This operation causes neither an overflow nor an and handle special values like Subnormal, Infinity, underflow exception. and NAN. Figure 90 summarizes the actions for Reround. The table does not include the setting of the FPSCRFPRF field. The FPSCRFPRF field is always set to the class and sign of the result, except for an enabled invalid- 176 Power ISATM I - III Version 2.05 FRA58:63 (binary) FRB FRT when RMC=1 FRT when RMC=2 1 0.41234 (41234 2 10-5) 0.4 (4 2 10-1) 0.4 (4 2 10-1) 1 4.1234 (41234 2 10-4) 4 (4 2 100) 4 (4 2 100) 1 41.234 (41234 2 10-3) 4 (4 2 101) 4 (4 2 101) 1 412.34 (41234 2 10-2) 4 (4 2 102) 4 (4 2 102) 2 0.491234 (491234 2 10-6) 0.49 (49 2 10-2) 0.49 (49 2 10-2) 2 0.499876 (499876 2 10-6) 0.49 (49 2 10-2) 0.50 (50 2 10-2) 2 0.999876 (999876 2 10-6) 0.99 (99 2 10-2) 1.0 (10 2 10-1) 10 0.491234 (491234 2 10-6) 0.491234 (491234 2 10-6) 0.491234 (491234 2 10-6) 10 999.999 (999999 2 10-3) 999.999 (999999 2 10-3) 999.999 (999999 2 10-3) 9999999999999999 9.999999999E+14 1.000000000E+15 10 (9999999999999999 2 100) (9999999999 2 105) (1000000000 2 106) Figure 89. DFP Reround examples Programming Note DFP Reround combined with DFP Quantize can be used to left justify a value (as needed by the frexp function). FRB is the DFP value for which we want to left justify; f13 contains the reference significance value 0x0000000000000001; and r1 is the stack pointer, with free space for a doubleword at offset - 8. This doubleword is used to transfer the biased exponents from the FPR to a GPR, for integer com- putation. The adjusted biased exponent (+ format precision - 1) is transferred back into an FPR so it can be inserted into the rerounded value. The adjusted rerounded value becomes the quantize reference value. The quantize instruction returns the left justified result in FRT. drrnd f1,f13,FRB,1 # reround 1 digit toward 0 dxex f0,f1 stfd f0,-8(r1) lfd r11,-8(r1) addi r11,r11,15 # biased exp + precision - 1 lfd r11,-8(r1) stfd f0,-8(r1) diex f1,f0,f1 # adjust exponent dqua FRT,f1,f0,1 # quantize to adjusted exponent Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 177 Version 2.05 Actions for Reround when operand b in FRB[p] is 0* Fn QNaN SNaN - RR(b) or T(dINF) P(b) VXSNAN: U(b) k 120, k < m VXCVI: T(dNaN) k 120, k = m - W(b) T(dINF) P(b) VXSNAN: U(b) k 120 and k > m, W(b) W(b) T(dINF) P(b) VXSNAN: U(b) or k = 0 Explanation: * The number of significant digits of the value 0 is considered to be zero for this instruction. - Not applicable. dINF Default infinity. Fn Finite nonzero numbers (includes both subnormal and normal numbers). k Reference significance, which specifies the number of significant digits in the target operand. m Number of significant digits in the operand in FRB[p]. P(x) The QNaN of operand x is propagated and placed in FRT[p]. RR(x) The value x is rounded to the form that has the specified number of significant digits. If RR(x) 5 (10k-1) 2 10Xmax, then RR(x) is returned; otherwise an invalid-operation excep- tion is recognized. T(x) The value x is placed in FRT[p]. U(x) The SNaN of operand x is converted to the corresponding QNaN and placed in FRT[p]. VXCVI The Invalid-Operation Exception (VXCVI) occurs. The result is produced only when the exception is disabled. (See Section 5.5.10.1 for actions.) VXSNAN: The Invalid-Operation Exception (VXSNAN) occurs. The result is produced only when the exception is disabled. See Section 5.5.10.1 for actions. W(x) The value and the form of x is placed in FRT[p]. Figure 90. Actions: Reround 178 Power ISATM I - III Version 2.05 DFP Round To FP Integer With Inexact Programming Note [Quad] Z23-form The DFP Round To FP Integer With Inexact and drintx R,FRT,FRB,RMC (Rc=0) DFP Round To FP Integer With Inexact Quad drintx. R,FRT,FRB,RMC (Rc=1) instructions can be used to implement the decimal equivalent of the C99 rint function by specifying the 59 FRT /// R FRB RMC 99 Rc primary RMC encoding for round according to FPSCRDRN (R=0, RMC=11). The specification for 0 6 11 15 16 21 23 31 rint requires the inexact exception be raised if detected. drintxq R,FRTp,FRBp,RMC (Rc=0) drintxq. R,FRTp,FRBp,RMC (Rc=1) 63 FRTp /// R FRBp RMC 99 Rc 0 6 11 15 16 21 23 31 The DFP operand in FRB[p] is rounded to a floating- point integer and placed into FRT[p]. The sign of the result is the same as the sign of the operand in FRB[p]. The ideal exponent is the larger value of zero and the exponent of the operand in FRB[p]. The rounding mode used is specified in the RMC field. When the RMC-encoding-selection (R) bit is zero, the RMC field contains the primary encoding; when the bit is one, the field contains the secondary encoding. In addition to coercion of the converted value to fit the target format, the special rounding used by Round To FP Integer also coerces the target exponent to the ideal exponent. When the operand in FRB[p] is a finite number and the exponent is less than zero, the operand is rounded to the result with an exponent of zero. When the expo- nent is greater than or equal to zero, the result is set to the numerical value and the form of the operand in FRB[p]. When the result differs in value from the operand in FRB[p], an inexact exception is recognized. No under- flow exception is recognized by this operation, regard- less of the value of the operand in FRB[p]. Figure 91 summarizes the actions for Round To FP Integer With Inexact. The table does not include the setting of the FPSCRFPRF field. The FPSCRFPRF field is always set to the class and sign of the result, except for an enabled invalid-operation, in which case the field remains unchanged. Special Registers Altered: FPRF FR FI FX XX VXSNAN CR1 (if Rc=1) Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 179 Version 2.05 Inv.-Op. Inexact Is n Incre- Operand b Is n not pre- Exception Exception mented in FRB is cise (n b) Enabled Enabled (|n| > |b|) Actions* - No1 - - - T(-dINF), FI 1 0, FR 1 0 F No - - - W(n), FI 1 0, FR 1 0 F Yes - No No W(n), FI 1 1, FR 1 0, XX 1 1 F Yes - No Yes W(n), FI 1 1, FR 1 1, XX 1 1 F Yes - Yes No W(n), FI 1 1, FR 1 0, XX 1 1, TX F Yes - Yes Yes W(n), FI 1 1, FR 1 1, XX 1 1, TX + No1 - - - T(+dINF), FI 1 0, FR 1 0 QNaN No1 - - - P(b), FI 1 0, FR 1 0 SNaN No1 No - - U(b), FI 1 0, FR 1 0, VXSNAN 1 1 SNaN No1 Yes - - VXSNAN 1 1, TV Explanation: * Setting of XX and VXSNAN is part of the corresponding exception actions. Also, when an invalid- operation exception occurs, setting of FI and FR is part of the exception actions.(See the sections, "Inexact Exception" and "Invalid Operation Exception" for more details.) - The actions do not depend on this condition. 1 This condition is true by virtue of the state of some condition to the left of this column. dINF Default infinity. F All finite numbers, including zeros. FI Floating-Point-Fraction-Inexact status flag, FPSCRFI. FR Floating-Point-Fraction-Rounded status flag, FPSCRFR. n The value derived when the source operand, b, is rounded to an integer using the special rounding for Round To FP Integer. P(x) The QNaN of operand x is propagated and placed in FRT[p]. T(x) The value x is placed in FRT[p]. TV The system floating-point enabled exception error handler is invoked for the invalid-operation excep- tion if the FE0 and FE1 bits in the machine-state register are set to any mode other than the ignore-exception mode. TX The system floating-point enabled exception error handler is invoked for the inexact exception if the FE0 and FE1 bits in the machine-state register are set to any mode other than the ignore-excep- tion mode. U(x) The SNaN of operand x is converted to the corresponding QNaN and placed in FPT[p]. W(x) The value x in the form of zero exponent or the source exponent is placed in FRT[p]. XX Floating-Point-Inexact-Exception status flag, FPSCRXX. Figure 91. Actions: Round to FP Integer With Inexact 180 Power ISATM I - III Version 2.05 DFP Round To FP Integer Without Inexact Special Registers Altered: [Quad] Z23-form FPRF FR (set to 0) FI (set to 0) FX drintn R,FRT,FRB,RMC (Rc=0) VXSNAN drintn. R,FRT,FRB,RMC (Rc=1) CR1 (if Rc=1) 59 FRT /// R FRB RMC 227 Rc Programming Note 0 6 11 15 16 21 23 31 The DFP Round To FP Integer Without Inexact and DFP Round To FP Integer Without Inexact Quad drintnq R,FRTp,FRBp,RMC (Rc=0) instructions can be used to implement decimal drintnq. R,FRTp,FRBp,RMC (Rc=1) equivalents of several C99 rounding functions by specifying the appropriate R and RMC field values. 63 FRTp /// R FRBp RMC 227 Rc Function R RMC 0 6 11 15 16 21 23 31 Ceil 1 0b00 Floor 1 0b01 This operation is the same as the Round To FP Integer Nearbyint 0 0b11 With Inexact operation, except that this operation does Round 0 0b10 not recognize an inexact exception. Trunc 0 0b01 Figure 92 summarizes the actions for Round To FP Note that nearbyint is similar to the rint function but Integer Without Inexact. The table does not include the without raising the inexact exception. Similarly ceil, setting of the FPSCRFPRF field. The FPSCRFPRF field floor, round, and trunc do not require the inexact is always set to the class and sign of the result, except exception. for an enabled invalid-operation, in which case the field remains unchanged. Operand b in Inv.-Op. Exception Actions* FRB is Enabled - - T(-dINF), FI 1 0, FR 1 0 F - W(n), FI 1 0, FR 1 0 + - T(+dINF), FI 1 0, FR 1 0 QNaN - P(b), FI 1 0, FR 1 0 SNaN No U(b), FI 1 0, FR 1 0, VXSNAN11 SNaN Yes VXSNAN 1 1, TV Explanation: * Setting of VXSNAN is part of the corresponding exception actions. Also, when an invalid-operation exception occurs, setting of FI and FR bits is part of the exception actions. (See the sections, "Invalid Operation Exception" for more details.) - The actions do not depend on this condition. dINF Default infinity. F All finite numbers, including zeros. FI Floating-Point-Fraction-Inexact status flag, FPSCRFI. FR Floating-Point-Fraction-Rounded status flag, FPSCRFR. n The value derived when the source operand, b, is rounded to an integer using the special rounding for Round-To-FP-Integer. P(x) The QNaN of operand x is propagated and placed in FRT[p]. T(x) The value x is placed in FRT[p]. TV The system floating-point enabled exception error handler is invoked for the invalid-operation exception if the FE0 and FE1 bits in the machine-state register are set to any mode other than the ignore- exception mode. U(x) The SNaN of operand x is converted to the corresponding QNaN and placed in FPT[p]. W(x) The value x in the form of zero exponent or the source exponent is placed in FRT[p]. Figure 92. Actions: Round to FP Integer Without Inexact Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 181 Version 2.05 5.6.5 DFP Conversion Instructions Programming Note The DFP conversion instructions consist of data-format DFP does not provide operations on short oper- conversion instructions and data-type conversion ands, so they must be converted to long format, instructions. They are all X-form instructions and and then converted back to be stored. Preserving employ the record bit (Rc). correct signaling NaN semantics requires that sig- naling NaNs be propagated from the source to the result without recognizing an exception during wid- 5.6.5.1 DFP Data-Format Conversion ening from short to long or narrowing from long to Instructions short. Because DFP does not provide equivalents to the BFP Load Floating-Point Single and Store The data-format conversion instructions consist of Con- Floating-Point Single functions, the widening is per- vert To DFP Long, Convert To DFP Extended, Round formed by loading the DFP short value with a Load To DFP Short, and Round To DFP Long. Figure 93 Floating as Integer Word Indexed followed by a summarizes the actions for these instructions. DFP Convert to DFP Long, and narrowing is per- formed by a DFP Round to DFP Short followed by a Store Floating-Point as Integer Word Indexed. If the SNaN or infinity in DFP short format uses the preferred DPD encoding, then converting this oper- and to DFP long format and back to DFP short will result in the original bit pattern. Actions when operand b in FRB[p] is Instruction F QNaN SNaN Convert To DFP Long T(b)1 P(b)2,4 P(b)2,4 P(b)3,4 Convert To DFP Extended T(b)1 T(dINF) P(b)2,4 VXSNAN: U(b)2,4 Round To DFP Short R(b) 1 P(b) 2,5 P(b) 2,5 P(b)3,5 Round To DFP Long R(b) 1 T(dINF) P(b) 2,5 VXSNAN: U(b)2,5 Explanation: 1 The ideal exponent is the exponent of the source operand. 2 Bits 5:N-1 of the N-bit combination field are set to zero. 3 Bit 5 of the N-bit combination field is set to one. Bits 6:N-1 of the combination field are set to zero. 4 The trailing significand field is padded on the left with zeros. 5 Leftmost digits in the trailing significand field are removed. dINF Default infinity. F All finite numbers, including zeros. P(x) The special symbol in operand x is propagated into FRT[p]. R(x) The value x is rounded to the target-format precision; see Section 5.5.11 T(x) The value x is placed in FRT[p]. U(x) The SNaN of operand x is converted to the corresponding QNaN. VXSNAN The Invalid-Operation Exception (VXSNAN) occurs. The result is produced only when the excep- tion is disabled. See Section 5.5.10.1 for actions. Figure 93. Actions: Data-Format Conversion Instructions 182 Power ISATM I - III Version 2.05 DFP Convert To DFP Long X-form DFP Convert To DFP Extended X-form dctdp FRT,FRB (Rc=0) dctqpq FRTp,FRB (Rc=0) dctdp. FRT,FRB (Rc=1) dctqpq. FRTp,FRB (Rc=1) 59 FRT /// FRB 258 Rc 63 FRTp /// FRB 258 Rc 0 6 11 16 21 31 0 6 11 16 21 31 The DFP short operand in bits 32-63 of FRB is con- The DFP long operand in the FRB is converted to DFP verted to DFP long format and the converted result is extended format and placed into FRTp. The sign of the placed into FRT. The sign of the result is the same as result is the same as the sign of the operand in FRB. the sign of the source operand. The ideal exponent is The ideal exponent is the exponent of the operand in the exponent of the source operand. FRB. If the operand in FRB is an SNaN, it is converted to an If the operand in FRB is an SNaN, an invalid-operation SNaN in DFP long format and does not cause an exception is recognized. If the exception is disabled, invalid-operation exception. the SNaN is converted to the corresponding QNaN in DFP extended format. Special Registers Altered: FPRF FR (undefined) FI (undefined) Special Registers Altered: CR1 (if Rc=1) FPRF FR (set to 0) FI (set to 0) FX Programming Note VXSNAN Note that DFP short format is a storage-only for- CR1 (if Rc=1) mat, Therefore, conversion of a short SNaN to long format will not cause an exception and the SNaN is preserved. Subsequent operation on that SNaN in long format will cause an exception. Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 183 Version 2.05 DFP Round To DFP Short X-form DFP Round To DFP Long X-form drsp FRT,FRB (Rc=0) drdpq FRTp,FRBp (Rc=0) drsp. FRT,FRB (Rc=1) drdpq. FRTp,FRBp (Rc=1) 59 FRT /// FRB 770 Rc 63 FRTp /// FRBp 770 Rc 0 6 11 16 21 31 0 6 11 16 21 31 The DFP long operand in FRB is converted and The DFP extended operand in FRBp is converted and rounded to DFP short format. The DFP short value is rounded to DFP long format. The result concatenated extended on the left with zeros to form a 64-bit entity with 64 0s is placed in FRTp. The sign of the result is and placed into FRT. The sign of the result is the same the same as the sign of the source operand. The ideal as the sign of the source operand. The ideal exponent exponent is the exponent of the operand in FRBp. is the exponent of the source operand. If the operand in FRBp is an SNaN, an invalid-opera- If the operand in FRB is an SNaN, it is converted to an tion exception is recognized. If the exception is dis- SNaN in DFP short format and does not cause an abled, the SNaN is converted to the corresponding invalid-operation exception. QNaN in DFP long format. Normally, the result is in the format and length of the Normally, the result is in the format and length of the target. However, when an overflow or underflow target. However, when an overflow or underflow exception occurs and if the exception is enabled, the exception occurs and if the exception is enabled, the operation is completed by producing a wrapped operation is completed by producing a wrapped rounded result in the same format and length as the rounded result in the same format and length as the source but rounded to the target-format precision. source but rounded to the target-format precision. Special Registers Altered: Special Registers Altered: FPRF FR FI FPRF FR FI FX OX UX XX FX OX UX XX CR1 (if Rc=1) VXSNAN CR1 (if Rc=1) Programming Note Note that DFP short format is a storage-only for- Programming Note mat, Therefore, conversion of a long SNaN to short Note that DFP Round to DFP Long, while produc- format will not cause an exception. Converting a ing a result in DFP long format, actually targets a long format SNaN to short format is an implied register pair, writing 64 0s in FRTp+1. move operation. 184 Power ISATM I - III Version 2.05 5.6.5.2 DFP Data-Type Conversion Instructions The DFP data-type conversion instructions are used to The data-type conversion instructions consist of Con- convert data type between DFP and fixed. vert From Fixed and Convert To Fixed. DFP Convert From Fixed Quad X-form DFP Convert To Fixed [Quad] X-form dcffixq FRTp,FRB (Rc=0) dctfix FRT,FRB (Rc=0) dcffixq. FRTp,FRB (Rc=1) dctfix. FRT,FRB (Rc=1) 63 FRTp /// FRB 802 Rc 59 FRT /// FRB 290 Rc 0 6 11 16 21 31 0 6 11 16 21 31 The 64-bit signed binary integer in FRB is converted dctfixq FRT,FRBp (Rc=0) and rounded to a DFP Extended value and placed into dctfixq. FRT,FRBp (Rc=1) FRTp. The sign of the result is the same as the sign of the source operand. The ideal exponent is zero. 63 FRT /// FRBp 290 Rc 0 6 11 16 21 31 If the source operand is a zero, then a plus zero with a zero exponent is returned. The DFP operand in FRB[p] is rounded to an integer The following table summarizes the actions for Convert value and is placed into FRT in the 64-bit signed binary From Fixed. The table does not include the setting of integer format. The sign of the result is the same as the FPSCRFPRF field. The FPSCRFPRF field is always the sign of the source operand, except when the source set to the class and sign of the result. operand is a NaN or a zero. Special Registers Altered: Figure 94 summarizes the actions for Convert To FPRF FR (undefined) FI (undefined) Fixed. CR1 (if Rc=1) Special Registers Altered: FPRF (undefined) FR FI FX XX VXSNAN VXCVI CR1 (if Rc=1) Programming Note It is recommended that software pre-round the operand to a floating-point integral using drintx[q] or drintn[q] is a rounding mode other than the cur- rent rounding mode specified by FPSCRDRN is needed. Saving, modifying and restoring the FPSCR just to temporarily change the rounding mode is less efficient than just employing drintx[p] or drint[p] which override the current rounding mode using an immediate control field. For example if the desired function rounding is Round to Nearest, Ties away from 0 but the default rounding (from FPSCRDRN) is Round to Nearest, Ties to Even then following is preferred. drintn 0,f1,f1,2 dctfix f1,f1 Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 185 Version 2.05 Is n not Inv.-Op. Inexact Is n Incre- Operand b q is precise Except. Except. mented Actions * in FRB[p] is (n b) Enabled Enabled (|n| > |b|) - b < MN < MN - No - - T(MN), FI 1 0, FR 1 0, VXCVI 1 1 - b < MN < MN - Yes - - VXCVI 1 1, TV - < b < MN = MN - - No - T(MN), FI 1 1, FR 1 0, XX 1 1 - < b < MN = MN - - Yes - T(MN), FI 1 1, FR 1 0, XX 1 1,TX MN b < 0 - No - - - T(n), FI 1 0, FR 1 0 MN b < 0 - Yes - No No T(n), FI 1 1, FR 1 0, XX 1 1 MN b < 0 - Yes - No Yes T(n), FI 1 1, FR 1 1, XX 1 1 MN b < 0 - Yes - Yes No T(n), FI 1 1, FR 1 0, XX 1 1, TX MN b < 0 - Yes - Yes Yes T(n), FI 1 1, FR 1 1, XX 1 1, TX ±0 - No - - - T(0), FI 1 0, FR 1 0 0 < b MP - No - - - T(n), FI 1 0, FR 1 0 0 < b MP - Yes - No No T(n), FI 1 1, FR 1 0, XX 1 1 0 < b MP - Yes - No Yes T(n), FI 1 1, FR 1 1, XX 1 1 0 < b MP - Yes - Yes No T(n), FI 1 1, FR 1 0, XX 1 1, TX 0 < b MP - Yes - Yes Yes T(n), FI 1 1, FR 1 1, XX 1 1, TX MP < b < + = MP - - No - T(MP), FI 1 1, FR 1 0, XX 1 1 MP < b < + = MP - - Yes - T(MP), FI 1 1, FR 1 0, XX 1 1, TX MP < b + > MP - No - - T(MP), FI 1 0, FR 1 0, VXCVI 1 1 MP < b + > MP - Yes - - VXCVI 1 1, TV QNaN - - No - - T(MN), FI10, FR10, VXCVI11 QNaN - - Yes - - VXCVI11, TV SNaN - - No - - T(MN),FI10, FR10, VXCVI11,VXSNAN 11 SNaN - - Yes - - VXCVI11,VXSNAN 1 1, TV Explanation: * Setting of XX, VXCVI, and VXSNAN is part of the corresponding exception actions. Also, when an invalid-operation exception occurs, setting of FI and FR bits is part of the exception actions. (See the sections, "Inexact Exception" and "Invalid Operation Exception" for more details.) - The actions do not depend on this condition. FI Floating-Point-Fraction-Inexact status flag, FPSCRFI. FR Floating-Point-Fraction-Rounded status flag, FPSCRFR. MN Maximum negative number representable by the 64-bit binary integer format MP Maximum positive number representable by the 64-bit binary integer format. n The value q converted to a fixed-point result. q The value derived when the source value b is rounded to an integer using the specified rounding mode T(x) The value x is placed in FRT[p]. TV The system floating-point enabled exception error handler is invoked for the invalid-operation exception if the FE0 and FE1 bits in the machine-state register are set to any mode other than the ignore-excep- tion mode. TX The system floating-point enabled exception error handler is invoked for the inexact exception if the FE0 and FE1 bits in the machine-state register are set to any mode other than the ignore-exception mode. VXCVI The FPSCRVXCVI invalid operation exception status bit. VXSNAN The FPSCRVXSNAN invalid operation exception status bit. XX Floating-Point-Inexact-Exception status flag, FPSCRXX. Figure 94. Actions: Convert To Fixed 186 Power ISATM I - III Version 2.05 5.6.6 DFP Format Instructions The DFP format instructions are used to compose or The format instructions consist of Decode DPD To decompose a DFP operand. A source operand of BCD, Encode BCD To DPD, Extract Biased Exponent, SNaN does not cause an invalid-operation exception. Insert Biased Exponent, Shift Significand Left Immedi- All format instructions employ the record bit (Rc). ate, and Shift Significand Right Immediate. DFP Decode DPD To BCD [Quad] X-form DFP Encode BCD To DPD [Quad] X-form ddedpd SP,FRT,FRB (Rc=0) denbcd S,FRT,FRB (Rc=0) ddedpd. SP,FRT,FRB (Rc=1) denbcd. S,FRT,FRB (Rc=1) 59 FRT SP /// FRB 322 Rc 59 FRT S /// FRB 834 Rc 0 6 11 13 16 21 31 0 6 11 12 16 21 31 ddedpdq SP,FRTp,FRBp (Rc=0) denbcdq S,FRTp,FRBp (Rc=0) ddedpdq. SP,FRTp,FRBp (Rc=1) denbcdq. S,FRTp,FRBp (Rc=1) 63 FRTp SP /// FRBp 322 Rc 63 FRTp S /// FRBp 834 Rc 0 6 11 13 16 21 31 0 6 11 12 16 21 31 A portion of the significand of the DFP operand in The signed or unsigned BCD operand, depending on FRB[p] is converted to a signed or unsigned BCD num- the S field, in FRB[p] is converted to a DFP number. ber depending on the SP field. For infinity and NaN, the The ideal exponent is zero. significand is considered to be the contents in the trail- ing significand field padded on the left by a zero digit. S = 0 (unsigned BCD operand) The unsigned BCD operand in FRB[p] is converted SP0 = 0 (unsigned conversion) to a positive DFP number of the same magnitude The rightmost 16 digits of the significand (32 digits and the result is placed into FRT[p]. for ddedpdq) is converted to an unsigned BCD number and the result is placed into FRT[p]. S = 1 (signed BCD operand) The signed BCD operand in FRB[p] is converted to SP0 = 1 (signed conversion) the corresponding DFP number and the result is The rightmost 15 digits of the significand (31 digits placed into FRT[p]. for ddedpdq) is converted to a signed BCD num- If an invalid BCD digit or sign code is detected in the ber with the same sign as the DFP operand, and source operand, an invalid-operation exception the result is placed into FRT[p]. If the DFP operand (VXCVI) occurs. is negative, the sign is encoded as 0b1101. If the DFP operand is positive, SP1 indicates which pre- FPSCRFPRF is set to the class and sign of the result, ferred plus sign encoding is used. If SP1 = 0, the except for Invalid Operation Exception when plus sign is encoded as 0b1100 (the option-1 pre- FPSCRVE=1. ferred sign code), otherwise the plus sign is encoded as 0b1111(the option-2 preferred sign Special Registers Altered: code). FPRF FR (set to 0) FI (set to 0) FX Special Registers Altered: VXCVI CR1 (if Rc=1) CR1 (if Rc=1) Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 187 Version 2.05 DFP Extract Biased Exponent [Quad] X- DFP Insert Biased Exponent [Quad] X- form form dxex FRT,FRB (Rc=0) diex FRT,FRA,FRB (Rc=0) dxex. FRT,FRB (Rc=1) diex. FRT,FRA,FRB (Rc=1) 59 FRT /// FRB 354 Rc 59 FRT FRA FRB 866 Rc 0 6 11 16 21 31 0 6 11 16 21 31 dxexq FRT,FRBp (Rc=0) diexq FRTp,FRA,FRBp (Rc=0) dxexq. FRT,FRBp (Rc=1) diexq. FRTp,FRA,FRBp (Rc=1) 63 FRT /// FRBp 354 Rc 63 FRTp FRA FRBp 866 Rc 0 6 11 16 21 31 0 6 11 16 21 31 The biased exponent of the operand in FRB[p] is Let a be the value of the 64-bit signed binary integer in extracted and placed into FRT in the 64-bit signed FRA. binary integer format. When the operand in FRB is an a Result infinity, QNaN, or SNaN, a special code is returned. a > MBE1 QNaN MBE 63a 630 Finite number with biased exponent a Operand Result a = -1 Infinity Finite Number biased exponent value a = -2 QNaN Infinity -1 a = -3 SNaN QNaN -2 a < -3 QNaN SNaN -3 1 Maximum biased exponent for the target format Special Registers Altered: When 0 5 a 5 MBE, a is the biased target exponent that CR1 (if Rc=1) is combined with the sign bit and the significand value of the DFP operand in FRB[p] to form the DFP result in Programming Note FRT[p]. The ideal exponent is the specified target The exponent bias value is 101 for DFP Short, 398 exponent. for DFP Long, and 6176 for DFP Extended. When a specifies a special code (a < 0 or a > MBE), an infinity, QNaN, or SNaN is formed in FRT[p] with the trailing significand field containing the value from the trailing significand field of the source operand in FRB[p], and with an N-bit combination field set as fol- lows. 1 For an Infinity result, 1 the leftmost 5 bits are set to 0b11110, and 1 the rightmost N-5 bits are set to zero. 1 For a QNaN result, 1 the leftmost 5 bits are set to 0b11111, 1 bit 5 is set to zero, and 1 the rightmost N-5 bits are set to zero. 1 For an SNaN result, 1 the leftmost 5 bits are set to 0b11111, 1 bit 5 is set to one, and 1 the rightmost N-5 bits are set to zero. Special Registers Altered: CR1 (if Rc=1) Programming Note The exponent bias value is 101 for DFP Short, 398 for DFP Long, and 6176 for DFP Extended. 188 Power ISATM I - III Version 2.05 Operand a in Actions for Insert Biased Exponent when operand b in FRB[p] specifies FRA[p] specifies F QNaN SNaN F N, Rb Z, Rb Z, Rb Z, Rb I, Rb I, Rb I, Rb I, Rb QNaN Q, Rb Q, Rb Q, Rb Q, Rb SNaN S, Rb S, Rb S, Rb S, Rb Explanation: F All finite numbers, including zeros I The combination field in FRT[p] is set to indicate a default Infinity. N The combination field in FRT[p] is set to the specified biased exponent in FRA and the leftmost significand digit in FRB[p]. Q The combination field in FRT[p] is set to indicate a default QNaN. S The combination field in FRT[p] is set to indicate a default SNaN. Z The combination field in FRT[p] is set to indicate the specific biased exponent in FRA and a leftmost coefficient digit of zero. Rb The contents of the trailing significand field in FRB[p] are reencoded using preferred DPD encodings and the reencoded result is placed in the same field in FRT[p]. The sign bit of FRB[p] is copied into the sign bit in FRT[p]. Figure 95. Actions: Insert Biased Exponent Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 189 Version 2.05 DFP Shift Significand Left Immediate DFP Shift Significand Right Immediate [Quad] Z22-form [Quad] Z22-form dscli FRT,FRA,SH (Rc=0) dscri FRT,FRA,SH (Rc=0) dscli. FRT,FRA,SH (Rc=1) dscri. FRT,FRA,SH (Rc=1) 59 FRT FRA SH 66 Rc 59 FRT FRA SH 98 Rc 0 6 11 16 22 31 0 6 11 16 22 31 dscriq FRTp,FRAp,SH (Rc=0) dscliq FRTp,FRAp,SH (Rc=0) dscriq. FRTp,FRAp,SH (Rc=1) dscliq. FRTp,FRAp,SH (Rc=1) 63 FRTp FRAp SH 98 Rc 63 FRTp FRAp SH 66 Rc 0 6 11 16 22 31 0 6 11 16 22 31 The significand of the DFP operand in FRA[p] is shifted The significand of the DFP operand in FRA[p] is shifted right SH digits. For a NaN or infinity, all significand dig- left SH digits. For a NaN or infinity, all significand digits its are in the trailing significand field. SH is a 6-bit are in the trailing significand field. SH is a 6-bit unsigned binary integer. Digits shifted out of the units unsigned binary integer. Digits shifted out of the left- digit are lost. Zeros are supplied to the vacated posi- most digit are lost. Zeros are supplied to the vacated tions on the left. The result is placed into FRT[p]. The positions on the right. The result is placed into FRT[p]. sign of the result is the same as the sign of the source The sign of the result is the same as the sign of the operand in FRA[p]. source operand in FRA[p]. If the source operand in FRA[p] is a finite number, the If the source operand in FRA[p] is a finite number, the exponent of the result is the same as the exponent of exponent of the result is the same as the exponent of the source operand. the source operand. For an Infinity, QNaN or SNaN result, the target for- For an Infinity, QNaN or SNaN result, the target for- mat's N-bit combination field is set as follows. mat's N-bit combination field is set as follows. 1 For an Infinity result, 1 For an Infinity result, 1 the leftmost 5 bits are set to 0b11110, and 1 the leftmost 5 bits are set to 0b11110, and 1 the rightmost N-5 bits are set to zero. 1 the rightmost N-5 bits are set to zero. 1 For a QNaN result, 1 For a QNaN result, 1 the leftmost 5 bits are set to 0b11111, 1 the leftmost 5 bits are set to 0b11111, 1 bit 5 is set to zero, and 1 bit 5 is set to zero, and 1 the rightmost N-6 bits are set to zero. 1 the rightmost N-6 bits are set to zero. 1 For an SNaN result, 1 For an SNaN result, 1 the leftmost 5 bits are set to 0b11111, 1 the leftmost 5 bits are set to 0b11111, 1 bit 5 is set to one, and 1 bit 5 is set to one, and 1 the rightmost N-6 bits are set to zero. 1 the rightmost N-6 bits are set to zero. Special Registers Altered: Special Registers Altered: CR1 (if Rc=1) CR1 (if Rc=1) 190 Power ISATM I - III Version 2.05 5.6.7 DFP Instruction Summary Mnemonic FPRF Encoding FORM FP FPCC FR\FI SNaN Exception Rc Full Name Operands Vs G V Z O U X IE C dadd DFP Add X FRT, FRA, FRB Y N RE Y Y V O U X Y Y Y daddq DFP Add Quad X FRTp, FRAp, FRBp Y N RE Y Y V O U X Y Y Y dsub DFP Subtract X FRT, FRA, FRB Y N RE Y Y V O U X Y Y Y dsubq DFP Subtract Quad X FRTp, FRAp, FRBp Y N RE Y Y V O U X Y Y Y dmul DFP Multiply X FRT, FRA, FRB Y N RE Y Y V O U X Y Y Y dmulq DFP Multiply Quad X FRTp, FRAp, FRBp Y N RE Y Y V O U X Y Y Y ddiv DFP Divide X FRT, FRA, FRB Y N RE Y Y V Z O U X Y Y Y ddivq DFP Divide Quad X FRTp, FRAp, FRBp Y N RE Y Y V Z O U X Y Y Y dcmpo DFP Compare Ordered X BF, FRA, FRB Y - - N Y V - - N dcmpoq DFP Compare Ordered Quad X BF, FRAp, FRBp Y - - N Y V - - N dcmpu DFP Compare Unordered X BF, FRA, FRB Y - - N Y V - - N dcmpuq DFP Compare Unordered Quad X BF, FRAp, FRBp Y - - N Y V - - N dtstdc DFP Test Data Class Z22 BF, FRA, DCM N - - N Y 1 - - N dtstdcq DFP Test Data Class Quad Z22 BF, FRAp, DCM N - - N Y 1 - - N dtstdg DFP Test Data Group Z22 BF, FRA,DGM N - - N Y1 - - N dtstdgq DFP Test Data Group Quad Z22 BF, FRAp, DGM N - - N Y1 - - N dtstex DFP Test Exponent X BF, FRA, FRB N - - N Y - - N dtstexq DFP Test Exponent Quad X BF, FRAp, FRBp N - - N Y - - N dtstsf DFP Test Significance X BF, FRA(FIX), FRB N - - N Y - - N dtstsfq DFP Test Significance Quad X BF, FRA(FIX), FRBp N - - N Y - - N dquai DFP Quantize Immediate Z23 TE, FRT, FRB, RMC Y N RE Y Y V X Y Y Y dquaiq DFP Quantize Immediate Quad Z23 TE, FRTp, FRBp, RMC Y N RE Y Y V X Y Y Y dqua DFP Quantize Z23 FRT,FRA,FRB,RMC Y N RE Y Y V X Y Y Y dquaq DFP Quantize Quad Z23 FRTp,FRAp,FRBp, RMC Y N RE Y Y V X Y Y Y drrnd DFP Reround Z23 FRT,FRA(FIX),FRB,RMC Y N RE Y Y V X Y Y Y drrndq DFP Reround Quad Z23 FRTp, FRA(FIX), FRBp, RMC Y N RE Y Y V X Y Y Y DFP Round To FP Integer With Y drintx Z23 R,FRT, FRB,RMC Y N RE Y Y V X Y Y Inexact DFP Round To FP Integer With Y drintxq Z23 R,FRTp,FRBp,RMC Y N RE Y Y V X Y Y Inexact Quad DFP Round To FP Integer Without Y drintn Z23 R,FRT, FRB,RMC Y N RE Y Y V Y# Y Inexact DFP Round To FP Integer Without Y drintnq Z23 R,FRTp, FRBp,RMC Y N RE Y Y V Y# Y Inexact Quad dctdp DFP Convert To DFP Long X FRT, FRB (DFP Short) N Y RE Y Y2 U Y Y dctqpq DFP Convert To DFP Extended X FRTp, FRB Y N RE Y Y V Y# Y Y drsp DFP Round To DFP Short X FRT (DFP Short), FRB N Y RE Y Y 2 O UX Y Y Y drdpq DFP Round To DFP Long X FRTp, FRBp Y N RE Y Y V O U X Y Y Y dcffixq DFP Convert From Fixed Quad X FRTp, FRB (FIX) - N RE Y Y U Y Y dctfix DFP Convert To Fixed X FRT (FIX), FRB Y N - U U V X Y - Y dctfixq DFP Convert To Fixed Quad X FRT (FIX), FRBp Y N - U U V X Y - Y ddedpd DFP Decode DPD To BCD X SP, FRT(BCD), FRB N - - N N - - Y ddedpdq DFP Decode DPD To BCD Quad X SP, FRTp(BCD), FRBp N - - N N - - Y Figure 96. Decimal Floating-Point Instructions Summary Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] 191 Version 2.05 Mnemonic FPRF Encoding FORM FP FPCC FR\FI SNaN Exception Rc Full Name Operands Vs G V Z O U X IE C denbcd DFP Encode BCD To DPD X S, FRT, FRB (BCD) - N RE Y Y V Y # Y Y denbcdq DFP Encode BCD To DPD Quad X S, FRTp, FRBp (BCD) - N RE Y Y V Y# Y Y dxex DFP Extract Biased Exponent X FRT (FIX), FRB N N - N N - - Y dxexq DFP Extract Biased Exponent Quad X FRT (FIX), FRBp N N - N N - - Y diex DFP Insert Biased Exponent X FRT, FRA(FIX), FRB N Y RE N N - Y Y diexq DFP Insert Biased Exponent Quad X FRTp, FRA(FIX), FRBp N Y RE N N - Y Y DFP Shift Significand Left Immedi- Y dscli Z22 FRT,FRA,SH N Y RE N N - - ate DFP Shift Significand Left Immedi- Y dscliq Z22 FRTp,FRAp,SH N Y RE N N - - ate Quad DFP Shift Significand Right Immedi- Y dscri Z22 FRT,FRA,SH N Y RE N N - - ate DFP Shift Significand Right Immedi- Y dscriq Z22 FRTp,FRAp,SH N Y RE N N - - ate Quad Explanation: # FI and FR are set to zeros for these instructions. - Not applicable. 1 A unique definition of the FPSCRFPCC field is provided for the instruction. These are the only instructions that may generate an SNaN and also set the FPSCFPRF field. Since the BFP FPSCRFPRF field 2 does not include a code for SNaN, these instructions cause the need for redefining the FPSCRFPRF field for DFP. DCM A 6-bit immediate operand specifying the data-class mask. DGM A 6-bit immediate operand specifying the data-group mask. G An SNaN can be generated as the target operand. IE An ideal exponent is defined for the instruction. FI Setting of the FPSCRFI flag. FR Setting of the FPSCRFR flag. N No. O An overflow exception may be recognized. Rc The record bit, Rc, is provided to record FPSCR0:3 in CR field 1. The trailing significand field is reencoded using preferred DPD encodings.The preferred DPD encoding are also used for propa- RE gated NaNs, or converted NaNs and infinities. RMC A 2-bit immediate operand specifying the rounding-mode control. S An one-bit immediate operand specifying if the operation is signed or unsigned. A two-bit immediate operand: one bit specifies if the operation is signed or unsigned and, for signed operations, another bit SP specifies which preferred plus sign code is generated. U An underflow exception may be recognized. V An invalid-operation exception may be recognized. Vs An input operand of SNaN causes an invalid-operation exception. X An inexact exception may be recognized. Y Yes. U Undefined Z A zero-divide exception may be recognized. Figure 96. Decimal Floating-Point Instructions Summary (Continued) 192 Power ISATM I - III Version 2.05 Chapter 6. Vector Processor [Category: Vector] 6.1 Vector Processor Overview . . . . . 194 6.8.6 Vector Shift Instructions . . . . . . . 218 6.2 Chapter Conventions . . . . . . . . . . 194 6.9 Vector Integer Instructions . . . . . . 220 6.2.1 Description of Instruction Operation 6.9.1 Vector Integer Arithmetic Instructions 194 220 6.3 Vector Processor Registers . . . . . 195 6.9.1.1 Vector Integer Add Instructions 220 6.3.1 Vector Registers . . . . . . . . . . . . 195 6.9.1.2 Vector Integer Subtract Instruc- 6.3.2 Vector Status and Control Register . tions . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 195 6.9.1.3 Vector Integer Multiply Instructions 6.3.3 VR Save Register . . . . . . . . . . . 196 226 6.4 Vector Storage Access Operations 196 6.9.1.4 Vector Integer Multiply-Add/Sum 6.4.1 Accessing Unaligned Storage Oper- Instructions . . . . . . . . . . . . . . . . . . . . . 228 ands . . . . . . . . . . . . . . . . . . . . . . . . . . 198 6.9.1.5 Vector Integer Sum-Across Instruc- 6.5 Vector Integer Operations . . . . . . 199 tions . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 6.5.1 Integer Saturation . . . . . . . . . . . 199 6.9.1.6 Vector Integer Average Instruc- 6.6 Vector Floating-Point Operations . 200 tions . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 6.6.1 Floating-Point Overview. . . . . . . 200 6.9.1.7 Vector Integer Maximum and Mini- 6.6.2 Floating-Point Exceptions . . . . . 200 mum Instructions . . . . . . . . . . . . . . . . . 237 6.6.2.1 NaN Operand Exception. . . . . 201 6.9.2 Vector Integer Compare Instructions 6.6.2.2 Invalid Operation Exception . . 201 241 6.6.2.3 Zero Divide Exception . . . . . . 201 6.9.3 Vector Logical Instructions . . . . . 244 6.6.2.4 Log of Zero Exception . . . . . . 201 6.9.4 Vector Integer Rotate and Shift 6.6.2.5 Overflow Exception. . . . . . . . . 201 Instructions . . . . . . . . . . . . . . . . . . . . . 245 6.6.2.6 Underflow Exception. . . . . . . . 202 6.10 Vector Floating-Point Instruction Set. 6.7 Vector Storage Access Instructions . . 249 202 6.10.1 Vector Floating-Point Arithmetic 6.7.1 Storage Access Exceptions . . . . 202 Instructions . . . . . . . . . . . . . . . . . . . . . 249 6.7.2 Vector Load Instructions . . . . . . 203 6.10.2 Vector Floating-Point Maximum and 6.7.3 Vector Store Instructions . . . . . . 206 Minimum Instructions. . . . . . . . . . . . . . 251 6.7.4 Vector Alignment Support Instruc- 6.10.3 Vector Floating-Point Rounding and tions . . . . . . . . . . . . . . . . . . . . . . . . . . 208 Conversion Instructions . . . . . . . . . . . . 252 6.8 Vector Permute and Formatting 6.10.4 Vector Floating-Point Compare Instructions . . . . . . . . . . . . . . . . . . . . . 209 Instructions . . . . . . . . . . . . . . . . . . . . . 255 6.8.1 Vector Pack and Unpack Instructions 6.10.5 Vector Floating-Point Estimate 209 Instructions . . . . . . . . . . . . . . . . . . . . . 257 6.8.2 Vector Merge Instructions . . . . . 214 6.11 Vector Status and Control Register 6.8.3 Vector Splat Instructions . . . . . . 216 Instructions . . . . . . . . . . . . . . . . . . . . . 259 6.8.4 Vector Permute Instruction . . . . 217 6.8.5 Vector Select Instruction . . . . . . 217 Chapter 6. Vector Processor [Category: Vector] 193 Version 2.05 6.1 Vector Processor Overview Clamp(x, y, z) x is interpreted as a signed integer. If the This chapter describes the registers and instructions value of x is less than y, then the value y is that make up the Vector Processor facility. returned, else if the value of x is greater than z, the value z is returned, else the value x is returned. 6.2 Chapter Conventions if (x < y) then result 1 y VSCRSAT 1 1 6.2.1 Description of Instruction else if (x > z) then result 1 z Operation VSCRSAT 1 1 else result 1 x The following notation, in addition to that described in RoundToSPIntCeil(x) Section 1.3.2, is used in this chapter. Additional RTL The value x if x is a single-precision float- functions are described in Appendix B. ing-point integer; otherwise the smallest Notation Meaning single-precision floating-point integer that x?y:z if the value of x is true, then the value of y, is greater than x. otherwise the value z. RoundToSPIntFloor(x) +int Integer addition. The value x if x is a single-precision float- +fp Floating-point addition. ing-point integer; otherwise the largest sin- ­fp Floating-point subtraction. gle-precision floating-point integer that is ×sui Multiplication of a signed-integer (first less than x. operand) by an unsigned-integer (second RoundToSPIntNear(x) operand). The value x if x is a single-precision float- ×fp Floating-point multiplication. ing-point integer; otherwise the single-pre- =int Integer equals relation. cision floating-point integer that is nearest =fp Floating-point equals relation. in value to x (in case of a tie, the even sin- ui, ui gle-precision floating-point integer is Unsigned-integer comparison relations. used). si, si RoundToSPIntTrunc(x) Signed-integer comparison relations. The value x if x is a single-precision float- fp, fp ing-point integer; otherwise the largest sin- Floating-point comparison relations. gle-precision floating-point integer that is LENGTH( x ) Length of x, in bits. If x is the word "ele- less than x if x>0, or the smallest sin- ment", LENGTH( x ) is the length, in bits, gle-precision floating-point integer that is of the element implied by the instruction greater than x if x<0. mnemonic. RoundToNearSP(x) x << y Result of shifting x left by y bits, filling The single-precision floating-point number vacated bits with zeros. that is nearest in value to the infinitely-pre- b 1 LENGTH(x) cise floating-point intermediate result x (in result 1 (y < b) ? (xy:b-1 ||y0) : b0 case of a tie, the single-precision float- x >>ui y Result of shifting x right by y bits, filling ing-point value with the least-significant bit vacated bits with zeros. equal to 0 is used). b 1 LENGTH(x) ReciprocalEstimateSP(x) result 1 (y < b) ? (y0 || x0:(b-y)-1) : b0 A single-precision floating-point estimate x >> y Result of shifting x right by y bits, filling of the reciprocal of the single-precision vacated bits with copies of bit 0 (sign bit) floating-point number x. of x. ReciprocalSquareRootEstimateSP(x) b 1 LENGTH(x) A single-precision floating-point estimate result 1 (y>ui ( shb || 0b000 ) do i=0 to 127 by 8 t 1 t & ((VRB)i+5:i+7=sh) The contents of VRA are shifted right by the number of if t=1 then VRT 1 (VRA) >>ui sh bytes specified in (VRB)121:124. else VRT 1 undefined - Bytes shifted out of byte 15 are lost. - Zeros are supplied to the vacated bytes on the The contents of VRA are shifted right by the number of left. bits specified in (VRB)125:127. - Bits shifted out of bit 127 are lost. The result is placed into VRT. - Zeros are supplied to the vacated bits on the Special Registers Altered: left. None The result is place into VRT, except if, for any byte ele- ment in register VRB, the low-order 3 bits are not equal to the shift amount, then VRT is undefined. Special Registers Altered: None Programming Note A double-register shift by a dynamically specified number of bits (0-127) can be performed in six instructions. The following example shifts Vw || Vx left by the number of bits specified in Vy and places the high-order 128 bits of the result into Vz. vslo Vt1,Vw,Vy #shift high-order reg left vsl Vt1,Vt1,Vy vsububm Vt3,V0,Vy #adjust shift count ((V0)=0) vsro Vt2,Vx,Vt3 #shift low-order reg right vsr Vt2,Vt2,Vt3 vor Vz,Vt1,Vt2 #merge to get final result Chapter 6. Vector Processor [Category: Vector] 219 Version 2.05 6.9 Vector Integer Instructions 6.9.1 Vector Integer Arithmetic Instructions 6.9.1.1 Vector Integer Add Instructions Vector Add and Write Carry-Out Unsigned Vector Add Signed Byte Saturate VX-form Word VX-form vaddsbs VRT,VRA,VRB vaddcuw VRT,VRA,VRB 4 VRT VRA VRB 768 4 VRT VRA VRB 384 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 8 do i=0 to 127 by 32 aop 1 EXTS(VRAi:i+7) aop 1 EXTZ((VRA)i:i+31) bop 1 EXTS(VRBi:i+7) bop 1 EXTZ((VRB)i:i+31) VRTi:i+7 1 Clamp( aop +int bop, -128, 127 )24:31 VRTi:i+31 1 Chop( ( aop +int bop ) >>ui 32,1) For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 3, do the following. Signed-integer byte element i in VRA is added to Unsigned-integer word element i in VRA is added signed-integer byte element i in VRB. to unsigned-integer word element i in VRB. The - If the sum is greater than 127 the result carry out of the 32-bit sum is zero-extended to 32 saturates to 127. bits and placed into word element i of VRT. - If the sum is less than -128 the result sat- urates to -128. Special Registers Altered: None The low-order 8 bits of the result are placed into byte element i of VRT. Special Registers Altered: SAT Vector Add Signed Halfword Saturate Vector Add Signed Word Saturate VX-form VX-form vaddshs VRT,VRA,VRB vaddsws VRT,VRA,VRB 4 VRT VRA VRB 832 4 VRT VRA VRB 896 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 16 do i=0 to 127 by 32 aop 1 EXTS((VRA)i:i+15) aop 1 EXTS((VRA)i:i+31) bop 1 EXTS((VRB)i:i+15) bop 1 EXTS((VRB)i:i+31) VRTi:i+15 VRTi:i+31 1 Clamp(aop +int bop, -231, 231-1) 21 Clamp(aop +int bop, -215, 215-1)16:31 For each vector element i from 0 to 3, do the following. For each vector element i from 0 to 7, do the following. Signed-integer word element i in VRA is added to Signed-integer halfword element i in VRA is added signed-integer word element i in VRB. to signed-integer halfword element i in VRB. - If the sum is greater than 231-1 the result - If the sum is greater than 215-1 the result saturates to 231-1. saturates to 215-1 - If the sum is less than -231 the result satu- - If the sum is less than -215 the result satu- rates to -231. rates to -215. The low-order 32 bits of the result are placed into The low-order 16 bits of the result are placed into word element i of VRT. halfword element i of VRT. Special Registers Altered: Special Registers Altered: SAT SAT 220 Power ISATM I Version 2.05 Vector Add Unsigned Byte Modulo Vector Add Unsigned Halfword Modulo VX-form VX-form vaddubm VRT,VRA,VRB vadduhm VRT,VRA,VRB 4 VRT VRA VRB 0 4 VRT VRA VRB 64 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 8 do i=0 to 127 by 16 aop 1 EXTZ((VRA)i:i+7) aop 1 EXTZ((VRA)i:i+15) bop 1 EXTZ((VRB)i:i+7) bop 1 EXTZ((VRB)i:i+15) VRTi:i+7 1 Chop( aop +int bop, 8 ) VRTi:i+15 1 Chop( aop +int bop, 16 ) For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Unsigned-integer byte element i in VRA is added Unsigned-integer halfword element i in VRA is to unsigned-integer byte element i in VRB. added to unsigned-integer halfword element i in VRB. The low-order 8 bits of the result are placed into byte element i of VRT. The low-order 16 bits of the result are placed into halfword element i of VRT. Special Registers Altered: None Special Registers Altered: None Programming Note vaddubm can be used for unsigned or signed-inte- Programming Note gers. vadduhm can be used for unsigned or signed-inte- gers. Vector Add Unsigned Word Modulo VX-form vadduwm VRT,VRA,VRB 4 VRT VRA VRB 128 0 6 11 16 21 31 do i=0 to 127 by 32 aop 1 EXTZ((VRA)i:i+31) bop 1 EXTZ((VRB)i:i+31) temp 1 aop +int bop VRTi:i+31 1 Chop( aop +int bop, 32 ) For each vector element i from 0 to 3, do the following. Unsigned-integer word element i in VRA is added to unsigned-integer word element i in VRB. The low-order 32 bits of the result are placed into word element i of VRT. Special Registers Altered: None Programming Note vadduwm can be used for unsigned or signed-inte- gers. Chapter 6. Vector Processor [Category: Vector] 221 Version 2.05 Vector Add Unsigned Byte Saturate Vector Add Unsigned Halfword Saturate VX-form VX-form vaddubs VRT,VRA,VRB vadduhs VRT,VRA,VRB 4 VRT VRA VRB 512 4 VRT VRA VRB 576 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 8 do i=0 to 127 by 16 aop 1 EXTZ((VRA)i:i+7) aop 1 EXTZ((VRA)i:i+15) bop 1 EXTZ((VRB)i:i+7) bop 1 EXTZ((VRB)i:i+15) VRTi:i+7 1 Clamp( aop +int bop, 0, 255 )24:31 VRTi:i+15 1 Clamp(aop +int bop, 0, 216-1)16:31 For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Unsigned-integer byte element i in VRA is added Unsigned-integer halfword element i in VRA is to unsigned-integer byte element i in VRB. added to unsigned-integer halfword element i in - If the sum is greater than 255 the result VRB. saturates to 255. - If the sum is greater than 216-1 the result saturates to 216-1. The low-order 8 bits of the result are placed into byte element i of VRT. The low-order 16 bits of the result are placed into halfword element i of VRT. Special Registers Altered: SAT Special Registers Altered: SAT Vector Add Unsigned Word Saturate VX-form vadduws VRT,VRA,VRB 4 VRT VRA VRB 640 0 6 11 16 21 31 do i=0 to 127 by 32 aop 1 EXTZ((VRA)i:i+31) bop 1 EXTZ((VRB)i:i+31) VRTi:i+31 1 Clamp(aop +int bop, 0, 232-1) For each vector element i from 0 to 3, do the following. Unsigned-integer word element i in VRA is added to unsigned-integer word element i in VRB. - If the sum is greater than 232-1 the result saturates to 232-1. The low-order 32 bits of the result are placed into word element i of VRT. Special Registers Altered: SAT 222 Power ISATM I Version 2.05 6.9.1.2 Vector Integer Subtract Instructions Vector Subtract and Write Carry-Out Vector Subtract Signed Byte Saturate Unsigned Word VX-form VX-form vsubcuw VRT,VRA,VRB vsubsbs VRT,VRA,VRB 4 VRT VRA VRB 1408 4 VRT VRA VRB 1792 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 32 do i=0 to 127 by 8 aop 1 (VRA)i:i+31 aop 1 EXTS((VRA)i:i+7) bop 1 (VRB)i:i+31 bop 1 EXTS((VRB)i:i+7) temp 1 (EXTZ(aop) +int EXTZ(¬bop) +int 1) >> 32 VRTi:i+7 1 VRTi:i+31 1 temp & 0x0000_0001 Clamp(aop +int ¬bop +int 1, -128, 127)24:31 For each vector element i from 0 to 3, do the following. For each vector element i from 0 to 15, do the following. Unsigned-integer word element i in VRB is sub- Signed-integer byte element i in VRB is subtracted tracted from unsigned-integer word element i in from signed-integer byte element i in VRA. VRA. The complement of the borrow out of bit 0 of - If the intermediate result is greater than the 32-bit difference is zero-extended to 32 bits 127 the result saturates to 127. and placed into word element i of VRT. - If the intermediate result is less than -128 the result saturates to -128. Special Registers Altered: None The low-order 8 bits of the result are placed into byte element i of VRT. Special Registers Altered: SAT Vector Subtract Signed Halfword Saturate Vector Subtract Signed Word Saturate VX-form VX-form vsubshs VRT,VRA,VRB vsubsws VRT,VRA,VRB 4 VRT VRA VRB 1856 4 VRT VRA VRB 1920 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 16 do i=0 to 127 by 32 aop 1 EXTS((VRA)i:i+15) aop 1 EXTS((VRA)i:i+31) bop 1 EXTS((VRB)i:i+15) bop 1 EXTS((VRB)i:i+31) VRTi:i+15 VRTi:i+31 1 Clamp(aop +int ¬bop +int 1,-231,231-1) 21 Clamp(aop +int ¬bop +int 1, -215, 215-1)16:31 For each vector element i from 0 to 3, do the following. For each vector element i from 0 to 7, do the following. Signed-integer word element i in VRB is sub- Signed-integer halfword element i in VRB is sub- tracted from signed-integer word element i in VRA. tracted from signed-integer halfword element i in - If the intermediate result is greater than VRA. 231-1 the result saturates to 231-1. - If the intermediate result is greater than - If the intermediate result is less than -231 215-1 the result saturates to 215-1. the result saturates to -231. - If the intermediate result is less than -215 the result saturates to -215. The low-order 32 bits of the result are placed into word element i of VRT. The low-order 16 bits of the result are placed into halfword element i of VRT. Special Registers Altered: SAT Special Registers Altered: SAT Chapter 6. Vector Processor [Category: Vector] 223 Version 2.05 Vector Subtract Unsigned Byte Modulo Vector Subtract Unsigned Halfword VX-form Modulo VX-form vsububm VRT,VRA,VRB vsubuhm VRT,VRA,VRB 4 VRT VRA VRB 1024 4 VRT VRA VRB 1088 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 8 do i=0 to 127 by 16 aop 1 EXTZ((VRA)i:i+7) aop 1 EXTZ((VRA)i:i+15) bop 1 EXTZ((VRB)i:i+7) bop 1 EXTZ((VRB)i:i+15) VRTi:i+7 1 Chop( aop +int ¬bop +int 1, 8 ) VRTi:i+16 1 Chop( aop +int ¬bop +int 1, 16 ) For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Unsigned-integer byte element i in VRB is sub- Unsigned-integer halfword element i in VRB is tracted from unsigned-integer byte element i in subtracted from unsigned-integer halfword ele- VRA. The low-order 8 bits of the result are placed ment i in VRA. The low-order 16 bits of the result into byte element i of VRT. are placed into halfword element i of VRT. Special Registers Altered: Special Registers Altered: None None Vector Subtract Unsigned Word Modulo VX-form vsubuwm VRT,VRA,VRB 4 VRT VRA VRB 1152 0 6 11 16 21 31 do i=0 to 127 by 32 aop 1 EXTZ((VRA)i:i+31) bop 1 EXTZ((VRB)i:i+31) VRTi:i+31 1 Chop( aop +int ¬bop +int 1, 32 ) For each vector element i from 0 to 3, do the following. Unsigned-integer word element i in VRB is sub- tracted from unsigned-integer word element i in VRA. The low-order 32 bits of the result are placed into word element i of VRT. Special Registers Altered: None 224 Power ISATM I Version 2.05 Vector Subtract Unsigned Byte Saturate Vector Subtract Unsigned Halfword VX-form Saturate VX-form vsububs VRT,VRA,VRB vsubuhs VRT,VRA,VRB 4 VRT VRA VRB 1536 4 VRT VRA VRB 1600 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 8 do i=0 to 127 by 16 aop 1 EXTZ((VRA)i:i+7) aop 1 EXTZ((VRA)i:i+15) bop 1 EXTZ((VRB)i:i+7) bop 1 EXTZ((VRB)i:i+15) VRTi:i+7 1 Clamp(aop +int ¬bop +int 1, 0, 255)24:31 VRTi:i+15 1 Clamp(aop +int ¬bop +int 1,0,216-1)16:31 For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Unsigned-integer byte element i in VRB is sub- Unsigned-integer halfword element i in VRB is tracted from unsigned-integer byte element i in subtracted from unsigned-integer halfword ele- VRA. If the intermediate result is less than 0 the ment i in VRA. If the intermediate result is less result saturates to 0. The low-order 8 bits of the than 0 the result saturates to 0. The low-order 16 result are placed into byte element i of VRT. bits of the result are placed into halfword element i of VRT. Special Registers Altered: SAT Special Registers Altered: SAT Vector Subtract Unsigned Word Saturate VX-form vsubuws VRT,VRA,VRB 4 VRT VRA VRB 1664 0 6 11 16 21 31 do i=0 to 127 by 32 aop 1 EXTZ((VRA)i:i+31) bop 1 EXTZ((VRB)i:i+31) VRTi:i+31 1 Clamp(aop +int ¬bop +int 1, 0, 232-1) For each vector element i from 0 to 7, do the following. Unsigned-integer word element i in VRB is sub- tracted from unsigned-integer word element i in VRA. - If the intermediate result is less than 0 the result saturates to 0. The low-order 32 bits of the result are placed into word element i of VRT. Special Registers Altered: SAT Chapter 6. Vector Processor [Category: Vector] 225 Version 2.05 6.9.1.3 Vector Integer Multiply Instructions Vector Multiply Even Signed Byte Vector Multiply Even Signed Halfword VX-form VX-form vmulesb VRT,VRA,VRB vmulesh VRT,VRA,VRB 4 VRT VRA VRB 776 4 VRT VRA VRB 840 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 16 do i=0 to 127 by 32 prod 1 EXTS((VRA)i:i+7) ×si EXTS((VRB)i:i+7) prod 1 EXTS((VRA)i:i+15) ×si EXTS((VRB)i:i+15) VRTi:i+15 1 Chop( prod, 16 ) VRTi:i+31 1 Chop( prod, 32 ) For each vector element i from 0 to 7, do the following. For each vector element i from 0 to 3, do the following. Signed-integer byte element i×2 in VRA is multi- Signed-integer halfword element i×2 in VRA is plied by signed-integer byte element i×2 in VRB. multiplied by signed-integer halfword element i×2 The low-order 16 bits of the product are placed into in VRB. The low-order 32 bits of the product are halfword element i VRT. placed into halfword element i VRT. Special Registers Altered: Special Registers Altered: None None Vector Multiply Even Unsigned Byte Vector Multiply Even Unsigned Halfword VX-form VX-form vmuleub VRT,VRA,VRB vmuleuh VRT,VRA,VRB 4 VRT VRA VRB 520 4 VRT VRA VRB 584 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 16 do i=0 to 127 by 32 prod 1 EXTZ((VRA)i:i+7) ×ui EXTZ((VRB)i:i+7) prod 1 EXTZ((VRA)i:i+15) ×ui EXTZ((VRB)i:i+15) VRTi:i+15 1 Chop(prod, 16) VRTi:i+31 1 Chop(prod, 32) For each vector element i from 0 to 7, do the following. For each vector element i from 0 to 3, do the following. Unsigned-integer byte element i×2 in VRA is multi- Unsigned-integer halfword element i×2 in VRA is plied by unsigned-integer byte element i×2 in VRB. multiplied by unsigned-integer halfword element The low-order 16 bits of the product are placed into i×2 in VRB. The low-order 32 bits of the product halfword element i VRT. are placed into halfword element i VRT. Special Registers Altered: Special Registers Altered: None None 226 Power ISATM I Version 2.05 Vector Multiply Odd Signed Byte VX-form Vector Multiply Odd Signed Halfword VX-form vmulosb VRT,VRA,VRB vmulosh VRT,VRA,VRB 4 VRT VRA VRB 264 0 6 11 16 21 31 4 VRT VRA VRB 328 0 6 11 16 21 31 do i=0 to 127 by 16 prod 1 EXTS((VRA)i+8:i+15) ×si EXTS((VRB)i+8:i+15) do i=0 to 127 by 32 VRTi:i+15 1 Chop( prod, 16 ) prod 1 EXTS((VRA)i+16:i+31) ×si EXTS((VRB)i+16:i+31) VRTi:i+31 1 Chop( prod, 32 ) For each vector element i from 0 to 7, do the following. For each vector element i from 0 to 3, do the following. Signed-integer byte element i×2+1 in VRA is multi- plied by signed-integer byte element i×2+1 in VRB. Signed-integer halfword element i×2+1 in VRA is The low-order 16 bits of the product are placed into multiplied by signed-integer halfword element halfword element i VRT. i×2+1 in VRB. The low-order 32 bits of the product are placed into halfword element i VRT. Special Registers Altered: None Special Registers Altered: None Vector Multiply Odd Unsigned Byte Vector Multiply Odd Unsigned Halfword VX-form VX-form vmuloub VRT,VRA,VRB vmulouh VRT,VRA,VRB 4 VRT VRA VRB 8 4 VRT VRA VRB 72 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 16 do i=0 to 127 by 32 prod 1 EXTZ((VRA)i+8:i+15) ×ui EXTZ((VRB)i+8:i+15) prod 1 EXTZ((VRA)i+16:i+31)×ui EXTZ((VRB)i+16:i+31) VRTi:i+15 1 Chop( prod, 16 ) VRTi:i+31 1 Chop( prod, 32 ) For each vector element i from 0 to 7, do the following. For each vector element i from 0 to 3, do the following. Unsigned-integer byte element i×2+1 in VRA is Unsigned-integer halfword element i×2+1 in VRA multiplied by unsigned-integer byte element i×2+1 is multiplied by unsigned-integer halfword element in VRB. The low-order 16 bits of the product are i×2+1 in VRB. The low-order 32 bits of the product placed into halfword element i VRT. are placed into halfword element i VRT. Special Registers Altered: Special Registers Altered: None None Chapter 6. Vector Processor [Category: Vector] 227 Version 2.05 6.9.1.4 Vector Integer Multiply-Add/Sum Instructions Vector Multiply-High-Add Signed Vector Multiply-High-Round-Add Signed Halfword Saturate VA-form Halfword Saturate VA-form vmhaddshs VRT,VRA,VRB,VRC vmhraddshs VRT,VRA,VRB,VRC 4 VRT VRA VRB VRC 32 4 VRT VRA VRB VRC 33 0 6 11 16 21 26 31 0 6 11 16 21 26 31 do i=0 to 127 by 16 do i=0 to 127 by 16 prod 1 EXTS((VRA)i:i+15) ×si EXTS((VRB)i:i+15) prod 1 EXTS((VRA)i:i+15) ×si EXTS((VRB)i:i+15) sum 1 (prod >>si 15) +int EXTS((VRC)i:i+15 sum 1 ((prod +int 0x0000_4000) >>si 15) VRTi:i+15 1 Clamp(sum, -215, 215-1)16:31 +int EXTS((VRC)i:i+15) VRTi:i+15 1 Clamp(sum, -215, 215-1)16:31 For each vector element i from 0 to 7, do the following. For each vector element i from 0 to 7, do the following. Signed-integer halfword element i in VRA is multi- plied by signed-integer halfword element i in VRB, Signed-integer halfword element i in VRA is multi- producing a 32-bit signed-integer product. Bits plied by signed-integer halfword element i in VRB, 0:16 of the product are added to signed-integer producing a 32-bit signed-integer product. The halfword element i in VRC. value 0x0000_4000 is added to the product, pro- - If the intermediate result is greater than ducing a 32-bit signed-integer sum. Bits 0:16 of the 215-1 the result saturates to 215-1. sum are added to signed-integer halfword element - If the intermediate result is less than -215 i in VRC. the result saturates to -215. - If the intermediate result is greater than 215-1 the result saturates to 215-1. The low-order 16 bits of the result are placed into - If the intermediate result is less than -215 halfword element i of VRT. the result saturates to -215. Special Registers Altered: The low-order 16 bits of the result are placed into SAT halfword element i of VRT. Special Registers Altered: SAT 228 Power ISATM I Version 2.05 Vector Multiply-Low-Add Unsigned Vector Multiply-Sum Unsigned Byte Halfword Modulo VA-form Modulo VA-form vmladduhm VRT,VRA,VRB,VRC vmsumubm VRT,VRA,VRB,VRC 4 VRT VRA VRB VRC 34 4 VRT VRA VRB VRC 36 0 6 11 16 21 26 31 0 6 11 16 21 26 31 do i=0 to 127 by 16 do i=0 to 127 by 32 prod 1 EXTZ((VRA)i:i+15) ×ui EXTZ((VRB)i:i+15) temp 1 EXTZ((VRC)i:i+31) sum 1 Chop( prod, 16 ) +int (VRC)i:i+15 do j=0 to 31 by 8 VRTi:i+15 1 Chop( sum, 16 ) prod 1 EXTZ((VRA)i+j:i+j+7) ×ui EXTZ((VRB)i+j:i+j+7) For each vector element i from 0 to 3, do the following. temp 1 temp +int prod Unsigned-integer halfword element i in VRA is VRTi:i+31 1 Chop( temp, 32 ) multiplied by unsigned-integer halfword element i For each word element in VRT the following operations in VRB, producing a 32-bit unsigned-integer prod- are performed, in the order shown. uct. The low-order 16 bits of the product are added to unsigned-integer halfword element i in VRC. - Each of the four unsigned-integer byte ele- ments contained in the corresponding word The low-order 16 bits of the sum are placed into element of VRA is multiplied by the corre- halfword element i of VRT. sponding unsigned-integer byte element in Special Registers Altered: VRB, producing an unsigned-integer halfword None product. - The sum of these four unsigned-integer half- Programming Note word products is added to the unsigned-inte- vmladduhm can be used for unsigned or ger word element in VRC. signed-integers. - The unsigned-integer word result is placed into the corresponding word element of VRT. Special Registers Altered: None Chapter 6. Vector Processor [Category: Vector] 229 Version 2.05 Vector Multiply-Sum Mixed Byte Modulo Vector Multiply-Sum Signed Halfword VA-form Modulo VA-form vmsummbm VRT,VRA,VRB,VRC vmsumshm VRT,VRA,VRB,VRC 4 VRT VRA VRB VRC 37 4 VRT VRA VRB VRC 40 0 6 11 16 21 26 31 0 6 11 16 21 26 31 do i=0 to 127 by 32 do i=0 to 127 by 32 temp 1 (VRC)i:i+31 temp 1 (VRC)i:i+31 do j=0 to 31 by 8 do j=0 to 31 by 16 prod0:15 1 (VRA)i+j:i+j+7 ×sui (VRB)i+j:i+j+7 prod0:31 1 (VRA)i+j:i+j+15 ×si (VRB)i+j:i+j+15 temp 1 temp +int EXTS(prod) temp 1 temp +int prod VRTi:i+31 1 temp VRTi:i+31 1 temp For each word element in VRT the following operations For each word element in VRT the following operations are performed, in the order shown. are performed, in the order shown. - Each of the four signed-integer byte elements - Each of the two signed-integer halfword ele- contained in the corresponding word element ments contained in the corresponding word of VRA is multiplied by the corresponding element of VRA is multiplied by the corre- unsigned-integer byte element in VRB, pro- sponding signed-integer halfword element in ducing a signed-integer product. VRB, producing a signed-integer product. - The sum of these four signed-integer halfword - The sum of these two signed-integer word products is added to the signed-integer word products is added to the signed-integer word element in VRC. element in VRC. - The signed-integer result is placed into the - The signed-integer word result is placed into corresponding word element of VRT. the corresponding word element of VRT. Special Registers Altered: Special Registers Altered: None None 230 Power ISATM I Version 2.05 Vector Multiply-Sum Signed Halfword Vector Multiply-Sum Unsigned Halfword Saturate VA-form Modulo VA-form vmsumshs VRT,VRA,VRB,VRC vmsumuhm VRT,VRA,VRB,VRC 4 VRT VRA VRB VRC 41 4 VRT VRA VRB VRC 38 0 6 11 16 21 26 31 0 6 11 16 21 26 31 do i=0 to 127 by 32 do i=0 to 127 by 32 temp 1 EXTS((VRC)i:i+31) temp 1 EXTZ((VRC)i:i+31) do j=0 to 31 by 16 do j=0 to 31 by 16 prod 1 EXTS((VRA)i+j:i+j+15) prod 1 EXTZ((VRA)i+j:i+j+15) ×si EXTS((VRB)i+j:i+j+15) ×ui EXTZ((VRB)i+j:i+j+15) temp 1 temp +int prod temp 1 temp +int prod VRTi:i+31 1 Clamp(temp, -231, 231-1) VRTi:i+31 1 Chop( temp, 32 ) For each word element in VRT the following operations For each word element in VRT the following operations are performed, in the order shown. are performed, in the order shown. - Each of the two signed-integer halfword ele- - Each of the two unsigned-integer halfword ments contained in the corresponding word elements contained in the corresponding word element of VRA is multiplied by the corre- element of VRA is multiplied by the corre- sponding signed-integer halfword element in sponding unsigned-integer halfword element VRB, producing a signed-integer product. in VRB, producing an unsigned-integer word product. - The sum of these two signed-integer word products is added to the signed-integer word - The sum of these two unsigned-integer word element in VRC. products is added to the unsigned-integer word element in VRC. - If the intermediate result is greater than 231-1 the result saturates to 231-1 and if it is less - The unsigned-integer result is placed into the than -231 it saturates to -231. corresponding word element of VRT. - The result is placed into the corresponding Special Registers Altered: word element of VRT. None Special Registers Altered: SAT Chapter 6. Vector Processor [Category: Vector] 231 Version 2.05 Vector Multiply-Sum Unsigned Halfword Saturate VA-form vmsumuhs VRT,VRA,VRB,VRC 4 VRT VRA VRB VRC 39 0 6 11 16 21 26 31 do i=0 to 127 by 32 temp 1 EXTZ((VRC)i:i+31) do j=0 to 31 by 16 prod 1 EXTZ((VRA)i+j:i+j+15) ×ui EXTZ((VRB)i+j:i+j+15) temp 1 temp +int prod VRTi:i+31 1 Clamp(temp, 0, 232-1) For each word element in VRT the following operations are performed, in the order shown. - Each of the two unsigned-integer halfword elements contained in the corresponding word element of VRA is multiplied by the corre- sponding unsigned-integer halfword element in VRB, producing an unsigned-integer prod- uct. - The sum of these two unsigned-integer word products is added to the unsigned-integer word element in VRC. - If the intermediate result is greater than 232-1 the result saturates to 232-1. - The result is placed into the corresponding word element of VRT. Special Registers Altered: SAT 232 Power ISATM I Version 2.05 6.9.1.5 Vector Integer Sum-Across Instructions Vector Sum across Signed Word Saturate Vector Sum across Half Signed Word VX-form Saturate VX-form vsumsws VRT,VRA,VRB vsum2sws VRT,VRA,VRB 4 VRT VRA VRB 1928 4 VRT VRA VRB 1672 0 6 11 16 21 31 0 6 11 16 21 31 temp 1 EXTS((VRB)96:127) do i=0 to 127 by 64 do i=0 to 127 by 32 temp 1 EXTS((VRB)i+32:i+63) temp 1 temp +int EXTS((VRA)i:i+31) do j=0 to 63 by 32 VRT0:31 1 0x0000_0000 temp 1 temp +int EXTS((VRA)i+j:i+j+31) VRT32:63 1 0x0000_0000 VRTi:i+63 1 0x0000_0000 || Clamp(temp, -231, 231-1) VRT64:95 1 0x0000_0000 VRT96:127 1 Clamp(temp, -231, 231-1) Word elements 0 and 2 of VRT are set to 0. The sum of the four signed-integer word elements in The sum of the signed-integer word elements 0 and 1 VRA is added to signed-integer word element 3 of VRB. in VRA is added to the signed-integer word element in bits 32:63 of VRB. - If the intermediate result is greater than 231-1 the result saturates to 231-1. - If the intermediate result is greater than 231-1 the result saturates to 231-1. - If the intermediate result is less than -231 the result saturates to -231. - If the intermediate result is less than -231 the result saturates to -231. The low-end 32 bits of the result are placed into word element 3 of VRT. The low-order 32 bits of the result are placed into word element 1 of VRT. Word elements 0 to 2 of VRT are set to 0. The sum of signed-integer word elements 2 and 3 in Special Registers Altered: VRA is added to the signed-integer word element in SAT bits 96:127 of VRB. - If the intermediate result is greater than 231-1 the result saturates to 231-1. - If the intermediate result is less than -231 the result saturates to -231. The low-order 32 bits of the result are placed into word element 3 of VRT. Special Registers Altered: SAT Chapter 6. Vector Processor [Category: Vector] 233 Version 2.05 Vector Sum across Quarter Signed Byte Vector Sum across Quarter Signed Saturate VX-form Halfword Saturate VX-form vsum4sbs VRT,VRA,VRB vsum4shs VRT,VRA,VRB 4 VRT VRA VRB 1800 4 VRT VRA VRB 1608 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 32 do i=0 to 127 by 32 temp 1 EXTS((VRB)i:i+31) temp 1 EXTS((VRB)i:i+31) do j=0 to 31 by 8 do j=0 to 31 by 16 temp 1 temp +int EXTS((VRA)i+j:i+j+7) temp 1 temp +int EXTS((VRA)i+j:i+j+15) VRTi:i+31 1 Clamp(temp, -231, 231-1) VRTi:i+31 1 Clamp(temp, -231, 231-1) For each vector element i from 0 to 3, do the following. For each vector element i from 0 to 3, do the following. The sum of the four signed-integer byte elements The sum of the two signed-integer halfword ele- contained in word element i of VRA is added to ments contained in word element i of VRA is signed-integer word element i in VRB. added to signed-integer word element i in VRB. - If the intermediate result is greater than - If the intermediate result is greater than 231-1 the result saturates to 231-1. 231-1 the result saturates to 231-1. - If the intermediate result is less than -231 - If the intermediate result is less than -231 the result saturates to -231. the result saturates to -231. The low-order 32 bits of the result are placed into The low-order 32 bits of the result are placed into word element i of VRT. the corresponding word element of VRT. Special Registers Altered: Special Registers Altered: SAT SAT Vector Sum across Quarter Unsigned Byte Saturate VX-form vsum4ubs VRT,VRA,VRB 4 VRT VRA VRB 1544 0 6 11 16 21 31 do i=0 to 127 by 32 temp 1 EXTZ((VRB)i:i+31) do j=0 to 31 by 8 temp 1 temp +int EXTZ((VRA)i+j:i+j+7) VRTi:i+31 1 Clamp( temp, 0, 232-1 ) For each vector element i from 0 to 3, do the following. The sum of the four unsigned-integer byte ele- ments contained in word element i of VRA is added to unsigned-integer word element i in VRB. - If the intermediate result is greater than 232-1 it saturates to 232-1. The low-order 32 bits of the result are placed into word element i of VRT. Special Registers Altered: SAT 234 Power ISATM I Version 2.05 6.9.1.6 Vector Integer Average Instructions Vector Average Signed Byte VX-form Vector Average Signed Halfword VX-form vavgsb VRT,VRA,VRB vavgsh VRT,VRA,VRB 4 VRT VRA VRB 1282 4 VRT VRA VRB 1346 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 8 do i=0 to 127 by 16 aop 1 EXTS((VRA)i:i+7) aop 1 EXTS((VRA)i:i+15) bop 1 EXTS((VRB)i:i+7) bop 1 EXTS((VRB)i:i+15) VRTi:i+7 1 Chop(( aop +int bop +int 1 ) >> 1, 8) VRTi:i+15 1 Chop(( aop +int bop +int 1 ) >> 1, 16) For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Signed-integer byte element i in VRA is added to Signed-integer halfword element i in VRA is added signed-integer byte element i in VRB. The sum is to signed-integer halfword element i in VRB. The incremented by 1 and then shifted right 1 bit. sum is incremented by 1 and then shifted right 1 bit. The low-order 8 bits of the result are placed into byte element i of VRT. The low-order 16 bits of the result are placed into halfword element i of VRT. Special Registers Altered: None Special Registers Altered: None Vector Average Signed Word VX-form vavgsw VRT,VRA,VRB 4 VRT VRA VRB 1410 0 6 11 16 21 31 do i=0 to 127 by 32 aop 1 EXTS((VRA)i:i+31) bop 1 EXTS((VRB)i:i+31) VRTi:i+31 1 Chop(( aop +int bop +int 1 ) >> 1, 32) For each vector element i from 0 to 3, do the following. Signed-integer word element i in VRA is added to signed-integer word element i in VRB. The sum is incremented by 1 and then shifted right 1 bit. The low-order 32 bits of the result are placed into word element i of VRT. Special Registers Altered: None Chapter 6. Vector Processor [Category: Vector] 235 Version 2.05 Vector Average Unsigned Byte VX-form Vector Average Unsigned Halfword VX-form vavgub VRT,VRA,VRB vavguh VRT,VRA,VRB 4 VRT VRA VRB 1026 0 6 11 16 21 31 4 VRT VRA VRB 1090 0 6 11 16 21 31 do i=0 to 127 by 8 aop 1 EXTZ((VRA)i:i+7) do i=0 to 127 by 16 bop 1 EXTZ((VRB)i:i+7 aop 1 EXTZ((VRA)i:i+15) VRTi:i+7 1 Chop((aop +int bop +int 1) >>ui 1, 8) bop 1 EXTZ((VRB)i:i+15) VRTi:i+15 1 Chop((aop +int bop +int 1) >>ui 1, 16) For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Unsigned-integer byte element i in VRA is added to unsigned-integer byte element i in VRB. The Unsigned-integer halfword element i in VRA is sum is incremented by 1 and then shifted right 1 added to unsigned-integer halfword element i in bit. VRB. The sum is incremented by 1 and then shifted right 1 bit. The low-order 8 bits of the result are placed into byte element i of VRT. The low-order 16 bits of the result are placed into halfword element i of VRT. Special Registers Altered: None Special Registers Altered: None Vector Average Unsigned Word VX-form vavguw VRT,VRA,VRB 4 VRT VRA VRB 1154 0 6 11 16 21 31 do i=0 to 127 by 32 aop 1 EXTZ((VRA)i:i+31) bop 1 EXTZ((VRB)i:i+31) VRTi:i+31 1 Chop((aop +int bop +int 1) >>ui 1, 32) For each vector element i from 0 to 3, do the following. Unsigned-integer word element i in VRA is added to unsigned-integer word element i in VRB. The sum is incremented by 1 and then shifted right 1 bit. The low-order 32 bits of the result are placed into word element i of VRT. Special Registers Altered: None 236 Power ISATM I Version 2.05 6.9.1.7 Vector Integer Maximum and Minimum Instructions Vector Maximum Signed Byte VX-form Vector Maximum Signed Halfword VX-form vmaxsb VRT,VRA,VRB vmaxsh VRT,VRA,VRB 4 VRT VRA VRB 258 0 6 11 16 21 31 4 VRT VRA VRB 322 0 6 11 16 21 31 do i=0 to 127 by 8 aop 1 EXTS((VRA)i:i+7) do i=0 to 127 by 16 bop 1 EXTS((VRB)i:i+7) aop 1 EXTS((VRA)i:i+15) VRTi:i+7 1 ( aop >si bop ) bop 1 EXTS((VRB)i:i+15 ? (VRA)i:i+7 : (VRB)i:i+7 VRTi:i+15 1 ( aop >si bop ) ? (VRA)i:i+15 : (VRB)i:i+15 For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Signed-integer byte element i in VRA is compared to signed-integer byte element i in VRB. The larger Signed-integer halfword element i in VRA is com- of the two values is placed into byte element i of pared to signed-integer halfword element i in VRB. VRT. The larger of the two values is placed into halfword element i of VRT. Special Registers Altered: None Special Registers Altered: None Vector Maximum Signed Word VX-form vmaxsw VRT,VRA,VRB 4 VRT VRA VRB 386 0 6 11 16 21 31 do i=0 to 127 by 32 aop 1 EXTS((VRA)i:i+31) bop 1 EXTS((VRB)i:i+31) VRTi:i+31 1 ( aop >si bop ) ? (VRA)i:i+31 : (VRB)i:i+31 For each vector element i from 0 to 3, do the following. Signed-integer word element i in VRA is compared to signed-integer word element i in VRB. The larger of the two values is placed into word ele- ment i of VRT. Special Registers Altered: None Chapter 6. Vector Processor [Category: Vector] 237 Version 2.05 Vector Maximum Unsigned Byte VX-form Vector Maximum Unsigned Halfword VX-form vmaxub VRT,VRA,VRB vmaxuh VRT,VRA,VRB 4 VRT VRA VRB 2 0 6 11 16 21 31 4 VRT VRA VRB 66 0 6 11 16 21 31 do i=0 to 127 by 8 aop 1 EXTZ((VRA)i:i+7) do i=0 to 127 by 16 bop 1 EXTZ((VRB)i:i+7) aop 1 EXTZ((VRA)i:i+15) VRTi:i+7 1 (aop >ui bop) ? (VRA)i:i+7 : (VRB)i:i+7 bop 1 EXTZ((VRB)i:i+15) VRTi:i+15 1 (aop >ui bop) For each vector element i from 0 to 15, do the following. ? (VRA)i:i+15 : (VRB)i:i+15 Unsigned-integer byte element i in VRA is com- For each vector element i from 0 to 7, do the following. pared to unsigned-integer byte element i in VRB. The larger of the two values is placed into byte ele- Unsigned-integer halfword element i in VRA is ment i of VRT. compared to unsigned-integer halfword element i in VRB. The larger of the two values is placed into Special Registers Altered: halfword element i of VRT. None Special Registers Altered: None Vector Maximum Unsigned Word VX-form vmaxuw VRT,VRA,VRB 4 VRT VRA VRB 130 0 6 11 16 21 31 do i=0 to 127 by 32 aop 1 EXTZ((VRA)i:i+31) bop 1 EXTZ((VRB)i:i+31) VRTi:i+31 1 (aop >ui bop) ? (VRA)i:i+31 : (VRB)i:i+31 For each vector element i from 0 to 3, do the following. Unsigned-integer word element i in VRA is com- pared to unsigned-integer word element i in VRB. The larger of the two values is placed into word element i of VRT. Special Registers Altered: None 238 Power ISATM I Version 2.05 Vector Minimum Signed Byte VX-form Vector Minimum Signed Halfword VX-form vminsb VRT,VRA,VRB vminsh VRT,VRA,VRB 4 VRT VRA VRB 770 0 6 11 16 21 31 4 VRT VRA VRB 834 0 6 11 16 21 31 do i=0 to 127 by 8 aop 1 EXTS((VRA)i:i+7) do i=0 to 127 by 16 bop 1 EXTS((VRB)i:i+7) aop 1 EXTS((VRA)i:i+15) VRTi:i+7 1 (aop si (VRB)i:i+7) ? 81 : 80 if Rc=1 then do if Rc=1 then do t 1 (VRT=1281) t 1 (VRT=1281) f 1 (VRT=1280) f 1 (VRT=1280) CR6 1 t || 0b0 || f || 0b0 CR6 1 t || 0b0 || f || 0b0 For each vector element i from 0 to 3, do the following. For each vector element i from 0 to 15, do the following. Unsigned-integer word element i in VRA is com- Signed-integer byte element i in VRA is compared pared to unsigned-integer word element i in VRB. to signed-integer byte element i in VRB. Byte ele- Word element i in VRT is set to all 1s if ment i in VRT is set to all 1s if signed-integer byte unsigned-integer word element i in VRA is equal to element i in VRA is greater than to signed-integer unsigned-integer word element i in VRB, and is set byte element i in VRB, and is set to all 0s other- to all 0s otherwise. wise. Special Registers Altered: Special Registers Altered: CR6 (if Rc=1) CR6 (if Rc=1) Vector Compare Greater Than Signed Vector Compare Greater Than Signed Halfword VC-form Word VC-form vcmpgtsh VRT,VRA,VRB (Rc=0) vcmpgtsw VRT,VRA,VRB (Rc=0) vcmpgtsh. VRT,VRA,VRB (Rc=1) vcmpgtsw. VRT,VRA,VRB (Rc=1) 4 VRT VRA VRB Rc 838 4 VRT VRA VRB Rc 902 0 6 11 16 21 22 31 0 6 11 16 21 22 31 do i=0 to 127 by 16 do i=0 to 127 by 32 VRTi:i+15 1 ((VRA)i:i+15 >si (VRB)i:i+15) ? 161 : 160 VRTi:i+31 1 ((VRA)i:i+31 >si (VRB)i:i+31) ? 321 : 320 if Rc=1 then do if Rc=1 then do t 1 (VRT=1281) t 1 (VRT=1281) f 1 (VRT=1280) f 1 (VRT=1280) CR6 1 t || 0b0 || f || 0b0 CR6 1 t || 0b0 || f || 0b0 For each vector element i from 0 to 7, do the following. For each vector element i from 0 to 3, do the following. Signed-integer halfword element i in VRA is com- Signed-integer word element i in VRA is compared pared to signed-integer halfword element i in VRB. to signed-integer word element i in VRB. Word ele- Halfword element i in VRT is set to all 1s if ment i in VRT is set to all 1s if signed-integer word signed-integer halfword element i in VRA is greater element i in VRA is greater than signed-integer than signed-integer halfword element i in VRB, and word element i in VRB, and is set to all 0s other- is set to all 0s otherwise. wise. Special Registers Altered: Special Registers Altered: CR6 (if Rc=1) CR6 (if Rc=1) 242 Power ISATM I Version 2.05 Vector Compare Greater Than Unsigned Vector Compare Greater Than Unsigned Byte VC-form Halfword VC-form vcmpgtub VRT,VRA,VRB (Rc=0) vcmpgtuh VRT,VRA,VRB (Rc=0) vcmpgtub. VRT,VRA,VRB (Rc=1) vcmpgtuh. VRT,VRA,VRB (Rc=1) 4 VRT VRA VRB Rc 518 4 VRT VRA VRB Rc 582 0 6 11 16 21 22 31 0 6 11 16 21 22 31 do i=0 to 127 by 8 do i=0 to 127 by 16 VRTi:i+7 1 ((VRA)i:i+7 >ui (VRB)i:i+7) ? 81 : 80 VRTi:i+15 1 ((VRA)i:i+15 >ui (VRB)i:i+15) ? 161 : 160 if Rc=1 then do if Rc=1 then do t 1 (VRT=1281) t 1 (VRT=1281) f 1 (VRT=1280) f 1 (VRT=1280) CR6 1 t || 0b0 || f || 0b0 CR6 1 t || 0b0 || f || 0b0 For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Unsigned-integer byte element i in VRA is com- Unsigned-integer halfword element i in VRA is pared to unsigned-integer byte element i in VRB. compared to unsigned-integer halfword element i Byte element i in VRT is set to all 1s if in VRB. Halfword element i in VRT is set to all 1s if unsigned-integer byte element i in VRA is greater unsigned-integer halfword element i in VRA is than to unsigned-integer byte element i in VRB, greater than to unsigned-integer halfword element and is set to all 0s otherwise. i in VRB, and is set to all 0s otherwise. Special Registers Altered: Special Registers Altered: CR6 (if Rc=1) CR6 (if Rc=1) Vector Compare Greater Than Unsigned Word VC-form vcmpgtuw VRT,VRA,VRB (Rc=0) vcmpgtuw. VRT,VRA,VRB (Rc=1) 4 VRT VRA VRB Rc 646 0 6 11 16 21 22 31 do i=0 to 127 by 32 VRTi:i+31 1 ((VRA)i:i+31 >ui (VRB)i:i+31) ? 321 : 320 if Rc=1 then do t 1 (VRT=1281) f 1 (VRT=1280) CR6 1 t || 0b0 || f || 0b0 For each vector element i from 0 to 3, do the following. Unsigned-integer word element i in VRA is com- pared to unsigned-integer word element i in VRB. Word element i in VRT is set to all 1s if unsigned-integer word element i in VRA is greater than to unsigned-integer word element i in VRB, and is set to all 0s otherwise. Special Registers Altered: CR6 (if Rc=1) Chapter 6. Vector Processor [Category: Vector] 243 Version 2.05 6.9.3 Vector Logical Instructions Extended mnemonics for vector logi- Vector Logical AND with Complement cal operations VX-form Extended mnemonics are provided that use the Vector vandc VRT,VRA,VRB OR and Vector NOR instructions to copy the contents of one Vector Register to another, with and without 4 VRT VRA VRB 1092 complementing. These are shown as examples with 0 6 11 16 21 31 the two instructions. Vector Move Register VRT 1 (VRA) & ¬(VRB) Several vector instructions can be coded in a way The contents of VRA are ANDed with the complement such that they simply copy the contents of one of the contents of VRB and the result is placed into Vector Register to another. An extended mne- VRT. monic is provided to convey the idea that no com- Special Registers Altered: putation is being performed but merely data None movement (from one register to another). The following instruction copies the contents of Vector Logical NOR VX-form register Vy to register Vx. vnor VRT,VRA,VRB vmr Vx,Vy (equivalent to: vor Vx,Vy,Vy) 4 VRT VRA VRB 1284 Vector Complement Register 0 6 11 16 21 31 The Vector NOR instruction can be coded in a way such that it complements the contents of one Vec- VRT 1 ¬( (VRA) | (VRB) ) tor Register and places the result into another Vec- The contents of VRA are ORed with the contents of tor Register. An extended mnemonic is provided VRB and the complemented result is placed into VRT. that allows this operation to be coded easily. Special Registers Altered: The following instruction complements the con- None tents of register Vy and places the result into regis- ter Vx. Vector Logical OR VX-form vnot Vx,Vy (equivalent to: vnor Vx,Vy,Vy) vor VRT,VRA,VRB Vector Logical AND VX-form 4 VRT VRA VRB 1156 vand VRT,VRA,VRB 0 6 11 16 21 31 4 VRT VRA VRB 1028 VRT 1 (VRA) | (VRB) 0 6 11 16 21 31 The contents of VRA are ORed with the contents of VRT 1 (VRA) & (VRB) VRB and the result is placed into VRT. The contents of VRA are ANDed with the contents of Special Registers Altered: VRB and the result is placed into VRT. None Special Registers Altered: Vector Logical XOR VX-form None vxor VRT,VRA,VRB 4 VRT VRA VRB 1220 0 6 11 16 21 31 VRT 1 (VRA) (VRB) The contents of VRA are XORed with the contents of VRB and the result is placed into VRT. Special Registers Altered: None 244 Power ISATM I Version 2.05 6.9.4 Vector Integer Rotate and Shift Instructions Vector Rotate Left Byte VX-form Vector Rotate Left Halfword VX-form vrlb VRT,VRA,VRB vrlh VRT,VRA,VRB 4 VRT VRA VRB 4 4 VRT VRA VRB 68 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 8 do i=0 to 127 by 16 sh 1 (VRB)i+5:i+7 sh 1 (VRB)i+12:i+15 VRTi:i+7 1 (VRA)i:i+7 <<< sh VRTi:i+15 1 (VRA)i:i+15 <<< sh For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Byte element i in VRA is rotated left by the number Halfword element i in VRA is rotated left by the of bits specified in the low-order 3 bits of the corre- number of bits specified in the low-order 4 bits of sponding byte element i in VRB. the corresponding halfword element i in VRB. The result is placed into byte element i in VRT. The result is placed into halfword element i in VRT. Special Registers Altered: Special Registers Altered: None None Vector Rotate Left Word VX-form vrlw VRT,VRA,VRB 4 VRT VRA VRB 132 0 6 11 16 21 31 do i=0 to 127 by 32 sh 1 (VRB)i+27:i+31 VRTi:i+31 1 (VRA)i:i+31 <<< sh For each vector element i from 0 to 3, do the following. Word element i in VRA is rotated left by the num- ber of bits specified in the low-order 5 bits of the corresponding word element i in VRB. The result is placed into word element i in VRT. Special Registers Altered: None Chapter 6. Vector Processor [Category: Vector] 245 Version 2.05 Vector Shift Left Byte VX-form Vector Shift Left Halfword VX-form vslb VRT,VRA,VRB vslh VRT,VRA,VRB 4 VRT VRA VRB 260 4 VRT VRA VRB 324 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 8 do i=0 to 127 by 16 sh 1 (VRB)i+5:i+7 sh 1 (VRB)i+12:i+15 VRTi:i+7 1 (VRA)i:i+7 << sh VRTi:i+15 1 (VRA)i:i+15 << sh For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Byte element i in VRA is shifted left by the number Halfword element i in VRA is shifted left by the of bits specified in the low-order 3 bits of byte ele- number of bits specified in the low-order 4 bits of ment i in VRB. halfword element i in VRB. - Bits shifted out of bit 0 are lost. - Bits shifted out of bit 0 are lost. - Zeros are supplied to the vacated bits on - Zeros are supplied to the vacated bits on the right. the right. The result is placed into byte element i of VRT. The result is placed into halfword element i of VRT. Special Registers Altered: Special Registers Altered: None None Vector Shift Left Word VX-form vslw VRT,VRA,VRB 4 VRT VRA VRB 388 0 6 11 16 21 31 do i=0 to 127 by 32 sh 1 (VRB)i+27:i+31 VRTi:i+31 1 (VRA)i:i+31 << sh For each vector element i from 0 to 3, do the following. Word element i in VRA is shifted left by the number of bits specified in the low-order 5 bits of word ele- ment i in VRB. - Bits shifted out of bit 0 are lost. - Zeros are supplied to the vacated bits on the right. The result is placed into word element i of VRT. Special Registers Altered: None 246 Power ISATM I Version 2.05 Vector Shift Right Byte VX-form Vector Shift Right Halfword VX-form vsrb VRT,VRA,VRB vsrh VRT,VRA,VRB 4 VRT VRA VRB 516 4 VRT VRA VRB 580 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 8 do i=0 to 127 by 16 sh 1 (VRB)i+5:i+7 sh 1 (VRB)i+12:i+15 VRTi:i+7 1 (VRA)i:i+7 >>ui sh VRTi:i+15 1 (VRA)i:i+15 >>ui sh For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Byte element i in VRA is shifted right by the num- Halfword element i in VRA is shifted right by the ber of bits specified in the low-order 3 bits of byte number of bits specified in the low-order 4 bits of element i in VRB. Bits shifted out of the least-sig- halfword element i in VRB. Bits shifted out of the nificant bit are lost. Zeros are supplied to the least-significant bit are lost. Zeros are supplied to vacated bits on the left. The result is placed into the vacated bits on the left. The result is placed byte element i of VRT. into halfword element i of VRT. Special Registers Altered: Special Registers Altered: None None Vector Shift Right Word VX-form vsrw VRT,VRA,VRB 4 VRT VRA VRB 644 0 6 11 16 21 31 do i=0 to 127 by 32 sh 1 (VRB)i+27:i+31 VRTi:i+31 1 (VRA)i:i+31 >>ui sh For each vector element i from 0 to 3, do the following. Word element i in VRA is shifted right by the num- ber of bits specified in the low-order 5 bits of word element i in VRB. Bits shifted out of the least-sig- nificant bit are lost. Zeros are supplied to the vacated bits on the left. The result is placed into word element i of VRT. Special Registers Altered: None Chapter 6. Vector Processor [Category: Vector] 247 Version 2.05 Vector Shift Right Algebraic Byte Vector Shift Right Algebraic Halfword VX-form VX-form vsrab VRT,VRA,VRB vsrah VRT,VRA,VRB 4 VRT VRA VRB 772 4 VRT VRA VRB 836 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 8 do i=0 to 127 by 16 sh 1 (VRB)i+5:i+7 sh 1 (VRB)i+12:i+15 VRTi:i+7 1 (VRA)i:i+7 >>si sh VRTi:i+15 1 (VRA)i:i+15 >>si sh For each vector element i from 0 to 15, do the following. For each vector element i from 0 to 7, do the following. Byte element i in VRA is shifted right by the num- Halfword element i in VRA is shifted right by the ber of bits specified in the low-order 3 bits of the number of bits specified in the low-order 4 bits of corresponding byte element i in VRB. Bits shifted the corresponding halfword element i in VRB. Bits out of bit 7 of the byte element are lost. Bit 0 of the shifted out of bit 15 of the halfword are lost. Bit 0 of byte element is replicated to fill the vacated bits on the halfword is replicated to fill the vacated bits on the left. The result is placed into byte element i of the left. The result is placed into halfword element i VRT. of VRT. Special Registers Altered: Special Registers Altered: None None Vector Shift Right Algebraic Word VX-form vsraw VRT,VRA,VRB 4 VRT VRA VRB 900 0 6 11 16 21 31 do i=0 to 127 by 32 sh 1 (VRB)i+27:i+31 VRTi:i+31 1 (VRA)i:i+31 >>si sh For each vector element i from 0 to 3, do the following. Word element i in VRA is shifted right by the num- ber of bits specified in the low-order 5 bits of the corresponding word element i in VRB. Bits shifted out of bit 31 of the word are lost. Bit 0 of the word is replicated to fill the vacated bits on the left. The result is placed into word element i of VRT. Special Registers Altered: None 248 Power ISATM I Version 2.05 6.10 Vector Floating-Point Instruction Set 6.10.1 Vector Floating-Point Arithmetic Instructions Vector Add Single-Precision VX-form Vector Subtract Single-Precision VX-form vaddfp VRT,VRA,VRB vsubfp VRT,VRA,VRB 4 VRT VRA VRB 10 4 VRT VRA VRB 74 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 32 do i=0 to 127 by 32 VRTi:i+31 1 VRTi:i+31 1 RoundToNearSP((VRA)i:i+31 +fp (VRB)i:i+31) RoundToNearSP((VRA)i:i+31 -fp (VRB)i:i+31) For each vector element i from 0 to 3, do the following. For each vector element i from 0 to 3, do the following. Single-precision floating-point element i in VRA is Single-precision floating-point element i in VRB is added to single-precision floating-point element i in subtracted from single-precision floating-point ele- VRB. The intermediate result is rounded to the ment i in VRA. The intermediate result is rounded nearest single-precision floating-point number and to the nearest single-precision floating-point num- placed into word element i of VRT. ber and placed into word element i of VRT. Special Registers Altered: Special Registers Altered: None None Chapter 6. Vector Processor [Category: Vector] 249 Version 2.05 Vector Multiply-Add Single-Precision Vector Negative Multiply-Subtract VA-form Single-Precision VA-form vmaddfp VRT,VRA,VRC,VRB vnmsubfp VRT,VRA,VRC,VRB 4 VRT VRA VRB VRC 46 4 VRT VRA VRB VRC 47 0 6 11 16 21 26 31 0 6 11 16 21 26 31 do i=0 to 127 by 32 do i=0 to 127 by 32 prod 1 (VRA)i:i+31 ×fp (VRC)i:i+31 prod0:inf21 (VRA)i:i+31 ×fp (VRC)i:i+31 VRTi:i+3121 RoundToNearSP( prod +fp (VRB)i:i+31 ) VRTi:i+31 1 -RoundToNearSP(prod0:inf -fp (VRB)i:i+31) For each vector element i from 0 to 3, do the following. For each vector element i from 0 to 3, do the following. Single-precision floating-point element i in VRA is multiplied by single-precision floating-point ele- Single-precision floating-point element i in VRA is ment i in VRC. Single-precision floating-point ele- multiplied by single-precision floating-point ele- ment i in VRB is added to the infinitely-precise ment i in VRC. Single-precision floating-point ele- product. The intermediate result is rounded to the ment i in VRB is subtracted from the nearest single-precision floating-point number and infinitely-precise product. The intermediate result placed into word element i of VRT. is rounded to the nearest single-precision float- ing-point number, then negated and placed into Special Registers Altered: word element i of VRT. None Special Registers Altered: Programming Note None To use a multiply-add to perform an IEEE or Java compliant multiply, the addend must be -0.0. This is necessary to insure that the sign of a zero result will be correct when the product is -0.0 (+0.0 + -0.0 +0.0, and -0.0 + -0.0 -0.0). When the sign of a resulting 0.0 is not important, then +0.0 can be used as an addend which may, in some cases, avoid the need for a second register to hold a -0.0 in addition to the integer 0/floating-point +0.0 that may already be available. 250 Power ISATM I Version 2.05 6.10.2 Vector Floating-Point Maximum and Minimum Instructions Vector Maximum Single-Precision Vector Minimum Single-Precision VX-form VX-form vmaxfp VRT,VRA,VRB vminfp VRT,VRA,VRB 4 VRT VRA VRB 1034 4 VRT VRA VRB 1098 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 32 do i=0 to 127 by 32 VRTi:i+31 1 ( (VRA)i:i+31 >fp (VRB)i:i+31 ) VRTi:i+31 1 ( (VRA)i:i+31 fp (VRB)i:i+31) ? 321 : 320 if Rc=1 then do if Rc=1 then do t 1 ( VRT=1281 ) t 1 ( VRT=1281 ) f 1 ( VRT=1280 ) f 1 ( VRT=1280 ) CR6 1 t || 0b0 || f || 0b0 CR6 1 t || 0b0 || f || 0b0 For each vector element i from 0 to 3, do the following. For each vector element i from 0 to 3, do the following. Single-precision floating-point element i in VRA is Single-precision floating-point element i in VRA is compared to single-precision floating-point ele- compared to single-precision floating-point ele- ment i in VRB. Word element i in VRT is set to all ment i in VRB. Word element i in VRT is set to all 1s if single-precision floating-point element i in 1s if single-precision floating-point element i in VRA is greater than or equal to single-precision VRA is greater than single-precision floating-point floating-point element i in VRB, and is set to all 0s element i in VRB, and is set to all 0s otherwise. otherwise. If the source element i in VRA or the source ele- If the source element i in VRA or the source ele- ment i in VRB is a NaN, VRT is set to all 0s, indi- ment i in VRB is a NaN, VRT is set to all 0s, indi- cating "not greater than". If the source element i in cating "not greater than or equal to". If the source VRA and the source element i in VRB are both element i in VRA and the source element i in VRB infinity with the same sign, VRT is set to all 0s, are both infinity with the same sign, VRT is set to indicating "not greater than". all 1s, indicating "greater than or equal to". Special Registers Altered: Special Registers Altered: CR6 (if Rc=1) CR6 (if Rc=1) 256 Power ISATM I Version 2.05 6.10.5 Vector Floating-Point Estimate Instructions Vector 2 Raised to the Exponent Estimate Vector Log Base 2 Estimate Floating-Point VX-form Floating-Point VX-form vexptefp VRT,VRB vlogefp VRT,VRB 4 VRT /// VRB 394 4 VRT /// VRB 458 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 32 do i=0 to 127 by 32 VRTi:i+31 1 Power2EstimateSP( (VRB)i:i+31 ) VRTi:i+31 1 LogBase2EstimateSP((VRB)i:i+31) For each vector element i from 0 to 3, do the following. For each vector element i from 0 to 3, do the following. The single-precision floating-point estimate of 2 The single-precision floating-point estimate of the raised to the power of single-precision float- base 2 logarithm of single-precision floating-point ing-point element i in VRB is placed into word ele- element i in VRB is placed into the corresponding ment i of VRT. word element of VRT. Let x be any single-precision floating-point input value. Let x be any single-precision floating-point input value. Unless x< -146 or the single-precision floating-point Unless | x-1 | is less than or equal to 0.125 or the sin- result of computing 2 raised to the power x would be a gle-precision floating-point result of computing the base zero, an infinity, or a QNaN, the estimate has a relative 2 logarithm of x would be an infinity or a QNaN, the error in precision no greater than one part in 16. The estimate has an absolute error in precision (absolute most significant 12 bits of the estimate's significand are value of the difference between the estimate and the monotonic. An integral input value returns an integral infinitely precise value) no greater than 2-5. Under the value when the result is representable. same conditions, the estimate has a relative error in precision no greater than one part in 8. The result for various special cases of the source value is given below. The most significant 12 bits of the estimate's signifi- cand are monotonic. The estimate is exact if x=2y, Value Result where y is an integer between -149 and +127 inclusive. - Infinity +0 Otherwise the value placed into the element of register -0 +1 VRT may vary between implementations, and between +0 +1 different executions on the same implementation. +Infinity +Infinity NaN QNaN The result for various special cases of the source value is given below. Special Registers Altered: None Value Result - Infinity QNaN <0 QNaN -0 - Infinity +0 - Infinity +Infinity +Infinity NaN QNaN Special Registers Altered: None Chapter 6. Vector Processor [Category: Vector] 257 Version 2.05 Vector Reciprocal Estimate Vector Reciprocal Square Root Estimate Single-Precision VX-form Single-Precision VX-form vrefp VRT,VRB vrsqrtefp VRT,VRB 4 VRT /// VRB 266 4 VRT /// VRB 330 0 6 11 16 21 31 0 6 11 16 21 31 do i=0 to 127 by 32 do i=0 to 127 by 32 VRTi:i+31 1 ReciprocalEstimateSP( (VRB)i:i+31 ) VRTi:i+31 1 ReciprocalSquareRootEstimateSP( (VRB)i:i+31 ) For each vector element i from 0 to 3, do the following. For each vector element i from 0 to 3, do the following. The single-precision floating-point estimate of the reciprocal of single-precision floating-point ele- The single-precision floating-point estimate of the ment i in VRB is placed into word element i of reciprocal of the square root of single-precision VRT. floating-point element i in VRB is placed into word element i of VRT. Unless the single-precision floating-point result of com- puting the reciprocal of a value would be a zero, an Let x be any single-precision floating-point value. infinity, or a QNaN, the estimate has a relative error in Unless the single-precision floating-point result of com- precision no greater than one part in 4096. puting the reciprocal of the square root of x would be a zero, an infinity, or a QNaN, the estimate has a relative Note that results may vary between implementations, error in precision no greater than one part in 4096. and between different executions on the same imple- mentation. Note that results may vary between implementations, and between different executions on the same imple- The result for various special cases of the source value mentation. is given below. The result for various special cases of the source value Value Result is given below. - Infinity -0 -0 - Infinity Value Result +0 + Infinity - Infinity QNaN +Infinity +0 <0 QNaN NaN QNaN -0 - Infinity +0 + Infinity Special Registers Altered: +Infinity +0 None NaN QNaN Special Registers Altered: None 258 Power ISATM I Version 2.05 6.11 Vector Status and Control Register Instructions Move To Vector Status and Control Move From Vector Status and Control Register VX-form Register VX-form mtvscr VRB mfvscr VRT 4 /// VRB 1604 4 VRT /// 1540 0 6 16 21 31 0 6 11 21 31 VSCR 1 (VRB)96:127 VRT 1 960 || (VSCR) The contents of word element 3 of VRB are placed into The contents of the VSCR are placed into word ele- the VSCR. ment 3 of VRT. Special Registers Altered: The remaining word elements in VRT are set to 0. None Special Registers Altered: None Chapter 6. Vector Processor [Category: Vector] 259 Version 2.05 260 Power ISATM I Version 2.05 Chapter 7. Signal Processing Engine (SPE) [Category: Signal Processing Engine] 7.1 Overview. . . . . . . . . . . . . . . . . . . . 261 7.3.5.2 Fractional Format . . . . . . . . . . 265 7.2 Nomenclature and Conventions . . 261 7.3.6 Computational Operations . . . . . 266 7.3 Programming Model . . . . . . . . . . . 262 7.3.7 SPE Instructions. . . . . . . . . . . . . 267 7.3.1 General Operation . . . . . . . . . . . 262 7.3.8 Saturation, Shift, and Bit Reverse 7.3.2 GPR Registers. . . . . . . . . . . . . . 262 Models . . . . . . . . . . . . . . . . . . . . . . . . . 267 7.3.3 Accumulator Register . . . . . . . . 262 7.3.8.1 Saturation . . . . . . . . . . . . . . . . 267 7.3.4 Signal Processing Embedded Float- 7.3.8.2 Shift Left . . . . . . . . . . . . . . . . . 267 ing-Point Status and Control Register 7.3.8.3 Bit Reverse . . . . . . . . . . . . . . . 267 (SPEFSCR). . . . . . . . . . . . . . . . . . . . . 262 7.3.9 SPE Instruction Set . . . . . . . . . . 268 7.3.5 Data Formats . . . . . . . . . . . . . . . 265 7.3.5.1 Integer Format . . . . . . . . . . . . 265 7.1 Overview The RTL conventions in described below are used in addition to those described in Section 1.3:Additional The Signal Processing Engine (SPE) accelerates sig- RTL functions are described in Appendix C. nal processing applications normally suited to DSP Notation Meaning operation. This is accomplished using short vectors ×sf Signed fractional multiplication. Result of (two element) within 64-bit GPRs and using single multiplying 2 signed fractional quantities instruction multiple data (SIMD) operations to perform having bit length n taking the least signifi- the requisite computations. SPE also architects an cant 2n-1 bits of the sign extended product Accumulator register to allow for back to back opera- and concatenating a 0 to the least signifi- tions without loop unrolling. cant bit forming a signed fractional result of 2n bits. Two 16-bit signed fractional quantities, a and b are multiplied, as 7.2 Nomenclature and Conven- shown below: tions ea0:31 = EXTS(a) eb0:31 = EXTS(b) Several conventions regarding nomenclature are used prod0:63 = ea X eb for SPE: eprod0:63 = EXTS(prod32:63) 1 The Signal Processing Engine category is abbrevi- result0:31 = eprod33:63 || 0b0 ated as SPE. ×gsf Guarded signed fractional multiplication. 1 Bits 0 to 31 of a 64-bit register are referenced as Result of multiplying 2 signed fractional upper word, even word or high word element of the quantities having bit length 16 taking the register. Bits 32:63 are referred to as lower word, least significant 31 bits of the sign odd word or low word element of the register. Each extended product and concatenating a 0 half is an element of a 64-bit GPR. to the least significant bit forming a 1 Bits 0 to 15 and bits 32 to 47 are referenced as guarded signed fractional result of 64 bits. even halfwords. Bits 16 to 31 and bits 48 to 63 are Since guarded signed fractional multiplica- referenced as odd halfwords. tion produces a 64-bit result, fractional 1 Mnemonics for SPE instructions generally begin input quantities of -1 and -1 can produce with the letters `ev' (embedded vector). +1 in the intermediate product. Two 16-bit fractional quantities, a and b are multi- plied, as shown below: Chapter 7. Signal Processing Engine (SPE) 261 Version 2.05 ea0:31 = EXTS(a) Unless otherwise specified, SPE instructions write all eb0:31 = EXTS(b) 64-bits of the destination register. prod0:63 = ea X eb eprod0:63 = EXTS(prod32:63) GPR Upper Word GPR Lower Word result0:63 = eprod1:63 || 0b0 0 32 63 << Logical shift left. x << y shifts value x left by y bits, leaving zeros in the vacated bits. Figure 105.GPR >> Logical shift right. x >> y shifts value x right by y bits, leaving zeros in the vacated 7.3.3 Accumulator Register bits. A partially visible accumulator register (ACC) is pro- vided for some SPE instructions. The accumulator is a 7.3 Programming Model 64-bit register that holds the results of the Multiply Accumulate (MAC) forms of SPE Fixed-Point instruc- tions. The accumulator allows the back-to-back execu- 7.3.1 General Operation tion of dependent MAC instructions, something that is SPE instructions generally take elements from one found in the inner loops of DSP code such as FIR and source register and operate on them with the corre- FFT filters. The accumulator is partially visible to the sponding elements of a second source register (and/or programmer in the sense that its results do not have to the accumulator) to produce results. Results are placed be explicitly read to use them. Instead they are always in the destination register and/or the accumulator. copied into a 64-bit destination GPR which is specified Instructions that are vector in nature (i.e. produce as part of the instruction. Based upon the type of results of more than one element) provide results for instruction, the accumulator can hold either a single each element that are independent of the computation 64-bit value or a vector of two 32-bit elements. of the other elements. These instructions can also be used to perform scalar DSP operations by ignoring the ACC Upper Word ACC Lower Word results of the upper 32-bit half of the register file. 0 32 63 There are no record forms of SPE instructions. As a Figure 106.Accumulator result, the meaning of bits in the CR is different than for other categories. SPE Compare instructions specify a CR field, two source registers, and the type of com- 7.3.4 Signal Processing Embed- pare: greater than, less than, or equal. Two bits of the ded Floating-Point Status and Con- CR field are written with the result of the vector com- pare, one for each element. The remaining two bits trol Register (SPEFSCR) reflect the ANDing and ORing of the vector compare Status and control for SPE uses the SPEFSCR regis- results. ter. This register is also used by the SPE.Embedded Float Scalar Double, SPE.Embedded Float Scalar Sin- gle, and SPE.Embedded Float Vector categories. Sta- 7.3.2 GPR Registers tus and control bits are shared with these categories. The SPE requires a GPR register file with thirty-two The SPEFSCR register is implemented as special pur- 64-bit registers. For 32-bit implementations, instruc- pose register (SPR) number 512 and is read and writ- tions that normally operate on a 32-bit register file ten by the mfspr and mtspr instructions. SPE access and change only the least significant 32-bits of instructions affect both the high element (bits 32:33) the GPRs leaving the most significant 32-bits and low element status flags (bits 48:49) of the SPEF- unchanged. For 64-bit implementations, operation of SCR. these instructions is unchanged, i.e. those instructions continue to operate on the 64-bit registers as they SPEFSCR would if the SPE was not implemented. Most SPE 32 63 instructions view the 64-bit register as being composed of a vector of two elements, each of which is 32 bits Figure 107. Signal Processing and Embedded wide (some instructions read or write 16-bit elements). Floating-Point Status and Control Register The most significant 32-bits are called the upper word, The SPEFSCR bits are defined as shown below. high word or even word. The least significant 32-bits are called the lower word, low word or odd word. Bit Description 32 Summary Integer Overflow High (SOVH) SOVH is set to 1 when an SPE instruction sets OVH. This is a sticky bit. 262 Power ISATM I Version 2.05 33 Integer Overflow High (OVH) Execution of an SPE.Embedded Float Scalar OVH is set to 1 to indicate that an overflow instruction leaves FDBZH undefined. has occurred in the upper element during exe- 38 Embedded Floating-Point Underflow High cution of an SPE instruction. The bit is set to 1 (FUNFH) [Category: SP.FV] if a result of an operation performed by the The FUNFH bit is set to 1 when the execution instruction cannot be represented in the num- of an SPE.Embedded Float Vector instruction ber of bits into which the result is to be placed, results in an underflow on the high word oper- and is set to 0 otherwise. The OVH bit is not ation. altered by Modulo instructions, nor by other instructions that cannot overflow. Execution of an SPE.Embedded Float Scalar instruction leaves FUNFH undefined. 34 Embedded Floating-Point Guard Bit High (FGH) [Category: SP.FV] 39 Embedded Floating-Point Overflow High FGH is supplied for use by the Embedded (FOVFH) [Category: SP.FV] Floating-Point Round interrupt handler. FGH The FOVFH bit is set to 1 when the execution is an extension of the low-order bits of the of an SPE.Embedded Float Vector instruction fractional result produced from an results in an overflow on the high word opera- SPE.Embedded Float Vector instruction on tion. the high word. FGH is zeroed if an overflow, Execution of an SPE.Embedded Float Scalar underflow, or invalid input error is detected on instruction leaves FOVFH undefined. the high element of an SPE.Embedded Float Vector instruction. 40:41 Reserved Execution of an SPE.Embedded Float Scalar 42 Embedded Floating-Point Inexact Sticky instruction leaves FGH undefined. Flag (FINXS) [Categories: SP.FV, SP.FD, SP.FS] 35 Embedded Floating-Point Inexact Bit High The FINXS bit is set to 1 whenever the execu- (FXH) [Category: SP.FV] tion of an Embedded Floating-Point instruction FXH is supplied for use by the Embedded delivers an inexact result for either the low or Floating-Point Round interrupt handler. FXH is high element and no Embedded Float- an extension of the low-order bits of the frac- ing-Point Data interrupt is taken for either ele- tional result produced from an SPE.Embed- ment, or if an Embedded Floating-Point ded Float Vector instruction on the high word. instruction results in overflow (FOVF=1 or FXH represents the logical `or' of all the bits FOVFH=1), but Embedded Floating-Point shifted right from the Guard bit when the frac- Overflow exceptions are disabled (FOVFE=0), tional result is normalized. FXH is zeroed if an or if an Embedded Floating-Point instruction overflow, underflow, or invalid input error is results in underflow (FUNF=1 or FUNFH=1), detected on the high element of an but Embedded Floating-Point Underflow SPE.Embedded Float Vector instruction. exceptions are disabled (FUNFE=0), and no Execution of an SPE.Embedded Float Scalar Embedded Floating-Point Data interrupt instruction leaves FXH undefined. occurs. This is a sticky bit. 36 Embedded Floating-Point Invalid Opera- 43 Embedded Floating-Point Invalid Opera- tion/Input Error High (FINVH) [Category: tion/Input Sticky Flag (FINVS) [Categories: SP.FV] SP.FV, SP.FD, SP.FS] The FINVH bit is set to 1 if any high word The FINVS bit is defined to be the sticky result operand of an SPE.Embedded Float Vector of any Embedded Floating-Point instruction instruction is infinity, NaN, or a denormalized that causes FINVH or FINV to be set to 1. value, or if the instruction is a divide and the That is, FINVS 1 FINVS | FINV | FINVH. This dividend and divisor are both 0, or if a conver- is a sticky bit. sion to integer or fractional value overflows. 44 Embedded Floating-Point Divide By Zero Execution of an SPE.Embedded Float Scalar Sticky Flag (FDBZS) [Categories: SP.FV, instruction leaves FINVH undefined. SP.FD, SP.FS] The FDBZS bit is set to 1 when an Embedded 37 Embedded Floating-Point Divide By Zero Floating-Point Divide instruction sets FDBZH High (FDBZH) [Category: SP.FV] or FDBZ to 1. That is, FDBZS 1 FDBZS | The FDBZH bit is set to 1 when an FDBZ | FDBZH. This is a sticky bit. SPE.Embedded Vector Floating-Point Divide instruction is executed with a divisor of 0 in the 45 Embedded Floating-Point Underflow high word operand, and the dividend is a finite Sticky Flag (FUNFS) [Categories: SP.FV, nonzero number. SP.FD, SP.FS] The FUNFS bit is defined to be the sticky Chapter 7. Signal Processing Engine (SPE) 263 Version 2.05 result of any Embedded Floating-Point or if the operation is a divide and the dividend instruction that causes FUNFH or FUNF to be and divisor are both 0, or if a conversion to set to 1. That is, FUNFS 1 FUNFS | FUNF | integer or fractional value overflows. FUNFH. This is a sticky bit. 53 Embedded Floating-Point Divide By Zero 46 Embedded Floating-Point Overflow Sticky (Low/scalar) (FDBZ) [Categories: SP.FV, Flag (FOVFS) [Categories: SP.FV, SP.FD, SP.FD, SP.FS] SP.FS] The FDBZ bit is set to 1 when an Embedded The FOVFS bit is defined to be the sticky Floating-Point Divide instruction is executed result of any Embedded Floating-Point with a divisor of 0 in the low word operand, instruction that causes FOVH or FOVF to be and the dividend is a finite nonzero number. set to 1. That is, FOVFS 1 FOVFS | FOVF | 54 Embedded Floating-Point Underflow (Low/ FOVFH. This is a sticky bit. scalar) (FUNF) [Categories: SP.FV, SP.FD, 47 Reserved SP.FS] The FUNF bit is set to 1 when the execution of 48 Summary Integer Overflow (SOV) an Embedded Floating-Point instruction SOV is set to 1 when an SPE instruction sets results in an underflow on the low word opera- OV to 1. This is a sticky bit. tion. 49 Integer Overflow (OV) 55 Embedded Floating-Point Overflow (Low/ OV is set to 1 to indicate that an overflow has scalar) (FOVF) [Categories: SP.FV, SP.FD, occurred in the lower element during execu- SP.FS] tion of an SPE instruction. The bit is set to 1 if The FOVF bit is set to 1 when the execution of a result of an operation performed by the an Embedded Floating-Point instruction instruction cannot be represented in the num- results in an overflow on the low word opera- ber of bits into which the result is to be placed, tion. and is set to 0 otherwise. The OV bit is not altered by Modulo instructions, nor by other 56 Reserved instructions that cannot overflow. 57 Embedded Floating-Point Round (Inexact) 50 Embedded Floating-Point Guard Bit (Low/ Exception Enable (FINXE) [Categories: scalar) (FG) [Categories: SP.FV, SP.FD, SP.FV, SP.FD, SP.FS] SP.FS] 0 Exception disabled FG is supplied for use by the Embedded 1 Exception enabled Floating-Point Round interrupt handler. FG is an extension of the low-order bits of the frac- The Embedded Floating-Point Round interrupt tional result produced from an Embedded is taken if the exception is enabled and if FG | Floating-Point instruction on the low word. FG FGH | FX | FXH (signifying an inexact result) is zeroed if an overflow, underflow, or invalid is set to 1 as a result of an Embedded Float- input error is detected on the low element of ing-Point instruction. an Embedded Floating-Point instruction. If an Embedded Floating-Point instruction 51 Embedded Floating-Point Inexact Bit (Low/ results in overflow or underflow and the corre- scalar) (FX) [Categories: SP.FV, SP.FD, sponding Embedded Floating-Point Underflow SP.FS] or Embedded Floating-Point Overflow excep- FX is supplied for use by the Embedded Float- tion is disabled then the Embedded Float- ing-Point Round interrupt handler. FX is an ing-Point Round interrupt is taken. extension of the low-order bits of the fractional 58 Embedded Floating-Point Invalid Opera- result produced from an Embedded Float- tion/Input Error Exception Enable (FINVE) ing-Point instruction on the low word. FX rep- [Categories: SP.FV, SP.FD, SP.FS] resents the logical `or' of all the bits shifted right from the Guard bit when the fractional 0 Exception disabled result is normalized. FX is zeroed if an over- 1 Exception enabled flow, underflow, or invalid input error is If the exception is enabled, an Embedded detected on Embedded Floating-Point instruc- Floating-Point Data interrupt is taken if the tion FINV or FINVH bit is set to 1 by an Embedded 52 Embedded Floating-Point Invalid Opera- Floating-Point instruction. tion/Input Error (Low/scalar) (FINV) [Cate- 59 Embedded Floating-Point Divide By Zero gories: SP.FV, SP.FD, SP.FS] Exception Enable (FDBZE) [Categories: The FINV bit is set to 1 if any low word oper- SP.FV, SP.FD, SP.FS] and of an Embedded Floating-Point instruc- tion is infinity, NaN, or a denormalized value, 0 Exception disabled 264 Power ISATM I Version 2.05 1 Exception enabled produce values larger than 2n-1 or smaller than 0 may set OV or OVH in the SPEFSCR. If the exception is enabled, an Embedded Floating-Point Data interrupt is taken if the Signed integers consist of 16, 32, or 64-bit binary val- FDBZ or FDBZH bit is set to 1 by an Embed- ues in two's complement form. The largest represent- ded Floating-Point instruction. able value is 2n-1-1 where n represents the number of 60 Embedded Floating-Point Underflow bits in the value. The smallest representable value is Exception Enable (FUNFE) [Categories: -2n-1. Computations that produce values larger than SP.FV, SP.FD, SP.FS] 2n-1-1 or smaller than -2n-1 may set OV or OVH in the SPEFSCR. 0 Exception disabled 1 Exception enabled 7.3.5.2 Fractional Format If the exception is enabled, an Embedded Floating-Point Data interrupt is taken if the Fractional data format is conventionally used for DSP FUNF or FUNFH bit is set to 1 by an Embed- fractional arithmetic. Fractional data is useful for repre- ded Floating-Point instruction. senting data converted from analog devices. 61 Embedded Floating-Point Overflow Excep- Unsigned fractions consist of 16, 32, or 64-bit binary tion Enable (FOVFE) [Categories: SP.FV, fractional values that range from 0 to less than 1. SP.FD, SP.FS] Unsigned fractions place the radix point immediately to the left of the most significant bit. The most significant 0 Exception disabled bit of the value represents the value 2-1, the next most 1 Exception enabled significant bit represents the value 2-2 and so on. The If the exception is enabled, an Embedded largest representable value is 1-2-n where n represents Floating-Point Data interrupt is taken if the the number of bits in the value. The smallest represent- FOVF or FOVFH bit is set to 1 by an Embed- able value is 0. Computations that produce values ded Floating-Point instruction. larger than 1-2-n or smaller than 0 may set OV or OVH in the SPEFSCR. The SPE category does not define 62:63 Embedded Floating-Point Rounding Mode unsigned fractional forms of instructions to manipulate Control (FRMC) [Categories: SP.FV, SP.FD, unsigned fractional data since the unsigned integer SP.FS] forms of the instructions produce the same results as 00 Round to Nearest would the unsigned fractional forms. 01 Round toward Zero Guarded unsigned fractions are 64-bit binary fractional 10 Round toward +Infinity values. Guarded unsigned fractions place the decimal 11 Round toward -Infinity point immediately to the left of bit 32. The largest repre- sentable value is 232-2-32. The smallest representable Programming Note value is 0. Guarded unsigned fractional computations Rounding modes 0b10 (+Infinity) and are always modulo and do not set OV or OVH in the 0b11 (-Infinity) may not be supported by SPEFSCR. some implementations. If an implementa- tion does not support these, Embedded Signed fractions consist of 16, 32, or 64-bit binary frac- Floating-Point Round interrupts are gener- tional values in two's-complement form that range from ated for every Embedded Floating-Point -1 to less than 1. Signed fractions place the decimal instruction for which rounding is required point immediately to the right of the most significant bit. when +Infinity or -Infinity modes are set The largest representable value is 1-2-(n-1) where n and software is required to produce the represents the number of bits in the value. The smallest correctly rounded result representable value is -1. Computations that produce values larger than 1-2-(n-1)or smaller than -1 may set OV or OVH in the SPEFSCR. Multiplication of two 7.3.5 Data Formats signed fractional values causes the result to be shifted left one bit to remove the resultant redundant sign bit in The SPE provides two different data formats, integer the product. In this case, a 0 bit is concatenated as the and fractional. Both data formats can be treated as least significant bit of the shifted result. signed or unsigned quantities. Guarded signed fractions are 64-bit binary fractional values. Guarded signed fractions place the decimal 7.3.5.1 Integer Format point immediately to the left of bit 33. The largest repre- sentable value is 232-2-31. The smallest representable Unsigned integers consist of 16, 32, or 64-bit binary value is -232-1+2-31. Guarded signed fractional compu- integer values. The largest representable value is 2n-1 tations are always modulo and do not set OV or OVH in where n represents the number of bits in the value. The the SPEFSCR. smallest representable value is 0. Computations that Chapter 7. Signal Processing Engine (SPE) 265 Version 2.05 7.3.6 Computational Operations 1 Multiply and Accumulate instructions. These instructions perform multiply operations, optionally The SPE category supports several different computa- add the result to the accumulator, and place the tional capabilities. Both modulo and saturation results result into the destination register and optionally can be performed. Modulo results produce truncation of into the accumulator. These instructions are com- the overflow bits in a calculation, therefore overflow posed of different multiply forms, data formats and does not occur and no saturation is performed. For data accumulate options. The mnemonics for instructions for which overflow occurs, saturation pro- these instructions indicate their various character- vides a maximum or minimum representable value (for istics. These are shown in Table 2. the data type) in the case of overflow. Instructions are 1 Load and Store instructions. These instructions provided for a wide range of computational capability. provide load and store capabilities for moving data The operation types can be divided into 4 basic catego- to and from memory. A variety of forms are pro- ries: vided that position data for efficient computation. 1 Compare and miscellaneous instructions. These 1 Simple Vector instructions. These instructions use instructions perform miscellaneous functions such the corresponding low and high word elements of as field manipulation, bit reversed incrementing, the operands to produce a vector result that is and vector compares. placed in the destination register, the accumulator, or both. Table 2: Mnemonic Extensions for Multiply Accumulate Instructions Extension Meaning Comments Multiply Form he halfword even 16 X 16 32 heg halfword even guarded 16 X 16 32, 64-bit final accumulate result ho halfword odd 16 X 16 32 hog halfword odd guarded 16 X 16 32, 64-bit final accumulate result w word 32 X 32 64 wh word high 32 X 32 32 (high-order 32 bits of product) wl word low 32 X 32 32 (low-order 32 bits of product) Data Format smf signed modulo fractional modulo, no saturation or overflow smi signed modulo integer modulo, no saturation or overflow ssf signed saturate fractional saturation on product and accumulate ssi signed saturate integer saturation on product and accumulate umi unsigned modulo integer modulo, no saturation or overflow usi unsigned saturate integer saturation on product and accumulate Accumulate Option a place in accumulator result accumulator aa add to accumulator accumulator + result accumulator aaw add to accumulator as word elements accumulator0:31 + result0:31 accumulator0:31 accumulator32:63 + result32:63 accumulator32:63 an add negated to accumulator accumulator - result accumulator anw add negated to accumulator as word accumulator0:31 - result0:31 accumulator0:31 elements accumulator32:63 - result32:63 accumulator32:63 266 Power ISATM I Version 2.05 7.3.7 SPE Instructions 7.3.8 Saturation, Shift, and Bit Reverse Models For saturation, left shifts, and bit reversal, the pseudo RTL is provided here to more accurately describe those functions that are referenced in the instruction pseudo RTL. 7.3.8.1 Saturation SATURATE(ov, carry, sat_ovn, sat_ov, val) if ov then if carry then return sat_ovn else return sat_ov else return val 7.3.8.2 Shift Left SL(value, cnt) if cnt > 31 then return 0 else return (value << cnt) 7.3.8.3 Bit Reverse BITREVERSE(value) result 1 0 mask 1 1 shift 1 31 cnt 1 32 while cnt > 0 then do t 1 value & mask if shift >= 0 then result 1 (t << shift) | result else result 1 (t >> -shift) | result cnt 1 cnt - 1 shift 1 shift - 2 mask 1 mask << 1 return result Chapter 7. Signal Processing Engine (SPE) 267 Version 2.05 7.3.9 SPE Instruction Set Bit Reversed Increment EVX-form Vector Absolute Value EVX-form brinc RT,RA,RB evabs RT,RA 4 RT RA RB 527 4 RT RA /// 520 0 6 11 16 21 31 0 6 11 16 21 31 n 1 implementation-dependent number of mask bits RT0:31 1 ABS((RA)0:31) mask 1 (RB)64-n:63 RT32:63 1 ABS((RA)32:63) a 1 (RA)64-n:63 d 1 BITREVERSE(1 + BITREVERSE(a | (¬ mask))) The absolute value of each element of RA is placed in RT 1 (RA)0:63-n || (d & mask) the corresponding elements of RT. An absolute value of 0x8000_0000 (most negative number) returns brinc computes a bit-reverse index based on the con- 0x8000_0000. tents of RA and a mask specified in RB. The new index is written to RT. Special Registers Altered: None The number of bits in the mask is implementa- tion-dependent but may not exceed 32. Special Registers Altered: None Vector Add Immediate Word EVX-form Programming Note evaddiw RT,RB,UI brinc provides a way for software to access FFT 4 RT UI RB 514 data in a bit-reversed manner. RA contains the 0 6 11 16 21 31 index into a buffer that contains data on which FFT is to be performed. RB contains a mask that allows RT0:31 1 (RB)0:31 + EXTZ(UI) the index to be updated with bit-reversed address- RT32:63 1 (RB)32:63 + EXTZ(UI) ing. Typically this instruction precedes a load with index instruction; for example, UI is zero-extended and added to both the high and low elements of RB and the results are placed in RT. Note brinc r2, r3, r4 that the same value is added to both elements of the lhax r8, r5, r2 register. RB contains a bit-mask that is based on the num- Special Registers Altered: ber of points in an FFT. To access a buffer contain- None ing n byte sized data that is to be accessed with bit-reversed addressing, the mask has log2n 1s in the least significant bit positions and 0s in the remaining most significant bit positions. If, how- Vector Add Signed, Modulo, Integer to ever, the data size is a multiple of a halfword or a Accumulator Word EVX-form word, the mask is constructed so that the 1s are shifted left by log2 (size of the data) and 0s are evaddsmiaaw RT,RA placed in the least significant bit positions. 4 RT RA /// 1225 0 6 11 16 21 31 Programming Note Architecture Note This instruction only modifies the lower 32 bits of RT0:31 1 (ACC)0:31 + (RA)0:31 the destination register in 32-bit implementations. RT32:63 1 (ACC)32:63 + (RA)32:63 For 64-bit implementations in 32-bit mode, the con- ACC0:63 1 (RT)0:63 tents of the upper 32-bits of the destination register are undefined. Each word element in RA is added to the correspond- ing element in the accumulator and the results are placed in RT and into the accumulator. Programming Note Special Registers Altered: Execution of brinc does not cause SPE Unavail- ACC able exceptions regardless of MSRSPV. 268 Power ISATM I Version 2.05 Vector Add Signed, Saturate, Integer to Vector Add Unsigned, Saturate, Integer to Accumulator Word EVX-form Accumulator Word EVX-form evaddssiaaw RT,RA evaddusiaaw RT,RA 4 RT RA /// 1217 4 RT RA /// 1216 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 1 EXTS((ACC)0:31) + EXTS((RA)0:31) temp0:63 1 EXTZ((ACC)0:31) + EXTZ((RA)0:31) ovh 1 temp31 temp32 ovh 1 temp31 RT0:31 1 SATURATE(ovh, temp31, 0x8000_0000, RT0:31 1 SATURATE(ovh, temp31, 0xFFFF_FFFF, 0x7FFF_FFFF, temp32:63) 0xFFFF_FFFF, temp32:63) temp0:63 1 EXTS((ACC)32:63) + EXTS((RA)32:63) temp0:63 1 EXTZ((ACC)32:63) + EXTZ((RA)32:63) ovl 1 temp31 temp32 ovl 1 temp31 RT32:63 1 SATURATE(ovl, temp31, 0x8000_0000, RT32:63 1 SATURATE(ovl, temp31, 0xFFFF_FFFF, 0x7FFF_FFFF, temp32:63) 0xFFFF_FFFF, temp32:63) ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 SPEFSCROVH 1 ovh SPEFSCROV 1 ovl SPEFSCROVH 1 ovh SPEFSCRSOVH 1 SPEFSCRSOVH | ovh SPEFSCROV 1 ovl SPEFSCRSOV 1 SPEFSCRSOV | ovl SPEFSCRSOVH 1 SPEFSCRSOVH | ovh SPEFSCRSOV 1 SPEFSCRSOV | ovl Each signed-integer word element in RA is sign-extended and added to the corresponding Each unsigned-integer word element in RA is sign-extended element in the accumulator saturating if zero-extended and added to the corresponding overflow occurs, and the results are placed in RT and zero-extended element in the accumulator saturating if the accumulator. overflow occurs, and the results are placed in RT and the accumulator. Special Registers Altered: ACC OV OVH SOV SOVH Special Registers Altered: ACC OV OVH SOV SOVH Vector Add Unsigned, Modulo, Integer to Vector Add Word EVX-form Accumulator Word EVX-form evaddw RT,RA,RB evaddumiaaw RT,RA 4 RT RA RB 512 4 RT RA /// 1224 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 (RA)0:31 + (RB)0:31 RT32:63 1 (RA)32:63 + (RB)32:63 RT0:31 1 (ACC)0:31 + (RA)0:31 The corresponding elements of RA and RB are added RT32:63 1 (ACC)32:63 + (RA)32:63 ACC0:63 1 (RT)0:63 and the results are placed in RT. The sum is a modulo sum. Each unsigned-integer word element in RA is added to the corresponding element in the accumulator and the Special Registers Altered: results are placed in RT and the accumulator. None Special Registers Altered: ACC Chapter 7. Signal Processing Engine (SPE) 269 Version 2.05 Vector AND EVX-form Vector AND with Complement EVX-form evand RT,RA,RB evandc RT,RA,RB 4 RT RA RB 529 4 RT RA RB 530 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 (RA)0:31 & (RB)0:31 RT0:31 1 (RA)0:31 & (¬(RB)0:31) RT32:63 1 (RA)32:63 & (RB)32:63 RT32:63 1 (RA)32:63 & (¬(RB)32:63) The corresponding elements of RA and RB are ANDed The word elements of RA are ANDed bitwise with the bitwise and the results are placed in the corresponding complement of the corresponding elements of RB. The element of RT. results are placed in the corresponding element of RT. Special Registers Altered: Special Registers Altered: None None Vector Compare Equal EVX-form Vector Compare Greater Than Signed EVX-form evcmpeq BF,RA,RB evcmpgts BF,RA,RB 4 BF // RA RB 564 0 6 9 11 16 21 31 4 BF // RA RB 561 0 6 9 11 16 21 31 ah 1 (RA)0:31 al 1 (RA)32:63 ah 1 (RA)0:31 bh 1 (RB)0:31 al 1 (RA)32:63 bl 1 (RB)32:63 bh 1 (RB)0:31 if (ah = bh) then ch 1 1 bl 1 (RB)32:63 else ch 1 0 if (ah > bh) then ch 1 1 if (al = bl) then cl 1 1 else ch 1 0 else cl 1 0 if (al > bl) then cl 1 1 CR4×BF+32:4×BF+35 1 ch || cl || (ch | cl) || (ch & cl) else cl 1 0 CR4×BF+32:4×BF+35 1 ch || cl || (ch | cl) || (ch & cl) The most significant bit in BF is set if the high-order ele- ment of RA is equal to the high-order element of RB; it The most significant bit in BF is set if the high-order ele- is cleared otherwise. The next bit in BF is set if the ment of RA is greater than the high-order element of low-order element of RA is equal to the low-order ele- RB; it is cleared otherwise. The next bit in BF is set if ment of RB and cleared otherwise. The last two bits of the low-order element of RA is greater than the BF are set to the OR and AND of the result of the com- low-order element of RB and cleared otherwise. The pare of the high and low elements. last two bits of BF are set to the OR and AND of the result of the compare of the high and low elements. Special Registers Altered: CR field BF Special Registers Altered: CR field BF 270 Power ISATM I Version 2.05 Vector Compare Greater Than Unsigned Vector Compare Less Than Signed EVX-form EVX-form evcmpgtu BF,RA,RB evcmplts BF,RA,RB 4 BF // RA RB 560 4 BF // RA RB 563 0 6 9 11 16 21 31 0 6 9 11 16 21 31 ah 1 (RA)0:31 ah 1 (RA)0:31 al 1 (RA)32:63 al 1 (RA)32:63 bh 1 (RB)0:31 bh 1 (RB)0:31 bl 1 (RB)32:63 bl 1 (RB)32:63 if (ah >u bh) then ch 1 1 if (ah < bh) then ch 1 1 else ch 1 0 else ch 1 0 if (al >u bl) then cl 1 1 if (al < bl) then cl 1 1 else cl 1 0 else cl 1 0 CR4×BF+32:4×BF+35 1 ch || cl || (ch | cl) || (ch & cl) CR4×BF+32:4×BF+35 1 ch || cl || (ch | cl) || (ch & cl) The most significant bit in BF is set if the high-order ele- The most significant bit in BF is set if the high-order ele- ment of RA is greater than the high-order element of ment of RA is less than the high-order element of RB; it RB; it is cleared otherwise. The next bit in BF is set if is cleared otherwise. The next bit in BF is set if the the low-order element of RA is greater than the low-order element of RA is less than the low-order ele- low-order element of RB and cleared otherwise. The ment of RB and cleared otherwise. The last two bits of last two bits of BF are set to the OR and AND of the BF are set to the OR and AND of the result of the com- result of the compare of the high and low elements. pare of the high and low elements. Special Registers Altered: Special Registers Altered: CR field BF CR field BF Vector Compare Less Than Unsigned EVX-form evcmpltu BF,RA,RB 4 BF // RA RB 562 0 6 9 11 16 21 31 ah 1 (RA)0:31 al 1 (RA)32:63 bh 1 (RB)0:31 bl 1 (RB)32:63 if (ah = 0) & (dvh = 0)) then RT32:63 1 n RT0:31 1 0x7FFFFFFF ovh 1 1 The leading sign bits in each element of RA are else if (ddh = 0x8000_0000)&(dvh = 0xFFFF_FFFF) counted, and the respective count is placed into each then element of RT. RT0:31 1 0x7FFFFFFF ovh 1 1 Special Registers Altered: if ((ddl < 0) & (dvl = 0)) then None RT32:63 1 0x8000_0000 ovl 1 1 Programming Note else if ((ddl >= 0) & (dvl = 0)) then evcntlzw is used for unsigned operands; evcntlsw RT32:63 1 0x7FFFFFFF ovl 1 1 is used for signed operands. else if (ddl = 0x8000_0000)&(dvl = 0xFFFF_FFFF) then RT32:63 1 0x7FFFFFFF ovl 1 1 SPEFSCROVH 1 ovh SPEFSCROV 1 ovl Vector Count Leading Zeros Word SPEFSCRSOVH 1 SPEFSCRSOVH | ovh EVX-form SPEFSCRSOV 1 SPEFSCRSOV | ovl The two dividends are the two elements of the contents evcntlzw RT,RA of RA. The two divisors are the two elements of the contents of RB. The resulting two 32-bit quotients on 4 RT RA /// 525 0 6 11 16 21 31 each element are placed into RT. The remainders are not supplied. The operands and quotients are inter- preted as signed integers. n 1 0 do while n < 32 Special Registers Altered: if (RA)n = 1 then leave OV OVH SOV SOVH n 1 n + 1 RT0:31 1 n Programming Note n 1 0 do while n < 32 Note that any overflow indication is always set as a if (RA)n+32 = 1 then leave side effect of this instruction. No form is defined n 1 n + 1 that disables the setting of the overflow bits. In RT32:63 1 n case of overflow, a saturated value is delivered into The leading zero bits in each element of RA are the destination register. counted, and the respective count is placed into each element of RT. Special Registers Altered: None 272 Power ISATM I Version 2.05 Vector Divide Word Unsigned EVX-form Vector Equivalent EVX-form evdivwu RT,RA,RB eveqv RT,RA,RB 4 RT RA RB 1223 4 RT RA RB 537 0 6 11 16 21 31 0 6 11 16 21 31 ddh 1 (RA)0:31 RT0:31 1 (RA)0:31 (RB)0:31 ddl 1(RA)32:63 RT32:63 1 (RA)32:63 (RB)32:63 dvh 1 (RB)0:31 dvl 1 (RB)32:63 The corresponding elements of RA and RB are XORed RT0:31 1 ddh ÷ dvh bitwise, and the complemented results are placed in RT32:63 1 ddl ÷ dvl RT. ovh 1 0 Special Registers Altered: ovl 1 0 if (dvh = 0) then None RT0:31 1 0xFFFFFFFF ovh 1 1 if (dvl = 0) then RT32:63 1 0xFFFFFFFF ovl 1 1 Vector Extend Sign Byte EVX-form SPEFSCROVH 1 ovh SPEFSCROV 1 ovl evextsb RT,RA SPEFSCRSOVH 1 SPEFSCRSOVH | ovh SPEFSCRSOV 1 SPEFSCRSOV | ovl 4 RT RA /// 522 The two dividends are the two elements of the contents 0 6 11 16 21 31 of RA. The two divisors are the two elements of the contents of RB. Two 32-bit quotients are formed as a RT0:31 1 EXTS((RA)24:31) result of the division on each of the high and low ele- RT32:63 1 EXTS((RA)56:63) ments and the quotients are placed into RT. Remain- ders are not supplied. Operands and quotients are The signs of the low-order byte in each of the elements interpreted as unsigned integers. in RA are extended, and the results are placed in RT. Special Registers Altered: Special Registers Altered: OV OVH SOV SOVH None Programming Note Note that any overflow indication is always set as a side effect of this instruction. No form is defined Vector Extend Sign Halfword EVX-form that disables the setting of the overflow bits. In case of overflow, a saturated value is delivered into evextsh RT,RA the destination register. 4 RT RA /// 523 0 6 11 16 21 31 RT0:31 1 EXTS((RA)16:31) RT32:63 1 EXTS((RA)48:63) The signs of the odd halfwords in each of the elements in RA are extended, and the results are placed in RT. Special Registers Altered: None Chapter 7. Signal Processing Engine (SPE) 273 Version 2.05 Vector Load Double Word into Double Vector Load Double Word into Double Word EVX-form Word Indexed EVX-form evldd RT,D(RA) evlddx RT,RA,RB 4 RT RA UI 769 4 RT RA RB 768 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 1 0 if (RA = 0) then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTZ(UI×8) EA 1 b + (RB) RT 1 MEM(EA, 8) RT 1 MEM(EA, 8) D in the instruction mnemonic is UI × 8. The double- The doubleword addressed by EA is loaded from mem- word addressed by EA is loaded from memory and ory and placed in RT. placed in RT. Special Registers Altered: Special Registers Altered: None None Vector Load Double into Four Halfwords Vector Load Double into Four Halfwords EVX-form Indexed EVX-form evldh RT,D(RA) evldhx RT,RA,RB 4 RT RA UI 773 4 RT RA RB 772 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 1 0 if (RA = 0) then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTZ(UI×8) EA 1 b + (RB) RT0:15 1 MEM(EA, 2) RT0:15 1 MEM(EA, 2) RT16:31 1 MEM(EA+2,2) RT16:31 1 MEM(EA+2,2) RT32:47 1 MEM(EA+4,2) RT32:47 1 MEM(EA+4,2) RT48:63 1 MEM(EA+6,2) RT48:63 1 MEM(EA+6,2) D in the instruction mnemonic is UI × 8. The double- The doubleword addressed by EA is loaded from mem- word addressed by EA is loaded from memory and ory and placed in RT. placed in RT. Special Registers Altered: Special Registers Altered: None None 274 Power ISATM I Version 2.05 Vector Load Double into Two Words Vector Load Double into Two Words EVX-form Indexed EVX-form evldw RT,D(RA) evldwx RT,RA,RB 4 RT RA UI 771 4 RT RA RB 770 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 1 0 if (RA = 0) then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTZ(UI×8) EA 1 b + (RB) RT0:31 1 MEM(EA, 4) RT0:31 1 MEM(EA, 4) RT32:63 1 MEM(EA+4, 4) RT32:63 1 MEM(EA+4, 4) D in the instruction mnemonic is UI × 8. The double- The doubleword addressed by EA is loaded from mem- word addressed by EA is loaded from memory and ory and placed in RT. placed in RT. Special Registers Altered: Special Registers Altered: None None Vector Load Halfword into Halfwords Vector Load Halfword into Halfwords Even and Splat EVX-form Even and Splat Indexed EVX-form evlhhesplat RT,D(RA) evlhhesplatx RT,RA,RB 4 RT RA UI 777 4 RT RA RB 776 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 1 0 if (RA = 0) then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTZ(UI×2) EA 1 b + (RB) RT0:15 1 MEM(EA,2) RT0:15 1 MEM(EA,2) RT16:31 1 0x0000 RT16:31 1 0x0000 RT32:47 1 MEM(EA,2) RT32:47 1 MEM(EA,2) RT48:63 1 0x0000 RT48:63 1 0x0000 D in the instruction mnemonic is UI × 2. The halfword The halfword addressed by EA is loaded from memory addressed by EA is loaded from memory and placed in and placed in the even halfwords of each element of the even halfwords of each element of RT. The odd RT. The odd halfwords of each element of RT are set to halfwords of each element of RT are set to 0. 0. Special Registers Altered: Special Registers Altered: None None Chapter 7. Signal Processing Engine (SPE) 275 Version 2.05 Vector Load Halfword into Halfword Odd Vector Load Halfword into Halfword Odd Signed and Splat EVX-form Signed and Splat Indexed EVX-form evlhhossplat RT,D(RA) evlhhossplatx RT,RA,RB 4 RT RA UI 783 4 RT RA RB 782 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 1 0 if (RA = 0) then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTZ(UI×2) EA 1 b + (RB) RT0:31 1 EXTS(MEM(EA,2)) RT0:31 1 EXTS(MEM(EA,2)) RT32:63 1 EXTS(MEM(EA,2)) RT32:63 1 EXTS(MEM(EA,2)) D in the instruction mnemonic is UI × 2. The halfword The halfword addressed by EA is loaded from memory addressed by EA is loaded from memory and placed in and placed in the odd halfwords sign extended in each the odd halfwords sign extended in each element of element of RT. RT. Special Registers Altered: Special Registers Altered: None None Vector Load Halfword into Halfword Odd Vector Load Halfword into Halfword Odd Unsigned and Splat EVX-form Unsigned and Splat Indexed EVX-form evlhhousplat RT,D(RA) evlhhousplatx RT,RA,RB 4 RT RA UI 781 4 RT RA RB 780 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 1 0 if (RA = 0) then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTZ(UI×2) EA 1 b + (RB) RT0:31 1 EXTZ(MEM(EA,2)) RT0:31 1 EXTZ(MEM(EA,2)) RT32:63 1 EXTZ(MEM(EA,2)) RT32:63 1 EXTZ(MEM(EA,2)) D in the instruction mnemonic is UI × 2. The halfword The halfword addressed by EA is loaded from memory addressed by EA is loaded from memory and placed in and placed in the odd halfwords zero-extended in each the odd halfwords zero-extended in each element of element of RT. RT. Special Registers Altered: Special Registers Altered: None None 276 Power ISATM I Version 2.05 Vector Load Word into Two Halfwords Vector Load Word into Two Halfwords Even EVX-form Even Indexed EVX-form evlwhe RT,D(RA) evlwhex RT,RA,RB 4 RT RA UI 785 4 RT RA RB 784 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 1 0 if (RA = 0) then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTZ(UI×4) EA 1 b + (RB) RT0:15 1 MEM(EA,2) RT0:15 1 MEM(EA,2) RT16:31 1 0x0000 RT16:31 1 0x0000 RT32:47 1 MEM(EA+2,2) RT32:47 1 MEM(EA+2,2) RT48:63 1 0x0000 RT48:63 1 0x0000 D in the instruction mnemonic is UI × 4. The word The word addressed by EA is loaded from memory and addressed by EA is loaded from memory and placed in placed in the even halfwords in each element of RT. the even halfwords of each element of RT. The odd The odd halfwords of each element of RT are set to 0. halfwords of each element of RT are set to 0. Special Registers Altered: Special Registers Altered: None None Vector Load Word into Two Halfwords Vector Load Word into Two Halfwords Odd Signed (with sign extension) Odd Signed Indexed (with sign extension) EVX-form EVX-form evlwhos RT,D(RA) evlwhosx RT,RA,RB 4 RT RA UI 791 4 RT RA RB 790 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 1 0 if (RA = 0) then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTZ(UI×4) EA 1 b + (RB) RT0:31 1 EXTS(MEM(EA,2)) RT0:31 1 EXTS(MEM(EA,2)) RT32:63 1 EXTS(MEM(EA+2,2)) RT32:63 1 EXTS(MEM(EA+2,2)) D in the instruction mnemonic is UI × 4. The word The word addressed by EA is loaded from memory and addressed by EA is loaded from memory and placed in placed in the odd halfwords sign extended in each ele- the odd halfwords sign extended in each element of ment of RT. RT. Special Registers Altered: Special Registers Altered: None None Chapter 7. Signal Processing Engine (SPE) 277 Version 2.05 Vector Load Word into Two Halfwords Vector Load Word into Two Halfwords Odd Unsigned (zero-extended) EVX-form Odd Unsigned Indexed (zero-extended) EVX-form evlwhou RT,D(RA) evlwhoux RT,RA,RB 4 RT RA UI 789 0 6 11 16 21 31 4 RT RA RB 788 0 6 11 16 21 31 if (RA = 0) then b 1 0 else b 1 (RA) if (RA = 0) then b 1 0 EA 1 b + EXTZ(UI×4) else b 1 (RA) RT0:31 1 EXTZ(MEM(EA,2)) EA 1 b + (RB) RT32:63 1 EXTZ(MEM(EA+2,2)) RT0:31 1 EXTZ(MEM(EA,2)) RT32:63 1 EXTZ(MEM(EA+2,2)) D in the instruction mnemonic is UI × 4. The word addressed by EA is loaded from memory and placed in The word addressed by EA is loaded from memory and the odd halfwords zero-extended in each element of placed in the odd halfwords zero-extended in each ele- RT. ment of RT. Special Registers Altered: Special Registers Altered: None None Vector Load Word into Two Halfwords Vector Load Word into Two Halfwords and Splat EVX-form and Splat Indexed EVX-form evlwhsplat RT,D(RA) evlwhsplatx RT,RA,RB 4 RT RA UI 797 4 RT RA RB 796 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 1 0 if (RA = 0) then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTZ(UI×4) EA 1 b + (RB) RT0:15 1 MEM(EA,2) RT0:15 1 MEM(EA,2) RT16:31 1 MEM(EA,2) RT16:31 1 MEM(EA,2) RT32:47 1 MEM(EA+2,2) RT32:47 1 MEM(EA+2,2) RT48:63 1 MEM(EA+2,2) RT48:63 1 MEM(EA+2,2) D in the instruction mnemonic is UI × 4. The word The word addressed by EA is loaded from memory and addressed by EA is loaded from memory and placed in placed in both the even and odd halfwords in each ele- both the even and odd halfwords in each element of ment of RT. RT. Special Registers Altered: Special Registers Altered: None None 278 Power ISATM I Version 2.05 Vector Load Word into Word and Splat Vector Load Word into Word and Splat EVX-form Indexed EVX-form evlwwsplat RT,D(RA) evlwwsplatx RT,RA,RB 4 RT RA UI 793 4 RT RA RB 792 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 1 0 if (RA = 0) then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTZ(UI×4) EA 1 b + (RB) RT0:31 1 MEM(EA,4) RT0:31 1 MEM(EA,4) RT32:63 1 MEM(EA,4) RT32:63 1 MEM(EA,4) D in the instruction mnemonic is UI × 4. The word The word addressed by EA is loaded from memory and addressed by EA is loaded from memory and placed in placed in both elements of RT. both elements of RT. Special Registers Altered: Special Registers Altered: None None Vector Merge High EVX-form Vector Merge Low EVX-form evmergehi RT,RA,RB evmergelo RT,RA,RB 4 RT RA RB 556 4 RT RA RB 557 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 (RA)0:31 RT0:31 1 (RA)32:63 RT32:63 1 (RB)0:31 RT32:63 1 (RB)32:63 The high-order elements of RA and RB are merged and The low-order elements of RA and RB are merged and placed in RT. placed in RT. Special Registers Altered: Special Registers Altered: None None Programming Note Programming Note A vector splat high can be performed by specifying A vector splat low can be performed by specifying the same register in RA and RB. the same register in RA and RB. Chapter 7. Signal Processing Engine (SPE) 279 Version 2.05 Vector Merge High/Low EVX-form Vector Merge Low/High EVX-form evmergehilo RT,RA,RB evmergelohi RT,RA,RB 4 RT RA RB 558 4 RT RA RB 559 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 (RA)0:31 RT0:31 1 (RA)32:63 RT32:63 1 (RB)32:63 RT32:63 1 (RB)0:31 The high-order element of RA and the low-order ele- The low-order element of RA and the high-order ele- ment of RB are merged and placed in RT. ment of RB are merged and placed in RT. Special Registers Altered: Special Registers Altered: None None Programming Note Programming Note With appropriate specification of RA and RB, A vector swap can be performed by specifying the evmergehi, evmergelo, evmergehilo, and same register in RA and RB. evmergelohi provide a full 32-bit permute of two source operands. Vector Multiply Halfwords, Even, Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Guarded, Signed, Modulo, Fractional and Accumulate EVX-form Accumulate Negative EVX-form evmhegsmfaa RT,RA,RB evmhegsmfan RT,RA,RB 4 RT RA RB 1323 4 RT RA RB 1451 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 1 (RA)32:47 ×gsf (RB)32:47 temp0:63 1 (RA)32:47 ×gsf (RB)32:47 RT0:63 1 (ACC)0:63 + temp0:63 RT0:63 1 (ACC)0:63 - temp0:63 ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 The corresponding low even-numbered, halfword The corresponding low even-numbered, halfword signed fractional elements in RA and RB are multiplied signed fractional elements in RA and RB are multiplied using guarded signed fractional multiplication produc- using guarded signed fractional multiplication produc- ing a sign extended 64-bit fractional product with the ing a sign extended 64-bit fractional product with the decimal between bits 32 and 33. The product is added decimal between bits 32 and 33. The product is sub- to the contents of the 64-bit accumulator and the result tracted from the contents of the 64-bit accumulator and is placed in RT and the accumulator the result is placed in RT and the accumulator. Special Registers Altered: Special Registers Altered: ACC ACC Note Note If the two input operands are both -1.0, the interme- If the two input operands are both -1.0, the interme- diate product is represented as +1.0. diate product is represented as +1.0. 280 Power ISATM I Version 2.05 Vector Multiply Halfwords, Even, Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Guarded, Signed, Modulo, Integer and Accumulate EVX-form Accumulate Negative EVX-form evmhegsmiaa RT,RA,RB evmhegsmian RT,RA,RB 4 RT RA RB 1321 4 RT RA RB 1449 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 1 (RA)32:47 ×si (RB)32:47 temp0:31 1 (RA)32:47 ×si (RB)32:47 temp0:63 1 EXTS(temp0:31) temp0:63 1 EXTS(temp0:31) RT0:63 1 (ACC)0:63 + temp0:63 RT0:63 1 (ACC)0:63 - temp0:63 ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 The corresponding low even-numbered halfword The corresponding low even-numbered halfword signed-integer elements in RA and RB are multiplied. signed-integer elements in RA and RB are multiplied. The intermediate product is sign-extended and added The intermediate product is sign-extended and sub- to the contents of the 64-bit accumulator, and the tracted from the contents of the 64-bit accumulator, and resulting sum is placed in RT and into the accumulator. the result is placed in RT and into the accumulator. Special Registers Altered: Special Registers Altered: ACC ACC Vector Multiply Halfwords, Even, Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Guarded, Unsigned, Modulo, Integer and Accumulate EVX-form Accumulate Negative EVX-form evmhegumiaa RT,RA,RB evmhegumian RT,RA,RB 4 RT RA RB 1320 4 RT RA RB 1448 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 1 (RA)32:47 ×ui (RB)32:47 temp0:31 1 (RA)32:47 ×ui (RB)32:47 temp0:63 1 EXTZ(temp0:31) temp0:63 1 EXTZ(temp0:31) RT0:63 1 (ACC)0:63 + temp0:63 RT0:63 1 (ACC)0:63 - temp0:63 ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 The corresponding low even-numbered halfword The corresponding low even-numbered unsigned-inte- unsigned-integer elements in RA and RB are multi- ger elements in RA and RB are multiplied. The interme- plied. The intermediate product is zero-extended and diate product is zero-extended and subtracted from the added to the contents of the 64-bit accumulator. The contents of the 64-bit accumulator. The result is placed resulting sum is placed in RT and into the accumulator. in RT and into the accumulator. Special Registers Altered: Special Registers Altered: ACC ACC Chapter 7. Signal Processing Engine (SPE) 281 Version 2.05 Vector Multiply Halfwords, Even, Signed, Vector Multiply Halfwords, Even, Signed, Modulo, Fractional EVX-form Modulo, Fractional to Accumulator EVX-form evmhesmf RT,RA,RB evmhesmfa RT,RA,RB 4 RT RA RB 1035 0 6 11 16 21 31 4 RT RA RB 1067 0 6 11 16 21 31 RT0:31 1 (RA)0:15 ×sf (RB)0:15 RT32:631 (RA)32:47 ×sf (RB)32:47 RT0:31 1 (RA)0:15 ×sf (RB)0:15 The corresponding even-numbered halfword signed RT32:631 (RA)32:47 ×sf (RB)32:47 fractional elements in RA and RB are multiplied then ACC0:63 1 (RT)0:63 placed into the corresponding words of RT. The corresponding even-numbered halfword signed Special Registers Altered: fractional elements in RA and RB are multiplied then None placed into the corresponding words of RT and into the accumulator. Special Registers Altered: ACC Vector Multiply Halfwords, Even, Signed, Vector Multiply Halfwords, Even, Signed, Modulo, Fractional and Accumulate into Modulo, Fractional and Accumulate Words EVX-form Negative into Words EVX-form evmhesmfaaw RT,RA,RB evmhesmfanw RT,RA,RB 4 RT RA RB 1291 4 RT RA RB 1419 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 1 (RA)0:15 ×sf (RB)0:15 temp0:31 1 (RA)0:15 ×sf (RB)0:15 RT0:31 1 (ACC)0:31 + temp0:31 RT0:31 1 (ACC)0:31 - temp0:31 temp0:31 1 (RA)32:47 ×sf (RB)32:47 RT32:63 1 (ACC)32:63 + temp0:31 temp0:31 1 (RA)32:47 ×sf (RB)32:47 ACC0:63 1 (RT)0:63 RT32:631 (ACC)32:63 - temp0:31 ACC0:63 1 (RT)0:63 For each word element in the accumulator, the corre- sponding even-numbered halfword signed fractional For each word element in the accumulator, the corre- elements in RA and RB are multiplied. The 32 bits of sponding even-numbered halfword signed fractional each intermediate product are added to the contents of elements in RA and RB are multiplied. The 32-bit inter- the accumulator words to form intermediate sums, mediate products are subtracted from the contents of which are placed into the corresponding RT words and the accumulator words to form intermediate differ- into the accumulator. ences, which are placed into the corresponding RT words and into the accumulator. Special Registers Altered: ACC Special Registers Altered: ACC 282 Power ISATM I Version 2.05 Vector Multiply Halfwords, Even, Signed, Vector Multiply Halfwords, Even, Signed, Modulo, Integer EVX-form Modulo, Integer to AccumulatorEVX-form evmhesmi RT,RA,RB evmhesmia RT,RA,RB 4 RT RA RB 1033 4 RT RA RB 1065 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 (RA)0:15 ×si (RB)0:15 RT0:31 1 (RA)0:15 ×si (RB)0:15 RT32:63 1 (RA)32:47 ×si (RB)32:47 RT32:63 1 (RA)32:47 ×si (RB)32:47 ACC0:63 1 (RT)0:63 The corresponding even-numbered halfword signed-integer elements in RA and RB are multiplied. The corresponding even-numbered halfword The two 32-bit products are placed into the correspond- signed-integer elements in RA and RB are multiplied. ing words of RT. The two 32-bit products are placed into the correspond- ing words of RT and into the accumulator. Special Registers Altered: None Special Registers Altered: ACC Vector Multiply Halfwords, Even, Signed, Vector Multiply Halfwords, Even, Signed, Modulo, Integer and Accumulate into Modulo, Integer and Accumulate Negative Words EVX-form into Words EVX-form evmhesmiaaw RT,RA,RB evmhesmianw RT,RA,RB 4 RT RA RB 1289 4 RT RA RB 1417 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 1 (RA)0:15 ×si (RB)0:15 temp0:31 1 (RA)0:15 ×si (RB)0:15 RT0:31 1 (ACC)0:31 + temp0:31 RT0:31 1 (ACC)0:31 - temp0:31 temp0:31 1 (RA)32:47 ×si (RB)32:47 temp0:31 1 (RA)32:47 ×si (RB)32:47 RT32:63 1 (ACC)32:63 + temp0:31 RT32:63 1 (ACC)32:63 - temp0:31 ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 For each word element in the accumulator, the corre- For each word element in the accumulator, the corre- sponding even-numbered halfword signed-integer ele- sponding even-numbered halfword signed-integer ele- ments in RA and RB are multiplied. Each intermediate ments in RA and RB are multiplied. Each intermediate 32-bit product is added to the contents of the accumu- 32-bit product is subtracted from the contents of the lator words to form intermediate sums, which are accumulator words to form intermediate differences, placed into the corresponding RT words and into the which are placed into the corresponding RT words and accumulator. into the accumulator. Special Registers Altered: Special Registers Altered: ACC ACC Chapter 7. Signal Processing Engine (SPE) 283 Version 2.05 Vector Multiply Halfwords, Even, Signed, Vector Multiply Halfwords, Even, Signed, Saturate, Fractional EVX-form Saturate, Fractional to Accumulator EVX-form evmhessf RT,RA,RB evmhessfa RT,RA,RB 4 RT RA RB 1027 0 6 11 16 21 31 4 RT RA RB 1059 0 6 11 16 21 31 temp0:31 1 (RA)0:15 ×sf (RB)0:15 if ((RA)0:15 = 0x8000) & ((RB)0:15 = 0x8000) then temp0:31 1 (RA)0:15 ×sf (RB)0:15 RT0:31 1 0x7FFF_FFFF if ((RA)0:15 = 0x8000) & ((RB)0:15 = 0x8000) then movh 1 1 RT0:31 1 0x7FFF_FFFF else movh 1 1 RT0:31 1 temp0:31 else movh 1 0 RT0:31 1 temp0:31 temp0:31 1 (RA)32:47 ×sf (RB)32:47 movh 1 0 if ((RA)32:47 = 0x8000) & ((RB)32:47 = 0x8000) then temp0:31 1 (RA)32:47 ×sf (RB)32:47 RT32:63 1 0x7FFF_FFFF if ((RA)32:47 = 0x8000) & ((RB)32:47 = 0x8000) then movl 1 1 RT32:63 1 0x7FFF_FFFF else movl 1 1 RT32:63 1 temp0:31 else movl 1 0 RT32:63 1 temp0:31 SPEFSCROVH 1 movh movl 1 0 SPEFSCROV 1 movl ACC0:63 1 (RT)0:63 SPEFSCRSOVH 1 SPEFSCRSOVH | movh SPEFSCROVH 1 movh SPEFSCRSOV 1 SPEFSCRSOV | movl SPEFSCROV 1 movl SPEFSCRSOVH 1 SPEFSCRSOVH | movh The corresponding even-numbered halfword signed SPEFSCRSOV 1 SPEFSCRSOV | movl fractional elements in RA and RB are multiplied. The 32 bits of each product are placed into the corresponding The corresponding even-numbered halfword signed words of RT. If both inputs are -1.0, the result saturates fractional elements in RA and RB are multiplied. The 32 to the largest positive signed fraction. bits of each product are placed into the corresponding words of RT and into the accumulator. If both inputs are Special Registers Altered: -1.0, the result saturates to the largest positive signed OV OVH SOV SOVH fraction. Special Registers Altered: ACC OV OVH SOV SOVH 284 Power ISATM I Version 2.05 Vector Multiply Halfwords, Even, Signed, Vector Multiply Halfwords, Even, Signed, Saturate, Fractional and Accumulate into Saturate, Fractional and Accumulate Words EVX-form Negative into Words EVX-form evmhessfaaw RT,RA,RB evmhessfanw RT,RA,RB 4 RT RA RB 1283 4 RT RA RB 1411 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 1 (RA)0:15 ×sf (RB)0:15 temp0:31 1 (RA)0:15 ×sf (RB)0:15 if ((RA)0:15 = 0x8000) & ((RB)0:15 = 0x8000) then if ((RA)0:15 = 0x8000) & ((RB)0:15 = 0x8000) then temp0:31 1 0x7FFF_FFFF temp0:31 1 0x7FFF_FFFF movh 1 1 movh 1 1 else else movh 1 0 movh 1 0 temp0:63 1 EXTS((ACC)0:31) + EXTS(temp0:31) temp0:63 1 EXTS((ACC)0:31) - EXTS(temp0:31) ovh 1 (temp31 temp32) ovh 1 (temp31 temp32) RT0:31 1 SATURATE(ovh, temp31, 0x8000_0000, RT0:31 1 SATURATE(ovh, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) 0x7FFF_FFFF, temp32:63) temp0:31 1 (RA)32:47 ×sf (RB)32:47 temp0:31 1 (RA)32:47 ×sf (RB)32:47 if ((RA)32:47 = 0x8000) & ((RB)32:47 = 0x8000) then if ((RA)32:47 = 0x8000) & ((RB)32:47 = 0x8000) then temp0:31 1 0x7FFF_FFFF temp0:31 1 0x7FFF_FFFF movl 1 1 movl 1 1 else else movl 1 0 movl 1 0 temp0:63 1 EXTS((ACC)32:63) - EXTS(temp0:31) temp0:63 1 EXTS((ACC)32:63) + EXTS(temp0:31) ovl 1 (temp31 temp32) ovl 1 (temp31 temp32) RT32:63 1 SATURATE(ovl, temp31, 0x8000_0000, RT32:63 1 SATURATE(ovl, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) 0x7FFF_FFFF, temp32:63) ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 SPEFSCROVH 1 ovh | movh SPEFSCROVH 1 ovh | movh SPEFSCROV 1 ovl| movl SPEFSCROV 1 ovl| movl SPEFSCRSOVH 1 SPEFSCRSOVH | ovh | movh SPEFSCRSOVH 1 SPEFSCRSOVH | ovh | movh SPEFSCRSOV 1 SPEFSCRSOV | ovl| movl SPEFSCRSOV 1 SPEFSCRSOV | ovl| movl The corresponding even-numbered halfword signed The corresponding even-numbered halfword signed fractional elements in RA and RB are multiplied pro- fractional elements in RA and RB are multiplied pro- ducing a 32-bit product. If both inputs are -1.0, the ducing a 32-bit product. If both inputs are -1.0, the result saturates to 0x7FFF_FFFF. Each 32-bit product result saturates to 0x7FFF_FFFF. Each 32-bit product is then added to the corresponding word in the accu- is then subtracted from the corresponding word in the mulator saturating if overflow occurs, and the result is accumulator saturating if overflow occurs, and the placed in RT and the accumulator. result is placed in RT and the accumulator. Special Registers Altered: Special Registers Altered: ACC OV OVH SOV SOVH ACC OV OVH SOV SOVH Chapter 7. Signal Processing Engine (SPE) 285 Version 2.05 Vector Multiply Halfwords, Even, Signed, Vector Multiply Halfwords, Even, Signed, Saturate, Integer and Accumulate into Saturate, Integer and Accumulate Words EVX-form Negative into Words EVX-form evmhessiaaw RT,RA,RB evmhessianw RT,RA,RB 4 RT RA RB 1281 4 RT RA RB 1409 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 1 (RA)0:15 ×si (RB)0:15 temp0:31 1 (RA)0:15 ×si (RB)0:15 temp0:63 1 EXTS((ACC)0:31) + EXTS(temp0:31) temp0:63 1 EXTS((ACC)0:31) - EXTS(temp0:31) ovh 1 (temp31 temp32) ovh 1 (temp31 temp32) RT0:31 1 SATURATE(ovh, temp31, 0x8000_0000, RT0:31 1 SATURATE(ovh, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) 0x7FFF_FFFF, temp32:63) temp0:31 1 (RA)32:47 ×si (RB)32:47 temp0:31 1 (RA)32:47 ×si (RB)32:47 temp0:63 1 EXTS((ACC)32:63) - EXTS(temp0:31) temp0:63 1 EXTS((ACC)32:63) + EXTS(temp0:31) ovl 1 (temp31 temp32) ovl 1 (temp31 temp32) RT32:63 1 SATURATE(ovl, temp31, 0x8000_0000, RT32:63 1 SATURATE(ovl, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) 0x7FFF_FFFF, temp32:63) ACC0:63 1 RT0:63 ACC0:63 1 (RT)0:63 SPEFSCROVH 1 ovh SPEFSCROVH 1 ovh SPEFSCROV 1 ovl SPEFSCROV 1 ovl SPEFSCRSOVH 1 SPEFSCRSOVH | ovh SPEFSCRSOVH 1 SPEFSCRSOVH | ovh SPEFSCRSOV 1 SPEFSCRSOV | ovl SPEFSCRSOV 1 SPEFSCRSOV | ovl The corresponding even-numbered halfword The corresponding even-numbered halfword signed-integer elements in RA and RB are multiplied signed-integer elements in RA and RB are multiplied producing a 32-bit product. Each 32-bit product is then producing a 32-bit product. Each 32-bit product is then subtracted from the corresponding word in the accumu- added to the corresponding word in the accumulator lator saturating if overflow occurs, and the result is saturating if overflow occurs, and the result is placed in placed in RT and the accumulator. RT and the accumulator. Special Registers Altered: Special Registers Altered: ACC OV OVH SOV SOVH ACC OV OVH SOV SOVH 286 Power ISATM I Version 2.05 Vector Multiply Halfwords, Even, Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer EVX-form Unsigned, Modulo, Integer to Accumulator EVX-form evmheumi RT,RA,RB evmheumia RT,RA,RB 4 RT RA RB 1032 0 6 11 16 21 31 4 RT RA RB 1064 0 6 11 16 21 31 RT0:31 1 (RA)0:15 ×ui (RB)0:15 RT32:63 1 (RA)32:47 ×ui (RB)32:47 RT0:31 1 (RA)0:15 ×ui (RB)0:15 RT32:63 1 (RA)32:47 ×ui (RB)32:47 The corresponding even-numbered halfword ACC0:63 1 (RT)0:63 unsigned-integer elements in RA and RB are multi- plied. The two 32-bit products are placed into the corre- The corresponding even-numbered halfword sponding words of RT. unsigned-integer elements in RA and RB are multi- plied. The two 32-bit products are placed into RT and Special Registers Altered: into the accumulator. None Special Registers Altered: ACC Vector Multiply Halfwords, Even, Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Unsigned, Modulo, Integer and Accumulate into Words EVX-form Accumulate Negative into Words EVX-form evmheumiaaw RT,RA,RB evmheumianw RT,RA,RB 4 RT RA RB 1288 0 6 11 16 21 31 4 RT RA RB 1416 0 6 11 16 21 31 temp0:31 1 (RA)0:15 ×ui (RB)0:15 RT0:31 1 (ACC)0:31 + temp0:31 temp0:31 1 (RA)0:15 ×ui (RB)0:15 temp0:31 1 (RA)32:47 ×ui (RB)32:47 RT0:31 1 (ACC)0:31 - temp0:31 RT32:63 1 (ACC)32:63 + temp0:31 temp0:31 1 (RA)32:47 ×ui (RB)32:47 ACC0:63 1 (RT)0:63 RT32:63 1 (ACC)32:63 - temp0:31 ACC0:63 1 (RT)0:63 For each word element in the accumulator, the corre- sponding even-numbered halfword unsigned-integer For each word element in the accumulator, the corre- elements in RA and RB are multiplied. Each intermedi- sponding even-numbered halfword unsigned-integer ate product is added to the contents of the correspond- elements in RA and RB are multiplied. Each intermedi- ing accumulator words and the sums are placed into ate product is subtracted from the contents of the corre- the corresponding RT and accumulator words. sponding accumulator words. The differences are placed into the corresponding RT and accumulator Special Registers Altered: words. ACC Special Registers Altered: ACC Chapter 7. Signal Processing Engine (SPE) 287 Version 2.05 Vector Multiply Halfwords, Even, Vector Multiply Halfwords, Even, Unsigned, Saturate, Integer and Unsigned, Saturate, Integer and Accumulate into Words EVX-form Accumulate Negative into Words EVX-form evmheusiaaw RT,RA,RB evmheusianw RT,RA,RB 4 RT RA RB 1280 0 6 11 16 21 31 4 RT RA RB 1408 0 6 11 16 21 31 temp0:31 1 (RA)0:15 ×ui (RB)0:15 temp0:63 1 EXTZ((ACC)0:31) + EXTZ(temp0:31) temp0:31 1 (RA)0:15 ×ui (RB)0:15 ovh 1 temp31 temp0:63 1 EXTZ((ACC)0:31) - EXTZ(temp0:31) RT0:31 1 SATURATE(ovh, 0, 0xFFFF_FFFF, 0xFFFF_FFFF, ovh 1 temp31 temp32:63) RT0:31 1 SATURATE(ovh, 0, 0x0000_0000, 0x0000_0000, temp0:31 1 (RA)32:47 ×ui (RB)32:47 temp32:63) temp0:63 1 EXTZ((ACC)32:63) + EXTZ(temp0:31) temp0:31 1 (RA)32:47 ×ui (RB)32:47 ovl 1 temp31 temp0:63 1 EXTZ((ACC)32:63) - EXTZ(temp0:31) RT32:63 1 SATURATE(ovl, 0, 0xFFFF_FFFF, ovl 1 temp31 0xFFFF_FFFF, temp32:63) RT32:63 1 SATURATE(ovl, 0, 0x0000_0000, ACC0:63 1 (RT)0:63 0x0000_0000, temp32:63) SPEFSCROVH 1 ovh ACC0:63 1 (RT)0:63 SPEFSCROV 1 ovl SPEFSCROVH 1 ovh SPEFSCRSOVH 1 SPEFSCRSOVH | ovh SPEFSCROV 1 ovl SPEFSCRSOV 1 SPEFSCRSOV | ovl SPEFSCRSOVH 1 SPEFSCRSOVH | ovh SPEFSCRSOV 1 SPEFSCRSOV | ovl For each word element in the accumulator, correspond- ing even-numbered halfword unsigned-integer ele- For each word element in the accumulator, correspond- ments in RA and RB are multiplied producing a 32-bit ing even-numbered halfword unsigned-integer ele- product. Each 32-bit product is then added to the corre- ments in RA and RB are multiplied producing a 32-bit sponding word in the accumulator saturating if overflow product. Each 32-bit product is then subtracted from occurs, and the result is placed in RT and the accumu- the corresponding word in the accumulator saturating if lator. overflow occurs, and the result is placed in RT and the accumulator. Special Registers Altered: ACC OV OVH SOV SOVH Special Registers Altered: ACC OV OVH SOV SOVH 288 Power ISATM I Version 2.05 Vector Multiply Halfwords, Odd, Guarded, Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Fractional and Signed, Modulo, Fractional and Accumulate EVX-form Accumulate Negative EVX-form evmhogsmfaa RT,RA,RB evmhogsmfan RT,RA,RB 4 RT RA RB 1327 4 RT RA RB 1455 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 1 (RA)48:63 ×gsf (RB)48:63 temp0:63 1 (RA)48:63 ×gsf (RB)48:63 RT0:63 1 (ACC)0:63 + temp0:63 RT0:63 1 (ACC)0:63 - temp0:63 ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 The corresponding low odd-numbered, halfword signed The corresponding low odd-numbered, halfword signed fractional elements in RA and RB are multiplied using fractional elements in RA and RB are multiplied using guarded signed fractional multiplication producing a guarded signed fractional multiplication producing a sign extended 64-bit fractional product with the decimal sign extended 64-bit fractional product with the decimal between bits 32 and 33. The product is added to the between bits 32 and 33. The product is subtracted from contents of the 64-bit accumulator and the result is the contents of the 64-bit accumulator and the result is placed in RT and the accumulator. placed in RT and the accumulator. Special Registers Altered: Special Registers Altered: ACC ACC Note Note If the two input operands are both -1.0, the interme- If the two input operands are both -1.0, the interme- diate product is represented as +1.0. diate product is represented as +1.0. Vector Multiply Halfwords, Odd, Guarded, Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Integer and Accumulate Signed, Modulo, Integer and Accumulate EVX-form Negative EVX-form evmhogsmiaa RT,RA,RB evmhogsmian RT,RA,RB 4 RT RA RB 1325 4 RT RA RB 1453 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 1 (RA)48:63 ×si (RB)48:63 temp0:31 1 (RA)48:63 ×si (RB)48:63 temp0:63 1 EXTS(temp0:31) temp0:63 1 EXTS(temp0:31) RT0:63 1 (ACC)0:63 + temp0:63 RT0:63 1 (ACC)0:63 - temp0:63 ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 The corresponding low odd-numbered halfword The corresponding low odd-numbered halfword signed-integer elements in RA and RB are multiplied. signed-integer elements in RA and RB are multiplied. The intermediate product is sign-extended to 64 bits The intermediate product is sign-extended to 64 bits then added to the contents of the 64-bit accumulator, then subtracted from the contents of the 64-bit accumu- and the result is placed in RT and into the accumulator. lator, and the result is placed in RT and into the accu- mulator. Special Registers Altered: ACC Special Registers Altered: ACC Chapter 7. Signal Processing Engine (SPE) 289 Version 2.05 Vector Multiply Halfwords, Odd, Guarded, Vector Multiply Halfwords, Odd, Guarded, Unsigned, Modulo, Integer and Unsigned, Modulo, Integer and Accumulate EVX-form Accumulate Negative EVX-form evmhogumiaa RT,RA,RB evmhogumian RT,RA,RB 4 RT RA RB 1324 4 RT RA RB 1452 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 1 (RA)48:63 ×ui (RB)48:63 temp0:31 1 (RA)48:63 ×ui (RB)48:63 temp0:63 1 EXTZ(temp0:31) temp0:63 1 EXTZ(temp0:31) RT0:63 1 (ACC)0:63 + temp0:63 RT0:63 1 (ACC)0:63 - temp0:63 ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 The corresponding low odd-numbered halfword The corresponding low odd-numbered halfword unsigned-integer elements in RA and RB are multi- unsigned-integer elements in RA and RB are multi- plied. The intermediate product is zero-extended to 64 plied. The intermediate product is zero-extended to 64 bits then added to the contents of the 64-bit accumula- bits then subtracted from the contents of the 64-bit tor, and the result is placed in RT and into the accumu- accumulator, and the result is placed in RT and into the lator. accumulator. Special Registers Altered: Special Registers Altered: ACC ACC Vector Multiply Halfwords, Odd, Signed, Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional EVX-form Modulo, Fractional to Accumulator EVX-form evmhosmf RT,RA,RB evmhosmfa RT,RA,RB 4 RT RA RB 1039 0 6 11 16 21 31 4 RT RA RB 1071 0 6 11 16 21 31 RT0:31 1 (RA)16:31 ×sf (RB)16:31 RT32:63 1 (RA)48:63 ×sf (RB)48:63 RT0:31 1 (RA)16:31 ×sf (RB)16:31 RT32:63 1 (RA)48:63 ×sf (RB)48:63 The corresponding odd-numbered, halfword signed ACC0:63 1 (RT)0:63 fractional elements in RA and RB are multiplied. Each product is placed into the corresponding words of RT. The corresponding odd-numbered, halfword signed fractional elements in RA and RB are multiplied. Each Special Registers Altered: product is placed into the corresponding words of RT. None and into the accumulator. Special Registers Altered: ACC 290 Power ISATM I Version 2.05 Vector Multiply Halfwords, Odd, Signed, Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional and Accumulate into Modulo, Fractional and Accumulate Words EVX-form Negative into Words EVX-form evmhosmfaaw RT,RA,RB evmhosmfanw RT,RA,RB 4 RT RA RB 1295 4 RT RA RB 1423 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 1 (RA)16:31 ×sf (RB)16:31 temp0:31 1 (RA)16:31 ×sf (RB)16:31 RT0:31 1 (ACC)0:31 + temp0:31 RT0:31 1 (ACC)0:31 - temp0:31 temp0:31 1 (RA)48:63 ×sf (RB)48:63 temp0:31 1 (RA)48:63 ×sf (RB)48:63 RT32:63 1 (ACC)32:63 + temp0:31 RT32:63 1 (ACC)32:63 - temp0:31 ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 For each word element in the accumulator, the corre- For each word element in the accumulator, the corre- sponding odd-numbered halfword signed fractional ele- sponding odd-numbered halfword signed fractional ele- ments in RA and RB are multiplied. The 32 bits of each ments in RA and RB are multiplied. The 32 bits of each intermediate product are added to the contents of the intermediate product are subtracted from the contents corresponding accumulator word and the results are of the corresponding accumulator word and the results placed into the corresponding RT words and into the are placed into the corresponding RT words and into accumulator. the accumulator. Special Registers Altered: Special Registers Altered: ACC ACC Vector Multiply Halfwords, Odd, Signed, Vector Multiply Halfwords, Odd, Signed, Modulo, Integer EVX-form Modulo, Integer to AccumulatorEVX-form evmhosmi RT,RA,RB evmhosmia RT,RA,RB 4 RT RA RB 1037 4 RT RA RB 1069 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 (RA)16:31 ×si (RB)16:31 RT0:31 1 (RA)16:31 ×si (RB)16:31 RT32:63 1 (RA)48:63 ×si (RB)48:63 RT32:63 1 (RA)48:63 ×si (RB)48:63 ACC0:63 1 (RT)0:63 The corresponding odd-numbered halfword signed-integer elements in RA and RB are multiplied. The corresponding odd-numbered halfword The two 32-bit products are placed into the correspond- signed-integer elements in RA and RB are multiplied. ing words of RT. The two 32-bit products are placed into the correspond- ing words of RT and into the accumulator. Special Registers Altered: None Special Registers Altered: ACC Chapter 7. Signal Processing Engine (SPE) 291 Version 2.05 Vector Multiply Halfwords, Odd, Signed, Vector Multiply Halfwords, Odd, Signed, Modulo, Integer and Accumulate into Modulo, Integer and Accumulate Negative Words EVX-form into Words EVX-form evmhosmiaaw RT,RA,RB evmhosmianw RT,RA,RB 4 RT RA RB 1293 4 RT RA RB 1421 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 1 (RA)16:31 ×si (RB)16:31 temp0:31 1(RA)16:31 ×si (RB)16:31 RT0:31 1 (ACC)0:31 + temp0:31 RT0:31 1 (ACC)0:31 - temp0:31 temp0:31 1 (RA)48:63 ×si (RB)48:63 temp0:31 1 (RA)48:63 ×si (RB)48:63 RT32:63 1 (ACC)32:63 + temp0:31 RT32:63 1 (ACC)32:63 - temp0:31 ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 For each word element in the accumulator, the corre- For each word element in the accumulator, the corre- sponding odd-numbered halfword signed-integer ele- sponding odd-numbered halfword signed-integer ele- ments in RA and RB are multiplied. Each intermediate ments in RA and RB are multiplied. Each intermediate 32-bit product is added to the contents of the corre- 32-bit product is subtracted from the contents of the sponding accumulator word and the results are placed corresponding accumulator word and the results are into the corresponding RT words and into the accumu- placed into the corresponding RT words and into the lator. accumulator. Special Registers Altered: Special Registers Altered: ACC ACC 292 Power ISATM I Version 2.05 Vector Multiply Halfwords, Odd, Signed, Vector Multiply Halfwords, Odd, Signed, Saturate, Fractional EVX-form Saturate, Fractional to Accumulator EVX-form evmhossf RT,RA,RB evmhossfa RT,RA,RB 4 RT RA RB 1031 0 6 11 16 21 31 4 RT RA RB 1063 0 6 11 16 21 31 temp0:31 1 (RA)16:31 ×sf (RB)16:31 if ((RA)16:31 = 0x8000) & ((RB)16:31 = 0x8000) then temp0:31 1 (RA)16:31 ×sf (RB)16:31 RT0:31 1 0x7FFF_FFFF if ((RA)16:31 = 0x8000) & ((RB)16:31 = 0x8000) then movh 1 1 RT0:31 1 0x7FFF_FFFF else movh 1 1 RT0:31 1 temp0:31 else movh 1 0 RT0:31 1 temp0:31 temp0:31 1 (RA)48:63 ×sf (RB)48:63 movh 1 0 if ((RA)48:63 = 0x8000) & ((RB)48:63 = 0x8000) then temp0:31 1 (RA)48:63 ×sf (RB)48:63 RT32:63 1 0x7FFF_FFFF if ((RA)48:63 = 0x8000) & ((RB)48:63 = 0x8000) then movl 1 1 RT32:63 1 0x7FFF_FFFF else movl 1 1 RT32:63 1 temp0:31 else movl 1 0 RT32:63 1 temp0:31 SPEFSCROVH 1 movh movl 1 0 SPEFSCROV 1 movl ACC0:63 1 (RT)0:63 SPEFSCRSOVH 1 SPEFSCRSOVH | movh SPEFSCROVH 1 movh SPEFSCRSOV 1 SPEFSCRSOV | movl SPEFSCROV 1 movl SPEFSCRSOVH 1 SPEFSCRSOVH | movh The corresponding odd-numbered halfword signed SPEFSCRSOV 1 SPEFSCRSOV | movl fractional elements in RA and RB are multiplied. The 32 bits of each product are placed into the corresponding The corresponding odd-numbered halfword signed words of RT. If both inputs are -1.0, the result saturates fractional elements in RA and RB are multiplied. The 32 to the largest positive signed fraction. bits of each product are placed into the corresponding words of RT and into the accumulator. If both inputs are Special Registers Altered: -1.0, the result saturates to the largest positive signed OV OVH SOV SOVH fraction. Special Registers Altered: ACC OV OVH SOV SOVH Chapter 7. Signal Processing Engine (SPE) 293 Version 2.05 Vector Multiply Halfwords, Odd, Signed, Vector Multiply Halfwords, Odd, Signed, Saturate, Fractional and Accumulate into Saturate, Fractional and Accumulate Words EVX-form Negative into Words EVX-form evmhossfaaw RT,RA,RB evmhossfanw RT,RA,RB 4 RT RA RB 1287 4 RT RA RB 1415 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 1 (RA)16:31 ×sf (RB)16:31 temp0:31 1 (RA)16:31 ×sf (RB)16:31 if ((RA)16:31 = 0x8000) & ((RB)16:31 = 0x8000) then if ((RA)16:31 = 0x8000) & ((RB)16:31 = 0x8000) then temp0:31 1 0x7FFF_FFFF temp0:31 1 0x7FFF_FFFF movh 1 1 movh 1 1 else else movh 1 0 movh 1 0 temp0:63 1 EXTS((ACC)0:31) + EXTS(temp0:31) temp0:63 1 EXTS((ACC)0:31) - EXTS(temp0:31) ovh 1 (temp31 temp32) ovh 1 (temp31 temp32) RT0:31 1 SATURATE(ovh, temp31, 0x8000_0000, RT0:31 1 SATURATE(ovh, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) 0x7FFF_FFFF, temp32:63) temp0:31 1 (RA)48:63 ×sf (RB)48:63 temp0:31 1 (RA)48:63 ×sf (RB)48:63 if ((RA)48:63 = 0x8000) & ((RB)48:63 = 0x8000) then if ((RA)48:63 = 0x8000) & ((RB)48:63 = 0x8000) then temp0:31 1 0x7FFF_FFFF temp0:31 1 0x7FFF_FFFF movl 1 1 movl 1 1 else else movl 1 0 movl 1 0 temp0:63 1 EXTS((ACC)32:63) + EXTS(temp0:31) temp0:63 1 EXTS((ACC)32:63) - EXTS(temp0:31) ovl 1 (temp31 temp32) ovl 1 (temp31 temp32) RT32:63 1 SATURATE(ovl, temp31, 0x8000_0000, RT32:63 1 SATURATE(ovl, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) 0x7FFF_FFFF, temp32:63) ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 SPEFSCROVH 1 ovh | movh SPEFSCROVH 1 ovh | movh SPEFSCROV 1 ovl| movl SPEFSCROV 1 ovl| movl SPEFSCRSOVH 1 SPEFSCRSOVH | ovh | movh SPEFSCRSOVH 1 SPEFSCRSOVH | ovh | movh SPEFSCRSOV 1 SPEFSCRSOV | ovl| movl SPEFSCRSOV 1 SPEFSCRSOV | ovl| movl The corresponding odd-numbered halfword signed The corresponding odd-numbered halfword signed fractional elements in RA and RB are multiplied pro- fractional elements in RA and RB are multiplied pro- ducing a 32-bit product. If both inputs are -1.0, the ducing a 32-bit product. If both inputs are -1.0, the result saturates to 0x7FFF_FFFF. Each 32-bit product result saturates to 0x7FFF_FFFF. Each 32-bit product is then added to the corresponding word in the accu- is then subtracted from the corresponding word in the mulator saturating if overflow occurs, and the result is accumulator saturating if overflow occurs, and the placed in RT and the accumulator. result is placed in RT and the accumulator. Special Registers Altered: Special Registers Altered: ACC OV OVH SOV SOVH ACC OV OVH SOV SOVH 294 Power ISATM I Version 2.05 Vector Multiply Halfwords, Odd, Signed, Vector Multiply Halfwords, Odd, Signed, Saturate, Integer and Accumulate into Saturate, Integer and Accumulate Words EVX-form Negative into Words EVX-form evmhossiaaw RT,RA,RB evmhossianw RT,RA,RB 4 RT RA RB 1285 4 RT RA RB 1413 0 6 11 16 21 31 0 6 11 16 21 31 temp0:31 1 (RA)16:31 ×si (RB)16:31 temp0:31 1 (RA)16:31 ×si (RB)16:31 temp0:63 1 EXTS((ACC)0:31) + EXTS(temp0:31) temp0:63 1 EXTS((ACC)0:31) - EXTS(temp0:31) ovh 1 (temp31 temp32) ovh 1 (temp31 temp32) RT0:31 1 SATURATE(ovh, temp31, 0x8000_0000, RT0:31 1 SATURATE(ovh, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) 0x7FFF_FFFF, temp32:63) temp0:31 1 (RA)48:63 ×si (RB)48:63 temp0:31 1 (RA)48:63 ×si (RB)48:63 temp0:63 1 EXTS((ACC)32:63) + EXTS(temp0:31) temp0:63 1 EXTS((ACC)32:63) - EXTS(temp0:31) ovl 1 (temp31 temp32) ovl 1 (temp31 temp32) RT32:63 1 SATURATE(ovl, temp31, 0x8000_0000, RT32:63 1 SATURATE(ovl, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) 0x7FFF_FFFF, temp32:63) ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 SPEFSCROVH 1 ovh SPEFSCROVH 1 ovh SPEFSCROV 1 ovl SPEFSCROV 1 ovl SPEFSCRSOVH 1 SPEFSCRSOVH | ovh SPEFSCRSOVH 1 SPEFSCRSOVH | ovh SPEFSCRSOV 1 SPEFSCRSOV | ovl SPEFSCRSOV 1 SPEFSCRSOV | ovl The corresponding odd-numbered halfword The corresponding odd-numbered halfword signed-integer elements in RA and RB are multiplied signed-integer elements in RA and RB are multiplied producing a 32-bit product. Each 32-bit product is then producing a 32-bit product. Each 32-bit product is then added to the corresponding word in the accumulator subtracted from the corresponding word in the accumu- saturating if overflow occurs, and the result is placed in lator saturating if overflow occurs, and the result is RT and the accumulator. placed in RT and the accumulator. Special Registers Altered: Special Registers Altered: ACC OV OVH SOV SOVH ACC OV OVH SOV SOVH Vector Multiply Halfwords, Odd, Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer EVX-form Unsigned, Modulo, Integer to Accumulator EVX-form evmhoumi RT,RA,RB evmhoumia RT,RA,RB 4 RT RA RB 1036 0 6 11 16 21 31 4 RT RA RB 1068 0 6 11 16 21 31 RT0:31 1 (RA)16:31 ×ui (RB)16:31 RT32:63 1 (RA)48:63 ×ui (RB)48:63 RT0:31 1 (RA)16:31 ×ui (RB)16:31 RT32:63 1 (RA)48:63 ×ui (RB)48:63 The corresponding odd-numbered halfword ACC0:63 1 (RT)0:63 unsigned-integer elements in RA and RB are multi- plied. The two 32-bit products are placed into the corre- The corresponding odd-numbered halfword sponding words of RT. unsigned-integer elements in RA and RB are multi- plied. The two 32-bit products are placed into RT and Special Registers Altered: into the accumulator. None Special Registers Altered: ACC Chapter 7. Signal Processing Engine (SPE) 295 Version 2.05 Vector Multiply Halfwords, Odd, Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Unsigned, Modulo, Integer and Accumulate into Words EVX-form Accumulate Negative into Words EVX-form evmhoumiaaw RT,RA,RB evmhoumianw RT,RA,RB 4 RT RA RB 1292 0 6 11 16 21 31 4 RT RA RB 1420 0 6 11 16 21 31 temp0:31 1 (RA)16:31 ×ui (RB)16:31 RT0:31 1 (ACC)0:31 + temp0:31 temp0:31 1 (RA)16:31 ×ui (RB)16:31 temp0:31 1 (RA)48:63 ×ui (RB)48:63 RT0:31 1 (ACC)0:31 - temp0:31 RT32:63 1 (ACC)32:63 + temp0:31 temp0:31 1 (RA)48:63 ×ui (RB)48:63 ACC0:63 1 (RT)0:63 RT32:63 1 (ACC)32:63 - temp0:31 ACC0:63 1 (RT)0:63 For each word element in the accumulator, the corre- sponding odd-numbered halfword unsigned-integer For each word element in the accumulator, the corre- elements in RA and RB are multiplied. Each intermedi- sponding odd-numbered halfword unsigned-integer ate product is added to the contents of the correspond- elements in RA and RB are multiplied. Each intermedi- ing accumulator word. The sums are placed into the ate product is subtracted from the contents of the corre- corresponding RT and accumulator words. sponding accumulator word. The results are placed into the corresponding RT and accumulator words. Special Registers Altered: ACC Special Registers Altered: ACC Vector Multiply Halfwords, Odd, Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Unsigned, Saturate, Integer and Accumulate into Words EVX-form Accumulate Negative into Words EVX-form evmhousiaaw RT,RA,RB evmhousianw RT,RA,RB 4 RT RA RB 1284 0 6 11 16 21 31 4 RT RA RB 1412 0 6 11 16 21 31 temp0:31 1 (RA)16:31 ×ui (RB)16:31 temp0:63 1 EXTZ((ACC)0:31) + EXTZ(temp0:31) temp0:31 1 (RA)16:31 ×ui (RB)16:31 ovh 1 temp31 temp0:63 1 EXTZ((ACC)0:31) - EXTZ(temp0:31) RT0:31 1 SATURATE(ovh, 0, 0xFFFF_FFFF, 0xFFFF_FFFF, ovh 1 temp31 temp32:63) RT0:31 1 SATURATE(ovh, 0, 0x0000_0000, 0x0000_0000, temp0:31 1 (RA)48:63 ×ui (RB)48:63 temp32:63) temp0:63 1 EXTZ((ACC)32:63) + EXTZ(temp0:31) temp0:31 1 (RA)48:63 ×ui (RB)48:63 ovl 1 temp31 temp0:63 1 EXTZ((ACC)32:63) - EXTZ(temp0:31) RT32:63 1 SATURATE(ovl, 0, 0xFFFF_FFFF, ovl 1 temp31 0xFFFF_FFFF, temp32:63) RT32:63 1 SATURATE(ovl, 0, 0x0000_0000,0x0000_0000, ACC0:63 1 (RT)0:63 temp32:63) SPEFSCROVH 1 ovh ACC0:63 1 (RT)0:63 SPEFSCROV 1 ovl SPEFSCROVH 1 ovh SPEFSCRSOVH 1 SPEFSCRSOVH | ovh SPEFSCROV 1 ovl SPEFSCRSOV 1 SPEFSCRSOV | ovl SPEFSCRSOVH 1 SPEFSCRSOVH | ovh SPEFSCRSOV 1 SPEFSCRSOV | ovl For each word element in the accumulator, correspond- ing odd-numbered halfword unsigned-integer elements For each word element in the accumulator, correspond- in RA and RB are multiplied producing a 32-bit product. ing odd-numbered halfword unsigned-integer elements Each 32-bit product is then added to the corresponding in RA and RB are multiplied producing a 32-bit product. word in the accumulator saturating if overflow occurs, Each 32-bit product is then subtracted from the corre- and the result is placed in RT and the accumulator. sponding word in the accumulator saturating if overflow occurs, and the result is placed in RT and the accumu- Special Registers Altered: lator. ACC OV OVH SOV SOVH Special Registers Altered: ACC OV OVH SOV SOVH 296 Power ISATM I Version 2.05 Initialize Accumulator EVX-form evmra RT,RA 4 RT RA /// 1220 0 6 11 16 21 31 ACC0:63 1 (RA)0:63 RT0:63 1 (RA)0:63 The contents of RA are placed into the accumulator and RT. This is the method for initializing the accumula- tor. Special Registers Altered: ACC Vector Multiply Word High Signed, Vector Multiply Word High Signed, Modulo, Fractional EVX-form Modulo, Fractional to Accumulator EVX-form evmwhsmf RT,RA,RB evmwhsmfa RT,RA,RB 4 RT RA RB 1103 0 6 11 16 21 31 4 RT RA RB 1135 0 6 11 16 21 31 temp0:63 1 (RA)0:31 ×sf (RB)0:31 RT0:31 1 temp0:31 temp0:63 1 (RA)0:31 ×sf (RB)0:31 temp0:63 1 (RA)32:63 ×sf (RB)32:63 RT0:31 1 temp0:31 RT32:63 1 temp0:31 temp0:63 1 (RA)32:63 ×sf (RB)32:63 RT32:63 1 temp0:31 The corresponding word signed fractional elements in ACC0:63 1 (RT)0:63 RA and RB are multiplied and bits 0:31 of the two prod- ucts are placed into the two corresponding words of The corresponding word signed fractional elements in RT. RA and RB are multiplied and bits 0:31 of the two prod- ucts are placed into the two corresponding words of RT Special Registers Altered: and into the accumulator. None Special Registers Altered: ACC Vector Multiply Word High Signed, Vector Multiply Word High Signed, Modulo, Integer EVX-form Modulo, Integer to AccumulatorEVX-form evmwhsmi RT,RA,RB evmwhsmia RT,RA,RB 4 RT RA RB 1101 4 RT RA RB 1133 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 1 (RA)0:31 ×si (RB)0:31 temp0:63 1 (RA)0:31 ×si (RB)0:31 RT0:31 1 temp0:31 RT0:31 1 temp0:31 temp0:63 1 (RA)32:63 ×si (RB)32:63 temp0:63 1 (RA)32:63 ×si (RB)32:63 RT32:63 1 temp0:31 RT32:63 1 temp0:31 ACC0:63 1 (RT)0:63 The corresponding word signed-integer elements in RA and RB are multiplied. Bits 0:31 of the two 64-bit prod- The corresponding word signed-integer elements in RA ucts are placed into the two corresponding words of and RB are multiplied. Bits 0:31 of the two 64-bit prod- RT. ucts are placed into the two corresponding words of RT and into the accumulator. Special Registers Altered: None Special Registers Altered: ACC Chapter 7. Signal Processing Engine (SPE) 297 Version 2.05 Vector Multiply Word High Signed, Vector Multiply Word High Signed, Saturate, Fractional EVX-form Saturate, Fractional to Accumulator EVX-form evmwhssf RT,RA,RB evmwhssfa RT,RA,RB 4 RT RA RB 1095 0 6 11 16 21 31 4 RT RA RB 1127 0 6 11 16 21 31 temp0:63 1 (RA)0:31 ×sf (RB)0:31 if ((RA)0:31 = 0x8000_0000)& ((RB)0:31 = 0x8000_0000) temp0:63 1 (RA)0:31 ×sf (RB)0:31 then if ((RA)0:31 = 0x8000_0000) & ((RB)0:31 = 0x8000_0000) RT0:31 1 0x7FFF_FFFF then movh 1 1 RT0:31 1 0x7FFF_FFFF else movh 1 1 RT0:31 1 temp0:31 else movh 1 0 RT0:31 1 temp0:31 temp0:63 1 (RA)32:63 ×sf (RB)32:63 movh 1 0 if ((RA)32:63 = 0x8000_0000 &(RB)32:63 = 0x8000_0000) temp0:63 1 (RA)32:63 ×sf (RB)32:63 then if ((RA)32:63=0x8000_0000)&((RB)32:63=0x8000_0000) RT32:63 1 0x7FFF_FFFF then movl 1 1 RT32:63 1 0x7FFF_FFFF else movl 1 1 RT32:63 1 temp0:31 else movl 1 0 RT32:63 1 temp0:31 SPEFSCROVH 1 movh movl 1 0 SPEFSCROV 1 movl ACC0:63 1 (RT)0:63 SPEFSCRSOVH 1 SPEFSCRSOVH | movh SPEFSCROVH 1 movh SPEFSCRSOV 1 SPEFSCRSOV | movl SPEFSCROV 1 movl SPEFSCRSOVH 1 SPEFSCRSOVH | movh The corresponding word signed fractional elements in SPEFSCRSOV 1 SPEFSCRSOV | movl RA and RB are multiplied. Bits 0:31 of each product are placed into the corresponding words of RT. If both The corresponding word signed fractional elements in inputs are -1.0, the result saturates to the largest posi- RA and RB are multiplied. Bits 0:31 of each product are tive signed fraction. placed into the corresponding words of RT and into the accumulator. If both inputs are -1.0, the result saturates Special Registers Altered: to the largest positive signed fraction. OV OVH SOV SOVH Special Registers Altered: ACC OV OVH SOV SOVH Vector Multiply Word High Unsigned, Vector Multiply Word High Unsigned, Modulo, Integer EVX-form Modulo, Integer to AccumulatorEVX-form evmwhumi RT,RA,RB evmwhumia RT,RA,RB 4 RT RA RB 1100 4 RT RA RB 1132 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 1 (RA)0:31 ×ui (RB)0:31 temp0:63 1 (RA)0:31 ×ui (RB)0:31 RT0:31 1 temp0:31 RT0:31 1 temp0:31 temp0:63 1 (RA)32:63 ×ui (RB)32:63 temp0:63 1 (RA)32:63 ×ui (RB)32:63 RT32:63 1 temp0:31 RT32:63 1 temp0:31 ACC0:63 1 (RT)0:63 The corresponding word unsigned-integer elements in RA and RB are multiplied. Bits 0:31 of the two products The corresponding word unsigned-integer elements in are placed into the two corresponding words of RT. RA and RB are multiplied. Bits 0:31 of the two products are placed into the two corresponding words of RT and Special Registers Altered: into the accumulator. None Special Registers Altered: ACC 298 Power ISATM I Version 2.05 Vector Multiply Word Low Signed, Vector Multiply Word Low Signed, Modulo, Integer and Accumulate into Modulo, Integer and Accumulate Negative Words EVX-form in Words EVX-form evmwlsmiaaw RT,RA,RB evmwlsmianw RT,RA,RB 4 RT RA RB 1353 4 RT RA RB 1481 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 1 (RA)0:31 ×si (RB)0:31 temp0:63 1 (RA)0:31 ×si (RB)0:31 RT0:31 1 (ACC)0:31 + temp32:63 RT0:31 1 (ACC)0:31 - temp32:63 temp0:63 1 (RA)32:63 ×si (RB)32:63 temp0:63 1 (RA)32:63 ×si (RB)32:63 RT32:63 1 (ACC)32:63 + temp32:63 RT32:63 1 (ACC)32:63 - temp32:63 ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 For each word element in the accumulator, the corre- For each word element in the accumulator, the corre- sponding word signed-integer elements in RA and RB sponding word elements in RA and RB are multiplied. are multiplied. The least significant 32 bits of each The least significant 32 bits of each intermediate prod- intermediate product are added to the contents of the uct are subtracted from the contents of the correspond- corresponding accumulator words, and the result is ing accumulator words and the result is placed in RT placed in RT and the accumulator. and the accumulator. Special Registers Altered: Special Registers Altered: ACC ACC Vector Multiply Word Low Signed, Vector Multiply Word Low Signed, Saturate, Integer and Accumulate into Saturate, Integer and Accumulate Words EVX-form Negative in Words EVX-form evmwlssiaaw RT,RA,RB evmwlssianw RT,RA,RB 4 RT RA RB 1345 4 RT RA RB 1473 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 1 (RA)0:31 ×si (RB)0:31 temp0:63 1 (RA)0:31 ×si (RB)0:31 temp0:63 1 EXTS((ACC)0:31) + EXTS(temp32:63) temp0:63 1 EXTS((ACC)0:31) - EXTS(temp32:63) ovh 1 (temp31 temp32) ovh 1 (temp31 temp32) RT0:31 1 SATURATE(ovh, temp31, 0x8000_0000, RT0:31 1 SATURATE(ovh, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) 0x7FFF_FFFF, temp32:63) temp0:63 1 (RA)32:63 ×si (RB)32:63 temp0:63 1 (RA)32:63 ×si (RB)32:63 temp0:63 1 EXTS((ACC)32:63) + EXTS(temp32:63) temp0:63 1 EXTS((ACC)32:63) - EXTS(temp32:63) ovl 1 (temp31 temp32) ovl 1 (temp31 temp32) RT32:63 1 SATURATE(ovl, temp31, 0x8000_0000, RT32:63 1 SATURATE(ovl, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) 0x7FFF_FFFF, temp32:63) ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 SPEFSCROVH 1 ovh SPEFSCROVH 1 ovh SPEFSCROV 1 ovl SPEFSCROV 1 ovl SPEFSCRSOVH 1 SPEFSCRSOVH | ovh SPEFSCRSOVH 1 SPEFSCRSOVH | ovh SPEFSCRSOV 1 SPEFSCRSOV | ovl SPEFSCRSOV 1 SPEFSCRSOV | ovl The corresponding word signed-integer elements in RA The corresponding word signed-integer elements in RA and RB are multiplied producing a 64-bit product. The and RB are multiplied producing a 64-bit product. The least significant 32 bits of each product are then added least significant 32 bits of each product are then sub- to the corresponding word in the accumulator saturat- tracted from the corresponding word in the accumulator ing if overflow occurs, and the result is placed in RT saturating if overflow occurs, and the result is placed in and the accumulator. RT and the accumulator. Special Registers Altered: Special Registers Altered: ACC OV OVH SOV SOVH ACC OV OVH SOV SOVH Chapter 7. Signal Processing Engine (SPE) 299 Version 2.05 Vector Multiply Word Low Unsigned, Vector Multiply Word Low Unsigned, Modulo, Integer EVX-form Modulo, Integer to AccumulatorEVX-form evmwlumi RT,RA,RB evmwlumia RT,RA,RB 4 RT RA RB 1096 4 RT RA RB 1128 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 1 (RA)0:31 ×ui (RB)0:31 temp0:63 1 (RA)0:31 ×ui (RB)0:31 RT0:31 1 temp32:63 RT0:31 1 temp32:63 temp0:63 1 (RA)32:63 ×ui (RB)32:63 temp0:63 1 (RA)32:63 ×ui (RB)32:63 RT32:63 1 temp32:63 RT32:63 1 temp32:63 ACC0:63 1 (RT)0:63 The corresponding word unsigned-integer elements in RA and RB are multiplied. The least significant 32 bits The corresponding word unsigned-integer elements in of each product are placed into the two corresponding RA and RB are multiplied. The least significant 32 bits words of RT. of each product are placed into the two corresponding words of RT and into the accumulator. Special Registers Altered: None Special Registers Altered: ACC Programming Note The least significant 32 bits of the product are inde- Programming Note pendent of whether the word elements in RA and The least significant 32 bits of the product are inde- RB are treated as signed or unsigned 32-bit inte- pendent of whether the word elements in RA and gers. RB are treated as signed or unsigned 32-bit inte- gers. Note that evmwlumi can be used for signed or unsigned integers. Note that evmwlumia can be used for signed or unsigned integers. Vector Multiply Word Low Unsigned, Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate into Modulo, Integer and Accumulate Negative Words EVX-form in Words EVX-form evmwlumiaaw RT,RA,RB evmwlumianw RT,RA,RB 4 RT RA RB 1352 4 RT RA RB 1480 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 1 (RA)0:31 ×ui (RB)0:31 temp0:63 1 (RA)0:31 ×ui (RB)0:31 RT0:31 1 (ACC)0:31 + temp32:63 RT0:31 1 (ACC)0:31 - temp32:63 temp0:63 1 (RA)32:63 ×ui (RB)32:63 temp0:63 1 (RA)32:63 ×ui (RB)32:63 RT32:63 1 (ACC)32:63 - temp32:63 RT32:63 1 (ACC)32:63 + temp32:63 ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 For each word element in the accumulator, the corre- For each word element in the accumulator, the corre- sponding word unsigned-integer elements in RA and sponding word unsigned-integer elements in RA and RB are multiplied. The least significant 32 bits of each RB are multiplied. The least significant 32 bits of each product are subtracted from the contents of the corre- product are added to the contents of the corresponding sponding accumulator word and the result is placed in accumulator word and the result is placed in RT and RT and the accumulator. the accumulator. Special Registers Altered: Special Registers Altered: ACC ACC 300 Power ISATM I Version 2.05 Vector Multiply Word Low Unsigned, Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate into Saturate, Integer and Accumulate Words EVX-form Negative in Words EVX-form evmwlusiaaw RT,RA,RB evmwlusianw RT,RA,RB 4 RT RA RB 1344 4 RT RA RB 1472 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 1 (RA)0:31 ×ui (RB)0:31 temp0:63 1 (RA)0:31 ×ui (RB)0:31 temp0:63 1 EXTZ((ACC)0:31) + EXTZ(temp32:63) temp0:63 1 EXTZ((ACC)0:31) - EXTZ(temp32:63) ovh 1 temp31 ovh 1 temp31 RT0:31 1 SATURATE(ovh, 0, 0xFFFF_FFFF, 0xFFFF_FFFF, RT0:31 1 SATURATE(ovh, 0, 0x0000_0000, 0x0000_0000, temp32:63) temp32:63) temp0:63 1 (RA)32:63 ×ui (RB)32:63 temp0:63 1 (RA)32:63 ×ui (RB)32:63 temp0:63 1 EXTZ((ACC)32:63) + EXTZ(temp32:63) temp0:63 1 EXTZ((ACC)32:63) - EXTZ(temp32:63) ovl 1 temp31 ovl 1 temp31 RT32:63 1 SATURATE(ovl, 0, 0xFFFF_FFFF, RT32:63 1 SATURATE(ovl, 0, 0x0000_0000, 0xFFFF_FFFF, temp32:63) 0x0000_0000, temp32:63) ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 SPEFSCROVH 1 ovh SPEFSCROVH 1 ovh SPEFSCROV 1 ovl SPEFSCROV 1 ovl SPEFSCRSOVH 1 SPEFSCRSOVH | ovh SPEFSCRSOVH 1 SPEFSCRSOVH | ovh SPEFSCRSOV 1 SPEFSCRSOV | ovl SPEFSCRSOV 1 SPEFSCRSOV | ovl For each word element in the accumulator, correspond- For each word element in the accumulator, correspond- ing word unsigned-integer elements in RA and RB are ing word unsigned-integer elements in RA and RB are multiplied producing a 64-bit product. The least signifi- multiplied producing a 64-bit product. The least signifi- cant 32 bits of each product are then added to the cor- cant 32 bits of each product are then subtracted from responding word in the accumulator saturating if the corresponding word in the accumulator saturating if overflow occurs, and the result is placed in RT and the overflow occurs, and the result is placed in RT and the accumulator. accumulator. Special Registers Altered: Special Registers Altered: ACC OV OVH SOV SOVH ACC OV OVH SOV SOVH Vector Multiply Word Signed, Modulo, Vector Multiply Word Signed, Modulo, Fractional EVX-form Fractional to Accumulator EVX-form evmwsmf RT,RA,RB evmwsmfa RT,RA,RB 4 RT RA RB 1115 4 RT RA RB 1147 0 6 11 16 21 31 0 6 11 16 21 31 RT0:63 1 (RA)32:63 ×sf (RB)32:63 RT0:63 1 (RA)32:63 ×sf (RB)32:63 ACC0:63 1 (RT)0:63 The corresponding low word signed fractional elements in RA and RB are multiplied. The product is placed in The corresponding low word signed fractional elements RT. in RA and RB are multiplied. The product is placed in RT and into the accumulator. Special Registers Altered: None Special Registers Altered: ACC Chapter 7. Signal Processing Engine (SPE) 301 Version 2.05 Vector Multiply Word Signed, Modulo, Vector Multiply Word Signed, Modulo, Fractional and Accumulate EVX-form Fractional and Accumulate Negative EVX-form evmwsmfaa RT,RA,RB evmwsmfan RT,RA,RB 4 RT RA RB 1371 0 6 11 16 21 31 4 RT RA RB 1499 0 6 11 16 21 31 temp0:63 1 (RA)32:63 ×sf (RB)32:63 RT0:63 1 (ACC)0:63 + temp0:63 temp0:63 1 (RA)32:63 ×sf (RB)32:63 ACC0:63 1 (RT)0:63 RT0:63 1 (ACC)0:63 - temp0:63 ACC0:63 1 (RT)0:63 The corresponding low word signed fractional elements in RA and RB are multiplied. The intermediate product The corresponding low word signed fractional elements is added to the contents of the 64-bit accumulator and in RA and RB are multiplied. The intermediate product the result is placed in RT and the accumulator. is subtracted from the contents of the accumulator and the result is placed in RT and the accumulator. Special Registers Altered: ACC Special Registers Altered: ACC Vector Multiply Word Signed, Modulo, Vector Multiply Word Signed, Modulo, Integer EVX-form Integer to Accumulator EVX-form evmwsmi RT,RA,RB evmwsmia RT,RA,RB 4 RT RA RB 1113 4 RT RA RB 1145 0 6 11 16 21 31 0 6 11 16 21 31 RT0:63 1 (RA)32:63 ×si (RB)32:63 RT0:63 1 (RA)32:63 ×si (RB)32:63 ACC0:63 1 (RT)0:63 The low word signed-integer elements in RA and RB are multiplied. The product is placed in RT. The low word signed-integer elements in RA and RB are multiplied. The product is placed in RT and the Special Registers Altered: accumulator. None Special Registers Altered: ACC Vector Multiply Word Signed, Modulo, Vector Multiply Word Signed, Modulo, Integer and Accumulate EVX-form Integer and Accumulate Negative EVX-form evmwsmiaa RT,RA,RB evmwsmian RT,RA,RB 4 RT RA RB 1369 0 6 11 16 21 31 4 RT RA RB 1497 0 6 11 16 21 31 temp0:63 1 (RA)32:63 ×si (RB)32:63 RT0:63 1 (ACC)0:63 + temp0:63 temp0:63 1 (RA)32:63 ×si (RB)32:63 ACC0:63 1 (RT)0:63 RT0:63 1 (ACC)0:63 - temp0:63 ACC0:63 1 (RT)0:63 The low word signed-integer elements in RA and RB are multiplied. The intermediate product is added to the The low word signed-integer elements in RA and RB contents of the 64-bit accumulator and the result is are multiplied. The intermediate product is subtracted placed in RT and the accumulator. from the contents of the 64-bit accumulator and the result is placed in RT and the accumulator. Special Registers Altered: ACC Special Registers Altered: ACC 302 Power ISATM I Version 2.05 Vector Multiply Word Signed, Saturate, Vector Multiply Word Signed, Saturate, Fractional EVX-form Fractional to Accumulator EVX-form evmwssf RT,RA,RB evmwssfa RT,RA,RB 4 RT RA RB 1107 4 RT RA RB 1139 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 1 (RA)32:63 ×sf (RB)32:63 temp0:63 1 (RA)32:63 ×sf (RB)32:63 if ((RA)32:63 = 0x8000_0000) & (RB32:63 = 0x8000_0000) if ((RA)32:63=0x8000_0000)&((RB)32:63=0x8000_0000) then then RT0:63 1 0x7FFF_FFFF_FFFF_FFFF RT0:63 1 0x7FFF_FFFF_FFFF_FFFF mov 1 1 mov 1 1 else else RT0:63 1 temp0:63 RT0:63 1 temp0:63 mov 1 0 mov 1 0 SPEFSCROVH 1 0 ACC0:63 1 (RT)0:63 SPEFSCROV 1 mov SPEFSCROVH 1 0 SPEFSCRSOV 1 SPEFSCRSOV | mov SPEFSCROV 1 mov SPEFSCRSOV 1 SPEFSCRSOV | mov The low word signed fractional elements in RA and RB are multiplied. The 64-bit product is placed in RT. If The low word signed fractional elements in RA and RB both inputs are -1.0, the result saturates to the largest are multiplied. The 64-bit product is placed in RT and positive signed fraction. into the accumulator. If both inputs are -1.0, the result saturates to the largest positive signed fraction. Special Registers Altered: OV OVH SOV Special Registers Altered: ACC OV OVH SOV Chapter 7. Signal Processing Engine (SPE) 303 Version 2.05 Vector Multiply Word Signed, Saturate, Vector Multiply Word Signed, Saturate, Fractional and Accumulate EVX-form Fractional and Accumulate Negative EVX-form evmwssfaa RT,RA,RB evmwssfan RT,RA,RB 4 RT RA RB 1363 0 6 11 16 21 31 4 RT RA RB 1491 0 6 11 16 21 31 temp0:63 1 (RA)32:63 ×sf (RB)32:63 if ((RA)32:63=0x8000_0000)&((RB)32:63=0x8000_0000) temp0:63 1 (RA)32:63 ×sf (RB)32:63 then if ((RA)32:63=0x8000_0000)&((RB)32:63=0x8000_0000) temp0:63 1 0x7FFF_FFFF_FFFF_FFFF then mov 1 1 temp0:63 1 0x7FFF_FFFF_FFFF_FFFF else mov 1 1 mov 1 0 else temp0:64 1 EXTS((ACC)0:63) + EXTS(temp0:63) mov 1 0 ov 1 (temp0 temp1) temp0:64 1 EXTS((ACC)0:63) - EXTS(temp0:63) RT0:63 1 temp1:64 ov 1 (temp0 temp1) RT0:63 1 temp1:64 ACC0:63 1 (RT)0:63 ACC0:63 1 (RT)0:63 SPEFSCROVH 1 0 SPEFSCROVH 1 0 SPEFSCROV 1 ov | mov SPEFSCROV 1 ov | mov SPEFSCRSOV 1 SPEFSCRSOV | ov | mov SPEFSCRSOV 1 SPEFSCRSOV | ov | mov The low word signed fractional elements in RA and RB The low word signed fractional elements in RA and RB are multiplied producing a 64-bit product. If both inputs are multiplied producing a 64-bit product. If both inputs are -1.0, the product saturates to the largest positive are -1.0, the product saturates to the largest positive signed fraction. The 64-bit product is then added to the signed fraction. The 64-bit product is then subtracted accumulator and the result is placed in RT and the from the accumulator and the result is placed in RT and accumulator. the accumulator. Special Registers Altered: Special Registers Altered: ACC OV OVH SOV ACC OV OVH SOV Vector Multiply Word Unsigned, Modulo, Vector Multiply Word Unsigned, Modulo, Integer EVX-form Integer to Accumulator EVX-form evmwumi RT,RA,RB evmwumia RT,RA,RB 4 RT RA RB 1112 4 RT RA RB 1144 0 6 11 16 21 31 0 6 11 16 21 31 RT0:63 1 (RA)32:63 ×ui (RB)32:63 RT0:63 1 (RA)32:63 ×ui (RB)32:63 ACC0:63 1 (RT)0:63 The low word unsigned-integer elements in RA and RB are multiplied to form a 64-bit product that is placed in The low word unsigned-integer elements in RA and RB RT. are multiplied to form a 64-bit product that is placed in RT and into the accumulator. Special Registers Altered: None Special Registers Altered: ACC 304 Power ISATM I Version 2.05 Vector Multiply Word Unsigned, Modulo, Vector Multiply Word Unsigned, Modulo, Integer and Accumulate EVX-form Integer and Accumulate Negative EVX-form evmwumiaa RT,RA,RB evmwumian RT,RA,RB 4 RT RA RB 1368 0 6 11 16 21 31 4 RT RA RB 1496 0 6 11 16 21 31 temp0:63 1 (RA)32:63 ×ui (RB)32:63 RT0:63 1 (ACC)0:63 + temp0:63 temp0:63 1 (RA)32:63 ×ui (RB)32:63 ACC0:63 1 (RT)0:63 RT0:63 1 (ACC)0:63 - temp0:63 ACC0:63 1 (RT)0:63 The low word unsigned-integer elements in RA and RB are multiplied. The intermediate product is added to the The low word unsigned-integer elements in RA and RB contents of the 64-bit accumulator, and the resulting are multiplied. The intermediate product is subtracted value is placed into the accumulator and in RT. from the contents of the 64-bit accumulator, and the resulting value is placed into the accumulator and in Special Registers Altered: RT. ACC Special Registers Altered: ACC Vector NAND EVX-form Vector Negate EVX-form evnand RT,RA,RB evneg RT,RA 4 RT RA RB 542 4 RT RA /// 521 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 ¬((RA)0:31 & (RB)0:31) RT0:31 1 NEG((RA)0:31) RT32:63 1 ¬((RA)32:63 & (RB)32:63) RT32:63 1 NEG((RA)32:63) Each element of RA and RB is bitwise NANDed. The The negative of each element of RA is placed in RT. result is placed in the corresponding element of RT. The negative of 0x8000_0000 (most negative number) returns 0x8000_0000. Special Registers Altered: None Special Registers Altered: None Chapter 7. Signal Processing Engine (SPE) 305 Version 2.05 Vector NOR EVX-form Vector OR EVX-form evnor RT,RA,RB evor RT,RA,RB 4 RT RA RB 536 4 RT RA RB 535 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 ¬((RA)0:31 | (RB)0:31) RT0:31 1 (RA)0:31 | (RB)0:31 RT32:63 1 ¬((RA)32:63 | (RB)32:63) RT32:63 1 (RA)32:63 | (RB)32:63 Each element of RA and RB is bitwise NORed. The Each element of RA and RB is bitwise ORed. The result is placed in the corresponding element of RT. result is placed in the corresponding element of RT. Special Registers Altered: Special Registers Altered: None None Extended Mnemonics: Extended Mnemonics: Extended mnemonics are provided for the Vector NOR Extended mnemonics are provided for the Vector OR instruction to produce a vector bitwise complement instruction to provide a 64-bit vector move instruction. operation. Extended: Equivalent to: Extended: Equivalent to: evmr RT,RA evor RT,RA,RA evnot RT,RA evnor RT,RA,RA Vector OR with Complement EVX-form Vector Rotate Left Word EVX-form evorc RT,RA,RB evrlw RT,RA,RB 4 RT RA RB 539 4 RT RA RB 552 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 (RA)0:31 | (¬(RB)0:31) nh 1 (RB)27:31 RT32:63 1 (RA)32:63 | (¬(RB)32:63) nl 1 (RB)59:63 RT0:31 1 ROTL((RA)0:31, nh) Each element of RA is bitwise ORed with the comple- RT32:63 1 ROTL((RA)32:63, nl) ment of RB. The result is placed in the corresponding element of RT. Each of the high and low elements of RA is rotated left by an amount specified in RB. The result is placed in Special Registers Altered: RT. Rotate values for each element of RA are found in None bit positions RB27:31 and RB59:63. Special Registers Altered: None 306 Power ISATM I Version 2.05 Vector Rotate Left Word Immediate Vector Round Word EVX-form EVX-form evrndw RT,RA evrlwi RT,RA,UI 4 RT RA /// 524 4 RT RA UI 554 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 ((RA)0:31+0x00008000) & 0xFFFF0000 n 1 UI RT32:63 1 ((RA)32:63+0x00008000) & 0xFFFF0000 RT0:31 1 ROTL((RA)0:31, n) The 32-bit elements of RA are rounded into 16 bits. RT32:63 1 ROTL((RA)32:63, n) The result is placed in RT. The resulting 16 bits are Both the high and low elements of RA are rotated left placed in the most significant 16 bits of each element of by an amount specified by UI. RT, zeroing out the low-order 16 bits of each element. Special Registers Altered: Special Registers Altered: None None Vector Select EVS-form evsel RT,RA,RB,BFA 4 RT RA RB 79 BFA 0 6 11 16 21 29 31 ch 1 CRBFA×4 cl 1 CRBFA×4+1 if (ch = 1) then RT0:31 1 (RA)0:31 else RT0:31 1 (RB)0:31 if (cl = 1) then RT32:63 1 (RA)32:63 else RT32:63 1 (RB)32:63 If the most significant bit in the BFA field of CR is set to 1, the high-order element of RA is placed in the high-order element of RT; otherwise, the high-order element of RB is placed into the high-order element of RT. If the next most significant bit in the BFA field of CR is set to 1, the low-order element of RA is placed in the low-order element of RT, otherwise, the low-order ele- ment of RB is placed into the low-order element of RT. Special Registers Altered: None Chapter 7. Signal Processing Engine (SPE) 307 Version 2.05 Vector Shift Left Word EVX-form Vector Shift Left Word Immediate EVX-form evslw RT,RA,RB evslwi RT,RA,UI 4 RT RA RB 548 0 6 11 16 21 31 4 RT RA UI 550 0 6 11 16 21 31 nh 1 (RB)26:31 nl 1 (RB)58:63 n 1 UI RT0:31 1 SL((RA)0:31, nh) RT0:31 1 SL((RA)0:31, n) RT32:63 1 SL((RA)32:63, nl) RT32:63 1 SL((RA)32:63, n) Each of the high and low elements of RA is shifted left Both high and low elements of RA are shifted left by the by an amount specified in RB. The result is placed in 5-bit UI value and the results are placed in RT. RT. The separate shift amounts for each element are specified by 6 bits in RB that lie in bit positions 26:31 Special Registers Altered: and 58:63. None Shift amounts from 32 to 63 give a zero result. Special Registers Altered: None Vector Splat Fractional Immediate Vector Splat Immediate EVX-form EVX-form evsplati RT,SI evsplatfi RT,SI 4 RT SI /// 553 4 RT SI /// 555 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 EXTS(SI) RT0:31 1 SI || 270 RT32:63 1 EXTS(SI) RT32:63 1 SI || 270 The value specified by SI is sign extended and placed The value specified by SI is padded with trailing zeros in both elements of RT. and placed in both elements of RT. The SI ends up in Special Registers Altered: bit positions RT0:4 and RT32:36. None Special Registers Altered: None Vector Shift Right Word Immediate Vector Shift Right Word Immediate Signed Unsigned EVX-form EVX-form evsrwiu RT,RA,UI evsrwis RT,RA,UI 4 RT RA UI 546 4 RT RA UI 547 0 6 11 16 21 31 0 6 11 16 21 31 n 1 UI n 1 UI RT0:31 1 EXTZ((RA)0:31-n) RT0:31 1 EXTS((RA)0:31-n) RT32:63 1 EXTZ((RA)32:63-n) RT32:63 1 EXTS((RA)32:63-n) Both high and low elements of RA are shifted right by Both high and low elements of RA are shifted right by the 5-bit UI value; zeros are shifted into the most signif- the 5-bit UI value. Bits in the most significant positions icant position. vacated by the shift are filled with a copy of the sign bit. Special Registers Altered: Special Registers Altered: None None 308 Power ISATM I Version 2.05 Vector Shift Right Word Signed EVX-form Vector Shift Right Word Unsigned EVX-form evsrws RT,RA,RB evsrwu RT,RA,RB 4 RT RA RB 545 0 6 11 16 21 31 4 RT RA RB 544 0 6 11 16 21 31 nh 1 (RB)26:31 nl 1 (RB)58:63 nh 1 (RB)26:31 RT0:31 1 EXTS((RA)0:31-nh) nl 1 (RB)58:63 RT32:63 1 EXTS((RA)32:63-nl) RT0:31 1 EXTZ((RA)0:31-nh) RT32:63 1 EXTZ((RA)32:63-nl) Both the high and low elements of RA are shifted right by an amount specified in RB. The result is placed in Both the high and low elements of RA are shifted right RT. The separate shift amounts for each element are by an amount specified in RB. The result is placed in specified by 6 bits in RB that lie in bit positions 26:31 RT. The separate shift amounts for each element are and 58:63. The sign bits are shifted into the most signif- specified by 6 bits in RB that lie in bit positions 26:31 icant position. and 58:63. Zeros are shifted into the most significant position. Shift amounts from 32 to 63 give a result of 32 sign bits. Shift amounts from 32 to 63 give a zero result. Special Registers Altered: None Special Registers Altered: None Vector Store Double of Double EVX-form Vector Store Double of Double Indexed EVX-form evstdd RS,D(RA) evstddx RS,RA,RB 4 RS RA UI 801 0 6 11 16 21 31 4 RS RA RB 800 0 6 11 16 21 31 if (RA = 0) then b 1 0 else b 1 (RA) if (RA = 0) then b 1 0 EA 1 b + EXTZ(UI×8) else b 1 (RA) MEM(EA,8) 1 (RS)0:63 EA 1 b + (RB) MEM(EA,8) 1 (RS)0:63 D in the instruction mnemonic is UI × 8. The contents of RS are stored as a doubleword in storage addressed The contents of RS are stored as a doubleword in stor- by EA. age addressed by EA. Special Registers Altered: Special Registers Altered: None None Chapter 7. Signal Processing Engine (SPE) 309 Version 2.05 Vector Store Double of Four Halfwords Vector Store Double of Four Halfwords EVX-form Indexed EVX-form evstdh RS,D(RA) evstdhx RS,RA,RB 4 RS RA UI 805 4 RS RA RB 804 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 1 0 if (RA = 0) then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTZ(UI×8) EA 1 b + (RB) MEM(EA,2) 1 (RS)0:15 MEM(EA,2) 1 (RS)0:15 MEM(EA+2,2) 1 (RS)16:31 MEM(EA+2,2) 1 (RS)16:31 MEM(EA+4,2) 1 (RS)32:47 MEM(EA+4,2) 1 (RS)32:47 MEM(EA+6,2) 1 (RS)48:63 MEM(EA+6,2) 1 (RS)48:63 D in the instruction mnemonic is UI × 8. The contents of The contents of RS are stored as four halfwords in stor- RS are stored as four halfwords in storage addressed age addressed by EA. by EA. Special Registers Altered: Special Registers Altered: None None Vector Store Double of Two Words Vector Store Double of Two Words EVX-form Indexed EVX-form evstdw RS,D(RA) evstdwx RS,RA,RB 4 RS RA UI 803 4 RS RA RB 802 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 1 0 if (RA = 0) then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTZ(UI×8) EA 1 b + (RB) MEM(EA,4) 1 (RS)0:31 MEM(EA,4) 1 (RS)0:31 MEM(EA+4,4) 1 (RS)32:63 MEM(EA+4,4) 1 (RS)32:63 D in the instruction mnemonic is UI × 8. The contents of The contents of RS are stored as two words in storage RS are stored as two words in storage addressed by addressed by EA. EA. Special Registers Altered: Special Registers Altered: None None 310 Power ISATM I Version 2.05 Vector Store Word of Two Halfwords from Vector Store Word of Two Halfwords from Even EVX-form Even Indexed EVX-form evstwhe RS,D(RA) evstwhex RS,RA,RB 4 RS RA UI 817 4 RS RA RB 816 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 1 0 if (RA = 0) then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTZ(UI×4) EA 1 b + (RB) MEM(EA,2) 1 (RS)0:15 MEM(EA,2) 1 (RS)0:15 MEM(EA+2,2) 1 (RS)32:47 MEM(EA+2,2) 1 (RS)32:47 D in the instruction mnemonic is UI × 4. The even half- The even halfwords from each element of RS are words from each element of RS are stored as two half- stored as two halfwords in storage addressed by EA. words in storage addressed by EA. Special Registers Altered: Special Registers Altered: None None Vector Store Word of Two Halfwords from Vector Store Word of Two Halfwords from Odd EVX-form Odd Indexed EVX-form evstwho RS,D(RA) evstwhox RS,RA,RB 4 RS RA UI 821 4 RS RA RB 820 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 1 0 if (RA = 0) then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTZ(UI×4) EA 1 b + (RB) MEM(EA,2) 1 (RS)16:31 MEM(EA,2) 1 (RS)16:31 MEM(EA+2,2) 1 (RS)48:63 MEM(EA+2,2) 1 (RS)48:63 D in the instruction mnemonic is UI × 4. The odd half- The odd halfwords from each element of RS are stored words from each element of RS are stored as two half- as two halfwords in storage addressed by EA. words in storage addressed by EA. Special Registers Altered: Special Registers Altered: None None Vector Store Word of Word from Even Vector Store Word of Word from Even EVX-form Indexed EVX-form evstwwe RS,D(RA) evstwwex RS,RA,RB 4 RS RA UI 825 4 RS RA RB 824 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 1 0 if (RA = 0) then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTZ(UI×4) EA 1 b + (RB) MEM(EA,4) 1 (RS)0:31 MEM(EA,4) 1 (RS)0:31 D in the instruction mnemonic is UI × 4. The even word The even word of RS is stored in storage addressed by of RS is stored in storage addressed by EA. EA. Special Registers Altered: Special Registers Altered: None None Chapter 7. Signal Processing Engine (SPE) 311 Version 2.05 Vector Store Word of Word from Odd Vector Store Word of Word from Odd EVX-form Indexed EVX-form evstwwo RS,D(RA) evstwwox RS,RA,RB 4 RS RA UI 829 4 RS RA RB 828 0 6 11 16 21 31 0 6 11 16 21 31 if (RA = 0) then b 1 0 if (RA = 0) then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTZ(UI×4) EA 1 b + (RB) MEM(EA,4) 1 (RS)32:63 MEM(EA,4) 1 (RS)32:63 D in the instruction mnemonic is UI × 4. The odd word The odd word of RS is stored in storage addressed by of RS is stored in storage addressed by EA. EA. Special Registers Altered: Special Registers Altered: None None Vector Subtract Signed, Modulo, Integer Vector Subtract Signed, Saturate, Integer to Accumulator Word EVX-form to Accumulator Word EVX-form evsubfsmiaaw RT,RA evsubfssiaaw RT,RA 4 RT RA /// 1227 4 RT RA /// 1219 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 (ACC)0:31 - (RA)0:31 temp0:63 1 EXTS((ACC)0:31) - EXTS((RA)0:31) RT32:63 1 (ACC)32:63 - (RA)32:63 ovh 1 temp31 temp32 ACC0:63 1 (RT)0:63 RT0:31 1 SATURATE(ovh, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) Each word element in RA is subtracted from the corre- temp0:63 1 EXTS((ACC)32:63) - EXTS((RA)32:63) sponding element in the accumulator and the differ- ovl 1 temp31 temp32 ence is placed into the corresponding RT word and into RT32:63 1 SATURATE(ovl, temp31, 0x8000_0000, the accumulator. 0x7FFF_FFFF, temp32:63) ACC0:63 1 (RT)0:63 Special Registers Altered: SPEFSCROVH 1 ovh ACC SPEFSCROV 1 ovl SPEFSCRSOVH 1 SPEFSCRSOVH | ovh SPEFSCRSOV 1 SPEFSCRSOV | ovl Each signed-integer word element in RA is sign-extended and subtracted from the corresponding sign-extended element in the accumulator saturating if overflow occurs, and the results are placed in RT and the accumulator. Special Registers Altered: ACC OV OVH SOV SOVH 312 Power ISATM I Version 2.05 Vector Subtract Unsigned, Modulo, Vector Subtract from Word EVX-form Integer to Accumulator Word EVX-form evsubfw RT,RA,RB evsubfumiaaw RT,RA 4 RT RA RB 516 4 RT RA /// 1226 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 (RB)0:31 - (RA)0:31 RT0:31 1 (ACC)0:31 - (RA)0:31 RT32:63 1 (RB)32:63 - (RA)32:63 RT32:63 1 (ACC)32:63 - (RA)32:63 Each signed-integer element of RA is subtracted from ACC0:63 1 (RT)0:63 the corresponding element of RB and the results are Each unsigned-integer word element in RA is sub- placed in RT. tracted from the corresponding element in the accumu- Special Registers Altered: lator and the results are placed in RT and into the None accumulator. Special Registers Altered: ACC Vector Subtract Unsigned, Saturate, Vector Subtract Immediate from Word Integer to Accumulator Word EVX-form EVX-form evsubfusiaaw RT,RA evsubifw RT,UI,RB 4 RT RA /// 1218 4 RT UI RB 518 0 6 11 16 21 31 0 6 11 16 21 31 temp0:63 1 EXTZ((ACC)0:31) - EXTZ((RA)0:31) RT0:31 1 (RB)0:31 - EXTZ(UI) ovh 1 temp31 RT32:63 1 (RB)32:63 - EXTZ(UI) RT0:31 1 SATURATE(ovh, temp31, 0x0000_0000, 0x0000_0000, temp32:63) UI is zero-extended and subtracted from both the high temp0:63 1 EXTS((ACC)32:63) - EXTS((RA)32:63) and low elements of RB. Note that the same value is ovl 1 temp31 subtracted from both elements of the register. RT32:63 1 SATURATE(ovl, temp31, 0x0000_0000, 0x0000_0000, temp32:63) Special Registers Altered: ACC0:63 1 (RT)0:63 None SPEFSCROVH 1 ovh SPEFSCROV 1 ovl SPEFSCRSOVH 1 SPEFSCRSOVH | ovh SPEFSCRSOV 1 SPEFSCRSOV | ovl Vector XOR EVX-form Each unsigned-integer word element in RA is zero-extended and subtracted from the corresponding evxor RT,RA,RB zero-extended element in the accumulator saturating if overflow occurs, and the results are placed in RT and 4 RT RA RB 534 0 6 11 16 21 31 the accumulator. Special Registers Altered: RT0:31 1 (RA)0:31 (RB)0:31 ACC OV OVH SOV SOVH RT32:63 1 (RA)32:63 (RB)32:63 Each element of RA and RB is exclusive-ORed. The results are placed in RT. Special Registers Altered: None Chapter 7. Signal Processing Engine (SPE) 313 Version 2.05 314 Power ISATM I Version 2.05 Chapter 8. Embedded Floating-Point [Category: SPE.Embedded Float Scalar Double] [Category: SPE.Embedded Float Scalar Single] [Category: SPE.Embedded Float Vector] 8.1 Overview. . . . . . . . . . . . . . . . . . . . 315 8.2.4.1 Sticky Bit Handling For Exception 8.2 Programming Model . . . . . . . . . . . 316 Conditions . . . . . . . . . . . . . . . . . . . . . . 318 8.2.1 Signal Processing Embedded Float- 8.3 Embedded Floating-Point Instructions ing-Point Status and Control Register 319 (SPEFSCR). . . . . . . . . . . . . . . . . . . . . 316 8.3.1 Load/Store Instructions . . . . . . . 319 8.2.2 Floating-Point Data Formats . . . 316 8.3.2 SPE.Embedded Float Vector Instruc- 8.2.3 Exception Conditions . . . . . . . . . 317 tions [Category: SPE.Embedded Float 8.2.3.1 Denormalized Values on Input 317 Vector] . . . . . . . . . . . . . . . . . . . . . . . . . 319 8.2.3.2 Embedded Floating-Point Overflow 8.3.3 SPE.Embedded Float Scalar Single and Underflow . . . . . . . . . . . . . . . . . . . 317 Instructions 8.2.3.3 Embedded Floating-Point Invalid [Category: SPE.Embedded Float Scalar Operation/Input Errors . . . . . . . . . . . . 317 Single] . . . . . . . . . . . . . . . . . . . . . . . . . 328 8.2.3.4 Embedded Floating-Point Round 8.3.4 SPE.Embedded Float Scalar Double (Inexact) . . . . . . . . . . . . . . . . . . . . . . . 317 Instructions 8.2.3.5 Embedded Floating-Point Divide [Category: SPE.Embedded Float Scalar by Zero . . . . . . . . . . . . . . . . . . . . . . . . 317 Double] . . . . . . . . . . . . . . . . . . . . . . . . 335 8.2.3.6 Default Results . . . . . . . . . . . . 318 8.4 Embedded Floating-Point Results 8.2.4 IEEE 754 Compliance . . . . . . . . 318 Summary . . . . . . . . . . . . . . . . . . . . . . . 344 8.1 Overview Single-precision floating-point is handled by the SPE.Embedded Float Vector and SPE.Embedded The Embedded Floating-Point categories require the Float Scalar Single categories; double-precision float- implementation of the Signal Processing Engine (SPE) ing-point is handled by the SPE.Embedded Float Sca- category and consist of three distinct categories: lar Double category. 1 Embedded vector single-precision floating-point (SPE.Embedded Float Vector [SP.FV]) 1 Embedded scalar single-precision floating-point (SPE.Embedded Float Scalar Single [SP.FS]) 1 Embedded scalar double-precision floating-point (SPE.Embedded Float Scalar Double [SP.FD]) Although each of these may be implemented indepen- dently, they are defined in a single chapter because it is likely that they may be implemented together. References to Embedded Floating-Point categories, Embedded Floating-Point instructions, or Embedded Floating-Point operations apply to all 3 categories. Chapter 8. Embedded Floating-Point 315 Version 2.05 8.2 Programming Model ing-point data elements are 64 bits wide with 1 sign bit (s), 11 bits of biased exponent (e) and 52 bits of fraction Embedded floating-point operations are performed in (f). the GPRs of the processor. In the IEEE 754 specification, floating-point values are The SPE.Embedded Float Vector and SPE.Embedded represented in a format consisting of three explicit Float Scalar Double categories require a GPR register fields (sign field, biased exponent field, and fraction file with thirty-two 64-bit registers as required by the field) and an implicit hidden bit. Signal Processing Engine category. hidden bit The SPE.Embedded Float Scalar Single category 0 1 8 9 31 (or 32:63) s exp fraction Single-precision requires a GPR register file with thirty-two 32-bit regis- ters. When implemented with a 64-bit register file on a 0 1 11 12 63 32-bit implementation, instructions in this category only s exp fraction Double-precision use and modify bits 32:63 of the GPR. In this case, bits s - sign bit; 0 = positive; 1 = negative 0:31 of the GPR are left unchanged by the operation. exp - biased exponent field For 64-bit implementations, bits 0:31 are unchanged fraction - fractional portion of number after the operation. Figure 108.Floating-Point Data Format Instructions in the SPE.Embedded Float Scalar Double category operate on the entire 64 bits of the GPRs. For single-precision normalized numbers, the biased exponent value e lies in the range of 1 to 254 corre- Instructions in the SPE.Embedded Float Vector cate- sponding to an actual exponent value E in the range gory operate on the entire 64 bits of the GPRs as well, -126 to +127. For double-precision normalized num- but contain two 32-bit data items that are operated on bers, the biased exponent value e lies in the range of 1 independently of each other in a SIMD fashion. The for- to 2046 corresponding to an actual exponent value E in mat of both data items is the same as the format of a the range -1022 to +1023. With the hidden bit implied to data item in the SPE.Embedded Float Scalar Single be `1' (for normalized numbers), the value of the num- category. The data item contained in bits 0:31 is called ber is interpreted as follows: the `high word'. The data item contained in bits 32:63 is called the `low word'. s ( ­ 1 ) × 2 E × ( 1.fraction ) There are no record forms of Embedded Floating-Point instructions. Embedded Floating-Point Compare where E is the unbiased exponent and 1.fraction is the instructions treat NaNs, Infinity, and Denorm as nor- mantissa (or significand) consisting of a leading `1' (the malized numbers for the comparison calculation when hidden bit) and a fractional part (fraction field). For the default results are provided. single-precision format, the maximum positive normal- ized number (pmax) is represented by the encoding 8.2.1 Signal Processing Embed- 0x7F7FFFFF which is approximately 3.4E+38 (2128), and the minimum positive normalized value (pmin) is ded Floating-Point Status and Con- represented by the encoding 0x00800000 which is trol Register (SPEFSCR) approximately 1.2E-38 (2-126). For the double-precision format, the maximum positive normalized number Status and control for the Embedded Floating-Point (pmax) is represented by the encoding categories uses the SPEFSCR. This register is defined 0x7feFFFFF_FFFFFFFF which is approximately by the Signal Processing Engine category in Section 1.8E+307 (21024), and the minimum positive normal- 7.3.4. Status and control bits are shared for Embedded ized value (pmin) is represented by the encoding Floating-Point and SPE operations. Instructions in the 0x00100000_00000000 which is approximately SPE.Embedded Float Vector category affect both the 2.2E-308 (2-1022). high element (bits 34:39) and low element floating-point Two specific values of the biased exponent are status flags (bits 50:55). Instructions in the reserved (0 and 255 for single-precision; 0 and 2047 for SPE.Embedded Float Scalar Double and SPE.Embed- double-precision) for encoding special values of +0, -0, ded Float Scalar Single categories affect only the low +infinity, -infinity, and NaNs. element floating-point status flags and leave the high element floating-point status flags undefined. Zeros of both positive and negative sign are repre- sented by a biased exponent value e of 0 and a fraction f which is 0. 8.2.2 Floating-Point Data Formats Infinities of both positive and negative sign are repre- Single-precision floating-point data elements are 32 sented by a maximum exponent field value (255 for sin- bits wide with 1 sign bit (s), 8 bits of biased exponent gle-precision, 2047 for double-precision) and a fraction (e) and 23 bits of fraction (f). Double-precision float- which is 0. 316 Power ISATM I Version 2.05 Denormalized numbers of both positive and negative Programming Note sign are represented by a biased exponent value e of 0 and a fraction f, which is nonzero. For these numbers, On some implementations, operations that result in the hidden bit is defined by the IEEE 754 standard to overflow or underflow are likely to take significantly be 0. This number type is not directly supported in longer than operations that do not. For example, hardware. Instead, either a software interrupt handler is these operations may cause a system error handler invoked, or a default value is defined. to be invoked; on such implementations, the sys- tem error handler updates the overflow bits appro- Not-a-Numbers (NaNs) are represented by a maximum priately. exponent field value (255 for single-precision, 2047 for double-precision) and a fraction f which is nonzero. 8.2.3 Exception Conditions 8.2.3.3 Embedded Floating-Point Invalid Operation/Input Errors 8.2.3.1 Denormalized Values on Input Embedded Floating-Point Invalid Operation/Input Any denormalized value used as an operand may be errors occur when an operand to an operation contains truncated by the implementation to a properly signed an invalid input value. If any of the input values are zero value. Infinity, Denorm, or NaN, or for an Embedded Float- ing-Point Divide instruction both operands are +/-0, SPEFSCRFINV FINVH are set to 1 appropriately, and 8.2.3.2 Embedded Floating-Point Over- SPEFSCRFGH FXH FG FX are set to 0 appropriately. If flow and Underflow SPEFSCRFINVE=1, an Embedded Floating-Point Data interrupt is taken and the destination register is not Defining pmax to be the most positive normalized value updated. (farthest from zero), pmin the smallest positive normal- ized value (closest to zero), nmax the most negative normalized value (farthest from zero) and nmin the 8.2.3.4 Embedded Floating-Point smallest normalized negative value (closest to zero), Round (Inexact) an overflow is said to have occurred if the numerically correct result (r) of an instruction is such that r>pmax or If any result element of an Embedded Floating-Point r bh) then ch 1 1 if (ah < bh) then ch 1 1 else ch 1 0 else ch 1 0 if (al > bl) then cl 1 1 if (al < bl) then cl 1 1 else cl 1 0 else cl 1 0 CR4×BF:4×BF+3 1 ch || cl || (ch | cl) || (ch & cl) CR4×BF:4×BF+3 1 ch || cl || (ch | cl) || (ch & cl) Each element of register RA is compared against the Each element of register RA is compared against the corresponding element of register RB. The results of corresponding element of register RB. The results of the comparisons are placed into CR field BF. If RA0:31 the comparisons are placed into CR field BF. If RA0:31 is greater than RB0:31, bit 0 of CR field BF is set to 1, is less than RB0:31, bit 0 of CR field BF is set to 1, oth- otherwise it is set to 0. If RA32:63 is greater than erwise it is set to 0. If RA32:63 is less than RB32:63, bit 1 RB32:63, bit 1 of CR field BF is set to 1, otherwise it is of CR field BF is set to 1, otherwise it is set to 0. Bit 2 of set to 0. Bit 2 of CR field BF is set to the OR of both CR field BF is set to the OR of both result bits and Bit 3 result bits and Bit 3 of CR field BF is set to the AND of of CR field BF is set to the AND of both result bits. both result bits. Comparison ignores the sign of 0 Comparison ignores the sign of 0 (+0 = -0). (+0 = -0). If an input error occurs and default results are gener- If an input error occurs and default results are gener- ated, NaNs, Infinities, and Denorms as treated as nor- ated, NaNs, Infinities, and Denorms as treated as nor- malized numbers, using their values of `e' and `f' malized numbers, using their values of `e' and `f' directly. directly. Special Registers Altered: Special Registers Altered: FINV FINVH FINVS FINV FINVH FINVS FGH FXH FG FX FGH FXH FG FX CR field BF CR field BF Chapter 8. Embedded Floating-Point 321 Version 2.05 Vector Floating-Point Single-Precision Vector Floating-Point Single-Precision Compare Equal EVX-form Test Greater Than EVX-form evfscmpeq BF,RA,RB evfststgt BF,RA,RB 4 BF // RA RB 654 4 BF // RA RB 668 0 6 9 11 16 21 31 0 6 9 11 16 21 31 ah 1 (RA)0:31 ah 1 (RA)0:31 al 1 (RA)32:63 al 1 (RA)32:63 bh 1 (RB)0:31 bh 1 (RB)0:31 bl 1 (RB)32:63 bl 1 (RB)32:63 if (ah = bh) then ch 1 1 if (ah > bh) then ch 1 1 else ch 1 0 else ch 1 0 if (al = bl) then cl 1 1 if (al > bl) then cl 1 1 else cl 1 0 else cl 1 0 CR4×BF:4×BF+3 1 ch || cl || (ch | cl) || (ch & cl) CR4×BF:4×BF+3 1 ch || cl || (ch | cl) || (ch & cl) Each element of register RA is compared against the Each element of register RA is compared against the corresponding element of register RB. The results of corresponding element of register RB.The results of the the comparisons are placed into CR field BF. If RA0:31 comparisons are placed into CR field BF. If RA0:31 is is equal to RB0:31, bit 0 of CR field BF is set to 1, other- greater than RB0:31, bit 0 of CR field BF is set to 1, oth- wise it is set to 0. If RA32:63 is equal to RB32:63, bit 1 of erwise it is set to 0. If RA32:63 is greater than RB32:63, CR field BF is set to 1, otherwise it is set to 0. Bit 2 of bit 1 of CR field BF is set to 1, otherwise it is set to 0. CR field BF is set to the OR of both result bits and Bit 3 Bit 2 of CR field BF is set to the OR of both result bits of CR field BF is set to the AND of both result bits. and Bit 3 of CR field BF is set to the AND of both result Comparison ignores the sign of 0 (+0 = -0). bits. Comparison ignores the sign of 0 (+0 = -0). The comparison proceeds after treating NaNs, Infinities, If an input error occurs and default results are gener- and Denorms as normalized numbers, using their val- ated, NaNs, Infinities, and Denorms as treated as nor- ues of `e' and `f' directly. malized numbers, using their values of `e' and `f' directly. No exceptions are taken during the execution of evfst- stgt. Special Registers Altered: FINV FINVH FINVS Special Registers Altered: FGH FXH FG FX CR field BF CR field BF Programming Note In an implementation, the execution of evfststgt is likely to be faster than the execution of evfscmpgt; however, if strict IEEE 754 compliance is required, the program should use evfscmpgt. 322 Power ISATM I Version 2.05 Vector Floating-Point Single-Precision Vector Floating-Point Single-Precision Test Less Than EVX-form Test Equal EVX-form evfststlt BF,RA,RB evfststeq BF,RA,RB 4 BF // RA RB 669 4 BF // RA RB 670 0 6 9 11 16 21 31 0 6 9 11 16 21 31 ah 1 (RA)0:31 ah 1 (RA)0:31 al 1 (RA)32:63 al 1 (RA)32:63 bh 1 (RB)0:31 bh 1 (RB)0:31 bl 1 (RB)32:63 bl 1 (RB)32:63 if (ah < bh) then ch 1 1 if (ah = bh) then ch 1 1 else ch 1 0 else ch 1 0 if (al < bl) then cl 1 1 if (al = bl) then cl 1 1 else cl 1 0 else cl 1 0 CR4×BF:4×BF+3 1 ch || cl || (ch | cl) || (ch & cl) CR4×BF:4×BF+3 1 ch || cl || (ch | cl) || (ch & cl) Each element of register RA is compared with the cor- Each element of register RA is compared against the responding element of register RB. The results of the corresponding element of register RB. The results of comparisons are placed into CR field BF. If RA0:31 is the comparisons are placed into CR field BF. If RA0:31 less than RB0:31, bit 0 of CR field BF is set to 1, other- is equal to RB0:31, bit 0 of CR field BF is set to 1, other- wise it is set to 0. If RA32:63 is less than RB32:63, bit 1 of wise it is set to 0. If RA32:63 is equal to RB32:63, bit 1 of CR field BF is set to 1, otherwise it is set to 0. Bit 2 of CR field BF is set to 1, otherwise it is set to 0. Bit 2 of CR field BF is set to the OR of both result bits and Bit 3 CR field BF is set to the OR of both result bits and Bit 3 of CR field BF is set to the AND of both result bits. of CR field BF is set to the AND of both result bits. Comparison ignores the sign of 0 (+0 = -0). The com- Comparison ignores the sign of 0 (+0 = -0). The com- parison proceeds after treating NaNs, Infinities, and parison proceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of Denorms as normalized numbers, using their values of `e' and `f' directly. `e' and `f' directly. No exceptions are taken during the execution of evfst- No exceptions are taken during the execution of evfst- stlt. steq. Special Registers Altered: Special Registers Altered: CR field BF CR field BF Programming Note Programming Note In an implementation, the execution of evfststlt is In an implementation, the execution of evfststeq is likely to be faster than the execution of evfscmplt; likely to be faster than the execution of evfsc- however, if strict IEEE 754 compliance is required, mpeq; however, if strict IEEE 754 compliance is the program should use evfscmplt. required, the program should use evfscmpeq. Chapter 8. Embedded Floating-Point 323 Version 2.05 Vector Convert Floating-Point Vector Convert Floating-Point Single-Precision from Signed Integer Single-Precision from Unsigned Integer EVX-form EVX-form evfscfsi RT,RB evfscfui RT,RB 4 RT /// RB 657 4 RT /// RB 656 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 CnvtI32ToFP32((RB)0:31, S, HI, I) RT0:31 1 CnvtI32ToFP32((RB)0:31, U, HI, I) RT32:63 1 CnvtI32ToFP32((RB)32:63, S, LO, I) RT32:63 1 CnvtI32ToFP32((RB)32:63, U, LO, I) Each signed integer element of register RB is con- Each unsigned integer element of register RB is con- verted to the nearest single-precision floating-point verted to the nearest single-precision floating-point value using the current rounding mode and the results value using the current rounding mode and the results are placed into the corresponding element of register are placed into the corresponding elements of register RT. RT. Special Registers Altered: Special Registers Altered: FGH FXH FG FX FINXS FGH FXH FG FX FINXS Vector Convert Floating-Point Vector Convert Floating-Point Single-Precision from Signed Fraction Single-Precision from Unsigned Fraction EVX-form EVX-form evfscfsf RT,RB evfscfuf RT,RB 4 RT /// RB 659 4 RT /// RB 658 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 CnvtI32ToFP32((RB)0:31, S, HI, F) RT0:31 1 CnvtI32ToFP32((RB)0:31, U, HI, F) RT32:63 1 CnvtI32ToFP32((RB)32:63, S, LO, F) RT32:63 1 CnvtI32ToFP32((RB)32:63, U, LO, F) Each signed fractional element of register RB is con- Each unsigned fractional element of register RB is con- verted to a single-precision floating-point value using verted to a single-precision floating-point value using the current rounding mode and the results are placed the current rounding mode and the results are placed into the corresponding elements of register RT. into the corresponding elements of register RT. Special Registers Altered: Special Registers Altered: FGH FXH FG FX FINXS FGH FXH FG FX FINXS 324 Power ISATM I Version 2.05 Vector Convert Floating-Point Vector Convert Floating-Point Single-Precision to Signed Integer Single-Precision to Signed Integer with EVX-form Round toward Zero EVX-form evfsctsi RT,RB evfsctsiz RT,RB 4 RT /// RB 661 4 RT /// RB 666 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 CnvtFP32ToI32Sat((RB)0:31, S, HI, RND, I) RT0:31 1 CnvtFP32ToI32Sat((RB)0:31, S, HI, ZER, I) RT32:63 1 CnvtFP32ToI32Sat((RB)32:63, S, LO, RND, I) RT32:63 1 CnvtFP32ToI32Sat((RB)32:63, S, LO, ZER, I) Each single-precision floating-point element in register Each single-precision floating-point element in register RB is converted to a signed integer using the current RB is converted to a signed integer using the rounding rounding mode and the result is saturated if it cannot mode Round toward Zero and the result is saturated if be represented in a 32-bit integer. NaNs are converted it cannot be represented in a 32-bit integer. NaNs are as though they were zero. converted as though they were zero. Special Registers Altered: Special Registers Altered: FINV FINVH FINVS FINV FINVH FINVS FGH FXH FG FX FINXS FGH FXH FG FX FINXS Vector Convert Floating-Point Vector Convert Floating-Point Single-Precision to Unsigned Integer Single-Precision to Unsigned Integer with EVX-form Round toward Zero EVX-form evfsctui RT,RB evfsctuiz RT,RB 4 RT /// RB 660 4 RT /// RB 664 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 CnvtFP32ToI32Sat((RB)0:31, U, HI, RND, I) RT0:31 1 CnvtFP32ToI32Sat((RB)0:31, U, HI, ZER, I) RT32:63 1 CnvtFP32ToI32Sat((RB)32:63,U, LO, RND, I) RT32:63 1 CnvtFP32ToI32Sat((RB)32:63, U, LO, ZER, I) Each single-precision floating-point element in register Each single-precision floating-point element in register RB is converted to an unsigned integer using the cur- RB is converted to an unsigned integer using the rent rounding mode and the result is saturated if it can- rounding mode Round toward Zero and the result is not be represented in a 32-bit integer. NaNs are saturated if it cannot be represented in a 32-bit integer. converted as though they were zero. NaNs are converted as though they were zero. Special Registers Altered: Special Registers Altered: FINV FINVH FINVS FINV FINVH FINVS FGH FXH FG FX FINXS FGH FXH FG FX FINXS Chapter 8. Embedded Floating-Point 325 Version 2.05 Vector Convert Floating-Point Vector Convert Floating-Point Single-Precision to Signed Fraction Single-Precision to Unsigned Fraction EVX-form EVX-form evfsctsf RT,RB evfsctuf RT,RB 4 RT /// RB 663 4 RT /// RB 662 0 6 11 16 21 31 0 6 11 16 21 31 RT0:31 1 CnvtFP32ToI32Sat((RB)0:31, S, HI, RND ,F) RT0:31 1 CnvtFP32ToI32Sat((RB)0:31, U, HI, RND, F) RT32:63 1 CnvtFP32ToI32Sat((RB)32:63, S, LO, RND, F) RT32:63 1 CnvtFP32ToI32Sat((RB)32:63, U, LO, RND, F) Each single-precision floating-point element in register Each single-precision floating-point element in register RB is converted to a signed fraction using the current RB is converted to an unsigned fraction using the cur- rounding mode and the result is saturated if it cannot rent rounding mode and the result is saturated if it can- be represented in a 32-bit signed fraction. NaNs are not be represented in a 32-bit fraction. NaNs are converted as though they were zero. converted as though they were zero. Special Registers Altered: Special Registers Altered: FINV FINVH FINVS FINV FINVH FINVS FGH FXH FG FX FINXS FGH FXH FG FX FINXS 326 Power ISATM I Version 2.05 Chapter 8. Embedded Floating-Point 327 Version 2.05 8.3.3 SPE.Embedded Float Scalar Single Instructions [Category: SPE.Embedded Float Scalar Single] Floating-Point Single-Precision Absolute Floating-Point Single-Precision Negative Value EVX-form Absolute Value EVX-form efsabs RT,RA efsnabs RT,RA 4 RT RA /// 708 4 RT RA /// 709 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 1 0b0 || (RA)33:63 RT32:63 1 0b1 || (RA)33:63 The sign bit of the low element of register RA is set to 0 The sign bit of the low element of register RA is set to 1 and the result is placed into the low element of register and the result is placed into the low element of register RT. RT. Regardless of the value of register RA, no exceptions Regardless of the value of register RA, no exceptions are taken during the execution of this instruction. are taken during the execution of this instruction. Special Registers Altered: Special Registers Altered: None None Floating-Point Single-Precision Negate EVX-form efsneg RT,RA 4 RT RA /// 710 0 6 11 16 21 31 RT32:63 1 ¬(RA)32 || (RA)33:63 The sign bit of the low element of register RA is com- plemented and the result is placed into the low element of register RT. Regardless of the value of register RA, no exceptions are taken during the execution of this instruction. Special Registers Altered: None 328 Power ISATM I Version 2.05 Floating-Point Single-Precision Add Floating-Point Single-Precision Subtract EVX-form EVX-form efsadd RT,RA,RB efssub RT,RA,RB 4 RT RA RB 704 4 RT RA RB 705 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 1 (RA)32:63 +sp (RB)32:63 RT32:63 1 (RA)32:63 -sp (RB)32:63 The low element of register RA is added to the low ele- The low element of register RB is subtracted from the ment of register RB and the result is stored in the low low element of register RA and the result is stored in element of register RT. the low element of register RT. If an underflow occurs, +0 (for rounding modes RN, RZ, If an underflow occurs, +0 (for rounding modes RN, RZ, RP) or -0 (for rounding mode RM) is stored in register RP) or -0 (for rounding mode RM) is stored in register RT. RT. Special Registers Altered: Special Registers Altered: FINV FINVS FINV FINVS FOVF FOVFS FOVF FOVFS FUNF FUNFS FUNF FUNFS FG FX FINXS FG FX FINXS Floating-Point Single-Precision Multiply Floating-Point Single-Precision Divide EVX-form EVX-form efsmul RT,RA,RB efsdiv RT,RA,RB 4 RT RA RB 712 4 RT RA RB 713 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 1 (RA)32:63 ×sp (RB)32:63 RT32:63 1 (RA)32:63 ÷sp (RB)32:63 The low element of register RA is multiplied by the low The low element of register RA is divided by the low element of register RB and the result is stored in the element of register RB and the result is stored in the low element of register RT. low element of register RT. Special Registers Altered: Special Registers Altered: FINV FINVS FINV FINVS FOVF FOVFS FG FX FINXS FUNF FUNFS FDBZ FDBZS FG FX FINXS FOVF FOVFS FUNF FUNFS Chapter 8. Embedded Floating-Point 329 Version 2.05 Floating-Point Single-Precision Compare Floating-Point Single-Precision Compare Greater Than EVX-form Less Than EVX-form efscmpgt BF,RA,RB efscmplt BF,RA,RB 4 BF // RA RB 716 4 BF // RA RB 717 0 6 9 11 16 21 31 0 6 9 11 16 21 31 al 1 (RA)32:63 al 1 (RA)32:63 bl 1 (RB)32:63 bl 1 (RB)32:63 if (al > bl) then cl 1 1 if (al < bl) then cl 1 1 else cl 1 0 else cl 1 0 CR4×BF:4×BF+3 1 undefined || cl || undefined || undefined CR4×BF:4×BF+3 1 undefined || cl || undefined || undefined The low element of register RA is compared against the The low element of register RA is compared against the low element of register RB. The results of the compari- low element of register RB. If RA32:63 is less than sons are placed into CR field BF. If RA32:63 is greater RB32:63, bit 1 of CR field BF is set to 1, otherwise it is than RB32:63, bit 1 of CR field BF is set to 1, otherwise it set to 0. Bits 0, 2, and 3 of CR field BF are undefined. is set to 0. Bits 0, 2, and 3 of CR field BF are undefined. Comparison ignores the sign of 0 (+0 = -0). Comparison ignores the sign of 0 (+0 = -0). If an Input Error occurs and default results are gener- If an Input Error occurs and default results are gener- ated, NaNs, Infinities, and Denorms are treated as nor- ated, NaNs, Infinities, and Denorms are treated as nor- malized numbers, using their values of `e' and `f' malized numbers, using their values of `e' and `f' directly. directly. Special Registers Altered: Special Registers Altered: FINV FINVS FINV FINVS FG FX FG FX CR field BF CR field BF 330 Power ISATM I Version 2.05 Floating-Point Single-Precision Compare Floating-Point Single-Precision Test Equal EVX-form Greater Than EVX-form efscmpeq BF,RA,RB efststgt BF,RA,RB 4 BF // RA RB 718 4 BF // RA RB 732 0 6 9 11 16 21 31 0 6 9 11 16 21 31 al 1 (RA)32:63 al 1 (RA)32:63 bl 1 (RB)32:63 bl 1 (RB)32:63 if (al = bl) then cl 1 1 if (al > bl) then cl 1 1 else cl 1 0 else cl 1 0 CR4×BF:4×BF+3 1 undefined || cl || undefined || undefined CR4×BF:4×BF+3 1 undefined || cl || undefined || undefined The low element of register RA is compared against the The low element of register RA is compared against the low element of register RB. If RA32:63 is equal to low element of register RB. If RA32:63 is greater than RB32:63, bit 1 of CR field BF is set to 1, otherwise it is RB32:63, bit 1 of CR field BF is set to 1, otherwise it is set to 0. Bits 0, 2, and 3 of CR field BF are undefined. set to 0. Bits 0, 2, and 3 of CR field BF are undefined. Comparison ignores the sign of 0 (+0 = -0). Comparison ignores the sign of 0 (+0 = -0). The com- parison proceeds after treating NaNs, Infinities, and If an Input Error occurs and default results are gener- Denorms as normalized numbers, using their values of ated, NaNs, Infinities, and Denorms are treated as nor- `e' and `f' directly. malized numbers, using their values of `e' and `f' directly. No exceptions are generated during the execution of efststgt. Special Registers Altered: FINV FINVS Special Registers Altered: FG FX CR field BF CR field BF Programming Note In an implementation, the execution of efststgt is likely to be faster than the execution of efscmpgt; however, if strict IEEE 754 compliance is required, the program should use efscmpgt. Chapter 8. Embedded Floating-Point 331 Version 2.05 Floating-Point Single-Precision Test Less Floating-Point Single-Precision Test Than EVX-form Equal EVX-form efststlt BF,RA,RB efststeq BF,RA,RB 4 BF // RA RB 733 4 BF // RA RB 734 0 6 9 11 16 21 31 0 6 9 11 16 21 31 al 1 (RA)32:63 al 1 (RA)32:63 bl 1 (RB)32:63 bl 1 (RB)32:63 if (al < bl) then cl 1 1 if (al = bl) then cl 1 1 else cl 1 0 else cl 1 0 CR4×BF:4×BF+3 1 undefined || cl || undefined || undefined CR4×BF:4×BF+3 1 undefined || cl || undefined || undefined The low element of register RA is compared against the The low element of register RA is compared against the low element of register RB. If RA32:63 is less than low element of register RB. If RA32:63 is equal to RB32:63, bit 1 of CR field BF is set to 1, otherwise it is RB32:63, bit 1 of CR field BF is set to 1, otherwise it is set to 0. Bits 0, 2, and 3 of CR field BF are undefined. set to 0. Bits 0, 2, and 3 of CR field BF are undefined. Comparison ignores the sign of 0 (+0 = -0). The com- Comparison ignores the sign of 0 (+0 = -0). The com- parison proceeds after treating NaNs, Infinities, and parison proceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of Denorms as normalized numbers, using their values of `e' and `f' directly. `e' and `f' directly. No exceptions are generated during the execution of No exceptions are generated during the execution of efststlt. efststeq. Special Registers Altered: Special Registers Altered: CR field BF CR field BF Programming Note Programming Note In an implementation, the execution of efststlt is In an implementation, the execution of efststeq is likely to be faster than the execution of efscmplt; likely to be faster than the execution of efscmpeq; however, if strict IEEE 754 compliance is required, however, if strict IEEE 754 compliance is required, the program should use efscmplt. the program should use efscmpeq. 332 Power ISATM I Version 2.05 Convert Floating-Point Single-Precision Convert Floating-Point Single-Precision from Signed Integer EVX-form from Unsigned Integer EVX-form efscfsi RT,RB efscfui RT,RB 4 RT /// RB 721 4 RT /// RB 720 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 1 CnvtI32ToFP32((RB)32:63, S, LO, I) RT32:63 1 CnvtI32ToFP32((RB)32:63, U, LO, I) The signed integer low element in register RB is con- The unsigned integer low element in register RB is con- verted to a single-precision floating-point value using verted to a single-precision floating-point value using the current rounding mode and the result is placed into the current rounding mode and the result is placed into the low element of register RT. the low element of register RT. Special Registers Altered: Special Registers Altered: FINXS FG FX FINXS FG FX Convert Floating-Point Single-Precision Convert Floating-Point Single-Precision from Signed Fraction EVX-form from Unsigned Fraction EVX-form efscfsf RT,RB efscfuf RT,RB 4 RT /// RB 723 4 RT /// RB 722 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 1 CnvtI32ToFP32((RB)32:63, S, LO, F) RT32:63 1 CnvtI32ToFP32((RB)32:63, U, LO, F) The signed fractional low element in register RB is con- The unsigned fractional low element in register RB is verted to a single-precision floating-point value using converted to a single-precision floating-point value the current rounding mode and the result is placed into using the current rounding mode and the result is the low element of register RT. placed into the low element of register RT. Special Registers Altered: Special Registers Altered: FINXS FG FX FINXS FG FX Convert Floating-Point Single-Precision Convert Floating-Point Single-Precision to Signed Integer EVX-form to Unsigned Integer EVX-form efsctsi RT,RB efsctui RT,RB 4 RT /// RB 725 4 RT /// RB 724 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 1 CnvtFP32ToI32Sat((RB)32:63, S, LO, RND, I) RT32:63 1 CnvtFP32ToI32Sat((RB)32:63, U, LO, RND, I) The single-precision floating-point low element in regis- The single-precision floating-point low element in regis- ter RB is converted to a signed integer using the cur- ter RB is converted to an unsigned integer using the rent rounding mode and the result is saturated if it current rounding mode and the result is saturated if it cannot be represented in a 32-bit integer. NaNs are cannot be represented in a 32-bit integer. NaNs are converted as though they were zero. converted as though they were zero. Special Registers Altered: Special Registers Altered: FINV FINVS FINV FINVS FINXS FG FX FINXS FG FX Chapter 8. Embedded Floating-Point 333 Version 2.05 Convert Floating-Point Single-Precision Convert Floating-Point Single-Precision to Signed Integer with Round toward Zero to Unsigned Integer with Round toward EVX-form Zero EVX-form efsctsiz RT,RB efsctuiz RT,RB 4 RT /// RB 730 4 RT /// RB 728 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 1 CnvtFP32ToI32Sat((RB)32:63, S, LO, ZER, I) RT32:63 1 CnvtFP32ToI32Sat((RB)32:63, U, LO, ZER, I) The single-precision floating-point low element in regis- The single-precision floating-point low element in regis- ter RB is converted to a signed integer using the round- ter RB is converted to an unsigned integer using the ing mode Round toward Zero and the result is rounding mode Round toward Zero and the result is saturated if it cannot be represented in a 32-bit integer. saturated if it cannot be represented in a 32-bit integer. NaNs are converted as though they were zero. NaNs are converted as though they were zero. Special Registers Altered: Special Registers Altered: FINV FINVS FINV FINVS FINXS FG FX FINXS FG FX Convert Floating-Point Single-Precision Convert Floating-Point Single-Precision to Signed Fraction EVX-form to Unsigned Fraction EVX-form efsctsf RT,RB efsctuf RT,RB 4 RT /// RB 727 4 RT /// RB 726 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 1 CnvtFP32ToI32Sat((RB)32:63, S, LO, RND, F) RT32:63 1 CnvtFP32ToI32Sat((RB)32:63, U, LO, RND, F) The single-precision floating-point low element in regis- The single-precision floating-point low element in regis- ter RB is converted to a signed fraction using the cur- ter RB is converted to an unsigned fraction using the rent rounding mode and the result is saturated if it current rounding mode and the result is saturated if it cannot be represented in a 32-bit fraction. NaNs are cannot be represented in a 32-bit unsigned fraction. converted as though they were zero. NaNs are converted as though they were zero. Special Registers Altered: Special Registers Altered: FINV FINVS FINV FINVS FINXS FG FX FINXS FG FX 334 Power ISATM I Version 2.05 8.3.4 SPE.Embedded Float Scalar Double Instructions [Category: SPE.Embedded Float Scalar Double] Floating-Point Double-Precision Absolute Floating-Point Double-Precision Negative Value EVX-form Absolute Value EVX-form efdabs RT,RA efdnabs RT,RA 4 RT RA /// 740 4 RT RA /// 741 0 6 11 16 21 31 0 6 11 16 21 31 RT0:63 1 0b0 || (RA)1:63 RT0:63 1 0b1 || (RA)1:63 The sign bit of register RA is set to 0 and the result is The sign bit of register RA is set to 1 and the result is placed in register RT. placed in register RT. Regardless of the value of register RA, no exceptions Regardless of the value of register RA, no exceptions are taken during the execution of this instruction. are taken during the execution of this instruction. Special Registers Altered: Special Registers Altered: None None Floating-Point Double-Precision Negate EVX-form efdneg RT,RA 4 RT RA /// 742 0 6 11 16 21 31 RT0:63 1 ¬(RA)0 || (RA)1:63 The sign bit of register RA is complemented and the result is placed in register RT. Regardless of the value of register RA, no exceptions are taken during the execution of this instruction. Special Registers Altered: None Chapter 8. Embedded Floating-Point 335 Version 2.05 Floating-Point Double-Precision Add Floating-Point Double-Precision Subtract EVX-form EVX-form efdadd RT,RA,RB efdsub RT,RA,RB 4 RT RA RB 736 4 RT RA RB 737 0 6 11 16 21 31 0 6 11 16 21 31 RT0:63 1 (RA)0:63 +dp (RB)0:63 RT0:63 1 (RA)0:63 -dp (RB)0:63 RA is added to RB and the result is stored in register RB is subtracted from RA and the result is stored in RT. register RT. If an underflow occurs, +0 (for rounding modes RN, RZ, If an underflow occurs, +0 (for rounding modes RN, RZ, RP) or -0 (for rounding mode RM) is stored in register RP) or -0 (for rounding mode RM) is stored in register RT. RT. Special Registers Altered: Special Registers Altered: FINV FINVS FINV FINVS FOVF FOVFS FOVF FOVFS FUNF FUNFS FUNF FUNFS FG FX FINXS FG FX FINXS Floating-Point Double-Precision Multiply Floating-Point Double-Precision Divide EVX-form EVX-form efdmul RT,RA,RB efddiv RT,RA,RB 4 RT RA RB 744 4 RT RA RB 745 0 6 11 16 21 31 0 6 11 16 21 31 RT0:63 1 (RA)0:63 ×dp (RB)0:63 RT0:63 1 (RA)0:63 ÷dp (RB)0:63 RA is multiplied by RB and the result is stored in regis- RA is divided by RB and the result is stored in register ter RT. RT. Special Registers Altered: Special Registers Altered: FINV FINVS FINV FINVS FOVF FOVFS FG FX FINXS FUNF FUNFS FDBZ FDBZS FG FX FINXS FOVF FOVFS FUNF FUNFS 336 Power ISATM I Version 2.05 Floating-Point Double-Precision Compare Floating-Point Double-Precision Compare Greater Than EVX-form Less Than EVX-form efdcmpgt BF,RA,RB efdcmplt BF,RA,RB 4 BF // RA RB 748 4 BF // RA RB 749 0 6 9 11 16 21 31 0 6 9 11 16 21 31 al 1 (RA)0:63 al 1 (RA)0:63 bl 1 (RB)0:63 bl 1 (RB)0:63 if (al > bl) then cl 1 1 if (al < bl) then cl 1 1 else cl 1 0 else cl 1 0 CR4×BF:4×BF+3 1 undefined || cl || undefined || undefined CR4×BF:4×BF+3 1 undefined || cl || undefined || undefined RA is compared against RB. If RA is greater than RB, RA is compared against RB. If RA is less than RB, bit 1 bit 1 of CR field BF is set to 1, otherwise it is set to 0. of CR field BF is set to 1, otherwise it is set to 0. Bits 0, Bits 0, 2, and 3 of CR field BF are undefined. Compari- 2, and 3 of CR field BF are undefined. Comparison son ignores the sign of 0 (+0 = -0). ignores the sign of 0 (+0 = -0). If an input error occurs and default results are gener- If an input error occurs and default results are gener- ated, NaNs, Infinities, and Denorms are treated as nor- ated, NaNs, Infinities, and Denorms are treated as nor- malized numbers, using their values of `e' and `f' malized numbers, using their values of `e' and `f' directly. directly. Special Registers Altered: Special Registers Altered: FINV FINVS FINV FINVS FG FX FG FX CR field BF CR field BF Floating-Point Double-Precision Compare Floating-Point Double-Precision Test Equal EVX-form Greater Than EVX-form efdcmpeq BF,RA,RB efdtstgt BF,RA,RB 4 BF // RA RB 750 4 BF // RA RB 764 0 6 9 11 16 21 31 0 6 9 11 16 21 31 al 1 (RA)0:63 al 1 (RA)0:63 bl 1 (RB)0:63 bl 1 (RB)0:63 if (al = bl) then cl 1 1 if (al > bl) then cl 1 1 else cl 1 0 else cl 1 0 CR4×BF:4×BF+3 1 undefined || cl || undefined || undefined CR4×BF:4×BF+3 1 undefined || cl || undefined || undefined RA is compared against RB. If RA is equal to RB, bit 1 RA is compared against RB. If RA is greater than RB, of CR field BF is set to 1, otherwise it is set to 0. Bits 0, bit 1 of CR field BF is set to 1, otherwise it is set to 0. 2, and 3 of CR field BF are undefined. Comparison Bits 0, 2, and 3 of CR field BF are undefined. Compari- ignores the sign of 0 (+0 = -0). son ignores the sign of 0 (+0 = -0). The comparison proceeds after treating NaNs, Infinities, and Denorms If an input error occurs and default results are gener- as normalized numbers, using their values of `e' and `f' ated, NaNs, Infinities, and Denorms are treated as nor- directly. malized numbers, using their values of `e' and `f' directly. No exceptions are generated during the execution of efdtstgt. Special Registers Altered: FINV FINVS Special Registers Altered: FG FX CR field BF CR field BF Programming Note In an implementation, the execution of efdtstgt is likely to be faster than the execution of efdcmpgt; however, if strict IEEE 754 compliance is required, the program should use efdcmpgt. Chapter 8. Embedded Floating-Point 337 Version 2.05 Floating-Point Double-Precision Test Floating-Point Double-Precision Test Less Than EVX-form Equal EVX-form efdtstlt BF,RA,RB efdtsteq BF,RA,RB 4 BF // RA RB 765 4 BF // RA RB 766 0 6 9 11 16 21 31 0 6 9 11 16 21 31 al 1 (RA)0:63 al 1 (RA)0:63 bl 1 (RB)0:63 bl 1 (RB)0:63 if (al < bl) then cl 1 1 if (al = bl) then cl 1 1 else cl 1 0 else cl 1 0 CR4×BF:4×BF+3 1 undefined || cl || undefined || undefined CR4×BF:4×BF+3 1 undefined || cl || undefined || undefined RA is compared against RB. If RA is less than RB, bit 1 RA is compared against RB. If RA is equal to RB, bit 1 of CR field BF is set to 1, otherwise it is set to 0. Bits 0, of CR field BF is set to 1, otherwise it is set to 0. Bits 0, 2, and 3 of CR field BF are undefined. Comparison 2, and 3 of CR field BF are undefined. Comparison ignores the sign of 0 (+0 = -0). The comparison pro- ignores the sign of 0 (+0 = -0). The comparison pro- ceeds after treating NaNs, Infinities, and Denorms as ceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of `e' and `f' normalized numbers, using their values of `e' and `f' directly. directly. No exceptions are generated during the execution of No exceptions are generated during the execution of efdtstlt. efdtsteq. Special Registers Altered: Special Registers Altered: CR field BF CR field BF Programming Note Programming Note In an implementation, the execution of efdtstlt is In an implementation, the execution of efdtsteq is likely to be faster than the execution of efdcmplt; likely to be faster than the execution of efdcmpeq; however, if strict IEEE 754 compliance is required, however, if strict IEEE 754 compliance is required, the program should use efdcmplt. the program should use efdcmpeq. 338 Power ISATM I Version 2.05 Convert Floating-Point Double-Precision Convert Floating-Point Double-Precision from Signed Integer EVX-form from Unsigned Integer EVX-form efdcfsi RT,RB efdcfui RT,RB 4 RT /// RB 753 4 RT /// RB 752 0 6 11 16 21 31 0 6 11 16 21 31 RT0:63 1 CnvtI32ToFP64((RB)32:63, S, I) RT0:63 1 CnvtI32ToFP64((RB)32:63, U, I) The signed integer low element in register RB is con- The unsigned integer low element in register RB is con- verted to a double-precision floating-point value using verted to a double-precision floating-point value using the current rounding mode and the result is placed in the current rounding mode and the result is placed in register RT. register RT. Special Registers Altered: Special Registers Altered: None None Chapter 8. Embedded Floating-Point 339 Version 2.05 Convert Floating-Point Double-Precision Convert Floating-Point Double-Precision from Signed Integer Doubleword from Unsigned Integer Doubleword EVX-form EVX-form efdcfsid RT,RB efdcfuid RT,RB 4 RT /// RB 739 4 RT /// RB 738 0 6 11 16 21 31 0 6 11 16 21 31 RT0:63 1 CnvtI64ToFP64((RB)0:63, S) RT0:63 1 CnvtI64ToFP64((RB)0:63, U) The signed integer doubleword in register RB is con- The unsigned integer doubleword in register RB is con- verted to a double-precision floating-point value using verted to a double-precision floating-point value using the current rounding mode and the result is placed in the current rounding mode and the result is placed in register RT. register RT. Corequisite Categories: Corequisite Categories: 64-Bit 64-Bit Special Registers Altered: Special Registers Altered: FINXS FG FX FINXS FG FX Convert Floating-Point Double-Precision Convert Floating-Point Double-Precision from Signed Fraction to Signed Integer EVX-form EVX-form efdctsi RT,RB efdcfsf RT,RB 4 RT /// RB 757 4 RT /// RB 755 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 1 CnvtFP64ToI32Sat((RB)0:63, S, RND, I) RT0:63 1 CnvtI32ToFP64((RB)32:63, S, F) The double-precision floating-point value in register RB The signed fractional low element in register RB is con- is converted to a signed integer using the current verted to a double-precision floating-point value using rounding mode and the result is saturated if it cannot the current rounding mode and the result is placed in be represented in a 32-bit integer. NaNs are converted register RT. as though they were zero. Special Registers Altered: Special Registers Altered: None FINV FINVS FINXS FG FX Convert Floating-Point Double-Precision from Unsigned Fraction EVX-form Convert Floating-Point Double-Precision to Unsigned Integer EVX-form efdcfuf RT,RB efdctui RT,RB 4 RT /// RB 754 0 6 11 16 21 31 4 RT /// RB 756 0 6 11 16 21 31 RT0:63 1 CnvtI32ToFP64((RB)32:63, U, F) RT32:63 1 CnvtFP64ToI32Sat((RB)0:63, U, RND, I) The unsigned fractional low element in register RB is converted to a double-precision floating-point value The double-precision floating-point value in register RB using the current rounding mode and the result is is converted to an unsigned integer using the current placed in register RT. rounding mode and the result is saturated if it cannot be represented in a 32-bit integer. NaNs are converted Special Registers Altered: as though they were zero. None Special Registers Altered: FINV FINVS FINXS FG FX 340 Power ISATM I Version 2.05 Convert Floating-Point Double-Precision Convert Floating-Point Double-Precision to Signed Integer Doubleword with Round to Unsigned Integer Doubleword with toward Zero EVX-form Round toward Zero EVX-form efdctsidz RT,RB efdctuidz RT,RB 4 RT /// RB 747 4 RT /// RB 746 0 6 11 16 21 31 0 6 11 16 21 31 RT0:63 1 CnvtFP64ToI64Sat((RB)0:63, S, ZER) RT0:63 1 CnvtFP64ToI64Sat((RB)0:63, U, ZER) The double-precision floating-point value in register RB The double-precision floating-point value in register RB is converted to a signed integer doubleword using the is converted to an unsigned integer doubleword using rounding mode Round toward Zero and the result is the rounding mode Round toward Zero and the result is saturated if it cannot be represented in a 64-bit integer. saturated if it cannot be represented in a 64-bit integer. NaNs are converted as though they were zero. NaNs are converted as though they were zero. Corequisite Categories: Corequisite Categories: 64-Bit 64-Bit Special Registers Altered: Special Registers Altered: FINV FINVS FINV FINVS FINXS FG FX FINXS FG FX Chapter 8. Embedded Floating-Point 341 Version 2.05 Convert Floating-Point Double-Precision Convert Floating-Point Double-Precision to Signed Integer with Round toward Zero to Unsigned Integer with Round toward EVX-form Zero EVX-form efdctsiz RT,RB efdctuiz RT,RB 4 RT /// RB 762 4 RT /// RB 760 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 1 CnvtFP64ToI32Sat((RB)0:63, S, ZER, I) RT32:63 1 CnvtFP64ToI32Sat((RB)0:63, U, ZER, I) The double-precision floating-point value in register RB The double-precision floating-point value in register RB is converted to a signed integer using the rounding is converted to an unsigned integer using the rounding mode Round toward Zero and the result is saturated if mode Round toward Zero and the result is saturated if it cannot be represented in a 32-bit integer. NaNs are it cannot be represented in a 32-bit integer. NaNs are converted as though they were zero. converted as though they were zero. Special Registers Altered: Special Registers Altered: FINV FINVS FINV FINVS FINXS FG FX FINXS FG FX Convert Floating-Point Double-Precision Floating-Point Double-Precision Convert to Signed Fraction EVX-form from Single-Precision EVX-form efdctsf RT,RB efdcfs RT,RB 4 RT /// RB 759 4 RT /// RB 751 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 1 CnvtFP64ToI32Sat((RB)0:63, S, RND, F) FP32format f; FP64format result; The double-precision floating-point value in register RB f 1 (RB)32:63 is converted to a signed fraction using the current if (fexp = 0) & (ffrac = 0)) then rounding mode and the result is saturated if it cannot result 1 fsign || 630 be represented in a 32-bit fraction. NaNs are converted else if Isa32NaNorInfinity(f) | Isa32Denorm(f) then as though they were zero. SPEFSCRFINV 1 1 result 1 fsign || 0b11111111110 || 521 Special Registers Altered: else if Isa32Denorm(f) then FINV FINVS SPEFSCRFINV 1 1 FINXS FG FX result 1 fsign || 630 else resultsign 1 fsign Convert Floating-Point Double-Precision resultexp 1 fexp - 127 + 1023 resultfrac 1 ffrac || 290 to Unsigned Fraction EVX-form RT0:63 1 result efdctuf RT,RB The single-precision floating-point value in the low ele- ment of register RB is converted to a double-precision 4 RT /// RB 758 floating-point value and the result is placed in register 0 6 11 16 21 31 RT. Corequisite Categories: RT32:63 1 CnvtFP64ToI32Sat((RB)0:63, U, RND, F) SPE.Embedded Float Scalar Single or The double-precision floating-point value in register RB SPE.Embedded Float Vector is converted to an unsigned fraction using the current Special Registers Altered: rounding mode and the result is saturated if it cannot FINV FINVS be represented in a 32-bit unsigned fraction. NaNs are FG FX converted as though they were zero. Special Registers Altered: FINV FINVS FINXS FG FX 342 Power ISATM I Version 2.05 Floating-Point Single-Precision Convert from Double-Precision EVX-form efscfd RT,RB 4 RT /// RB 719 0 6 11 16 21 31 FP64format f; FP32format result; f 1 (RB)0:63 if (fexp = 0) & (ffrac = 0)) then result 1 fsign || 310 else if Isa64NaNorInfinity(f) then SPEFSCRFINV 1 1 result 1 fsign || 0b11111110 || 231 else if Isa64Denorm(f) then SPEFSCRFINV 1 1 result 1 fsign || 310 else unbias 1 fexp - 1023 if unbias > 127 then result 1 fsign || 0b11111110 || 231 SPEFSCRFOVF 1 1 else if unbias < -126 then result 1 fsign || 0b00000001 || 230 SPEFSCRFUNF 1 1 else resultsign 1 fsign resultexp 1 unbias + 127 resultfrac 1 ffrac[0:22] guard 1 ffrac[23] sticky 1 (ffrac[24:51] 0) result 1 Round32(result, LO, guard, sticky) SPEFSCRFG 1 guard SPEFSCRFX 1 sticky if guard | sticky then SPEFSCRFINXS 1 1 RT32:63 1 result The double-precision floating-point value in register RB is converted to a single-precision floating-point value using the current rounding mode and the result is placed into the low element of register RT. Corequisite Categories: SPE.Embedded Float Scalar Scalar Special Registers Altered: FINV FINVS FOVF FOVFS FUNF FUNFS FG FX FINXS Chapter 8. Embedded Floating-Point 343 Version 2.05 8.4 Embedded Floating-Point Results Summary The following tables summarize the results of various 1 nmin denotes the minimum normalized negative types of Embedded Floating-Point operations on vari- number. The encoding for single-precision is: ous combinations of input operands. Flag settings are 0x80800000. The encoding for double-precision is: performed on appropriate element flags. For all the 0x80100000_00000000. tables the following annotation and general rules apply: 1 Calculations that overflow or underflow saturate. 1 * denotes that this status flag is set based on the Overflow for operations that have a floating-point results of the calculation. result force the result to max. Underflow for opera- 1 _Calc_ denotes that the result is updated with the tions that have a floating-point result force the results of the computation. result to zero. Overflow for operations that have a 1 max denotes the maximum normalized number signed integer result force the result to with the sign set to the computation [sign(operand 0x7FFFFFFF (positive) or 0x80000000 (negative). A) XOR sign(operand B)]. Overflow for operations that have an unsigned 1 amax denotes the maximum normalized number integer result force the result to 0xFFFFFFFF (pos- with the sign set to the sign of Operand A. itive) or 0x00000000 (negative). 1 bmax denotes the maximum normalized number 1 1 (superscript) denotes that the sign of the result is with the sign set to the sign of Operand B. positive when the sign of Operand A and the sign 1 pmax denotes the maximum normalized positive of Operand B are different, for all rounding modes number. The encoding for single-precision is: except round to -infinity, where the sign of the 0x7F7FFFFF. The encoding for double-precision result is then negative. is: 0x7FEFFFFF_FFFFFFFF. 1 2 (superscript) denotes that the sign of the result is 1 nmax denotes the maximum normalized negative positive when the sign of Operand A and the sign number. The encoding for single-precision is: of Operand B are the same, for all rounding modes 0xFF7FFFFF. The encoding for double-precision except round to -infinity, where the sign of the is: 0xFFEFFFFF_FFFFFFFF. result is then negative. 1 pmin denotes the minimum normalized positive 1 3 (superscript) denotes that the sign for any multi- number. The encoding for single-precision is: ply or divide is always the result of the operation 0x00800000. The encoding for double-precision is: [sign(Operand A) XOR sign(Operand B)]. 0x00100000_00000000. 1 4 (superscript) denotes that if an overflow is detected, the result may be saturated. Table 3: Embedded Floating-Point Results Summary--Add, Sub, Mul, Div Operation Operand A Operand B Result FINV FOVF FUNF FDBZ FINX Add Add amax 1 0 0 0 0 Add NaN amax 1 0 0 0 0 Add denorm amax 1 0 0 0 0 Add zero amax 1 0 0 0 0 Add Norm amax 1 0 0 0 0 Add NaN amax 1 0 0 0 0 Add NaN NaN amax 1 0 0 0 0 Add NaN denorm amax 1 0 0 0 0 Add NaN zero amax 1 0 0 0 0 Add NaN norm amax 1 0 0 0 0 Add denorm bmax 1 0 0 0 0 Add denorm NaN bmax 1 0 0 0 0 1 Add denorm denorm zero 1 0 0 0 0 Add denorm zero zero1 1 0 0 0 0 4 Add denorm norm operand_b 1 0 0 0 0 Add zero bmax 1 0 0 0 0 Add zero NaN bmax 1 0 0 0 0 Add zero denorm zero1 1 0 0 0 0 344 Power ISATM I Version 2.05 Table 3: Embedded Floating-Point Results Summary--Add, Sub, Mul, Div (Continued) Operation Operand A Operand B Result FINV FOVF FUNF FDBZ FINX Add zero zero zero1 0 0 0 0 0 Add zero norm operand_b4 0 0 0 0 0 Add norm bmax 1 0 0 0 0 Add norm NaN bmax 1 0 0 0 0 Add norm denorm operand_a4 1 0 0 0 0 Add norm zero operand_a4 0 0 0 0 0 Add norm norm _Calc_ 0 * * 0 * Subtract Sub amax 1 0 0 0 0 Sub NaN amax 1 0 0 0 0 Sub denorm amax 1 0 0 0 0 Sub zero amax 1 0 0 0 0 Sub Norm amax 1 0 0 0 0 Sub NaN amax 1 0 0 0 0 Sub NaN NaN amax 1 0 0 0 0 Sub NaN denorm amax 1 0 0 0 0 Sub NaN zero amax 1 0 0 0 0 Sub NaN norm amax 1 0 0 0 0 Sub denorm -bmax 1 0 0 0 0 Sub denorm NaN -bmax 1 0 0 0 0 2 Sub denorm denorm zero 1 0 0 0 0 2 Sub denorm zero zero 1 0 0 0 0 Sub denorm norm -operand_b4 1 0 0 0 0 Sub zero -bmax 1 0 0 0 0 Sub zero NaN -bmax 1 0 0 0 0 Sub zero denorm zero2 1 0 0 0 0 Sub zero zero zero2 0 0 0 0 0 4 Sub zero norm -operand_b 0 0 0 0 0 Sub norm -bmax 1 0 0 0 0 Sub norm NaN -bmax 1 0 0 0 0 Sub norm denorm operand_a4 1 0 0 0 0 Sub norm zero operand_a4 0 0 0 0 0 Sub norm norm _Calc_ 0 * * 0 * Multiply3 Mul max 1 0 0 0 0 Mul NaN max 1 0 0 0 0 Mul denorm zero 1 0 0 0 0 Mul zero zero 1 0 0 0 0 Mul Norm max 1 0 0 0 0 Mul NaN max 1 0 0 0 0 Mul NaN NaN max 1 0 0 0 0 Mul NaN denorm zero 1 0 0 0 0 Mul NaN zero zero 1 0 0 0 0 Mul NaN norm max 1 0 0 0 0 Chapter 8. Embedded Floating-Point 345 Version 2.05 Table 3: Embedded Floating-Point Results Summary--Add, Sub, Mul, Div (Continued) Operation Operand A Operand B Result FINV FOVF FUNF FDBZ FINX Mul denorm zero 1 0 0 0 0 Mul denorm NaN zero 1 0 0 0 0 Mul denorm denorm zero 1 0 0 0 0 Mul denorm zero zero 1 0 0 0 0 Mul denorm norm zero 1 0 0 0 0 Mul zero zero 1 0 0 0 0 Mul zero NaN zero 1 0 0 0 0 Mul zero denorm zero 1 0 0 0 0 Mul zero zero zero 0 0 0 0 0 Mul zero norm zero 0 0 0 0 0 Mul norm max 1 0 0 0 0 Mul norm NaN max 1 0 0 0 0 Mul norm denorm zero 1 0 0 0 0 Mul norm zero zero 0 0 0 0 0 Mul norm norm _Calc_ 0 * * 0 * 3 Divide Div zero 1 0 0 0 0 Div NaN zero 1 0 0 0 0 Div denorm max 1 0 0 0 0 Div zero max 1 0 0 0 0 Div Norm max 1 0 0 0 0 Div NaN zero 1 0 0 0 0 Div NaN NaN zero 1 0 0 0 0 Div NaN denorm max 1 0 0 0 0 Div NaN zero max 1 0 0 0 0 Div NaN norm max 1 0 0 0 0 Div denorm zero 1 0 0 0 0 Div denorm NaN zero 1 0 0 0 0 Div denorm denorm max 1 0 0 0 0 Div denorm zero max 1 0 0 0 0 Div denorm norm zero 1 0 0 0 0 Div zero zero 1 0 0 0 0 Div zero NaN zero 1 0 0 0 0 Div zero denorm max 1 0 0 0 0 Div zero zero max 1 0 0 0 0 Div zero norm zero 0 0 0 0 0 Div norm zero 1 0 0 0 0 Div norm NaN zero 1 0 0 0 0 Div norm denorm max 1 0 0 0 0 Div norm zero max 0 0 0 1 0 Div norm norm _Calc_ 0 * * 0 * 346 Power ISATM I Version 2.05 Table 4: Embedded Floating-Point Results Summary--Single Convert from Double Operand B efscfd result FINV FOVF FUNF FDBZ FINX + pmax 1 0 0 0 0 - nmax 1 0 0 0 0 +NaN pmax 1 0 0 0 0 -NaN nmax 1 0 0 0 0 +denorm +zero 1 0 0 0 0 -denorm -zero 1 0 0 0 0 +zero +zero 0 0 0 0 0 -zero -zero 0 0 0 0 0 norm _Calc_ 0 * * 0 * Table 5: Embedded Floating-Point Results Summary--Double Convert from Single Operand B efdcfs result FINV FOVF FUNF FDBZ FINX + pmax 1 0 0 0 0 - nmax 1 0 0 0 0 +NaN pmax 1 0 0 0 0 -NaN nmax 1 0 0 0 0 +denorm +zero 1 0 0 0 0 -denorm -zero 1 0 0 0 0 +zero +zero 0 0 0 0 0 -zero -zero 0 0 0 0 0 norm _Calc_ 0 0 0 0 0 Table 6: Embedded Floating-Point Results Summary--Convert to Unsigned Integer Result Fractional Result Operand B FINV FOVF FUNF FDBZ FINX ctui[d][z] ctuf + 0xFFFF_FFFF 0x7FFF_FFFF 1 0 0 0 0 0xFFFF_FFFF_FFFF_FFFF - 0 0 1 0 0 0 0 +NaN 0 0 1 0 0 0 0 -NaN 0 0 1 0 0 0 0 denorm 0 0 1 0 0 0 0 zero 0 0 0 0 0 0 0 +norm _Calc_ _Calc_ * 0 0 0 * -norm _Calc_ _Calc_ * 0 0 0 * Chapter 8. Embedded Floating-Point 347 Version 2.05 Table 7: Embedded Floating-Point Results Summary--Convert to Signed Integer Result Fractional Result Operand B FINV FOVF FUNF FDBZ FINX ctsi[d][z] ctsf + 0x7FFF_FFFF 0x7FFF_FFFF 1 0 0 0 0 0x7FFF_FFFF_FFFF_FFFF - 0x8000_0000 0x8000_0000 1 0 0 0 0 0x8000_0000_0000_0000 +NaN 0 0 1 0 0 0 0 -NaN 0 0 1 0 0 0 0 denorm 0 0 1 0 0 0 0 zero 0 0 0 0 0 0 0 +norm _Calc_ _Calc_ * 0 0 0 * -norm _Calc_ _Calc_ * 0 0 0 * Table 8: Embedded Floating-Point Results Summary--Convert from Unsigned Integer Source Fractional Source Operand B FINV FOVF FUNF FDBZ FINX cfui cfuf zero zero zero 0 0 0 0 0 norm _Calc_ _Calc_ 0 0 0 0 * Table 9: Embedded Floating-Point Results Summary--Convert from Signed Integer Source Fractional Source Operand B FINV FOVF FUNF FDBZ FINX cfsi cfsf zero zero zero 0 0 0 0 0 norm _Calc_ _Calc_ 0 0 0 0 * Table 10:Embedded Floating-Point Results Summary--*abs, *nabs, *neg Operand A *abs *nabs *neg FINV FOVF FUNF FDBZ FINX + pmax | + nmax | - -amax | - 1 0 0 0 0 - pmax | + nmax | - -amax | + 1 0 0 0 0 +NaN pmax | NaN nmax | -NaN -amax | -NaN 1 0 0 0 0 -NaN pmax | NaN nmax | -NaN -amax | +NaN 1 0 0 0 0 +denorm +zero | +denorm -zero | -denorm -zero | -denorm 1 0 0 0 0 -denorm +zero | +denorm -zero | -denorm +zero | +denorm 1 0 0 0 0 +zero +zero -zero -zero 0 0 0 0 0 -zero +zero -zero +zero 0 0 0 0 0 +norm +norm -norm -norm 0 0 0 0 0 -norm +norm -norm +norm 0 0 0 0 0 348 Power ISATM I Version 2.05 Chapter 9. Legacy Move Assist Instruction [Category: Legacy Move Assist] Determine Leftmost Zero Byte X-form Special Registers Altered: XER57:63 dlmzb RA,RS,RB (Rc=0) CR0 (if Rc=1) dlmzb. RA,RS,RB (Rc=1) 31 RS RA RB 78 Rc 0 6 11 16 21 31 d0:63 1 (RS)32:63 || (RB)32:63 i 1 0 x 1 0 y 1 0 do while (x<8) & (y=0) x 1 x + 1 if di+32:i+39 = 0 then y 1 1 else i 1 i + 8 RA 1 x XER57:63 1 x if Rc = 1 then do CR35 1 SO if y = 1 then do if x<5 then CR32:34 1 0b010 else CR32:34 1 0b100 else CR32:34 1 0b001 The contents of bits 32:63 of register RS and the con- tents of bits 32:63 of register RB are concatenated to form an 8-byte operand. The operand is searched for the leftmost byte in which each bit is 0 (i.e., a null byte). Bytes in the operand are numbered from left to right starting with 1. If a null byte is found, its byte number is placed into bits 57:63 of the XER and into register RA. Otherwise, the value 0b000_1000 is placed into both bits 57:63 of the XER and register RA. If Rc is equal to 1, SO is copied into bit 35 of the CR and bits 32:34 of the CR are updated as follows: 1 If no null byte is found, bits 32:34 of the CR are set to 0b001. 1 If the leftmost null byte is in the first 4 bytes (i.e., from register RS), bits 32:34 of the CR are set to 0b010. 1 If the leftmost null byte is in the last 4 bytes (i.e., from register RB), bits 32:34 of the CR are set to 0b100. Chapter 9. Legacy Move Assist Instruction [Category: Legacy Move Assist] 349 Version 2.05 350 Power ISATM I Version 2.05 Chapter 10. Legacy Integer Multiply-Accumulate Instructions [Category: Legacy Integer Multiply-Accumulate] The Legacy Integer Multiply-Accumulate instructions Programming Note with Rc=1 set the first three bits of CR Field 0 based on the 32-bit result, as described in Section 3.3.7, "Other Notice that CR Field 0 may not reflect the "true" Fixed-Point Instructions". (infinitely precise) result if overflow occurs. The XO-form Legacy Integer Multiply-Accumulate instructions set SO and OV when OE=1 to reflect over- flow of the 32-bit result. Multiply Accumulate Cross Halfword to Multiply Accumulate Cross Halfword to Word Modulo Signed XO-form Word Saturate Signed XO-form macchw RT,RA,RB (OE=0 Rc=0) macchws RT,RA,RB (OE=0 Rc=0) macchw. RT,RA,RB (OE=0 Rc=1) macchws. RT,RA,RB (OE=0 Rc=1) macchwo RT,RA,RB (OE=1 Rc=0) macchwso RT,RA,RB (OE=1 Rc=0) macchwo. RT,RA,RB (OE=1 Rc=1) macchwso. RT,RA,RB (OE=1 Rc=1) 4 RT RA RB OE 172 Rc 4 RT RA RB OE 236 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 prod0:31 1 (RA)48:63 ×si (RB)32:47 prod0:31 1 (RA)48:63 ×si (RB)32:47 temp0:32 1 prod0:31 + (RT)32:63 temp0:32 1 prod0:31 + RT32:63 RT32:63 1 temp1:32 if temp < -231 then RT32:63 1 0x8000_0000 RT0:31 1 undefined else if temp > 231-1 then RT32:63 1 0x7FFF_FFFF else RT32:63 1 temp1:32 The signed-integer halfword in bits 48:63 of register RA RT0:31 1 undefined is multiplied by the signed-integer halfword in bits 32:47 of register RB. The signed-integer halfword in bits 48:63 of register RA is multiplied by the signed-integer halfword in bits 32:47 The 32-bit signed-integer product is added to the of register RB. signed-integer word in bits 32:63 of register RT. The 32-bit signed-integer product is added to the The low-order 32 bits of the sum are placed into bits signed-integer word in bits 32:63 of register RT. 32:63 of register RT. If the sum is less than -231, then the value 0x8000_0000 The contents of bits 0:31 of register RT are undefined. is placed into bits 32:63 of register RT. Special Registers Altered: If the sum is greater than 231-1, then the value SO OV (if OE=1) 0x7FFF_FFFF is placed into bits 32:63 of register RT. CR0 (if Rc=1) Otherwise, the sum is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV (if OE=1) CR0 (if Rc=1) Chapter 10. Legacy Integer Multiply-Accumulate Instructions 351 Version 2.05 Multiply Accumulate Cross Halfword to Multiply Accumulate Cross Halfword to Word Modulo Unsigned XO-form Word Saturate Unsigned XO-form macchwu RT,RA,RB (OE=0 Rc=0) macchwsu RT,RA,RB (OE=0 Rc=0) macchwu. RT,RA,RB (OE=0 Rc=1) macchwsu. RT,RA,RB (OE=0 Rc=1) macchwuo RT,RA,RB (OE=1 Rc=0) macchwsuo RT,RA,RB (OE=1 Rc=0) macchwuo. RT,RA,RB (OE=1 Rc=1) macchwsuo. RT,RA,RB (OE=1 Rc=1) 4 RT RA RB OE 140 Rc 4 RT RA RB OE 204 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 prod0:31 1 (RA)48:63 ×ui (RB)32:47 prod0:31 1 (RA)48:63 ×ui (RB)32:47 temp0:32 1 prod0:31 + (RT)32:63 temp0:32 1 prod0:31 + (RT)32:63 RT 1 temp1:32 if temp > 232-1 then RT 1 0xFFFF_FFFF else RT 1 temp1:32 The unsigned-integer halfword in bits 48:63 of register RA is multiplied by the unsigned-integer halfword in bits The unsigned-integer halfword in bits 48:63 of register 32:47 of register RB. RA is multiplied by the unsigned-integer halfword in bits 32:47 of register RB. The 32-bit unsigned-integer product is added to the unsigned-integer word in bits 32:63 of register RT. The 32-bit unsigned-integer product is added to the unsigned-integer word in bits 32:63 of register RT. The low-order 32 bits of the sum are placed into bits 32:63 of register RT. If the sum is greater than 232-1, then the value 0xFFFF_FFFF is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Otherwise, the sum is placed into bits 32:63 of register Special Registers Altered: RT. SO OV (if OE=1) CR0 (if Rc=1) The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV (if OE=1) CR0 (if Rc=1) 352 Power ISATM I Version 2.05 Multiply Accumulate High Halfword to Multiply Accumulate High Halfword to Word Modulo Signed XO-form Word Saturate Signed XO-form machhw RT,RA,RB (OE=0 Rc=0) machhws RT,RA,RB (OE=0 Rc=0) machhw. RT,RA,RB (OE=0 Rc=1) machhws. RT,RA,RB (OE=0 Rc=1) machhwo RT,RA,RB (OE=1 Rc=0) machhwso RT,RA,RB (OE=1 Rc=0) machhwo. RT,RA,RB (OE=1 Rc=1) machhwso. RT,RA,RB (OE=1 Rc=1) 4 RT RA RB OE 44 Rc 4 RT RA RB OE 108 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 prod0:31 1 (RA)32:47 ×si (RB)32:47 prod0:31 1 (RA)32:47 ×si (RB)32:47 temp0:32 1 prod0:31 + (RT)32:63 temp0:32 1 prod0:31 + (RT)32:63 RT32:63 1 temp1:32 if temp < -231 then RT32:63 1 0x8000_0000 RT0:31 1 undefined else if temp > 231-1 then RT32:63 1 0x7FFF_FFFF else RT32:63 1 temp1:32 The signed-integer halfword in bits 32:47 of register RA RT0:31 1 undefined is multiplied by the signed-integer halfword in bits 32:47 of register RB. The signed-integer halfword in bits 32:47 of register RA is multiplied by the signed-integer halfword in bits 32:47 The 32-bit signed-integer product is added to the of register RB. signed-integer word in bits 32:63 of register RT. The 32-bit signed-integer product is added to the The low-order 32 bits of the sum are placed into bits signed-integer word in bits 32:63 of register RT. 32:63 of register RT. If the sum is less than -231, then the value 0x8000_0000 The contents of bits 0:31 of register RT are undefined. is placed into bits 32:63 of register RT. Special Registers Altered: If the sum is greater than 231-1, then the value SO OV (if OE=1) 0x7FFF_FFFF is placed into bits 32:63 of register RT. CR0 (if Rc=1) Otherwise, the sum is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV (if OE=1) CR0 (if Rc=1) Chapter 10. Legacy Integer Multiply-Accumulate Instructions 353 Version 2.05 Multiply Accumulate High Halfword to Multiply Accumulate High Halfword to Word Modulo Unsigned XO-form Word Saturate Unsigned XO-form machhwu RT,RA,RB (OE=0 Rc=0) machhwsu RT,RA,RB (OE=0 Rc=0) machhwu. RT,RA,RB (OE=0 Rc=1) machhwsu. RT,RA,RB (OE=0 Rc=1) machhwuo RT,RA,RB (OE=1 Rc=0) machhwsuo RT,RA,RB (OE=1 Rc=0) machhwuo. RT,RA,RB (OE=1 Rc=1) machhwsuo. RT,RA,RB (OE=1 Rc=1) 4 RT RA RB OE 12 Rc 4 RT RA RB OE 76 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 prod0:31 1 (RA)32:47 ×ui (RB)32:47 prod0:31 1 (RA)32:47 ×ui (RB)32:47 temp0:32 1 prod0:31 + (RT)32:63 temp0:32 1 prod0:31 + (RT)32:63 RT32:63 1 temp1:32 if temp > 232-1 then RT 1 0xFFFF_FFFF RT0:31 1 undefined else RT 1 temp1:32 The unsigned-integer halfword in bits 32:47 of register The unsigned-integer halfword in bits 32:47 of register RA is multiplied by the unsigned-integer halfword in bits RA is multiplied by the unsigned-integer halfword in bits 32:47 of register RB. 32:47 of register RB. The 32-bit unsigned-integer product is added to the The 32-bit unsigned-integer product is added to the unsigned-integer word in bits 32:63 of register RT. unsigned-integer word in bits 32:63 of register RT. The low-order 32 bits of the sum are placed into bits If the sum is greater than 232-1, then the value 32:63 of register RT. 0xFFFF_FFFF is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Otherwise, the sum is placed into bits 32:63 of register RT. Special Registers Altered: SO OV (if OE=1) The contents of bits 0:31 of register RT are undefined. CR0 (if Rc=1) Special Registers Altered: SO OV (if OE=1) CR0 (if Rc=1) 354 Power ISATM I Version 2.05 Multiply Accumulate Low Halfword to Multiply Accumulate Low Halfword to Word Modulo Signed XO-form Word Saturate Signed XO-form maclhw RT,RA,RB (OE=0 Rc=0) maclhws RT,RA,RB (OE=0 Rc=0) maclhw. RT,RA,RB (OE=0 Rc=1) maclhws. RT,RA,RB (OE=0 Rc=1) maclhwo RT,RA,RB (OE=1 Rc=0) maclhwso RT,RA,RB (OE=1 Rc=0) maclhwo. RT,RA,RB (OE=1 Rc=1) maclhwso. RT,RA,RB (OE=1 Rc=1) 4 RT RA RB OE 428 Rc 4 RT RA RB OE 492 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 prod0:31 1 (RA)48:63 ×si (RB)48:63 prod0:31 1 (RA)48:63 ×si (RB)48:63 temp0:32 1 prod0:31 + (RT)32:63 temp0:32 1 prod0:31 + (RT)32:63 RT32:63 1 temp1:32 if temp < -231 then RT32:63 1 0x8000_0000 RT0:31 1 undefined else if temp > 231-1 then RT32:63 1 0x7FFF_FFFF else RT32:63 1 temp1:32 The signed-integer halfword in bits 48:63 of register RA RT0:31 1 undefined is multiplied by the signed-integer halfword in bits 48:63 of register RB. The signed-integer halfword in bits 48:63 of register RA is multiplied by the signed-integer halfword in bits 48:63 The 32-bit signed-integer product is added to the of register RB. signed-integer word in bits 32:63 of register RT. The 32-bit signed-integer product is added to the The low-order 32 bits of the sum are placed into bits signed-integer word in bits 32:63 of register RT. 32:63 of register RT. If the sum is less than -231, then the value 0x8000_0000 The contents of bits 0:31 of register RT are undefined. is placed into bits 32:63 of register RT. Special Registers Altered: If the sum is greater than 231-1, then the value SO OV (if OE=1) 0x7FFF_FFFF is placed into bits 32:63 of register RT. CR0 (if Rc=1) Otherwise, the sum is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV (if OE=1) CR0 (if Rc=1) Chapter 10. Legacy Integer Multiply-Accumulate Instructions 355 Version 2.05 Multiply Accumulate Low Halfword to Multiply Accumulate Low Halfword to Word Modulo Unsigned XO-form Word Saturate Unsigned XO-form maclhwu RT,RA,RB (OE=0 Rc=0) maclhwsu RT,RA,RB (OE=0 Rc=0) maclhwu. RT,RA,RB (OE=0 Rc=1) maclhwsu. RT,RA,RB (OE=0 Rc=1) maclhwuo RT,RA,RB (OE=1 Rc=0) maclhwsuo RT,RA,RB (OE=1 Rc=0) maclhwuo. RT,RA,RB (OE=1 Rc=1) maclhwsuo. RT,RA,RB (OE=1 Rc=1) 4 RT RA RB OE 396 Rc 4 RT RA RB OE 460 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 prod0:31 1 (RA)48:63 ×ui (RB)48:63 prod0:31 1 (RA)48:63 ×ui (RB)48:63 temp0:32 1 prod0:31 + (RT)32:63 temp0:32 1 prod0:31 + (RT)32:63 RT32:63 1 temp1:32 if temp > 232-1 then RT 1 0xFFFF_FFFF RT0:31 1 undefined else RT 1 temp1:32 The unsigned-integer halfword in bits 48:63 of register The unsigned-integer halfword in bits 48:63 of register RA is multiplied by the unsigned-integer halfword in bits RA is multiplied by the unsigned-integer halfword in bits 48:63 of register RB. 48:63 of register RB. The 32-bit unsigned-integer product is added to the The 32-bit unsigned-integer product is added to the unsigned-integer word in bits 32:63 of register RT. unsigned-integer word in bits 32:63 of register RT. The low-order 32 bits of the sum are placed into bits If the sum is greater than 232-1, then the value 32:63 of register RT. 0xFFFF_FFFF is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Otherwise, the sum is placed into bits 32:63 of register RT. Special Registers Altered: SO OV (if OE=1) The contents of bits 0:31 of register RT are undefined. CR0 (if Rc=1) Special Registers Altered: SO OV (if OE=1) CR0 (if Rc=1) Multiply Cross Halfword to Word Signed Multiply Cross Halfword to Word X-form Unsigned X-form mulchw RT,RA,RB (Rc=0) mulchwu RT,RA,RB (Rc=0) mulchw. RT,RA,RB (Rc=1) mulchwu. RT,RA,RB (Rc=1) 4 RT RA RB 168 Rc 4 RT RA RB 136 Rc 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 1 (RA)48:63 ×si (RB)32:47 RT32:63 1 (RA)48:63 ×ui (RB)32:47 RT0:31 1 undefined RT0:31 1 undefined The signed-integer halfword in bits 48:63 of register RA The unsigned-integer halfword in bits 48:63 of register is multiplied by the signed-integer halfword in bits 32:47 RA is multiplied by the unsigned-integer halfword in bits of register RB and the signed-integer word result is 32:47 of register RB and the unsigned-integer word placed into bits 32:63 of register RT. result is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) 356 Power ISATM I Version 2.05 Multiply High Halfword to Word Signed Multiply High Halfword to Word Unsigned X-form X-form mulhhw RT,RA,RB (Rc=0) mulhhwu RT,RA,RB (Rc=0) mulhhw. RT,RA,RB (Rc=1) mulhhwu. RT,RA,RB (Rc=1) 4 RT RA RB 40 Rc 4 RT RA RB 8 Rc 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 1 (RA)32:47 ×si (RB)32:47 RT32:63 1 (RA)32:47 ×ui (RB)32:47 RT0:31 1 undefined RT0:31 1 undefined The signed-integer halfword in bits 32:47 of register RA The unsigned-integer halfword in bits 32:47 of register is multiplied by the signed-integer halfword in bits 32:47 RA is multiplied by the unsigned-integer halfword in bits of register RB and the signed-integer word result is 32:47 of register RB and the unsigned-integer word placed into bits 32:63 of register RT. result is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) Multiply Low Halfword to Word Signed Multiply Low Halfword to Word Unsigned X-form X-form mullhw RT,RA,RB (Rc=0) mullhwu RT,RA,RB (Rc=0) mullhw. RT,RA,RB (Rc=1) mullhwu. RT,RA,RB (Rc=1) 4 RT RA RB 424 Rc 4 RT RA RB 392 Rc 0 6 11 16 21 31 0 6 11 16 21 31 RT32:63 1 (RA)48:63 ×si (RB)48:63 RT32:63 1 (RA)48:63 ×ui (RB)48:63 RT0:31 1 undefined RT0:31 1 undefined The signed-integer halfword in bits 48:63 of register RA The unsigned-integer halfword in bits 48:63 of register is multiplied by the signed-integer halfword in bits 48:63 RA is multiplied by the unsigned-integer halfword in bits of register RB and the signed-integer word result is 48:63 of register RB and the unsigned-integer word placed into bits 32:63 of register RT. result is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) Chapter 10. Legacy Integer Multiply-Accumulate Instructions 357 Version 2.05 Negative Multiply Accumulate Cross Negative Multiply Accumulate Cross Halfword to Word Modulo Signed Halfword to Word Saturate Signed XO-form XO-form nmacchw RT,RA,RB (OE=0 Rc=0) nmacchws RT,RA,RB (OE=0 Rc=0) nmacchw. RT,RA,RB (OE=0 Rc=1) nmacchws. RT,RA,RB (OE=0 Rc=1) nmacchwo RT,RA,RB (OE=1 Rc=0) nmacchwso RT,RA,RB (OE=1 Rc=0) nmacchwo. RT,RA,RB (OE=1 Rc=1) nmacchwso. RT,RA,RB (OE=1 Rc=1) 4 RT RA RB OE 174 Rc 4 RT RA RB OE 238 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 prod0:31 1 (RA)48:63 ×si (RB)32:47 prod0:31 1 (RA)48:63 ×si (RB)32:47 temp0:32 1 (RT)32:63 -si prod0:31 temp0:32 1 (RT)32:63 -si prod0:31 RT32:63 1 temp1:32 if temp < -231 then RT32:63 1 0x8000_0000 RT0:31 1 undefined else if temp > 231-1 then RT32:63 1 0x7FFF_FFFF else RT32:63 1 temp1:32 The signed-integer halfword in bits 48:63 of register RA RT0:31 1 undefined is multiplied by the signed-integer halfword in bits 32:47 of register RB. The signed-integer halfword in bits 48:63 of register RA is multiplied by the signed-integer halfword in bits 32:47 The 32-bit signed-integer product is subtracted from of register RB. the signed-integer word in bits 32:63 of register RT. The 32-bit signed-integer product is subtracted from The low-order 32 bits of the difference are placed into the signed-integer word in bits 32:63 of register RT. bits 32:63 of register RT. If the difference is less than -231, then the value The contents of bits 0:31 of register RT are undefined. 0x8000_0000 is placed into bits 32:63 of register RT. Special Registers Altered: If the difference is greater than 231-1, then the value SO OV (if OE=1) 0x7FFF_FFFF is placed into bits 32:63 of register RT. CR0 (if Rc=1) Otherwise, the difference is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV (if OE=1) CR0 (if Rc=1) 358 Power ISATM I Version 2.05 Negative Multiply Accumulate High Negative Multiply Accumulate High Halfword to Word Modulo Signed Halfword to Word Saturate Signed XO-form XO-form nmachhw RT,RA,RB (OE=0 Rc=0) nmachhws RT,RA,RB (OE=0 Rc=0) nmachhw. RT,RA,RB (OE=0 Rc=1) nmachhws. RT,RA,RB (OE=0 Rc=1) nmachhwo RT,RA,RB (OE=1 Rc=0) nmachhwso RT,RA,RB (OE=1 Rc=0) nmachhwo. RT,RA,RB (OE=1 Rc=1) nmachhwso. RT,RA,RB (OE=1 Rc=1) 4 RT RA RB OE 46 Rc 4 RT RA RB OE 110 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 prod0:31 1 (RA)32:47 ×si (RB)32:47 prod0:31 1 (RA)32:47 ×si (RB)32:47 temp0:32 1 (RT)32:63 -si prod0:31 temp0:32 1 (RT)32:63 -si prod0:31 RT32:63 1 temp1:32 if temp < -231 then RT32:63 1 0x8000_0000 RT0:31 1 undefined else if temp > 231-1 then RT32:63 1 0x7FFF_FFFF else RT32:63 1 temp1:32 The signed-integer halfword in bits 32:47 of register RA RT0:31 1 undefined is multiplied by the signed-integer halfword in bits 32:47 of register RB. The signed-integer halfword in bits 32:47 of register RA is multiplied by the signed-integer halfword in bits 32:47 The 32-bit signed-integer product is subtracted from of register RB. the signed-integer word in bits 32:63 of register RT. The 32-bit signed-integer product is subtracted from The low-order 32 bits of the difference are placed into the signed-integer word in bits 32:63 of register RT. bits 32:63 of register RT. If the difference is less than -231, then the value The contents of bits 0:31 of register RT are undefined. 0x8000_0000 is placed into bits 32:63 of register RT. Special Registers Altered: If the difference is greater than 231-1, then the value SO OV (if OE=1) 0x7FFF_FFFF is placed into bits 32:63 of register RT. CR0 (if Rc=1) Otherwise, the difference is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV (if OE=1) CR0 (if Rc=1) Chapter 10. Legacy Integer Multiply-Accumulate Instructions 359 Version 2.05 Negative Multiply Accumulate Low Negative Multiply Accumulate Low Halfword to Word Modulo Signed Halfword to Word Saturate Signed XO-form XO-form nmaclhw RT,RA,RB (OE=0 Rc=0) nmaclhws RT,RA,RB (OE=0 Rc=0) nmaclhw. RT,RA,RB (OE=0 Rc=1) nmaclhws. RT,RA,RB (OE=0 Rc=1) nmaclhwo RT,RA,RB (OE=1 Rc=0) nmaclhwso RT,RA,RB (OE=1 Rc=0) nmaclhwo. RT,RA,RB (OE=1 Rc=1) nmaclhwso. RT,RA,RB (OE=1 Rc=1) 4 RT RA RB OE 430 Rc 4 RT RA RB OE 494 Rc 0 6 11 16 21 22 31 0 6 11 16 21 22 31 prod0:31 1 (RA)48:63 ×si (RB)48:63 prod0:31 1 (RA)48:63 ×si (RB)48:63 temp0:32 1 (RT)32:63 -si prod0:31 temp0:32 1 (RT)32:63 -si prod0:31 RT32:63 1 temp1:32 if temp < -231 then RT32:63 1 0x8000_0000 RT0:31 1 undefined else if temp > 231-1 then RT32:63 1 0x7FFF_FFFF else RT32:63 1 temp1:32 The signed-integer halfword in bits 48:63 of register RA RT0:31 1 undefined is multiplied by the signed-integer halfword in bits 48:63 of register RB. The signed-integer halfword in bits 48:63 of register RA is multiplied by the signed-integer halfword in bits 48:63 The 32-bit signed-integer product is subtracted from of register RB. the signed-integer word in bits 32:63 of register RT. The 32-bit signed-integer product is subtracted from The low-order 32 bits of the difference are placed into the signed-integer word in bits 32:63 of register RT. bits 32:63 of register RT. If the difference is less than -231, then the value The contents of bits 0:31 of register RT are undefined. 0x8000_0000 is placed into bits 32:63 of register RT. Special Registers Altered: If the difference is greater than 231-1, then the value SO OV (if OE=1) 0x7FFF_FFFF is placed into bits 32:63 of register RT. CR0 (if Rc=1) Otherwise, the difference is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV (if OE=1) CR0 (if Rc=1) 360 Power ISATM I Version 2.05 Appendix A. Suggested Floating-Point Models [Category: Floating-Point] A.1 Floating-Point Round to Single-Precision Model The following describes algorithmically the operation of the Floating Round to Single-Precision instruction. If (FRB)1:11 < 897 and (FRB)1:63 > 0 then Do If FPSCRUE = 0 then goto Disabled Exponent Underflow If FPSCRUE = 1 then goto Enabled Exponent Underflow End If (FRB)1:11 > 1150 and (FRB)1:11 < 2047 then Do If FPSCROE = 0 then goto Disabled Exponent Overflow If FPSCROE = 1 then goto Enabled Exponent Overflow End If (FRB)1:11 > 896 and (FRB)1:11 < 1151 then goto Normal Operand If (FRB)1:63 = 0 then goto Zero Operand If (FRB)1:11 = 2047 then Do If (FRB)12:63 = 0 then goto Infinity Operand If (FRB)12 = 1 then goto QNaN Operand If (FRB)12 = 0 and (FRB)13:63 > 0 then goto SNaN Operand End Disabled Exponent Underflow: sign 1 (FRB)0 If (FRB)1:11 = 0 then Do exp 1 -1022 frac0:52 1 0b0 || (FRB)12:63 End If (FRB)1:11 > 0 then Do exp 1 (FRB)1:11 - 1023 frac0:52 1 0b1 || (FRB)12:63 End Denormalize operand: G || R || X 1 0b000 Do while exp < -126 exp 1 exp + 1 frac0:52 || G || R || X 1 0b0 || frac0:52 || G || (R | X) End FPSCRUX 1 (frac24:52 || G || R || X) > 0 Round Single(sign,exp,frac0:52,G,R,X) FPSCRXX 1 FPSCRXX | FPSCRFI If frac0:52 = 0 then Do Appendix A. Suggested Floating-Point Models [Category: Floating-Point] 361 Version 2.05 FRT0 1 sign FRT1:63 1 0 If sign = 0 then FPSCRFPRF 1 "+ zero" If sign = 1 then FPSCRFPRF 1 "- zero" End If frac0:52 > 0 then Do If frac0 = 1 then Do If sign = 0 then FPSCRFPRF 1 "+ normal number" If sign = 1 then FPSCRFPRF 1 "- normal number" End If frac0 = 0 then Do If sign = 0 then FPSCRFPRF 1 "+ denormalized number" If sign = 1 then FPSCRFPRF 1 "- denormalized number" End Normalize operand: Do while frac0 = 0 exp 1 exp-1 frac0:52 1 frac1:52 || 0b0 End FRT0 1 sign FRT1:11 1 exp + 1023 FRT12:63 1 frac1:52 End Done Enabled Exponent Underflow: FPSCRUX 1 1 sign 1 (FRB)0 If (FRB)1:11 = 0 then Do exp 1 -1022 frac0:52 1 0b0 || (FRB)12:63 End If (FRB)1:11 > 0 then Do exp 1 (FRB)1:11 - 1023 frac0:52 1 0b1 || (FRB)12:63 End Normalize operand: Do while frac0 = 0 exp 1 exp - 1 frac0:52 1 frac1:52 || 0b0 End Round Single(sign,exp,frac0:52,0,0,0) FPSCRXX 1 FPSCRXX | FPSCRFI exp 1 exp + 192 FRT0 1 sign FRT1:11 1 exp + 1023 FRT12:63 1 frac1:52 If sign = 0 then FPSCRFPRF 1 "+ normal number" If sign = 1 then FPSCRFPRF 1 "- normal number" Done Disabled Exponent Overflow: FPSCROX 1 1 If FPSCRRN = 0b00 then /* Round to Nearest */ Do If (FRB)0 = 0 then FRT 1 0x7FF0_0000_0000_0000 If (FRB)0 = 1 then FRT 1 0xFFF0_0000_0000_0000 If (FRB)0 = 0 then FPSCRFPRF 1 "+ infinity" If (FRB)0 = 1 then FPSCRFPRF 1 "- infinity" End 362 Power ISATM I Version 2.05 If FPSCRRN = 0b01 then /* Round toward Zero */ Do If (FRB)0 = 0 then FRT 1 0x47EF_FFFF_E000_0000 If (FRB)0 = 1 then FRT 1 0xC7EF_FFFF_E000_0000 If (FRB)0 = 0 then FPSCRFPRF 1 "+ normal number" If (FRB)0 = 1 then FPSCRFPRF 1 "- normal number" End If FPSCRRN = 0b10 then /* Round toward +Infinity */ Do If (FRB)0 = 0 then FRT 1 0x7FF0_0000_0000_0000 If (FRB)0 = 1 then FRT 1 0xC7EF_FFFF_E000_0000 If (FRB)0 = 0 then FPSCRFPRF 1 "+ infinity" If (FRB)0 = 1 then FPSCRFPRF 1 "- normal number" End If FPSCRRN = 0b11 then /* Round toward -Infinity */ Do If (FRB)0 = 0 then FRT 1 0x47EF_FFFF_E000_0000 If (FRB)0 = 1 then FRT 1 0xFFF0_0000_0000_0000 If (FRB)0 = 0 then FPSCRFPRF 1 "+ normal number" If (FRB)0 = 1 then FPSCRFPRF 1 "- infinity" End FPSCRFR 1 undefined FPSCRFI 1 1 FPSCRXX 1 1 Done Enabled Exponent Overflow: sign 1 (FRB)0 exp 1 (FRB)1:11 - 1023 frac0:52 1 0b1 || (FRB)12:63 Round Single(sign,exp,frac0:52,0,0,0) FPSCRXX 1 FPSCRXX | FPSCRFI Enabled Overflow: FPSCROX 1 1 exp 1 exp - 192 FRT0 1 sign FRT1:11 1 exp + 1023 FRT12:63 1 frac1:52 If sign = 0 then FPSCRFPRF 1 "+ normal number" If sign = 1 then FPSCRFPRF 1 "- normal number" Done Zero Operand: FRT 1 (FRB) If (FRB)0 = 0 then FPSCRFPRF 1 "+ zero" If (FRB)0 = 1 then FPSCRFPRF 1 "- zero" FPSCRFRFI 1 0b00 Done Infinity Operand: FRT 1 (FRB) If (FRB)0 = 0 then FPSCRFPRF 1 "+ infinity" If (FRB)0 = 1 then FPSCRFPRF 1 "- infinity" FPSCRFRFI 1 0b00 Done QNaN Operand: FRT 1 (FRB)0:34 || 290 FPSCRFPRF 1 "QNaN" FPSCRFR FI 1 0b00 Done Appendix A. Suggested Floating-Point Models [Category: Floating-Point] 363 Version 2.05 SNaN Operand: FPSCRVXSNAN 1 1 If FPSCRVE = 0 then Do FRT0:11 1 (FRB)0:11 FRT12 1 1 FRT13:63 1 (FRB)13:34 || 290 FPSCRFPRF 1 "QNaN" End FPSCRFR FI 1 0b00 Done Normal Operand: sign 1 (FRB)0 exp 1 (FRB)1:11 - 1023 frac0:52 1 0b1 || (FRB)12:63 Round Single(sign,exp,frac0:52,0,0,0) FPSCRXX 1 FPSCRXX | FPSCRFI If exp > 127 and FPSCROE = 0 then go to Disabled Exponent Overflow If exp > 127 and FPSCROE = 1 then go to Enabled Overflow FRT0 1 sign FRT1:11 1 exp + 1023 FRT12:63 1 frac1:52 If sign = 0 then FPSCRFPRF 1 "+ normal number" If sign = 1 then FPSCRFPRF 1 "- normal number" Done Round Single(sign,exp,frac0:52,G,R,X): inc 1 0 lsb 1 frac23 gbit 1 frac24 rbit 1 frac25 xbit 1 (frac26:52||G||R||X)0 If FPSCRRN = 0b00 then /* Round to Nearest */ Do /* comparisons ignore u bits */ If sign || lsb || gbit || rbit || xbit = 0bu11uu then inc 1 1 If sign || lsb || gbit || rbit || xbit = 0bu011u then inc 1 1 If sign || lsb || gbit || rbit || xbit = 0bu01u1 then inc 1 1 End If FPSCRRN = 0b10 then /* Round toward + Infinity */ Do /* comparisons ignore u bits */ If sign || lsb || gbit || rbit || xbit = 0b0u1uu then inc 1 1 If sign || lsb || gbit || rbit || xbit = 0b0uu1u then inc 1 1 If sign || lsb || gbit || rbit || xbit = 0b0uuu1 then inc 1 1 End If FPSCRRN = 0b11 then /* Round toward - Infinity */ Do /* comparisons ignore u bits */ If sign || lsb || gbit || rbit || xbit = 0b1u1uu then inc 1 1 If sign || lsb || gbit || rbit || xbit = 0b1uu1u then inc 1 1 If sign || lsb || gbit || rbit || xbit = 0b1uuu1 then inc 1 1 End frac0:23 1 frac0:23 + inc If carry_out = 1 then Do frac0:23 1 0b1 || frac0:22 exp 1 exp + 1 End frac24:52 1 290 FPSCRFR 1 inc FPSCRFI 1 gbit | rbit | xbit Return 364 Power ISATM I Version 2.05 A.2 Floating-Point Convert to Integer Model The following describes algorithmically the operation of the Floating Convert To Integer instructions. If Floating Convert To Integer Word then Do round_mode 1 FPSCRRN tgt_precision 1 "32-bit integer" End If Floating Convert To Integer Word with round toward Zero then Do round_mode 1 0b01 tgt_precision 1 "32-bit integer" End If Floating Convert To Integer Doubleword then Do round_mode 1 FPSCRRN tgt_precision 1 "64-bit integer" End If Floating Convert To Integer Doubleword with round toward Zero then Do round_mode 1 0b01 tgt_precision 1 "64-bit integer" End sign 1 (FRB)0 If (FRB)1:11 = 2047 and (FRB)12:63 = 0 then goto Infinity Operand If (FRB)1:11 = 2047 and (FRB)12 = 0 then goto SNaN Operand If (FRB)1:11 = 2047 and (FRB)12 = 1 then goto QNaN Operand If (FRB)1:11 > 1086 then goto Large Operand If (FRB)1:11 > 0 then exp 1 (FRB)1:11 - 1023 /* exp - bias */ If (FRB)1:11 = 0 then exp 1 -1022 If (FRB)1:11 > 0 then frac0:64 1 0b01 || (FRB)12:63 || 110 /* normal; need leading 0 for later complement */ If (FRB)1:11 = 0 then frac0:64 1 0b00 || (FRB)12:63 || 110 /* denormal */ gbit || rbit || xbit 1 0b000 Do i=1,63-exp /* do the loop 0 times if exp = 63 */ frac0:64 || gbit || rbit || xbit 1 0b0 || frac0:64 || gbit || (rbit | xbit) End Round Integer(sign,frac0:64,gbit,rbit,xbit,round_mode) If sign = 1 then frac0:64 1 ¬frac0:64 + 1 /* needed leading 0 for -264 < (FRB) < -263 */ If tgt_precision = "32-bit integer" and frac0:64 > 231-1 then goto Large Operand If tgt_precision = "64-bit integer" and frac0:64 > 263-1 then goto Large Operand If tgt_precision = "32-bit integer" and frac0:64 < -231 then goto Large Operand If tgt_precision = "64-bit integer" and frac0:64 < -263 then goto Large Operand FPSCRXX 1 FPSCRXX | FPSCRFI If tgt_precision = "32-bit integer" then FRT 1 0xuuuu_uuuu || frac33:64 /* u is undefined hex digit */ If tgt_precision = "64-bit integer" then FRT 1 frac1:64 FPSCRFPRF 1 undefined Done Appendix A. Suggested Floating-Point Models [Category: Floating-Point] 365 Version 2.05 Round Integer(sign,frac0:64,gbit,rbit,xbit,round_mode): inc 1 0 If round_mode = 0b00 then /* Round to Nearest */ Do /* comparisons ignore u bits */ If sign || frac64 || gbit || rbit || xbit = 0bu11uu then inc 1 1 If sign || frac64 || gbit || rbit || xbit = 0bu011u then inc 1 1 If sign || frac64 || gbit || rbit || xbit = 0bu01u1 then inc 1 1 End If round_mode = 0b10 then /* Round toward +Infinity */ Do /* comparisons ignore u bits */ If sign || frac64 || gbit || rbit || xbit = 0b0u1uu then inc 1 1 If sign || frac64 || gbit || rbit || xbit = 0b0uu1u then inc 1 1 If sign || frac64 || gbit || rbit || xbit = 0b0uuu1 then inc 1 1 End If round_mode = 0b11 then /* Round toward -Infinity */ Do /* comparisons ignore u bits */ If sign || frac64 || gbit || rbit || xbit = 0b1u1uu then inc 1 1 If sign || frac64 || gbit || rbit || xbit = 0b1uu1u then inc 1 1 If sign || frac64 || gbit || rbit || xbit = 0b1uuu1 then inc 1 1 End frac0:64 1 frac0:64 + inc FPSCRFR 1 inc FPSCRFI 1 gbit | rbit | xbit Return Infinity Operand: FPSCRFR FI VXCVI 1 0b001 If FPSCRVE = 0 then Do If tgt_precision = "32-bit integer" then Do If sign = 0 then FRT 1 0xuuuu_uuuu_7FFF_FFFF /* u is undefined hex digit */ If sign = 1 then FRT 1 0xuuuu_uuuu_8000_0000 /* u is undefined hex digit */ End Else Do If sign = 0 then FRT 1 0x7FFF_FFFF_FFFF_FFFF If sign = 1 then FRT 1 0x8000_0000_0000_0000 End FPSCRFPRF 1 undefined End Done SNaN Operand: FPSCRFR FI VXSNAN VXCVI 1 0b0011 If FPSCRVE = 0 then Do If tgt_precision = "32-bit integer" then FRT 1 0xuuuu_uuuu_8000_0000 /* u is undefined hex digit */ If tgt_precision = "64-bit integer" then FRT 1 0x8000_0000_0000_0000 FPSCRFPRF 1 undefined End Done QNaN Operand: FPSCRFR FI VXCVI 1 0b001 If FPSCRVE = 0 then Do If tgt_precision = "32-bit integer" then FRT 1 0xuuuu_uuuu_8000_0000 /* u is undefined hex digit */ If tgt_precision = "64-bit integer" then FRT 1 0x8000_0000_0000_0000 FPSCRFPRF 1 undefined End Done 366 Power ISATM I Version 2.05 Large Operand: FPSCRFR FI VXCVI 1 0b001 If FPSCRVE = 0 then Do If tgt_precision = "32-bit integer" then Do If sign = 0 then FRT 1 0xuuuu_uuuu_7FFF_FFFF /* u is undefined hex digit */ If sign = 1 then FRT 1 0xuuuu_uuuu_8000_0000 /* u is undefined hex digit */ End Else Do If sign = 0 then FRT 1 0x7FFF_FFFF_FFFF_FFFF If sign = 1 then FRT 1 0x8000_0000_0000_0000 End FPSCRFPRF 1 undefined End Done Appendix A. Suggested Floating-Point Models [Category: Floating-Point] 367 Version 2.05 A.3 Floating-Point Convert from Integer Model The following describes algorithmically the operation of the Floating Convert From Integer Doubleword instruction. sign 1 (FRB)0 exp 1 63 frac0:63 1 (FRB) If frac0:63 = 0 then go to Zero Operand If sign = 1 then frac0:63 1 ¬frac0:63 + 1 Do while frac0 = 0 /* do the loop 0 times if (FRB) = maximum negative integer */ frac0:63 1 frac1:63 || 0b0 exp 1 exp - 1 End Round Float(sign,exp,frac0:63,FPSCRRN) If sign = 0 then FPSCRFPRF 1 "+normal number" If sign = 1 then FPSCRFPRF 1 "-normal number" FRT0 1 sign FRT1:11 1 exp + 1023 /* exp + bias */ FRT12:63 1 frac1:52 Done Zero Operand: FPSCRFR FI 1 0b00 FPSCRFPRF 1 "+ zero" FRT 1 0x0000_0000_0000_0000 Done Round Float(sign,exp,frac0:63,round_mode): inc 1 0 lsb 1 frac52 gbit 1 frac53 rbit 1 frac54 xbit 1 frac55:63 > 0 If round_mode = 0b00 then /* Round to Nearest */ Do /* comparisons ignore u bits */ If sign || lsb || gbit || rbit || xbit = 0bu11uu then inc 1 1 If sign || lsb || gbit || rbit || xbit = 0bu011u then inc 1 1 If sign || lsb || gbit || rbit || xbit = 0bu01u1 then inc 1 1 End If round_mode = 0b10 then /* Round toward + Infinity */ Do /* comparisons ignore u bits */ If sign || lsb || gbit || rbit || xbit = 0b0u1uu then inc 1 1 If sign || lsb || gbit || rbit || xbit = 0b0uu1u then inc 1 1 If sign || lsb || gbit || rbit || xbit = 0b0uuu1 then inc 1 1 End If round_mode = 0b11 then /* Round toward - Infinity */ Do /* comparisons ignore u bits */ If sign || lsb || gbit || rbit || xbit = 0b1u1uu then inc 1 1 If sign || lsb || gbit || rbit || xbit = 0b1uu1u then inc 1 1 If sign || lsb || gbit || rbit || xbit = 0b1uuu1 then inc 1 1 End frac0:52 1 frac0:52 + inc If carry_out = 1 then exp 1 exp + 1 FPSCRFR 1 inc FPSCRFI 1 gbit | rbit | xbit FPSCRXX 1 FPSCRXX | FPSCRFI Return 368 Power ISATM I Version 2.05 A.4 Floating-Point Round to Integer Model The following describes algorithmically the operation of the Floating Round To Integer instructions. If (FRB)1:11 = 2047 and (FRB)12:63 = 0, then goto Infinity Operand If (FRB)1:11 = 2047 and (FRB)12 = 0, then goto SNaN Operand If (FRB)1:11 = 2047 and (FRB)12 = 1, then goto QNaN Operand if (FRB)1:63 = 0 then goto Zero Operand If (FRB)1:11 < 1023 then goto Small Operand /* exp < 0; |value| < 1*/ If (FRB)1:11 > 1074 then goto Large Operand /* exp > 51; integral value */ sign 1 (FRB)0 exp 1 (FRB)1:11 - 1023 /* exp - bias */ frac0:52 1 0b1 || (FRB)12:63 gbit || rbit || xbit 1 0b000 Do i = 1, 52 - exp frac0:52 || gbit || rbit || xbit 1 0b0 || frac0:52 || gbit || (rbit | xbit) End Round Integer (sign, frac0:52, gbit, rbit, xbit) Do i = 2, 52 - exp frac0:52 1 frac1:52 || 0b0 End If frac0 = 1, then exp 1 exp + 1 Else frac0:52 1 frac1:52 || 0b0 FRT0 1 sign FRT1:11 1 exp + 1023 FRT12:63 1 frac1:52 If (FRT)0 = 0 then FPSCRFPRF 1 "+ normal number" Else FPSCRFPRF 1 "- normal number" FPSCRFR FI 1 0b00 Done Round Integer(sign, frac0:52, gbit, rbit, xbit): inc 1 0 If inst = Floating Round to Integer Nearest then /* ties away from zero */ Do /* comparisons ignore u bits */ If sign || frac52 || gbit || rbit || xbit = 0buu1uu then inc 1 1 End If inst = Floating Round to Integer Plus then Do /* comparisons ignore u bits */ If sign || frac52 || gbit || rbit || xbit = 0b0u1uu then inc 1 1 If sign || frac52 || gbit || rbit || xbit = 0b0uu1u then inc 1 1 If sign || frac52 || gbit || rbit || xbit = 0b0uuu1 then inc 1 1 End If inst = Floating Round to Integer Minus then Do /* comparisons ignore u bits */ If sign || frac52 || gbit || rbit || xbit = 0b1u1uu then inc 1 1 If sign || frac52 || gbit || rbit || xbit = 0b1uu1u then inc 1 1 If sign || frac52 || gbit || rbit || xbit = 0b1uuu1 then inc 1 1 End frac0:52 1 frac0:52 + inc Return Appendix A. Suggested Floating-Point Models [Category: Floating-Point] 369 Version 2.05 Infinity Operand: FRT 1 (FRB) If (FRB)0 = 0 then FPSCRFPRF 1 "+ infinity" If (FRB)0 = 1 then FPSCRFPRF 1 "- infinity" FPSCRFR FI 1 0b00 Done SNaN Operand: FPSCRVXSNAN 1 1 If FPSCRVE = 0 then Do FRT 1 (FRB) FRT12 1 1 FPSCRFPRF 1 "QNaN" End FPSCRFR FI 1 0b00 Done QNaN Operand: FRT 1 (FRB) FPSCRFPRF 1 "QNaN" FPSCRFR FI 1 0b00 Done Zero Operand: If (FRB)0 = 0 then Do FRT 1 0x0000_0000_0000_0000 FPSCRFPRF 1 "+ zero" End Else Do FRT 1 0x8000_0000_0000_0000 FPSCRFPRF 1 "- zero" End FPSCRFR FI 1 0b00 Done Small Operand: If inst = Floating Round to Integer Nearest and (FRB)1:11 < 1022 then goto Zero Operand If inst = Floating Round to Integer Toward Zero then goto Zero Operand If inst = Floating Round to Integer Plus and (FRB)0 = 1 then goto Zero Operand If inst = Floating Round to Integer Minus and (FRB)0 = 0 then goto Zero Operand If (FRB)0 = 0 then Do FRT 1 0x3FF0_0000_0000_0000 /* value = 1.0 */ FPSCRFPRF 1 "+ normal number" End Else Do FRT 1 0xBFF0_0000_0000_0000 /* value = -1.0 */ FPSCRFPRF 1 "- normal number" End FPSCRFR FI 1 0b00 Done Large Operand: FRT 1 (FRB) If FRT0 = 0 then FPSCRFPRF 1 "+ normal number" Else FPSCRFPRF 1 "- normal number" FPSCRFR FI 1 0b00 Done 370 Power ISATM I Version 2.05 Appendix A. Densely Packed Decimal The trailing significand field of the decimal floating-point can be applied or reversed using simple Boolean oper- data format is encoded using Densely Packed Decimal ations. In the following examples, a 3-digit BCD num- (DPD). DPD encoding is a compression technique ber is represented as (abcd)(efgh)(ijkm), a 10-bit DPD which supports the representation of decimal integers number is represented as (pqr)(stu)(v)(wxy), and the of arbitrary length. Translation operates on three Boolean operations, & (AND), | (OR), and ¬ (NOT) are Binary Coded Decimal (BCD) digits at a time com- used. pressing the 12 bits into 10 bits with an algorithm that A.1 BCD-to-DPD Translation the DPD entries shown in hexadecimal format. The BCD number is produced by replacing `_' in the left- The translation from a 3-digit BCD number to a 10-bit most column with the corresponding digit along the top DPD can be performed through the following Boolean row. The table is split into two halves, with the right half operations. being a continuation of the left half. p = (f & a & i & ¬e) | (j & a & ¬i) | (b & ¬a) q = (g & a & i & ¬e) | (k & a & ¬i) | (c & ¬a) A.2 DPD-to-BCD Translation r = d The translation from a 10-bit DPD to a 3-digit BCD s = (j & ¬a & e & ¬i) | (f & ¬i & ¬e) | number can be performed through the following Bool- (f & ¬a & ¬e) | (e & i) ean operations. t = (k & ¬a & e & ¬i) | (g & ¬i & ¬e) | (g & ¬a & ¬e) | (a & i) a = (¬s & v & w) | (t & v & w & s) | (v & w & ¬x) u = h b = (p & s & x & ¬t) | (p & ¬w) | (p & ¬v) c = (q & s & x & ¬t) | (q & ¬w) | (q & ¬v) v = a | e | i d = r w = (¬e & j & ¬i) | (e & i) | a e = (v & ¬w & x) | (s & v & w & x) | x = (¬a & k & ¬i) | (a & i) | e (¬t & v & x & w) y = m f = (p & t & v & w & x & ¬s) | (s & ¬x & v) | Alternatively, the following table can be used to perform (s & ¬v) g = (q & t & w & v & x & ¬s) | (t & ¬x & v) | the translation. The most significant bit of the three (t & ¬v) BCD digits (left column) is used to select a specific 10- h = u bit encoding (right column) of the DPD. i = (t & v & w & x) | (s & v & w & x) | aei pqr stu v wxy (v & ¬w & ¬x) 000 bcd fgh 0 jkm j = (p & ¬s & ¬t & w & v) | (s & v & ¬w & x) | (p & w & ¬x & v) | (w & ¬v) 001 bcd fgh 1 00m k = (q & ¬s & ¬t & v & w) | (t & v & ¬w & x) | 010 bcd jkh 1 01m (q & v & w & ¬x) | (x & ¬v) m = y 011 bcd 10h 1 11m 100 jkd fgh 1 10m Alternatively, the following table can be used to perform 101 fgd 01h 1 11m the translation. A combination of five bits in the DPD 110 jkd 00h 1 11m encoding (leftmost column) are used to specify a trans- lation to the 3-digit BCD encoding. Dashes (-) in the 111 00d 11h 1 11m table are don't cares, and can be either one or zero. The full translation of a 3-digit BCD number (000 - 999) to a 10-bit DPD is shown in Table 11 on page 373, with Appendix A. Densely Packed Decimal 371 Version 2.05 vwxst abcd efgh ijkm DPD Code BCD Value DPD Code BCD Value 0---- 0pqr 0stu 0wxy 0x06E 0x0EE 100-- 0pqr 0stu 100y (0x16E) 888 (0x1EE) 988 101-- 0pqr 100u 0sty (0x26E) (0x2EE) 110-- 100r 0stu 0pqy (0x36E) (0x3EE) 11100 100r 100u 0pqy 0x06F 0x0EF 11101 100r 0pqu 100y (0x16F) 889 (0x1EF) 989 11110 0pqr 100u 100y (0x26F) (0x2EF) 11111 100r 100u 100y (0x36F) (0x3EF) The full translation of the 10-bit DPD to a 3-digit BCD 0x07E 0x0FE number is shown in Table 12 on page 374. The 10-bit (0x17E) 898 (0x1FE) 998 DPD index is produced by concatenating the 6-bit (0x27E) (0x2FE) value shown in the left column with the 4-bit index (0x37E) (0x3FE) along the top row, both represented in hexadecimal. The values in parentheses are non-preferred transla- 0x07F 0x0FF tions and are explained further in the following section. (0x17F) 899 (0x1FF) 999 (0x27F) (0x2FF) A.3 Preferred DPD encoding (0x37F) (0x3FF) Translating from a 3-digit BCD number (1000 numbers) to a 10-bit DPD encoding (1024 combinations) leaves 24 redundant translations. The 24 redundant combina- tions are evenly assigned to eight BCD numbers and are shown in the following table, with the non-preferred encoding in parentheses. The preferred encoding is produced by translating a 3-digit BCD number with the translation table or Boolean operations shown in Sec- tion A.1. The redundant DPD encodings are all valid and will be correctly translated to their respective BCD value through the mechanisms provided in Section A.2. For decimal floating-point operations all DPD encod- ings are recognized as source operands. 372 Power ISATM I Version 2.05 Table 11:BCD-to-DPD translation 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 00_ 000 001 002 003 004 005 006 007 008 009 50_ 280 281 282 283 284 285 286 287 288 289 01_ 010 011 012 013 014 015 016 017 018 019 51_ 290 291 292 293 294 295 296 297 298 299 02_ 020 021 022 023 024 025 026 027 028 029 52_ 2A0 2A1 2A2 2A3 2A4 2A5 2A6 2A7 2A8 2A9 03_ 030 031 032 033 034 035 036 037 038 039 53_ 2B0 2B1 2B2 2B3 2B4 2B5 2B6 2B7 2B8 2B9 04_ 040 041 042 043 044 045 046 047 048 049 54_ 2C0 2C1 2C2 2C3 2C4 2C5 2C6 2C7 2C8 2C9 05_ 050 051 052 053 054 055 056 057 058 059 55_ 2D0 2D1 2D2 2D3 2D4 2D5 2D6 2D7 2D8 2D9 06_ 060 061 062 063 064 065 066 067 068 069 56_ 2E0 2E1 2E2 2E3 2E4 2E5 2E6 2E7 2E8 2E9 07_ 070 071 072 073 074 075 076 077 078 079 57_ 2F0 2F1 2F2 2F3 2F4 2F5 2F6 2F7 2F8 2F9 08_ 00A 00B 02A 02B 04A 04B 06A 06B 04E 04F 58_ 28A 28B 2AA 2AB 2CA 2CB 2EA 2EB 2CE 2CF 09_ 01A 01B 03A 03B 05A 05B 07A 07B 05E 05F 59_ 29A 29B 2BA 2BB 2DA 2DB 2FA 2FB 2DE 2DF 10_ 080 081 082 083 084 085 086 087 088 089 60_ 300 301 302 303 304 305 306 307 308 309 11_ 090 091 092 093 094 095 096 097 098 099 61_ 310 311 312 313 314 315 316 317 318 319 12_ 0A0 0A1 0A2 0A3 0A4 0A5 0A6 0A7 0A8 0A9 62_ 320 321 322 323 324 325 326 327 328 329 13_ 0B0 0B1 0B2 0B3 0B4 0B5 0B6 0B7 0B8 0B9 63_ 330 331 332 333 334 335 336 337 338 339 14_ 0C0 0C1 0C2 0C3 0C4 0C5 0C6 0C7 0C8 0C9 64_ 340 341 342 343 344 345 346 347 348 349 15_ 0D0 0D1 0D2 0D3 0D4 0D5 0D6 0D7 0D8 0D9 65_ 350 351 352 353 354 355 356 357 358 359 16_ 0E0 0E1 0E2 0E3 0E4 0E5 0E6 0E7 0E8 0E9 66_ 360 361 362 363 364 365 366 367 368 369 17_ 0F0 0F1 0F2 0F3 0F4 0F5 0F6 0F7 0F8 0F9 67_ 370 371 372 373 374 375 376 377 378 379 18_ 08A 08B 0AA 0AB 0CA 0CB 0EA 0EB 0CE 0CF 68_ 30A 30B 32A 32B 34A 34B 36A 36B 34E 34F 19_ 09A 09B 0BA 0BB 0DA 0DB 0FA 0FB 0DE 0DF 69_ 31A 31B 33A 33B 35A 35B 37A 37B 35E 35F 20_ 100 101 102 103 104 105 106 107 108 109 70_ 380 381 382 383 384 385 386 387 388 389 21_ 110 111 112 113 114 115 116 117 118 119 71_ 390 391 392 393 394 395 396 397 398 399 22_ 120 121 122 123 124 125 126 127 128 129 72_ 3A0 3A1 3A2 3A3 3A4 3A5 3A6 3A7 3A8 3A9 23_ 130 131 132 133 134 135 136 137 138 139 73_ 3B0 3B1 3B2 3B3 3B4 3B5 3B6 3B7 3B8 3B9 24_ 140 141 142 143 144 145 146 147 148 149 74_ 3C0 3C1 3C2 3C3 3C4 3C5 3C6 3C7 3C8 3C9 25_ 150 151 152 153 154 155 156 157 158 159 75_ 3D0 3D1 3D2 3D3 3D4 3D5 3D6 3D7 3D8 3D9 26_ 160 161 162 163 164 165 166 167 168 169 76_ 3E0 3E1 3E2 3E3 3E4 3E5 3E6 3E7 3E8 3E9 27_ 170 171 172 173 174 175 176 177 178 179 77_ 3F0 3F1 3F2 3F3 3F4 3F5 3F6 3F7 3F8 3F9 28_ 10A 10B 12A 12B 14A 14B 16A 16B 14E 14F 78_ 38A 38B 3AA 3AB 3CA 3CB 3EA 3EB 3CE 3CF 29_ 11A 11B 13A 13B 15A 15B 17A 17B 15E 15F 79_ 39A 39B 3BA 3BB 3DA 3DB 3FA 3FB 3DE 3DF 30_ 180 181 182 183 184 185 186 187 188 189 80_ 00C 00D 10C 10D 20C 20D 30C 30D 02E 02F 31_ 190 191 192 193 194 195 196 197 198 199 81_ 01C 01D 11C 11D 21C 21D 31C 31D 03E 03F 32_ 1A0 1A1 1A2 1A3 1A4 1A5 1A6 1A7 1A8 1A9 82_ 02C 02D 12C 12D 22C 22D 32C 32D 12E 12F 33_ 1B0 1B1 1B2 1B3 1B4 1B5 1B6 1B7 1B8 1B9 83_ 03C 03D 13C 13D 23C 23D 33C 33D 13E 13F 34_ 1C0 1C1 1C2 1C3 1C4 1C5 1C6 1C7 1C8 1C9 84_ 04C 04D 14C 14D 24C 24D 34C 34D 22E 22F 35_ 1D0 1D1 1D2 1D3 1D4 1D5 1D6 1D7 1D8 1D9 85_ 05C 05D 15C 15D 25C 25D 35C 35D 23E 23F 36_ 1E0 1E1 1E2 1E3 1E4 1E5 1E6 1E7 1E8 1E9 86_ 06C 06D 16C 16D 26C 26D 36C 36D 32E 32F 37_ 1F0 1F1 1F2 1F3 1F4 1F5 1F6 1F7 1F8 1F9 87_ 07C 07D 17C 17D 27C 27D 37C 37D 33E 33F 38_ 18A 18B 1AA 1AB 1CA 1CB 1EA 1EB 1CE 1CF 88_ 00E 00F 10E 10F 20E 20F 30E 30F 06E 06F 39_ 19A 19B 1BA 1BB 1DA 1DB 1FA 1FB 1DE 1DF 89_ 01E 01F 11E 11F 21E 21F 31E 31F 07E 07F 40_ 200 201 202 203 204 205 206 207 208 209 90_ 08C 08D 18C 18D 28C 28D 38C 38D 0AE 0AF 41_ 210 211 212 213 214 215 216 217 218 219 91_ 09C 09D 19C 19D 29C 29D 39C 39D 0BE 0BF 42_ 220 221 222 223 224 225 226 227 228 229 92_ 0AC 0AD 1AC 1AD 2AC 2AD 3AC 3AD 1AE 1AF 43_ 230 231 232 233 234 235 236 237 238 239 93_ 0BC 0BD 1BC 1BD 2BC 2BD 3BC 3BD 1BE 1BF 44_ 240 241 242 243 244 245 246 247 248 249 94_ 0CC 0CD 1CC 1CD 2CC 2CD 3CC 3CD 2AE 2AF 45_ 250 251 252 253 254 255 256 257 258 259 95_ 0DC 0DD 1DC 1DD 2DC 2DD 3DC 3DD 2BE 2BF 46_ 260 261 262 263 264 265 266 267 268 269 96_ 0EC 0ED 1EC 1ED 2EC 2ED 3EC 3ED 3AE 3AF 47_ 270 271 272 273 274 275 276 277 278 279 97_ 0FC 0FD 1FC 1FD 2FC 2FD 3FC 3FD 3BE 3BF 48_ 20A 20B 22A 22B 24A 24B 26A 26B 24E 24F 98_ 08E 08F 18E 18F 28E 28F 38E 38F 0EE 0EF 49_ 21A 21B 23A 23B 25A 25B 27A 27B 25E 25F 99_ 09E 09F 19E 19F 29E 29F 39E 39F 0FE 0FF Appendix A. Densely Packed Decimal 373 Version 2.05 Table 12: DPD-to-BCD translation 0 1 2 3 4 5 6 7 8 9 A B C D E F 00_ 000 001 002 003 004 005 006 007 008 009 080 081 800 801 880 881 01_ 010 011 012 013 014 015 016 017 018 019 090 091 810 811 890 891 02_ 020 021 022 023 024 025 026 027 028 029 082 083 820 821 808 809 03_ 030 031 032 033 034 035 036 037 038 039 092 093 830 831 818 819 04_ 040 041 042 043 044 045 046 047 048 049 084 085 840 841 088 089 05_ 050 051 052 053 054 055 056 057 058 059 094 095 850 851 098 099 06_ 060 061 062 063 064 065 066 067 068 069 086 087 860 861 888 889 07_ 070 071 072 073 074 075 076 077 078 079 096 097 870 871 898 899 08_ 100 101 102 103 104 105 106 107 108 109 180 181 900 901 980 981 09_ 110 111 112 113 114 115 116 117 118 119 190 191 910 911 990 991 0A_ 120 121 122 123 124 125 126 127 128 129 182 183 920 921 908 909 0B_ 130 131 132 133 134 135 136 137 138 139 192 193 930 931 918 919 0C_ 140 141 142 143 144 145 146 147 148 149 184 185 940 941 188 189 0D_ 150 151 152 153 154 155 156 157 158 159 194 195 950 951 198 199 0E_ 160 161 162 163 164 165 166 167 168 169 186 187 960 961 988 989 0F_ 170 171 172 173 174 175 176 177 178 179 196 197 970 971 998 999 10_ 200 201 202 203 204 205 206 207 208 209 280 281 802 803 882 883 11_ 210 211 212 213 214 215 216 217 218 219 290 291 812 813 892 893 12_ 220 221 222 223 224 225 226 227 228 229 282 283 822 823 828 829 13_ 230 231 232 233 234 235 236 237 238 239 292 293 832 833 838 839 14_ 240 241 242 243 244 245 246 247 248 249 284 285 842 843 288 289 15_ 250 251 252 253 254 255 256 257 258 259 294 295 852 853 298 299 16_ 260 261 262 263 264 265 266 267 268 269 286 287 862 863 (888) (889) 17_ 270 271 272 273 274 275 276 277 278 279 296 297 872 873 (898) (899) 18_ 300 301 302 303 304 305 306 307 308 309 380 381 902 903 982 983 19_ 310 311 312 313 314 315 316 317 318 319 390 391 912 913 992 993 1A_ 320 321 322 323 324 325 326 327 328 329 382 383 922 923 928 929 1B_ 330 331 332 333 334 335 336 337 338 339 392 393 932 933 938 939 1C_ 340 341 342 343 344 345 346 347 348 349 384 385 942 943 388 389 1D_ 350 351 352 353 354 355 356 357 358 359 394 395 952 953 398 399 1E_ 360 361 362 363 364 365 366 367 368 369 386 387 962 963 (988) (989) 1F_ 370 371 372 373 374 375 376 377 378 379 396 397 972 973 (998) (999) 20_ 400 401 402 403 404 405 406 407 408 409 480 481 804 805 884 885 21_ 410 411 412 413 414 415 416 417 418 419 490 491 814 815 894 895 22_ 420 421 422 423 424 425 426 427 428 429 482 483 824 825 848 849 23_ 430 431 432 433 434 435 436 437 438 439 492 493 834 835 858 859 24_ 440 441 442 443 444 445 446 447 448 449 484 485 844 845 488 489 25_ 450 451 452 453 454 455 456 457 458 459 494 495 854 855 498 499 26_ 460 461 462 463 464 465 466 467 468 469 486 487 864 865 (888) (889) 27_ 470 471 472 473 474 475 476 477 478 479 496 497 874 875 (898) (899) 28_ 500 501 502 503 504 505 506 507 508 509 580 581 904 905 984 985 29_ 510 511 512 513 514 515 516 517 518 519 590 591 914 915 994 995 2A_ 520 521 522 523 524 525 526 527 528 529 582 583 924 925 948 949 2B_ 530 531 532 533 534 535 536 537 538 539 592 593 934 935 958 959 2C_ 540 541 542 543 544 545 546 547 548 549 584 585 944 945 588 589 2D_ 550 551 552 553 554 555 556 557 558 559 594 595 954 955 598 599 2E_ 560 561 562 563 564 565 566 567 568 569 586 587 964 965 (988) (989) 2F_ 570 571 572 573 574 575 576 577 578 579 596 597 974 975 (998) (999) 30_ 600 601 602 603 604 605 606 607 608 609 680 681 806 807 886 887 31_ 610 611 612 613 614 615 616 617 618 619 690 691 816 817 896 897 32_ 620 621 622 623 624 625 626 627 628 629 682 683 826 827 868 869 33_ 630 631 632 633 634 635 636 637 638 639 692 693 836 837 878 879 34_ 640 641 642 643 644 645 646 647 648 649 684 685 846 847 688 689 35_ 650 651 652 653 654 655 656 657 658 659 694 695 856 857 698 699 36_ 660 661 662 663 664 665 666 667 668 669 686 687 866 867 (888) (889) 37_ 670 671 672 673 674 675 676 677 678 679 696 697 876 877 (898) (899) 38_ 700 701 702 703 704 705 706 707 708 709 780 781 906 907 986 987 39_ 710 711 712 713 714 715 716 717 718 719 790 791 916 917 996 997 3A_ 720 721 722 723 724 725 726 727 728 729 782 783 926 927 968 969 3B_ 730 731 732 733 734 735 736 737 738 739 792 793 936 937 978 979 3C_ 740 741 742 743 744 745 746 747 748 749 784 785 946 947 788 789 3D_ 750 751 752 753 754 755 756 757 758 759 794 795 956 957 798 799 3E_ 760 761 762 763 764 765 766 767 768 769 786 787 966 967 (988) (989) 3F_ 770 771 772 773 774 775 776 777 778 779 796 797 976 977 (998) (999) 374 Power ISATM I Version 2.05 Appendix B. Vector RTL Functions [Category: Vector] ConvertSPtoSXWsaturate( X, Y ) sign = X0 exp0:7 = X1:8 frac0:30 = X9:31 || 0b0000_0000 if((exp==255)&(frac!=0)) then return(0x0000_0000) // NaN operand if((exp==255)&(frac==0)) then do // infinity operand VSCRSAT = 1 return( (sign==1) ? 0x8000_0000 : 0x7FFF_FFFF ) if((exp+Y-127)>30) then do // large operand VSCRSAT = 1 return( (sign==1) ? 0x8000_0000 : 0x7FFF_FFFF ) if((exp+Y-127)<0) then return(0x0000_0000) // -1.0 < value < 1.0 (value rounds to 0) significand0:31 = 0b1 || frac do i=1 to 31-(exp+Y-127) significand = significand >>ui 1 return( (sign==0) ? significand : (¬significand + 1) ) ConvertSPtoUXWsaturate( X, Y ) sign = X0 exp0:7 = X1:8 frac0:30 = X9:31 || 0b0000_0000 if((exp==255)&&(frac!=0)) then return(0x0000_0000) // NaN operand if((exp==255)&&(frac==0)) then do // infinity operand VSCRSAT = 1 return( (sign==1) ? 0x0000_0000 : 0xFFFF_FFFF ) if((exp+Y-127)>31) then do // large operand VSCRSAT = 1 return( (sign==1) ? 0x0000_0000 : 0xFFFF_FFFF ) if((exp+Y-127)<0) then return(0x0000_0000) // -1.0 < value < 1.0 // value rounds to 0 if( sign==1 ) then do // negative operand VSCRSAT = 1 return(0x0000_0000) significand0:31 = 0b1 || frac do i=1 to 31-(exp+Y-127) significand = significand >>ui 1 return( significand ) ConvertSXWtoSP( X ) sign = X0 exp0:7 = 32 + 127 frac0:32 = X0 || X0:31 if( frac==0 ) return( 0x0000_0000 ) // Zero operand if( sign==1 ) then frac = ¬frac + 1 do while( frac0==0 ) frac = frac << 1 exp = exp - 1 lsb = frac23 gbit = frac24 xbit = frac25:32!=0 inc = ( lsb && gbit ) | ( gbit && xbit ) frac0:23 = frac0:23 + inc if( carry_out==1 ) exp = exp + 1 return( sign || exp || frac1:23 ) Appendix B. Vector RTL Functions [Category: Vector] 375 Version 2.05 ConvertUXWtoSP( X ) exp0:7 = 31 + 127 frac0:31 = X0:31 if( frac==0 ) return( 0x0000_0000 ) // Zero Operand do while( frac0==0 ) frac = frac << 1 exp = exp - 1 lsb = frac23 gbit = frac24 xbit = frac25:31!=0 inc = ( lsb && gbit ) | ( gbit && xbit ) frac0:23 = frac0:23 + inc if( carry_out==1 ) exp = exp + 1 return( 0b0 || exp || frac1:23 ) 376 Power ISATM I Version 2.05 Appendix C. Embedded Floating-Point RTL Functions [Category: SPE.Embedded Float Scalar Double] [Category: SPE.Embedded Float Scalar Single] [Category: SPE.Embedded Float Vector] C.1 Common Functions // Round a 32-bit fp result Round32(fp, guard, sticky) // Check if 32-bit fp value is a NaN or Infinity FP32format fp; Isa32NaNorInfinity(fp) if (SPEFSCRFINXE = 0) then return (fpexp = 255) if (SPEFSCRFRMC = 0b00) then // nearest if (guard) then Isa32NaN(fp) if (sticky | fpfrac[22]) then return ((fpexp = 255) & (fpfrac 0)) v0:23 1 fpfrac + 1 if v0 then // Check if 32-bit fp value is denormalized if (fpexp >= 254) then Isa32Denorm(fp) // overflow return ((fpexp = 0) & (fpfrac 0)) fp 1 fpsign || 0b11111110 || 231 else // Check if 64-bit fp value is a NaN or Infinity fpexp 1 fpexp + 1 Isa64NaNorInfinity(fp) fpfrac 1 v1:23 return (fpexp = 2047) else fpfrac 1 v1:23 Isa64NaN(fp) else if ((SPEFSCRFRMC & 0b10) = 0b10) then return ((fpexp = 2047) & (fpfrac 0)) // infinity modes // implementation dependent // Check if 32-bit fp value is denormalized return fp Isa64Denorm(fp) return ((fpexp = 0) & (fpfrac 0)) // Round a 64-bit fp result Round64(fp, guard, sticky) // Signal an error in the SPEFSCR SignalFPError(upper_lower, bits) FP32format fp; if (upper_lower = HI) then if (SPEFSCRFINXE = 0) then bits 1 bits << 15 if (SPEFSCRFRMC = 0b00) then // nearest SPEFSCR 1 SPEFSCR | bits if (guard) then bits 1 (FG | FX) if (sticky | fpfrac[51]) then if (upper_lower = HI) then v0:52 1 fpfrac + 1 bits 1 bits << 15 if v0 then SPEFSCR 1 SPEFSCR & ¬bits if (fpexp >= 2046) then // overflow fp 1 fpsign || 0b11111111110 || 521 else fpexp 1 fpexp + 1 fpfrac 1 v1:52 else fpfrac 1 v1:52 else if ((SPEFSCRFRMC & 0b10) = 0b10) then // infinity modes // implementation dependent return fp Appendix C. Embedded Floating-Point RTL Functions [Category: SPE.Em- 377 Version 2.05 guard 1 result & 0x00000001 C.2 Convert from Single-Preci- result 1 result > 1 sion Embedded Floating-Point // Report sticky and guard bits if (upper_lower = HI) then to Integer Word with Saturation SPEFSCRFGH 1 guard SPEFSCRFXH 1 sticky // Convert 32-bit Floating-Point to 32-bit integer else // or fractional SPEFSCRFG 1 guard // signed = S (signed) or U (unsigned) SPEFSCRFX 1 sticky // upper_lower = HI (high word) or LO (low word) // round = RND (round) or ZER (truncate) if (guard | sticky) then // fractional = F (fractional) or I (integer) SPEFSCRFINXS 1 1 // Round the integer result CnvtFP32ToI32Sat(fp, signed, if ((round = RND) & (SPEFSCRFINXE = 0)) then upper_lower, round, fractional) if (SPEFSCRFRMC = 0b00) then // nearest if (guard) then FP32format fp; if (sticky | (result & 0x00000001)) then if (Isa32NaNorInfinity(fp)) then result 1 result + 1 SignalFPError(upper_lower, FINV) else if ((SPEFSCRFRMC & 0b10) = 0b10) then if (Isa32NaN(fp)) then // infinity modes return 0x00000000 // all NaNs // implementation dependent if (signed = S) then if (signed = S) then if (fpsign = 1) then if (fpsign = 1) then return 0x80000000 result 1 ¬result + 1 else return result return 0x7fffffff else if (fpsign = 1) then return 0x00000000 else return 0xffffffff if (Isa32Denorm(fp)) then SignalFPError(upper_lower, FINV) return 0x00000000 // regardless of sign if ((signed = U) & (fpsign = 1)) then SignalFPError(upper_lower, FOVF) // overflow return 0x00000000 if ((fpexp = 0) & (fpfrac = 0)) then return 0x00000000 // all zero values if (fractional = I) then // convert to integer max_exp 1 158 shift 1 158 - fpexp if (signed = S) then if ((fpexp158)|(fpfrac0)|(fpsign1)) then max_exp 1 max_exp - 1 else // fractional conversion max_exp 1 126 shift 1 126 - fpexp if (signed = S) then shift 1 shift + 1 if (fpexp > max_exp) then SignalFPError(upper_lower, FOVF) // overflow if (signed = S) then if (fpsign = 1) then return 0x80000000 else return 0x7fffffff else return 0xffffffff result 1 0b1 || fpfrac || 0b00000000 // add U bit guard 1 0 sticky 1 0 for (n 1 0; n < shift; n 1 n + 1) do sticky 1 sticky | guard 378 Power ISATM I Version 2.05 guard 1 result & 0x00000001 C.3 Convert from Double-Preci- result 1 result > 1 sion Embedded Floating-Point // Report sticky and guard bits to Integer Word with Saturation SPEFSCRFG 1 guard SPEFSCRFX 1 sticky // Convert 64-bit Floating-Point to 32-bit integer // or fractional if (guard | sticky) then // signed = S (signed) or U (unsigned) SPEFSCRFINXS 1 1 // round = RND (round) or ZER (truncate) // Round the result // fractional = F (fractional) or I (integer) if ((round = RND) & (SPEFSCRFINXE = 0)) then if (SPEFSCRFRMC = 0b00) then // nearest CnvtFP64ToI32Sat(fp, signed, round, if (guard) then fractional) if (sticky | (result & 0x00000001)) then FP64format fp; result 1 result + 1 else if ((SPEFSCRFRMC & 0b10) = 0b10) then if (Isa64NaNorInfinity(fp)) then // infinity modes SignalFPError(LO, FINV) // implementation dependent if (Isa64NaN(fp)) then if (signed = S) then return 0x00000000 // all NaNs if (fpsign = 1) then if (signed = S) then result 1 ¬result + 1 if (fpsign = 1) then return result return 0x80000000 else return 0x7fffffff else if (fpsign = 1) then return 0x00000000 else return 0xffffffff if (Isa64Denorm(fp)) then SignalFPError(LO, FINV) return 0x00000000 // regardless of sign if ((signed = U) & (fpsign = 1)) then SignalFPError(LO, FOVF) // overflow return 0x00000000 if ((fpexp = 0) & (fpfrac = 0)) then return 0x00000000 // all zero values if (fractional = I) then // convert to integer max_exp 1 1054 shift 1 1054 - fpexp if (signed 1 S) then if ((fpexp1054)|(fpfrac0)|(fpsign1)) then max_exp 1 max_exp - 1 else // fractional conversion max_exp 1 1022 shift 1 1022 - fpexp if (signed = S) then shift 1 shift + 1 if (fpexp > max_exp) then SignalFPError(LO, FOVF) // overflow if (signed = S) then if (fpsign = 1) then return 0x80000000 else return 0x7fffffff else return 0xffffffff result 1 0b1 || fpfrac[0:30] // add U to frac guard 1 fpfrac[31] sticky 1 (fpfrac[32:63] 0) for (n 1 0; n < shift; n 1 n + 1) do sticky 1 sticky | guard Appendix C. Embedded Floating-Point RTL Functions [Category: SPE.Em- 379 Version 2.05 C.4 Convert from Double-Preci- if (guard | sticky) then SPEFSCRFINXS 1 1 sion Embedded Floating-Point // Round the result if ((round = RND) & (SPEFSCRFINXE = 0)) then to Integer Doubleword with Satu- if (SPEFSCRFRMC = 0b00) then // nearest if (guard) then ration if (sticky | (result&0x00000000_00000001)) then // Convert 64-bit Floating-Point to 64-bit integer result 1 result + 1 // signed = S (signed) or U (unsigned) else if ((SPEFSCRFRMC & 0b10) = 0b10) then // round = RND (round) or ZER (truncate) // infinity modes // implementation dependent CnvtFP64ToI64Sat(fp, signed, round) if (signed = S) then FP64format fp; if (fpsign = 1) then if (Isa64NaNorInfinity(fp)) then result 1 ¬result + 1 SignalFPError(LO, FINV) return result if (Isa64NaN(fp)) then return 0x00000000_00000000 // all NaNs if (signed = S) then if (fpsign = 1) then return 0x80000000_00000000 else return 0x7fffffff_ffffffff else if (fpsign = 1) then return 0x00000000_00000000 else return 0xffffffff_ffffffff if (Isa64Denorm(fp)) then SignalFPError(LO, FINV) return 0x00000000_00000000 if ((signed = U) & (fpsign = 1)) then SignalFPError(LO, FOVF) // overflow return 0x00000000_00000000 if ((fpexp = 0) & (fpfrac = 0)) then return 0x00000000_00000000 // all zero values max_exp 1 1086 shift 1 1086 - fpexp if (signed = S) then if ((fpexp1086)|(fpfrac0)|(fpsign1)) then max_exp 1 max_exp - 1 if (fpexp > max_exp) then SignalFPError(LO, FOVF) // overflow if (signed = S) then if (fpsign = 1) then return 0x80000000_00000000 else return 0x7fffffff_ffffffff else return 0xffffffff_ffffffff result 1 0b1 || fpfrac || 0b00000000000 //add U bit guard 1 0 sticky 1 0 for (n 1 0; n < shift; n 1 n + 1) do sticky 1 sticky | guard guard 1 result & 0x00000000_00000001 result 1 result > 1 // Report sticky and guard bits SPEFSCRFG 1 guard SPEFSCRFX 1 sticky 380 Power ISATM I Version 2.05 C.5 Convert to Single-Precision C.6 Convert to Double-Precision Embedded Floating-Point from Embedded Floating-Point from Integer Word Integer Word // Convert from 32-bit integer or fractional to // Convert from integer or fractional to 64 bit // 32-bit Floating-Point // Floating-Point // signed = S (signed) or U (unsigned) // signed = S (signed) or U (unsigned) // round = RND (round) or ZER (truncate) // fractional = F (fractional) or I (integer) // fractional = F (fractional) or I (integer) CnvtI32ToFP64(v, signed, fractional) CnvtI32ToFP32(v, signed, upper_lower, FP64format result; fractional) resultsign 1 0 FP32format result; if (v = 0) then resultsign 1 0 result 1 0 if (v = 0) then SPEFSCRFG 1 0 result 1 0 SPEFSCRFX 1 0 if (upper_lower = HI) then else SPEFSCRFGH 1 0 if (signed = S) then SPEFSCRFXH 1 0 if (v0 = 1) then else v 1 ¬v + 1 SPEFSCRFG 1 0 resultsign 1 1 SPEFSCRFX 1 0 if (fractional = F) then // frac bit align else maxexp 1 1023 if (signed = S) then if (signed = U) then if (v0 = 1) then maxexp 1 maxexp - 1 v 1 ¬v + 1 else resultsign 1 1 maxexp 1 1054 // integer bit align if (fractional = F) then // frac bit align sc 1 0 maxexp 1 127 while (v0 = 0) if (signed = U) then v 1 v << 1 maxexp 1 maxexp - 1 sc 1 sc + 1 else v0 1 0 // clear U bit maxexp 1 158 // integer bit alignment resultexp 1 maxexp - sc sc 1 0 while (v0 = 0) // Report sticky and guard bits v 1 v << 1 sc 1 sc + 1 SPEFSCRFG 1 0 v0 1 0 // clear U bit SPEFSCRFX 1 0 resultexp 1 maxexp - sc guard 1 v24 resultfrac 1 v1:31 || 210 sticky 1 (v25:31 0) return result // Report sticky and guard bits if (upper_lower = HI) then SPEFSCRFGH 1 guard SPEFSCRFXH 1 sticky else SPEFSCRFG 1 guard SPEFSCRFX 1 sticky if (guard | sticky) then SPEFSCRFINXS 1 1 // Round the result resultfrac 1 v1:23 result 1 Round32(result, guard, sticky) return result Appendix C. Embedded Floating-Point RTL Functions [Category: SPE.Em- 381 Version 2.05 C.7 Convert to Double-Precision Embedded Floating-Point from Integer Doubleword // Convert from 64-bit integer to 64-bit // floating-point // signed = S (signed) or U (unsigned) CnvtI64ToFP64(v, signed) FP64format result; resultsign 1 0 if (v = 0) then result 1 0 SPEFSCRFG 1 0 SPEFSCRFX 1 0 else if (signed = S) then if (v0 = 1) then v 1 ¬v + 1 resultsign 1 1 maxexp 1 1054 sc 1 0 while (v0 = 0) v 1 v << 1 sc 1 sc + 1 v0 1 0 // clear U bit resultexp 1 maxexp - sc guard 1 v53 sticky 1 (v54:63 0) // Report sticky and guard bits SPEFSCRFG 1 guard SPEFSCRFX 1 sticky if (guard | sticky) then SPEFSCRFINXS 1 1 // Round the result resultfrac 1 v1:52 result 1 Round64(result, guard, sticky) return result 382 Power ISATM I Version 2.05 Appendix D. Assembler Extended Mnemonics In order to make assembler language programs simpler to write and easier to understand, a set of extended mne- monics and symbols is provided that defines simple shorthand for the most frequently used forms of Branch Condi- tional, Compare, Trap, Rotate and Shift, and certain other instructions. Assemblers should provide the extended mnemonics and symbols listed here, and may provide others. D.1 Symbols The following symbols are defined for use in instructions (basic or extended mnemonics) that specify a Condition Register field or a Condition Register bit. The first five (lt, ..., un) identify a bit number within a CR field. The remainder (cr0, ..., cr7) identify a CR field. An expression in which a CR field symbol is multiplied by 4 and then added to a bit- number-within-CR-field symbol and 32 can be used to identify a CR bit. Symbol Value Meaning lt 0 Less than gt 1 Greater than eq 2 Equal so 3 Summary overflow un 3 Unordered (after floating-point comparison) cr0 0 CR Field 0 cr1 1 CR Field 1 cr2 2 CR Field 2 cr3 3 CR Field 3 cr4 4 CR Field 4 cr5 5 CR Field 5 cr6 6 CR Field 6 cr7 7 CR Field 7 The extended mnemonics in Sections D.2.2 and D.3 require identification of a CR bit: if one of the CR field symbols is used, it must be multiplied by 4 and added to a bit-number-within-CR-field (value in the range 0-3, explicit or sym- bolic) and 32. The extended mnemonics in Sections D.2.3 and D.5 require identification of a CR field: if one of the CR field symbols is used, it must not be multiplied by 4 or added to 32. (For the extended mnemonics in Section D.2.3, the bit number within the CR field is part of the extended mnemonic. The programmer identifies the CR field, and the Assembler does the multiplication and addition required to produce a CR bit number for the BI field of the underlying basic mnemonic.) Appendix D. Assembler Extended Mnemonics 383 Version 2.05 D.2 Branch Mnemonics The mnemonics discussed in this section are variations of the Branch Conditional instructions. Note: bclr, bclrl, bcctr, and bcctrl each serve as both a basic and an extended mnemonic. The Assembler will rec- ognize a bclr, bclrl, bcctr, or bcctrl mnemonic with three operands as the basic form, and a bclr, bclrl, bcctr, or bcctrl mnemonic with two operands as the extended form. In the extended form the BH operand is omitted and assumed to be 0b00. Similarly, for all the extended mnemonics described in Sections D.2.2 - D.2.4 that devolve to any of these four basic mnemonics the BH operand can either be coded or omitted. If it is omitted it is assumed to be 0b00. D.2.1 BO and BI Fields The 5-bit BO and BI fields control whether the branch is taken. Providing an extended mnemonic for every possible combination of these fields would be neither useful nor practical. The mnemonics described in Sections D.2.2 - D.2.4 include the most useful cases. Other cases can be coded using a basic Branch Conditional mnemonic (bc[l][a], bclr[l], bcctr[l]) with the appropriate operands. D.2.2 Simple Branch Mnemonics Instructions using one of the mnemonics in Table 13 that tests a Condition Register bit specify the corresponding bit as the first operand. The symbols defined in Section D.1 can be used in this operand. Notice that there are no extended mnemonics for relative and absolute unconditional branches. For these the basic mnemonics b, ba, bl, and bla should be used. Table 13: Simple branch mnemonics LR not Set LR Set Branch Semantics bc bca bclr bcctr bcl bcla bclrl bcctrl Relative Absolute To LR To CTR Relative Absolute To LR To CTR Branch unconditionally - - blr bctr - - blrl bctrl Branch if CRBI=1 bt bta btlr btctr btl btla btlrl btctrl Branch if CRBI=0 bf bfa bflr bfctr bfl bfla bflrl bfctrl Decrement CTR, branch if bdnz bdnza bdnzlr - bdnzl bdnzla bdnzlrl - CTR nonzero Decrement CTR, branch if bdnzt bdnzta bdnztlr - bdnztl bdnztla bdnztlrl - CTR nonzero and CRBI=1 Decrement CTR, branch if bdnzf bdnzfa bdnzflr - bdnzfl bdnzfla bdnzflrl - CTR nonzero and CRBI=0 Decrement CTR, branch if bdz bdza bdzlr - bdzl bdzla bdzlrl - CTR zero Decrement CTR, branch if bdzt bdzta bdztlr - bdztl bdztla bdztlrl - CTR zero and CRBI=1 Decrement CTR, branch if bdzf bdzfa bdzflr - bdzfl bdzfla bdzflrl - CTR zero and CRBI=0 Examples 1. Decrement CTR and branch if it is still nonzero (closure of a loop controlled by a count loaded into CTR). bdnz target (equivalent to: bc 16,0,target) 2. Same as (1) but branch only if CTR is nonzero and condition in CR0 is "equal". bdnzt eq,target (equivalent to: bc 8,2,target) 3. Same as (2), but "equal" condition is in CR5. bdnzt 4×cr5+eq,target (equivalent to: bc 8,22,target) 384 Power ISATM I Version 2.05 4. Branch if bit 59 of CR is 0. bf 27,target (equivalent to: bc 4,27,target) 5. Same as (4), but set the Link Register. This is a form of conditional "call". bfl 27,target (equivalent to: bcl 4,27,target) D.2.3 Branch Mnemonics Incorporating Conditions In the mnemonics defined in Table 14, the test of a bit in a Condition Register field is encoded in the mnemonic. Instructions using the mnemonics in Table 14 specify the CR field as an optional first operand. One of the CR field symbols defined in Section D.1 can be used for this operand. If the CR field being tested is CR Field 0, this operand need not be specified unless the resulting basic mnemonic is bclr[l] or bcctr[l] and the BH operand is specified. A standard set of codes has been adopted for the most common combinations of branch conditions. Code Meaning lt Less than le Less than or equal eq Equal ge Greater than or equal gt Greater than nl Not less than ne Not equal ng Not greater than so Summary overflow ns Not summary overflow un Unordered (after floating-point comparison) nu Not unordered (after floating-point comparison) These codes are reflected in the mnemonics shown in Table 14. Table 14: Branch mnemonics incorporating conditions LR not Set LR Set Branch Semantics bc bca bclr bcctr bcl bcla bclrl bcctrl Relative Absolute To LR To CTR Relative Absolute To LR To CTR Branch if less than blt blta bltlr bltctr bltl bltla bltlrl bltctrl Branch if less than or equal ble blea blelr blectr blel blela blelrl blectrl Branch if equal beq beqa beqlr beqctr beql beqla beqlrl beqctrl Branch if greater than or equal bge bgea bgelr bgectr bgel bgela bgelrl bgectrl Branch if greater than bgt bgta bgtlr bgtctr bgtl bgtla bgtlrl bgtctrl Branch if not less than bnl bnla bnllr bnlctr bnll bnlla bnllrl bnlctrl Branch if not equal bne bnea bnelr bnectr bnel bnela bnelrl bnectrl Branch if not greater than bng bnga bnglr bngctr bngl bngla bnglrl bngctrl Branch if summary overflow bso bsoa bsolr bsoctr bsol bsola bsolrl bsoctrl Branch if not summary overflow bns bnsa bnslr bnsctr bnsl bnsla bnslrl bnsctrl Branch if unordered bun buna bunlr bunctr bunl bunla bunlrl bunctrl Branch if not unordered bnu bnua bnulr bnuctr bnul bnula bnulrl bnuctrl Examples 1. Branch if CR0 reflects condition "not equal". bne target (equivalent to: bc 4,2,target) 2. Same as (1), but condition is in CR3. Appendix D. Assembler Extended Mnemonics 385 Version 2.05 bne cr3,target (equivalent to: bc 4,14,target) 3. Branch to an absolute target if CR4 specifies "greater than", setting the Link Register. This is a form of condi- tional "call". bgtla cr4,target (equivalent to: bcla 12,17,target) 4. Same as (3), but target address is in the Count Register. bgtctrl cr4 (equivalent to: bcctrl 12,17,0) D.2.4 Branch Prediction Software can use the "at" bits of Branch Conditional instructions to provide a hint to the processor about the behavior of the branch. If, for a given such instruction, the branch is almost always taken or almost always not taken, a suffix can be added to the mnemonic indicating the value to be used for the "at" bits. + Predict branch to be taken (at=0b11) - Predict branch not to be taken (at=0b10) Such a suffix can be added to any Branch Conditional mnemonic, either basic or extended, that tests either the Count Register or a CR bit (but not both). Assemblers should use 0b00 as the default value for the "at" bits, indicating that software has offered no prediction. Examples 1. Branch if CR0 reflects condition "less than", specifying that the branch should be predicted to be taken. blt+ target 2. Same as (1), but target address is in the Link Register and the branch should be predicted not to be taken. bltlr- 386 Power ISATM I Version 2.05 D.3 Condition Register Logical Mnemonics The Condition Register Logical instructions can be used to set (to 1), clear (to 0), copy, or invert a given Condition Register bit. Extended mnemonics are provided that allow these operations to be coded easily. Table 15: Condition Register logical mnemonics Operation Extended Mnemonic Equivalent to Condition Register set crset bx creqv bx,bx,bx Condition Register clear crclr bx crxor bx,bx,bx Condition Register move crmove bx,by cror bx,by,by Condition Register not crnot bx,by crnor bx,by,by The symbols defined in Section D.1 can be used to identify the Condition Register bits. Examples 1. Set CR bit 57. crset 25 (equivalent to: creqv 25,25,25) 2. Clear the SO bit of CR0. crclr so (equivalent to: crxor 3,3,3) 3. Same as (2), but SO bit to be cleared is in CR3. crclr 4×cr3+so (equivalent to: crxor 15,15,15) 4. Invert the EQ bit. crnot eq,eq (equivalent to: crnor 2,2,2) 5. Same as (4), but EQ bit to be inverted is in CR4, and the result is to be placed into the EQ bit of CR5. crnot 4×cr5+eq,4×cr4+eq (equivalent to: crnor 22,18,18) D.4 Subtract Mnemonics D.4.1 Subtract Immediate Although there is no "Subtract Immediate" instruction, its effect can be achieved by using an Add Immediate instruc- tion with the immediate operand negated. Extended mnemonics are provided that include this negation, making the intent of the computation clearer. subi Rx,Ry,value (equivalent to: addi Rx,Ry,-value) subis Rx,Ry,value (equivalent to: addis Rx,Ry,-value) subic Rx,Ry,value (equivalent to: addic Rx,Ry,-value) subic. Rx,Ry,value (equivalent to: addic. Rx,Ry,-value) D.4.2 Subtract The Subtract From instructions subtract the second operand (RA) from the third (RB). Extended mnemonics are pro- vided that use the more "normal" order, in which the third operand is subtracted from the second. Both these mne- monics can be coded with a final "o" and/or "." to cause the OE and/or Rc bit to be set in the underlying instruction. sub Rx,Ry,Rz (equivalent to: subf Rx,Rz,Ry) subc Rx,Ry,Rz (equivalent to: subfc Rx,Rz,Ry) Appendix D. Assembler Extended Mnemonics 387 Version 2.05 D.5 Compare Mnemonics The L field in the fixed-point Compare instructions controls whether the operands are treated as 64-bit quantities or as 32-bit quantities. Extended mnemonics are provided that represent the L value in the mnemonic rather than requiring it to be coded as a numeric operand. The BF field can be omitted if the result of the comparison is to be placed into CR Field 0. Otherwise the target CR field must be specified as the first operand. One of the CR field symbols defined in Section D.1 can be used for this operand. Note: The basic Compare mnemonics of Power ISA are the same as those of POWER, but the POWER instructions have three operands while the Power ISA instructions have four. The Assembler will recognize a basic Compare mnemonic with three operands as the POWER form, and will generate the instruction with L=0. (Thus the Assembler must require that the BF field, which normally can be omitted when CR Field 0 is the target, be specified explicitly if L is.) D.5.1 Doubleword Comparisons Table 16: Doubleword compare mnemonics Operation Extended Mnemonic Equivalent to Compare doubleword immediate cmpdi bf,ra,si cmpi bf,1,ra,si Compare doubleword cmpd bf,ra,rb cmp bf,1,ra,rb Compare logical doubleword immediate cmpldi bf,ra,ui cmpli bf,1,ra,ui Compare logical doubleword cmpld bf,ra,rb cmpl bf,1,ra,rb Examples 1. Compare register Rx and immediate value 100 as unsigned 64-bit integers and place result into CR0. cmpldi Rx,100 (equivalent to: cmpli 0,1,Rx,100) 2. Same as (1), but place result into CR4. cmpldi cr4,Rx,100 (equivalent to: cmpli 4,1,Rx,100) 3. Compare registers Rx and Ry as signed 64-bit integers and place result into CR0. cmpd Rx,Ry (equivalent to: cmp 0,1,Rx,Ry) D.5.2 Word Comparisons Table 17: Word compare mnemonics Operation Extended Mnemonic Equivalent to Compare word immediate cmpwi bf,ra,si cmpi bf,0,ra,si Compare word cmpw bf,ra,rb cmp bf,0,ra,rb Compare logical word immediate cmplwi bf,ra,ui cmpli bf,0,ra,ui Compare logical word cmplw bf,ra,rb cmpl bf,0,ra,rb Examples 1. Compare bits 32:63 of register Rx and immediate value 100 as signed 32-bit integers and place result into CR0. cmpwi Rx,100 (equivalent to: cmpi 0,0,Rx,100) 2. Same as (1), but place result into CR4. cmpwi cr4,Rx,100 (equivalent to: cmpi 4,0,Rx,100) 3. Compare bits 32:63 of registers Rx and Ry as unsigned 32-bit integers and place result into CR0. cmplw Rx,Ry (equivalent to: cmpl 0,0,Rx,Ry) 388 Power ISATM I Version 2.05 D.6 Trap Mnemonics The mnemonics defined in Table 18 are variations of the Trap instructions, with the most useful values of TO repre- sented in the mnemonic rather than specified as a numeric operand. A standard set of codes has been adopted for the most common combinations of trap conditions. Code Meaning TO encoding < > = u lt Less than 16 1 0 0 0 0 le Less than or equal 20 1 0 1 0 0 eq Equal 4 0 0 1 0 0 ge Greater than or equal 12 0 1 1 0 0 gt Greater than 8 0 1 0 0 0 nl Not less than 12 0 1 1 0 0 ne Not equal 24 1 1 0 0 0 ng Not greater than 20 1 0 1 0 0 llt Logically less than 2 0 0 0 1 0 lle Logically less than or equal 6 0 0 1 1 0 lge Logically greater than or equal 5 0 0 1 0 1 lgt Logically greater than 1 0 0 0 0 1 lnl Logically not less than 5 0 0 1 0 1 lng Logically not greater than 6 0 0 1 1 0 u Unconditionally with parameters 31 1 1 1 1 1 (none) Unconditional 31 1 1 1 1 1 These codes are reflected in the mnemonics shown in Table 18. Table 18: Trap mnemonics 64-bit Comparison 32-bit Comparison Trap Semantics tdi td twi tw Immediate Register Immediate Register Trap unconditionally - - - trap Trap unconditionally with parameters tdui tdu twui twu Trap if less than tdlti tdlt twlti twlt Trap if less than or equal tdlei tdle twlei twle Trap if equal tdeqi tdeq tweqi tweq Trap if greater than or equal tdgei tdge twgei twge Trap if greater than tdgti tdgt twgti twgt Trap if not less than tdnli tdnl twnli twnl Trap if not equal tdnei tdne twnei twne Trap if not greater than tdngi tdng twngi twng Trap if logically less than tdllti tdllt twllti twllt Trap if logically less than or equal tdllei tdlle twllei twlle Trap if logically greater than or equal tdlgei tdlge twlgei twlge Trap if logically greater than tdlgti tdlgt twlgti twlgt Trap if logically not less than tdlnli tdlnl twlnli twlnl Trap if logically not greater than tdlngi tdlng twlngi twlng Appendix D. Assembler Extended Mnemonics 389 Version 2.05 Examples 1. Trap if register Rx is not 0. tdnei Rx,0 (equivalent to: tdi 24,Rx,0) 2. Same as (1), but comparison is to register Ry. tdne Rx,Ry (equivalent to: td 24,Rx,Ry) 3. Trap if bits 32:63 of register Rx, considered as a 32-bit quantity, are logically greater than 0x7FF. twlgti Rx,0x7FF (equivalent to: twi 1,Rx,0x7FF) 4. Trap unconditionally. trap (equivalent to: tw 31,0,0) 5. Trap unconditionally with immediate parameters Rx and Ry tdu Rx,Ry (equivalent to: td 31,Rx,Ry) 390 Power ISATM I Version 2.05 D.7 Rotate and Shift Mnemonics The Rotate and Shift instructions provide powerful and general ways to manipulate register contents, but can be diffi- cult to understand. Extended mnemonics are provided that allow some of the simpler operations to be coded easily. Mnemonics are provided for the following types of operation. Extract Select a field of n bits starting at bit position b in the source register; left or right justify this field in the target register; clear all other bits of the target register to 0. Insert Select a left-justified or right-justified field of n bits in the source register; insert this field starting at bit posi- tion b of the target register; leave other bits of the target register unchanged. (No extended mnemonic is provided for insertion of a left-justified field when operating on doublewords, because such an insertion requires more than one instruction.) Rotate Rotate the contents of a register right or left n bits without masking. Shift Shift the contents of a register right or left n bits, clearing vacated bits to 0 (logical shift). Clear Clear the leftmost or rightmost n bits of a register to 0. Clear left and shift left Clear the leftmost b bits of a register, then shift the register left by n bits. This operation can be used to scale a (known nonnegative) array index by the width of an element. D.7.1 Operations on Doublewords All these mnemonics can be coded with a final "." to cause the Rc bit to be set in the underlying instruction. Table 19: Doubleword rotate and shift mnemonics Operation Extended Mnemonic Equivalent to Extract and left justify immediate extldi ra,rs,n,b (n > 0) rldicr ra,rs,b,n-1 Extract and right justify immediate extrdi ra,rs,n,b (n > 0) rldicl ra,rs,b+n,64-n Insert from right immediate insrdi ra,rs,n,b (n > 0) rldimi ra,rs,64-(b+n),b Rotate left immediate rotldi ra,rs,n rldicl ra,rs,n,0 Rotate right immediate rotrdi ra,rs,n rldicl ra,rs,64-n,0 Rotate left rotld ra,rs,rb rldcl ra,rs,rb,0 Shift left immediate sldi ra,rs,n (n < 64) rldicr ra,rs,n,63-n Shift right immediate srdi ra,rs,n (n < 64) rldicl ra,rs,64-n,n Clear left immediate clrldi ra,rs,n (n < 64) rldicl ra,rs,0,n Clear right immediate clrrdi ra,rs,n (n < 64) rldicr ra,rs,0,63-n Clear left and shift left immediate clrlsldi ra,rs,b,n (n <= b < 64) rldic ra,rs,n,b-n Examples 1. Extract the sign bit (bit 0) of register Ry and place the result right-justified into register Rx. extrdi Rx,Ry,1,0 (equivalent to: rldicl Rx,Ry,1,63) 2. Insert the bit extracted in (1) into the sign bit (bit 0) of register Rz. insrdi Rz,Rx,1,0 (equivalent to: rldimi Rz,Rx,63,0) 3. Shift the contents of register Rx left 8 bits. sldi Rx,Rx,8 (equivalent to: rldicr Rx,Rx,8,55) 4. Clear the high-order 32 bits of register Ry and place the result into register Rx. clrldi Rx,Ry,32 (equivalent to: rldicl Rx,Ry,0,32) Appendix D. Assembler Extended Mnemonics 391 Version 2.05 D.7.2 Operations on Words All these mnemonics can be coded with a final "." to cause the Rc bit to be set in the underlying instruction. The operations as described above apply to the low-order 32 bits of the registers, as if the registers were 32-bit registers. The Insert operations either preserve the high-order 32 bits of the target register or place rotated data there; the other operations clear these bits. Table 20: Word rotate and shift mnemonics Operation Extended Mnemonic Equivalent to Extract and left justify immediate extlwi ra,rs,n,b (n > 0) rlwinm ra,rs,b,0,n-1 Extract and right justify immediate extrwi ra,rs,n,b (n > 0) rlwinm ra,rs,b+n,32-n,31 Insert from left immediate inslwi ra,rs,n,b (n > 0) rlwimi ra,rs,32-b,b,(b+n)-1 Insert from right immediate insrwi ra,rs,n,b (n > 0) rlwimi ra,rs,32-(b+n),b,(b+n)-1 Rotate left immediate rotlwi ra,rs,n rlwinm ra,rs,n,0,31 Rotate right immediate rotrwi ra,rs,n rlwinm ra,rs,32-n,0,31 Rotate left rotlw ra,rs,rb rlwnm ra,rs,rb,0,31 Shift left immediate slwi ra,rs,n (n < 32) rlwinm ra,rs,n,0,31-n Shift right immediate srwi ra,rs,n (n < 32) rlwinm ra,rs,32-n,n,31 Clear left immediate clrlwi ra,rs,n (n < 32) rlwinm ra,rs,0,n,31 Clear right immediate clrrwi ra,rs,n (n < 32) rlwinm ra,rs,0,0,31-n Clear left and shift left immediate clrlslwi ra,rs,b,n (n b < 32) rlwinm ra,rs,n,b-n,31-n Examples 1. Extract the sign bit (bit 32) of register Ry and place the result right-justified into register Rx. extrwi Rx,Ry,1,0 (equivalent to: rlwinm Rx,Ry,1,31,31) 2. Insert the bit extracted in (1) into the sign bit (bit 32) of register Rz. insrwi Rz,Rx,1,0 (equivalent to: rlwimi Rz,Rx,31,0,0) 3. Shift the contents of register Rx left 8 bits, clearing the high-order 32 bits. slwi Rx,Rx,8 (equivalent to: rlwinm Rx,Rx,8,0,23) 4. Clear the high-order 16 bits of the low-order 32 bits of register Ry and place the result into register Rx, clearing the high-order 32 bits of register Rx. clrlwi Rx,Ry,16 (equivalent to: rlwinm Rx,Ry,0,16,31) 392 Power ISATM I Version 2.05 D.8 Move To/From Special Purpose Register Mnemonics The mtspr and mfspr instructions specify a Special Purpose Register (SPR) as a numeric operand. Extended mne- monics are provided that represent the SPR in the mnemonic rather than requiring it to be coded as an operand. Table 21: Extended mnemonics for moving to/from an SPR Move To SPR Move From SPR Special Purpose Register Extended Equivalent to Extended Equivalent to Fixed-Point Exception Register (XER) mtxer Rx mtspr 1,Rx mfxer Rx mfspr Rx,1 Link Register (LR) mtlr Rx mtspr 8,Rx mflr Rx mfspr Rx,8 Count Register (CTR) mtctr Rx mtspr 9,Rx mfctr Rx mfspr Rx,9 PPR mtppr Rx mtspr 896,Rx mfppr Rx mfspr Rx,896 Examples 1. Copy the contents of register Rx to the XER. mtxer Rx (equivalent to: mtspr 1,Rx) 2. Copy the contents of the LR to register Rx. mflr Rx (equivalent to: mfspr Rx,8) 3. Copy the contents of register Rx to the CTR. mtctr Rx (equivalent to: mtspr 9,Rx) D.9 Miscellaneous Mnemonics No-op Many Power ISA instructions can be coded in a way such that, effectively, no operation is performed. An extended mnemonic is provided for the preferred form of no-op. If an implementation performs any type of run-time optimization related to no-ops, the preferred form is the no-op that will trigger this. nop (equivalent to: ori 0,0,0) For some uses of a no-op instruction, optimizations related to no-ops, such as removal from the execution stream, are not desireable. An extended mnemonic is provided for the executed form of no-op. This form of no-op will still consume execution resources. xnop (equivalent to: xori 0,0,0) Load Immediate The addi and addis instructions can be used to load an immediate value into a register. Extended mnemonics are provided to convey the idea that no addition is being performed but merely data movement (from the immediate field of the instruction to a register). Load a 16-bit signed immediate value into register Rx. li Rx,value (equivalent to: addi Rx,0,value) Load a 16-bit signed immediate value, shifted left by 16 bits, into register Rx. lis Rx,value (equivalent to: addis Rx,0,value) Appendix D. Assembler Extended Mnemonics 393 Version 2.05 Load Address This mnemonic permits computing the value of a base-displacement operand, using the addi instruction which nor- mally requires separate register and immediate operands. la Rx,D(Ry) (equivalent to: addi Rx,Ry,D) The la mnemonic is useful for obtaining the address of a variable specified by name, allowing the Assembler to sup- ply the base register number and compute the displacement. If the variable v is located at offset Dv bytes from the address in register Rv, and the Assembler has been told to use register Rv as a base for references to the data struc- ture containing v, then the following line causes the address of v to be loaded into register Rx. la Rx,v (equivalent to: addi Rx,Rv,Dv) Move Register Several Power ISA instructions can be coded in a way such that they simply copy the contents of one register to another. An extended mnemonic is provided to convey the idea that no computation is being performed but merely data movement (from one register to another). The following instruction copies the contents of register Ry to register Rx. This mnemonic can be coded with a final "." to cause the Rc bit to be set in the underlying instruction. mr Rx,Ry (equivalent to: or Rx,Ry,Ry) Complement Register Several Power ISA instructions can be coded in a way such that they complement the contents of one register and place the result into another register. An extended mnemonic is provided that allows this operation to be coded eas- ily. The following instruction complements the contents of register Ry and places the result into register Rx. This mne- monic can be coded with a final "." to cause the Rc bit to be set in the underlying instruction. not Rx,Ry (equivalent to: nor Rx,Ry,Ry) Move To/From Condition Register This mnemonic permits copying the contents of the low-order 32 bits of a GPR to the Condition Register, using the same style as the mfcr instruction. mtcr Rx (equivalent to: mtcrf 0xFF,Rx) The following instructions may generate either the (old) mtcrf or mfcr instructions or the (new) mtocrf or mfocrf instruction, respectively, depending on the target machine type assembler parameter. mtcrf FXM,Rx mfcr Rx All three extended mnemonics in this subsection are being phased out. In future assemblers the form "mtcr Rx" may not exist, and the mtcrf and mfcr mnemonics may generate the old form instructions (with bit 11 = 0) regardless of the target machine type assembler parameter, or may cease to exist. 394 Power ISATM I Version 2.05 Appendix D. Assembler Extended Mnemonics 395 Version 2.05 396 Power ISATM I Version 2.05 Appendix E. Programming Examples E.1 Multiple-Precision Shifts them to the case N=2 when the more stringent restric- tion on shift amount is met. For shifts with immediate This section gives examples of how multiple-precision shift amounts only the case N=3 is shown, because the shifts can be programmed. more stringent restriction on shift amount is always met. A multiple-precision shift is defined to be a shift of an N-doubleword quantity (64-bit mode) or an N-word In the examples it is assumed that GPRs 2 and 3 (and quantity (32-bit mode), where N>1. The quantity to be 4) contain the quantity to be shifted, and that the result shifted is contained in N registers. The shift amount is is to be placed into the same registers, except for the specified either by an immediate value in the instruc- immediate left shifts in 64-bit mode for which the result tion, or by a value in a register. is placed into GPRs 3, 4, and 5. In all cases, for both input and result, the lowest-numbered register contains The examples shown below distinguish between the the highest-order part of the data and highest-num- cases N=2 and N>2. If N=2, the shift amount may be in bered register contains the lowest-order part. For the range 0 through 127 (64-bit mode) or 0 through 63 non-immediate shifts, the shift amount is assumed to (32-bit mode), which are the maximum ranges sup- be in GPR 6. For immediate shifts, the shift amount is ported by the Shift instructions used. However if N>2, assumed to be greater than 0. GPRs 0 and 31 are used the shift amount must be in the range 0 through 63 as scratch registers. (64-bit mode) or 0 through 31 (32-bit mode), in order for the examples to yield the desired result. The specific For N>2, the number of instructions required is 2N-1 instance shown for N>2 is N=3; extending those code (immediate shifts) or 3N-1 (non-immediate shifts). sequences to larger N is straightforward, as is reducing Appendix E. Programming Examples 397 Version 2.05 Multiple-precision shifts in 64-bit Multiple-precision shifts in 32-bit mode [Category: 64-Bit] mode Shift Left Immediate, N = 3 (shift amnt < 64) Shift Left Immediate, N = 3 (shift amnt < 32) rldicr r5,r4,sh,63-sh rlwinm r2,r2,sh,0,31-sh rldimi r4,r3,0,sh rlwimi r2,r3,sh,32-sh,31 rldicl r4,r4,sh,0 rlwinm r3,r3,sh,0,31-sh rldimi r3,r2,0,sh rlwimi r3,r4,sh,32-sh,31 rldicl r3,r3,sh,0 rlwinm r4,r4,sh,0,31-sh Shift Left, N = 2 (shift amnt < 128) Shift Left, N = 2 (shift amnt < 64) subfic r31,r6,64 subfic r31,r6,32 sld r2,r2,r6 slw r2,r2,r6 srd r0,r3,r31 srw r0,r3,r31 or r2,r2,r0 or r2,r2,r0 addi r31,r6,-64 addi r31,r6,-32 sld r0,r3,r31 slw r0,r3,r31 or r2,r2,r0 or r2,r2,r0 sld r3,r3,r6 slw r3,r3,r6 Shift Left, N = 3 (shift amnt < 64) Shift Left, N = 3 (shift amnt < 32) subfic r31,r6,64 subfic r31,r6,32 sld r2,r2,r6 slw r2,r2,r6 srd r0,r3,r31 srw r0,r3,r31 or r2,r2,r0 or r2,r2,r0 sld r3,r3,r6 slw r3,r3,r6 srd r0,r4,r31 srw r0,r4,r31 or r3,r3,r0 or r3,r3,r0 sld r4,r4,r6 slw r4,r4,r6 Shift Right Immediate, N = 3 (shift amnt < 64) Shift Right Immediate, N = 3 (shift amnt < 32) rldimi r4,r3,0,64-sh rlwinm r4,r4,32-sh,sh,31 rldicl r4,r4,64-sh,0 rlwimi r4,r3,32-sh,0,sh-1 rldimi r3,r2,0,64-sh rlwinm r3,r3,32-sh,sh,31 rldicl r3,r3,64-sh,0 rlwimi r3,r2,32-sh,0,sh-1 rldicl r2,r2,64-sh,sh rlwinm r2,r2,32-sh,sh,31 Shift Right, N = 2 (shift amnt < 128) Shift Right, N = 2 (shift amnt < 64) subfic r31,r6,64 subfic r31,r6,32 srd r3,r3,r6 srw r3,r3,r6 sld r0,r2,r31 slw r0,r2,r31 or r3,r3,r0 or r3,r3,r0 addi r31,r6,-64 addi r31,r6,-32 srd r0,r2,r31 srw r0,r2,r31 or r3,r3,r0 or r3,r3,r0 srd r2,r2,r6 srw r2,r2,r6 Shift Right, N = 3 (shift amnt < 64) Shift Right, N = 3 (shift amnt < 32) subfic r31,r6,64 subfic r31,r6,32 srd r4,r4,r6 srw r4,r4,r6 sld r0,r3,r31 slw r0,r3,r31 or r4,r4,r0 or r4,r4,r0 srd r3,r3,r6 srw r3,r3,r6 sld r0,r2,r31 slw r0,r2,r31 or r3,r3,r0 or r3,r3,r0 srd r2,r2,r6 srw r2,r2,r6 398 Power ISATM I Version 2.05 Multiple-precision shifts in 64-bit Multiple-precision shifts in 32-bit mode, continued [Category: 64-Bit] mode, continued Shift Right Algebraic Immediate, N = 3 (shift amnt < Shift Right Algebraic Immediate, N = 3 (shift amnt < 64) 32) rldimi r4,r3,0,64-sh rlwinm r4,r4,32-sh,sh,31 rldicl r4,r4,64-sh,0 rlwimi r4,r3,32-sh,0,sh-1 rldimi r3,r2,0,64-sh rlwinm r3,r3,32-sh,sh,31 rldicl r3,r3,64-sh,0 rlwimi r3,r2,32-sh,0,sh-1 sradi r2,r2,sh srawi r2,r2,sh Shift Right Algebraic, N = 2 (shift amnt < 128) Shift Right Algebraic, N = 2 (shift amnt < 64) subfic r31,r6,64 subfic r31,r6,32 srd r3,r3,r6 srw r3,r3,r6 sld r0,r2,r31 slw r0,r2,r31 or r3,r3,r0 or r3,r3,r0 addic. r31,r6,-64 addic. r31,r6,-32 srad r0,r2,r31 sraw r0,r2,r31 ble $+8 ble $+8 ori r3,r0,0 ori r3,r0,0 srad r2,r2,r6 sraw r2,r2,r6 Shift Right Algebraic, N = 3 (shift amnt < 64) Shift Right Algebraic, N = 3 (shift amnt < 32) subfic r31,r6,64 subfic r31,r6,32 srd r4,r4,r6 srw r4,r4,r6 sld r0,r3,r31 slw r0,r3,r31 or r4,r4,r0 or r4,r4,r0 srd r3,r3,r6 srw r3,r3,r6 sld r0,r2,r31 slw r0,r2,r31 or r3,r3,r0 or r3,r3,r0 srad r2,r2,r6 sraw r2,r2,r6 Appendix E. Programming Examples 399 Version 2.05 E.2 Floating-Point Conversions [Category: Floating-Point] This section gives examples of how the Floating-Point Warning: Some of the examples use the fsel instruc- Conversion instructions can be used to perform various tion. Care must be taken in using fsel if IEEE compati- conversions. bility is required, or if the values being tested can be NaNs or infinities; see Section E.3.4, "Notes" on page 402. E.2.1 Conversion from E.2.3 Conversion from Floating-Point Number to Floating-Point Number to Unsigned Floating-Point Integer Fixed-Point Integer Doubleword The full convert to floating-point integer function can be The full convert to unsigned fixed-point integer double- implemented with the sequence shown below, assum- word function can be implemented with the sequence ing the floating-point value to be converted is in FPR 1 shown below, assuming the floating-point value to be and the result is returned in FPR 3. converted is in FPR 1, the value 0 is in FPR 0, the value 264-2048 is in FPR 3, the value 263 is in FPR 4 mtfsb0 23 #clear VXCVI and GPR 4, the result is returned in GPR 3, and a dou- fctid[z] f3,f1 #convert to fx int bleword at displacement "disp" from the address in fcfid f3,f3 #convert back again mcrfs 7,5 #VXCVI to CR GPR 1 can be used as scratch space. bf 31,$+8 #skip if VXCVI was 0 fsel f2,f1,f1,f0 #use 0 if < 0 fmr f3,f1 #input was fp int fsub f5,f3,f1 #use max if > max fsel f2,f5,f2,f3 E.2.2 Conversion from fsub f5,f2,f4 #subtract 263 fcmpu cr2,f2,f4 #use diff if >= 263 Floating-Point Number to Signed fsel f2,f5,f5,f2 fctid[z] f2,f2 #convert to fx int Fixed-Point Integer Doubleword stfd f2,disp(r1) #store float ld r3,disp(r1) #load dword The full convert to signed fixed-point integer double- blt cr2,$+8 #add 263 if input word function can be implemented with the sequence add r3,r3,r4 # was >= 263 shown below, assuming the floating-point value to be converted is in FPR 1, the result is returned in GPR 3, and a doubleword at displacement "disp" from the E.2.4 Conversion from address in GPR 1 can be used as scratch space. Floating-Point Number to Signed fctid[z] f2,f1 #convert to dword int Fixed-Point Integer Word stfd f2,disp(r1) #store float ld r3,disp(r1) #load dword The full convert to signed fixed-point integer word func- tion can be implemented with the sequence shown below, assuming the floating-point value to be con- verted is in FPR 1, the result is returned in GPR 3, and a doubleword at displacement "disp" from the address in GPR 1 can be used as scratch space. fctiw[z] f2,f1 #convert to fx int stfd f2,disp(r1) #store float lwa r3,disp+4(r1) #load word algebraic 400 Power ISATM I Version 2.05 E.2.5 Conversion from An alternative, shorter, sequence can be used if round- ing according to FSCPRRN is desired and FPSCRRN Floating-Point Number to Unsigned specifies Round toward +Infinity or Round toward Fixed-Point Integer Word -Infinity, or if it is acceptable for the rounded answer to be either of the two representable floating-point inte- The full convert to unsigned fixed-point integer word gers nearest to the given fixed-point integer. In this function can be implemented with the sequence shown case the full convert from unsigned fixed-point integer below, assuming the floating-point value to be con- doubleword function can be implemented with the verted is in FPR 1, the value 0 is in FPR 0, the value sequence shown below, assuming the value 264 is in 232-1 is in FPR 3, the result is returned in GPR 3, and a FPR 2. doubleword at displacement "disp" from the address in GPR 1 can be used as scratch space. std r3,disp(r1) #store dword lfd f1,disp(r1) #load float fsel f2,f1,f1,f0 #use 0 if < 0 fcfid f1,f1 #convert to fp int fsub f4,f3,f1 #use max if > max fadd f4,f1,f2 #add 264 fsel f2,f4,f2,f3 fsel f1,f1,f1,f4 # if r3 < 0 fctid[z] f2,f2 #convert to fx int stfd f2,disp(r1) #store float lwz r3,disp+4(r1) #load word and zero E.2.8 Conversion from Signed Fixed-Point Integer Word to Float- E.2.6 Conversion from Signed ing-Point Number Fixed-Point Integer Doubleword to The full convert from signed fixed-point integer word Floating-Point Number function can be implemented with the sequence shown below, assuming the fixed-point value to be converted The full convert from signed fixed-point integer double- is in GPR 3, the result is returned in FPR 1, and a dou- word function, using the rounding mode specified by bleword at displacement "disp" from the address in FPSCRRN, can be implemented with the sequence GPR 1 can be used as scratch space. (The result is shown below, assuming the fixed-point value to be con- exact.) verted is in GPR 3, the result is returned in FPR 1, and a doubleword at displacement "disp" from the address extsw r3,r3 #extend sign in GPR 1 can be used as scratch space. std r3,disp(r1) #store dword lfd f1,disp(r1) #load float std r3,disp(r1) #store dword fcfid f1,f1 #convert to fp int lfd f1,disp(r1) #load float The following sequence can be used, assuming a word fcfid f1,f1 #convert to fp int at the address in GPR 1 + GPR 2 can be used as scratch space. E.2.7 Conversion from Unsigned stwx r3,r1,r2 # store word Fixed-Point Integer Doubleword to lfiwax f1,r1,r2 # load float Floating-Point Number fcfid f1,f1 # convert to fp int The full convert from unsigned fixed-point integer dou- bleword function, using the rounding mode specified by FPSCRRN, can be implemented with the sequence shown below, assuming the fixed-point value to be con- verted is in GPR 3, the value 232 is in FPR 4, the result is returned in FPR 1, and two doublewords at displace- ment "disp" from the address in GPR 1 can be used as scratch space. rldicl r2,r3,32,32 #isolate high half rldicl r0,r3,0,32 #isolate low half std r2,disp(r1) #store dword both std r0,disp+8(r1) lfd f2,disp(r1) #load float both lfd f1,disp+8(r1) fcfid f2,f2 #convert each half to fcfid f1,f1 # fp int (exact result) fmadd f1,f4,f2,f1 #(232)×high + low Appendix E. Programming Examples 401 Version 2.05 E.3 Floating-Point Selection [Category: Floating-Point] This section gives examples of how the Floating Select in FPRs fa, fb, fx, fy, and fz. FPR fs is assumed to be instruction can be used to implement floating-point min- available for scratch space. imum and maximum functions, and certain simple Additional examples can be found in Section E.2, forms of if-then-else constructions, without branching. "Floating-Point Conversions [Category: Float- The examples show program fragments in an imagi- ing-Point]" on page 400. nary, C-like, high-level programming language, and the Warning: Care must be taken in using fsel if IEEE corresponding program fragment using fsel and other compatibility is required, or if the values being tested Power ISA instructions. In the examples, a, b, x, y, and can be NaNs or infinities; see Section E.3.4. z are floating-point variables, which are assumed to be E.3.1 Comparison to Zero E.3.4 Notes The following Notes apply to the preceding examples High-level language: Power ISA: Notes and to the corresponding cases using the other three if a 0.0 then x 1 y fsel fx,fa,fy,fz (1) arithmetic relations (<, , and ). They should also be else x 1 z considered when any other use of fsel is contemplated. if a > 0.0 then x 1 y fneg fs,fa (1,2) In these Notes, the "optimized program" is the Power else x 1 z fsel fx,fs,fz,fy ISA program shown, and the "unoptimized program" if a = 0.0 then x 1 y fsel fx,fa,fy,fz (1) (not shown) is the corresponding Power ISA program else x 1 z fneg fs,fa that uses fcmpu and Branch Conditional instructions fsel fx,fs,fx,fz instead of fsel. 1. The unoptimized program affects the VXSNAN bit E.3.2 Minimum and Maximum of the FPSCR, and therefore may cause the sys- tem error handler to be invoked if the correspond- High-level language: Power ISA: Notes ing exception is enabled, while the optimized x 1 min(a,b) fsub fs,fa,fb (3,4,5) program does not affect this bit. This property of fsel fx,fs,fb,fa the optimized program is incompatible with the IEEE standard. x 1 max(a,b) fsub fs,fa,fb (3,4,5) fsel fx,fs,fa,fb 2. The optimized program gives the incorrect result if a is a NaN. E.3.3 Simple if-then-else 3. The optimized program gives the incorrect result if a and/or b is a NaN (except that it may give the Constructions correct result in some cases for the minimum and maximum functions, depending on how those High-level language: Power ISA: Notes functions are defined to operate on NaNs). if a b then x 1 y fsub fs,fa,fb (4,5) 4. The optimized program gives the incorrect result if else x 1 z fsel fx,fs,fy,fz a and b are infinities of the same sign. (Here it is if a > b then x 1 y fsub fs,fb,fa (3,4,5) assumed that Invalid Operation Exceptions are else x 1 z fsel fx,fs,fz,fy disabled, in which case the result of the subtrac- if a = b then x 1 y fsub fs,fa,fb (4,5) tion is a NaN. The analysis is more complicated if else x 1 z fsel fx,fs,fy,fz Invalid Operation Exceptions are enabled, fneg fs,fs because in that case the target register of the sub- fsel fx,fs,fx,fz traction is unchanged.) 5. The optimized program affects the OX, UX, XX, and VXISI bits of the FPSCR, and therefore may cause the system error handler to be invoked if the corresponding exceptions are enabled, while the unoptimized program does not affect these bits. This property of the optimized program is incom- patible with the IEEE standard. 402 Power ISATM I Version 2.05 E.4 Vector Unaligned Storage Operations [Category: Vector] E.4.1 Loading a Unaligned Quad- word Using Permute from Big-Endian Storage The following sequence of instructions copies the unaligned quadword storage operand into VRT. # Assumptions: # Rb != 0 and contents of Rb = 0xB lvx Vhi,0,Rb # load MSQ lvsl Vp,0,Rb # set permute control vector addi Rb,Rb,16 # address of LSQ lvx Vlo,0,Rb # load LSQ perm Vt,Vhi,Vlo,Vp # align the data Appendix E. Programming Examples 403 Version 2.05 404 Power ISATM I Version 2.05 Book II: Power ISA Virtual Environment Architecture Book II: Power ISA Virtual Environment Architecture 405 Version 2.05 406 Power ISATM II Version 2.05 Chapter 1. Storage Model 1.1 Definitions . . . . . . . . . . . . . . . . . . . 407 1.6.6 Variable Length Encoded (VLE) 1.2 Introduction . . . . . . . . . . . . . . . . . . 408 Instructions . . . . . . . . . . . . . . . . . . . . . 412 1.3 Virtual Storage . . . . . . . . . . . . . . . 408 1.7 Shared Storage . . . . . . . . . . . . . . 413 1.4 Single-copy Atomicity . . . . . . . . . 409 1.7.1 Storage Access Ordering . . . . 413 1.5 Cache Model . . . . . . . . . . . . . . . . 409 1.7.2 Storage Ordering of I/O Accesses . . 1.6 Storage Control Attributes . . . . . . 410 415 1.6.1 Write Through Required . . . . . . 410 1.7.3 Atomic Update . . . . . . . . . . . . . . 415 1.6.2 Caching Inhibited . . . . . . . . . . . 411 1.7.3.1 Reservations . . . . . . . . . . . . . 415 1.6.3 Memory Coherence Required [Cate- 1.7.3.2 Forward Progress. . . . . . . . . . 417 gory: Memory Coherence] . . . . . . . . . 411 1.8 Instruction Storage . . . . . . . . . . . . 417 1.6.4 Guarded . . . . . . . . . . . . . . . . . . 411 1.8.1 Concurrent Modification and Execu- 1.6.5 Endianness [Category: Embed- tion of Instructions . . . . . . . . . . . . . . . . 419 ded.Little-Endian] . . . . . . . . . . . . . . . . 412 1 instruction storage 1.1 Definitions The view of storage as seen by the mechanism The following definitions, in addition to those specified that fetches instructions. in Book I, are used in this Book. In these definitions, 1 data storage "Load instruction" includes the Cache Management The view of storage as seen by a Load or Store and other instructions that are stated in the instruction instruction. descriptions to be "treated as a Load", and similarly for "Store instruction". 1 program order The execution of instructions in the order required 1 processor by the sequential execution model. (See the sec- A hardware component that executes the instruc- tion entitled "Instruction Execution Order" in Book tions specified in a program. I. A dcbz instruction that modifies storage which 1 system contains instructions has the same effect with A combination of processors, storage, and associ- respect to the sequential execution model as a ated mechanisms that is capable of executing pro- Store instruction as described there.) grams. Sometimes the reference to system 1 storage location includes services provided by the operating sys- A contiguous sequence of one or more bytes in tem. storage. When used in association with a specific 1 main storage instruction or the instruction fetching mechanism, The level of storage hierarchy in which all storage the length of the sequence of one or more bytes is state is visible to all processors and mechanisms typically implied by the operation. In other uses, it in the system. may refer more abstractly to a group of bytes which share common storage attributes. 1 primary cache The level of cache closest to the processor. 1 storage access An access to a storage location. There are three 1 secondary cache (mutually exclusive) kinds of storage access. After the primary cache, the next closest level of cache to the processor. - data access An access to the storage location specified by a Load or Store instruction, or, if the access is Chapter 1. Storage Model 407 Version 2.05 performed "out-of-order" (see Book III), an The preceding definitions apply regardless of access to a storage location as if it were the whether P1 and P2 are the same entity. storage location specified by a Load or Store 1 page (virtual page) instruction. 2n contiguous bytes of storage aligned such that - instruction fetch the effective address of the first byte in the page is an integral multiple of the page size for which pro- An access for the purpose of fetching an tection and control attributes are independently instruction. specifiable and for which reference and change - implicit access status are independently recorded. An access by the processor for the purpose of 1 block address translation or reference and change The aligned unit of storage operated on by the recording (see Book III-S). Cache Management instructions. The size of an instruction cache block may differ from the size of 1 caused by, associated with a data cache block, and both sizes may vary - caused by between implementations. The maximum block size is equal to the minimum page size. A storage access is said to be caused by an instruction if the instruction is a Load or Store 1 aligned storage access and the access (data access) is to the storage A load or store is aligned if the address of the tar- location specified by the instruction. get storage location is a multiple of the size of the transfer effected by the instruction. - associated with A storage access is said to be associated with an instruction if the access is for the purpose 1.2 Introduction of fetching the instruction (instruction fetch), or is a data access caused by the instruction, or The Power ISA User Instruction Set Architecture, dis- is an implicit access that occurs as a side cussed in Book I, defines storage as a linear array of effect of fetching or executing the instruction. bytes indexed from 0 to a maximum of 264-1. Each byte is identified by its index, called its address, and each 1 prefetched instructions byte contains a value. This information is sufficient to Instructions for which a copy of the instruction has allow the programming of applications that require no been fetched from instruction storage, but the special features of any particular system environment. instruction has not yet been executed. The Power ISA Virtual Environment Architecture, 1 uniprocessor described herein, expands this simple storage model to A system that contains one processor. include caches, virtual storage, and shared storage multiprocessors. The Power ISA Virtual Environment 1 multiprocessor Architecture, in conjunction with services based on the A system that contains two or more processors. Power ISA Operating Environment Architecture (see Book III) and provided by the operating system, permits 1 shared storage multiprocessor explicit control of this expanded storage model. A sim- A multiprocessor that contains some common stor- ple model for sequential execution allows at most one age, which all the processors in the system can storage access to be performed at a time and requires access. that all storage accesses appear to be performed in 1 performed program order. In contrast to this simple model, the A load or instruction fetch by a processor or mech- Power ISA specifies a relaxed model of storage consis- anism (P1) is performed with respect to any pro- tency. In a multiprocessor system that allows multiple cessor or mechanism (P2) when the value to be copies of a storage location, aggressive implementa- returned by the load or instruction fetch can no tions of the architecture can permit intervals of time longer be changed by a store by P2. A store by P1 during which different copies of a storage location have is performed with respect to P2 when a load by P2 different values. This chapter describes features of the from the location accessed by the store will return Power ISA that enable programmers to write correct the value stored (or a value stored subsequently). programs for this storage model. An instruction cache block invalidation by P1 is performed with respect to P2 when an instruction fetch by P2 will not be satisfied from the copy of 1.3 Virtual Storage the block that existed in its instruction cache when the instruction causing the invalidation was exe- The Power ISA system implements a virtual storage cuted, and similarly for a data cache block invali- model for applications. This means that a combination dation. of hardware and software can present a storage model that allows applications to exist within a "virtual" 408 Power ISATM II Version 2.05 address space larger than either the effective address 1 lmw, stmw, lswi, lswx, stswi, stswx space or the real address space. 1 lfdp, lfdpx, stfdp, stfdpx 1 any Cache Management instruction Each program can access 264 bytes of "effective address" (EA) space, subject to limitations imposed by An access that is not atomic is performed as a set of the operating system. In a typical Power ISA system, smaller disjoint atomic accesses. The number and each program's EA space is a subset of a larger "virtual alignment of these accesses are implementation- address" (VA) space managed by the operating sys- dependent, as is the relative order in which they are tem. performed. Accesses that are aligned on a doubleword boundary for lfdp, lfdpx, stfdp, and stfdpx are per- Each effective address is translated to a real address formed as a pair of disjoint atomic doubleword (i.e., to an address of a byte in real storage or on an I/O accesses. device) before being used to access storage. The hardware accomplishes this, using the address transla- The results for several combinations of loads and tion mechanism described in Book III. The operating stores to the same or overlapping locations are system manages the real (physical) storage resources described below. of the system, by setting up the tables and other infor- 1. When two processors execute atomic stores to mation used by the hardware address translation locations that do not overlap, and no other stores mechanism. are performed to those locations, the contents of those locations are the same as if the two stores In general, real storage may not be large enough to were performed by a single processor. map all the virtual pages used by the currently active 2. When two processors execute atomic stores to the applications. With support provided by hardware, the same storage location, and no other store is per- operating system can attempt to use the available real formed to that location, the contents of that loca- pages to map a sufficient set of virtual pages of the tion are the result stored by one of the processors. applications. If a sufficient set is maintained, "paging" activity is minimized. If not, performance degradation 3. When two processors execute stores that have the is likely. same target location and are not guaranteed to be atomic, and no other store is performed to that The operating system can support restricted access to location, the result is some combination of the virtual pages (including read/write, read only, and no bytes stored by both processors. access; see Book III), based on system standards (e.g., program code might be read only) and application 4. When two processors execute stores to overlap- requests. ping locations, and no other store is performed to those locations, the result is some combination of the bytes stored by the processors to the overlap- 1.4 Single-copy Atomicity ping bytes. The portions of the locations that do not overlap contain the bytes stored by the proces- An access is single-copy atomic, or simply atomic, if it sor storing to the location. is always performed in its entirety with no visible frag- 5. When a processor executes an atomic store to a mentation. Atomic accesses are thus serialized: each location, a second processor executes an atomic happens in its entirety in some order, even when that load from that location, and no other store is per- order is not specified in the program or enforced formed to that location, the value returned by the between processors. load is the contents of the location before the store Vector storage accesses are not guaranteed to be or the contents of the location after the store. atomic. The following other types of single-register 6. When a load and a store with the same target loca- accesses are always atomic: tion can be executed simultaneously, and no other 1 byte accesses (all bytes are aligned on byte store is performed to that location, the value boundaries) returned by the load is some combination of the 1 halfword accesses aligned on halfword boundaries contents of the location before the store and the 1 word accesses aligned on word boundaries contents of the location after the store. 1 doubleword accesses aligned on doubleword boundaries (64-bit implementations only; see Section 1.2 of Book III-E) No other accesses are guaranteed to be atomic. For 1.5 Cache Model example, the access caused by the following instruc- tions is not guaranteed to be atomic. A cache model in which there is one cache for instruc- tions and another cache for data is called a "Harvard- 1 any Load or Store instruction for which the oper- style" cache. This is the model assumed by the Power and is unaligned Chapter 1. Storage Model 409 Version 2.05 ISA, e.g., in the descriptions of the Cache Management tem being used must be known before these attributes instructions in Section 3.3. Alternative cache models can be used. may be implemented (e.g., a "combined cache" model, Storage control attributes are associated with units of in which a single cache is used for both instructions storage that are multiples of the page size. Each stor- and data, or a model in which there are several levels age access is performed according to the storage con- of caches), but they support the programming model trol attributes of the specified storage location, as implied by a Harvard-style cache. described below. The storage control attributes are the The processor is not required to maintain copies of following. storage locations in the instruction cache consistent 1 Write Through Required with modifications to those storage locations (e.g., 1 Caching Inhibited modifications caused by Store instructions). 1 Memory Coherence Required A location in the data cache is considered to be modi- 1 Guarded fied in that cache if the location has been modified 1 Endianness (e.g., by a Store instruction) and the modified data have These attributes have meaning only when an effective not been written to main storage. address is translated by the processor performing the Cache Management instructions are provided so that storage access. programs can manage the caches when needed. For Additional storage control attributes may be example, program management of the caches is defined for some implementations. See Section 4.8 of needed when a program generates or modifies code Book III-E for additional information. that will be executed (i.e., when the program modifies data in storage and then attempts to execute the modi- Programming Note fied data as instructions). The Cache Management instructions are also useful in optimizing the use of The Write Through Required and Caching Inhibited memory bandwidth in such applications as graphics attributes are mutually exclusive because, as and numerically intensive computing. The functions described below, the Write Through Required performed by these instructions depend on the storage attribute permits the storage location to be in the control attributes associated with the specified storage data cache while the Caching Inhibited attribute location (see Section 1.6, "Storage Control Attributes"). does not. The Cache Management instructions allow the pro- Storage that is Write Through Required or Caching gram to do the following. Inhibited is not intended to be used for general-pur- pose programming. For example, the lwarx, ldarx, 1 invalidate the copy of storage in an instruction stwcx., and stdcx. instructions may cause the cache block (icbi) system data storage error handler to be invoked if 1 provide a hint that an instruction will probably they specify a location in storage having either of soon be accessed from a specified instruction these attributes. cache block (icbt) 1 provide a hint that the program will probably soon In the remainder of this section, "Load instruction" access a specified data cache block (dcbt, dcbtst) includes the Cache Management and other instructions 1 allocate a data cache block and set the con- that are stated in the instruction descriptions to be tents of that block to zeros, but perform no opera- "treated as a Load", and similarly for "Store instruction". tion if no write access is allowed to the data cache block (dcba) 1 set the contents of a data cache block to zeros 1.6.1 Write Through Required (dcbz) A store to a Write Through Required storage location is 1 copy the contents of a modified data cache block performed in main storage. A Store instruction that to main storage (dcbst) specifies a location in Write Through Required storage 1 copy the contents of a modified data cache block may cause additional locations in main storage to be to main storage and make the copy of the block in accessed. If a copy of the block containing the speci- the data cache invalid (dcbf or dcbfl) fied location is retained in the data cache, the store is also performed in the data cache. The store does not 1.6 Storage Control Attributes cause the block to be considered to be modified in the data cache. Some operating systems may provide a means to allow In general, accesses caused by separate Store instruc- programs to specify the storage control attributes tions that specify locations in Write Through Required described in this section. Because the support pro- storage may be combined into one access. Such com- vided for these attributes by the operating system may bining does not occur if the Store instructions are sepa- vary between systems, the details of the specific sys- rated by a sync, eieio, or mbar instruction. 410 Power ISATM II Version 2.05 1.6.2 Caching Inhibited Memory coherence is managed in blocks called coher- ence blocks. Their size is implementation-dependent, An access to a Caching Inhibited storage location is but is larger than a word and is usually the size of a performed in main storage. A Load instruction that cache block. specifies a location in Caching Inhibited storage may For storage that is not Memory Coherence Required, cause additional locations in main storage to be software must explicitly manage memory coherence to accessed unless the specified location is also Guarded. the extent required by program correctness. The oper- An instruction fetch from Caching Inhibited storage may ations required to do this may be system-dependent. cause additional words in main storage to be accessed. No copy of the accessed locations is placed into the Because the Memory Coherence Required attribute for caches. a given storage location is of little use unless all pro- cessors that access the location do so coherently, in In general, non-overlapping accesses caused by sepa- statements about Memory Coherence Required stor- rate Load instructions that specify locations in Caching age elsewhere in this document it is generally assumed Inhibited storage may be combined into one access, as that the storage has the Memory Coherence Required may non-overlapping accesses caused by separate attribute for all processors that access it. Store instructions that specify locations in Caching Inhibited storage. Such combining does not occur if the Programming Note Load or Store instructions are separated by a sync or mbar instruction. Combining may also occur Operating systems that allow programs to request among such accesses from multiple processors that that storage not be Memory Coherence Required share a common memory interface. No combining should provide services to assist in managing occurs if the storage is also Guarded. memory coherence for such storage, including all system-dependent aspects thereof. Programming Note In most systems the default is that all storage is None of the memory barrier instructions prevent Memory Coherence Required. For some applica- the combining of accesses from different proces- tions in some systems, software management of sors. The Guarded storage attribute must be used coherence may yield better performance. In such in combination with Caching Inhibited to prevent cases, a program can request that a given unit of such combining. storage not be Memory Coherence Required, and can manage the coherence of that storage by using the sync instruction, the Cache Management 1.6.3 Memory Coherence instructions, and services provided by the operat- ing system. Required [Category: Memory Coherence] An access to a Memory Coherence Required storage 1.6.4 Guarded location is performed coherently, as follows. A data access to a Guarded storage location is per- formed only if either (a) the access is caused by an Memory coherence refers to the ordering of stores to a instruction that is known to be required by the sequen- single location. Atomic stores to a given location are tial execution model, or (b) the access is a load and the coherent if they are serialized in some order, and no storage location is already in a cache. If the storage is processor or mechanism is able to observe any subset also Caching Inhibited, only the storage location speci- of those stores as occurring in a conflicting order. This fied by the instruction is accessed; otherwise any stor- serialization order is an abstract sequence of values; age location in the cache block containing the specified the physical storage location need not assume each of storage location may be accessed. the values written to it. For example, a processor may update a location several times before the value is writ- For the Server environment, instructions are not ten to physical storage. The result of a store operation fetched from virtual storage that is Guarded. If the is not available to every processor or mechanism at the instruction addressed by the current instruction same instant, and it may be that a processor or mecha- address is in such storage, the system instruction stor- nism observes only some of the values that are written age error handler may be invoked (see Section 6.5.5 of to a location. However, when a location is accessed Book III-S). atomically and coherently by all processors and mech- anisms, the sequence of values loaded from the loca- tion by any processor or mechanism during any interval of time forms a subsequence of the sequence of values that the location logically held during that interval. That is, a processor or mechanism can never load a "newer" value first and then, later, load an "older" value. Chapter 1. Storage Model 411 Version 2.05 Programming Note In some implementations, instructions may be exe- cuted before they are known to be required by the sequential execution model. Because the results of instructions executed in this manner are dis- carded if it is later determined that those instruc- tions would not have been executed in the sequential execution model, this behavior does not affect most programs. This behavior does affect programs that access storage locations that are not "well-behaved" (e.g., a storage location that represents a control register on an I/O device that, when accessed, causes the device to perform an operation). To avoid unin- tended results, programs that access such storage locations should request that the storage be Guarded, and should prevent such storage loca- tions from being in a cache (e.g., by requesting that the storage also be Caching Inhibited). 1.6.5 Endianness [Category: Embedded.Little-Endian] The Endianness storage control attribute specifies the byte ordering (Big-Endian or Little-Endian) that is used when the storage location is accessed; see Section 1.10 of Book I. 1.6.6 Variable Length Encoded (VLE) Instructions VLE storage is used to store VLE instructions. Instruc- tions fetched from VLE storage are processed as VLE instructions. VLE storage must also be Big-Endian. Instructions fetched from VLE storage that is Little- Endian cause a Byte-ordering exception, and the sys- tem instruction storage error handler will be invoked. The VLE attribute has no effect on data accesses. See Chapter 1 of Book VLE. 412 Power ISATM II Version 2.05 1.7 Shared Storage accesses pairwise, as follows. Let A be a set of storage accesses that includes all storage This architecture supports the sharing of storage accesses associated with instructions preceding between programs, between different instances of the the barrier-creating instruction, and let B be a set same program, and between processors and other of storage accesses that includes all storage mechanisms. It also supports access to a storage loca- accesses associated with instructions following the tion by one or more programs using different effective barrier-creating instruction. For each applicable addresses. All these cases are considered storage pair ai,bj of storage accesses such that ai is in A sharing. Storage is shared in blocks that are an inte- and bj is in B, the memory barrier ensures that ai gral number of pages. will be performed with respect to any processor or mechanism, to the extent required by the associ- When the same storage location has different effective ated Memory Coherence Required attributes, addresses, the addresses are said to be aliases. Each before bj is performed with respect to that proces- application can be granted separate access privileges sor or mechanism. to aliased pages. The ordering done by a memory barrier is said to be "cumulative" if it also orders storage accesses 1.7.1 Storage Access Ordering that are performed by processors and mecha- nisms other than P1, as follows. The storage model for the ordering of storage accesses is weakly consistent. This model provides an opportu- - A includes all applicable storage accesses by nity for improved performance over a model that has any such processor or mechanism that have stronger consistency rules, but places the responsibility been performed with respect to P1 before the on the program to ensure that ordering or synchroniza- memory barrier is created. tion instructions are properly placed when storage is - B includes all applicable storage accesses by shared by two or more programs. any such processor or mechanism that are The order in which the processor performs storage performed after a Load instruction executed accesses, the order in which those accesses are per- by that processor or mechanism has returned formed with respect to another processor or mecha- the value stored by a store that is in B. nism, and the order in which those accesses are No ordering should be assumed among the storage performed in main storage may all be different. Several accesses caused by a single instruction (i.e, by an means of enforcing an ordering of storage accesses instruction for which the access is not atomic), and no are provided to allow programs to share storage with means are provided for controlling that order. other programs, or with mechanisms such as I/O devices. These means are listed below. The phrase "to the extent required by the associated Memory Coherence Required attributes" refers to the Memory Coherence Required attribute, if any, associated with each access. 1 If two Store instructions or two Load instructions specify storage locations that are both Caching Inhibited and Guarded, the corresponding storage accesses are performed in program order with respect to any processor or mechanism. 1 If a Load instruction depends on the value returned by a preceding Load instruction (because the value is used to compute the effective address specified by the second Load), the corresponding storage accesses are performed in program order with respect to any processor or mechanism to the extent required by the associated Memory Coher- ence Required attributes. This applies even if the dependency has no effect on program logic (e.g., the value returned by the first Load is ANDed with zero and then added to the effective address spec- ified by the second Load). 1 When a processor (P1) executes a Synchronize, eieio, or mbar instruction a memory bar- rier is created, which orders applicable storage Chapter 1. Storage Model 413 Version 2.05 Programming Note Because stores cannot be performed "out-of-order" not order the Store Conditional's store with respect (see Book III), if a Store instruction depends on the to storage accesses caused by instructions that value returned by a preceding Load instruction follow the Branch. (because the value returned by the Load is used to 1 Because processors may predict branch target compute either the effective address specified by the addresses and branch condition resolution, control Store or the value to be stored), the corresponding stor- dependencies (e.g., branches) do not order stor- age accesses are performed in program order. The age accesses except as described above. For same applies if whether the Store instruction is exe- example, when a subroutine returns to its caller cuted depends on a conditional Branch instruction that the return address may be predicted, with the in turn depends on the value returned by a preceding result that loads caused by instructions at or after Load instruction. the return address may be performed before the Because an isync instruction prevents the execution of load that obtains the return address is performed. instructions following the isync until instructions pre- Because processors may implement nonarchitected ceding the isync have completed, if an isync follows a duplicates of architected resources (e.g., GPRs, CR conditional Branch instruction that depends on the fields, and the Link Register), resource dependencies value returned by a preceding Load instruction, the (e.g., specification of the same target register for two load on which the Branch depends is performed before Load instructions) do not order storage accesses. any loads caused by instructions following the isync. This applies even if the effects of the "dependency" are Examples of correct uses of dependencies, sync, independent of the value loaded (e.g., the value is lwsync, eieio, and mbar to order storage compared to itself and the Branch tests the EQ bit in accesses can be found in Appendix B. "Programming the selected CR field), and even if the branch target is Examples for Sharing Storage" on page 459. the sequentially next instruction. Because the storage model is weakly consistent, the With the exception of the cases described above and sequential execution model as applied to instructions earlier in this section, data dependencies and control that cause storage accesses guarantees only that dependencies do not order storage accesses. Exam- those accesses appear to be performed in program ples include the following. order with respect to the processor executing the instructions. For example, an instruction may com- 1 If a Load instruction specifies the same storage plete, and subsequent instructions may be executed, location as a preceding Store instruction and the before storage accesses caused by the first instruction location is in storage that is not Caching Inhibited, have been performed. However, for a sequence of the load may be satisfied from a "store queue" (a atomic accesses to the same storage location, if the buffer into which the processor places stored val- location is in storage that is Memory Coherence ues before presenting them to the storage sub- Required the definition of coherence guarantees that system), and not be visible to other processors the accesses are performed in program order with and mechanisms. A consequence is that if a sub- respect to any processor or mechanism that accesses sequent Store depends on the value returned by the location coherently, and similarly if the location is in the Load, the two stores need not be performed in storage that is Caching Inhibited. program order with respect to other processors and mechanisms. Because accesses to storage that is Caching Inhibited 1 Because a Store Conditional instruction may com- are performed in main storage, memory barriers and plete before its store has been performed, a condi- dependencies on Load instructions order such tional Branch instruction that depends on the CR0 accesses with respect to any processor or mechanism value set by a Store Conditional instruction does even if the storage is not Memory Coherence Required. 414 Power ISATM II Version 2.05 the doubleword forms ldarx and stdcx. is the same Programming Note except for obvious substitutions. The first example below illustrates cumulative ordering of storage accesses preceding a memory The lwarx instruction is a load from a word-aligned barrier, and the second illustrates cumulative order- location that has two side effects. Both of these side ing of storage accesses following a memory barrier. effects occur at the same time that the load is per- Assume that locations X, Y, and Z initially contain formed. the value 0. 1. A reservation for a subsequent stwcx. instruction is created. Example 1: 2. The memory coherence mechanism is notified that Processor A: a reservation exists for the storage location speci- stores the value 1 to location X fied by the lwarx. Processor B: The stwcx. instruction is a store to a word-aligned loca- loads from location X obtaining the value tion that is conditioned on the existence of the reserva- 1, executes a sync instruction, then tion created by the lwarx and on whether the same stores the value 2 to location Y storage location is specified by both instructions. To Processor C: emulate an atomic operation with these instructions, it loads from location Y obtaining the value is necessary that both the lwarx and the stwcx. spec- 2, executes a sync instruction, then loads ify the same storage location. from location X A stwcx. performs a store to the target storage location Example 2: only if the storage location specified by the lwarx that established the reservation has not been stored into by Processor A: another processor or mechanism since the reservation stores the value 1 to location X, executes was created. If the storage locations specified by the a sync instruction, then stores the value 2 two instructions differ, the store is not necessarily per- to location Y formed. Processor B: A stwcx. that performs its store is said to "succeed". loops loading from location Y until the value 2 is obtained, then stores the value Examples of the use of lwarx and stwcx. are given in 3 to location Z Appendix B. "Programming Examples for Sharing Stor- age" on page 459. Processor C: loads from location Z obtaining the value A successful stwcx. to a given location may complete 3, executes a sync instruction, then loads before its store has been performed with respect to from location X other processors and mechanisms. As a result, a sub- sequent load or lwarx from the given location by In both cases, cumulative ordering dictates that the another processor may return a "stale" value. How- value loaded from location X by processor C is 1. ever, a subsequent lwarx from the given location by the other processor followed by a successful stwcx. by that processor is guaranteed to have returned the value 1.7.2 Storage Ordering of I/O stored by the first processor's stwcx. (in the absence of other stores to the given location). Accesses A "coherence domain" consists of all processors and all Programming Note interfaces to main storage. Memory reads and writes The store caused by a successful stwcx. is initiated by mechanisms outside the coherence domain ordered, by a dependence on the reservation, with are performed within the coherence domain in the respect to the load caused by the lwarx that estab- order in which they enter the coherence domain and lished the reservation, such that the two storage are performed as coherent accesses. accesses are performed in program order with respect to any processor or mechanism. 1.7.3 Atomic Update The Load And Reserve and Store Conditional instruc- 1.7.3.1 Reservations tions together permit atomic update of a shared storage The ability to emulate an atomic operation using lwarx location. There are word and doubleword forms of and stwcx. is based on the conditional behavior of each of these instructions. Described here is the oper- stwcx., the reservation created by lwarx, and the ation of the word forms lwarx and stwcx.; operation of clearing of that reservation if the target location is mod- Chapter 1. Storage Model 415 Version 2.05 ified by another processor or mechanism before the Programming Note stwcx. performs its store. One use of lwarx and stwcx. is to emulate a "Com- A reservation is held on an aligned unit of real storage pare and Swap" primitive like that provided by the called a reservation granule. The size of the reserva- IBM System/370 Compare and Swap instruction; tion granule is 2n bytes, where n is implementation- see Section B.1, "Atomic Update Primitives" on dependent but is always at least 4 (thus the minimum page 459. A System/370-style Compare and Swap reservation granule size is a quadword). The reserva- checks only that the old and current values of the tion granule associated with effective address EA con- word being tested are equal, with the result that tains the real address to which EA maps. programs that use such a Compare and Swap to ("real_addr(EA)" in the RTL for the Load And Reserve control a shared resource can err if the word has and Store Conditional instructions stands for "real been modified and the old value subsequently address to which EA maps".) restored. The combination of lwarx and stwcx. A processor has at most one reservation at any time. A improves on such a Compare and Swap, because reservation is established by executing a lwarx or the reservation reliably binds the lwarx and stwcx. ldarx instruction, and is lost (or may be lost, in the case together. The reservation is always lost if the word of the third, fifth, sixth and seventh item) if any of the is modified by another processor or mechanism following occur. between the lwarx and stwcx., so the stwcx. never succeeds unless the word has not been 1. The processor holding the reservation executes stored into (by another processor or mechanism) another lwarx or ldarx: this clears the first reser- since the lwarx. vation and establishes a new one. 2. The processor holding the reservation executes Programming Note any stwcx. or stdcx., regardless of whether the In general, programming conventions must ensure specified address matches the address specified that lwarx and stwcx. specify addresses that by the lwarx or ldarx that established the reserva- match; a stwcx. should be paired with a specific tion. lwarx to the same storage location. Situations in 3. The processor holding the reservation executes a which a stwcx. may erroneously be issued after dcbf or dcbfl to the reservation granule: some lwarx other than that with which it is intended whether the reservation is lost is undefined. to be paired must be scrupulously avoided. For example, there must not be a context switch in 4. Some other processor executes a Store or dcbz to which the processor holds a reservation in behalf of the same reservation granule. the old context, and the new context resumes after 5. Some other processor executes a dcbtst, dcbst, a lwarx and before the paired stwcx.. The stwcx. dcbf (but not dcbfl) to the same reservation in the new context might succeed, which is not granule: whether the reservation is lost is unde- what was intended by the programmer. Such a situ- fined. ation must be prevented by executing a stwcx. or stdcx. that specifies a dummy writable aligned 6. Some other processor executes a dcba to the location as part of the context switch; see same reservation granule: the reservation is lost if Section 6.4.3 of Book III-S and Section 5.5 of Book the instruction causes the target block to be newly III-E. established in a data cache or to be modified; oth- erwise whether the reservation is lost is undefined. 7. Any processor modifies a Reference or Change bit (see Book III-S) in the same reservation granule: whether the reservation is lost is undefined. 8. Some mechanism other than a processor modifies a storage location in the same reservation granule. For the Server environment, interrupts (see Book III-S) do not clear reservations (however, system software invoked by interrupts may clear reservations); for the Embedded environment, interrupts do not necessarily clear reservations (see Book III-E). 416 Power ISATM II Version 2.05 specify the possible causes of reservation loss in Case Programming Note 3. While the architecture alone cannot provide such a Because the reservation is lost if another processor guarantee, the characteristics listed in Cases 1 and 2 stores anywhere in the reservation granule, lock are necessary conditions for any forward progress words (or doublewords) should be allocated such guarantee. An implementation and operating system that few such stores occur, other than perhaps to can build on them to provide such a guarantee. the lock word itself. (Stores by other processors to the lock word result from contention for the lock, Programming Note and are an expected consequence of using locks to The architecture does not include a "fairness guar- control access to shared storage; stores to other antee". In competing for a reservation, two proces- locations in the reservation granule can cause sors can indefinitely lock out a third. needless reservation loss.) Such allocation can most easily be accomplished by allocating an entire reservation granule for the lock and wasting all but one word. Because reservation granule size is 1.8 Instruction Storage implementation-dependent, portable code must do such allocation dynamically. The instruction execution properties and requirements described in this section, including its subsections, Similar considerations apply to other data that are apply only to instruction execution that is required by shared directly using lwarx and stwcx. (e.g., point- the sequential execution model. ers in certain linked lists; see Section B.3, "List Insertion" on page 463). In this section, including its subsections, it is assumed that all instructions for which execution is attempted are in storage that is not Caching Inhibited and (unless 1.7.3.2 Forward Progress instruction address translation is disabled; see Book III) is not Guarded, and from which instruction fetching Forward progress in loops that use lwarx and stwcx. is does not cause the system error handler to be invoked achieved by a cooperative effort among hardware, sys- (e.g., from which instruction fetching is not prohibited tem software, and application software. by the "address translation mechanism" or the "storage The architecture guarantees that when a processor protection mechanism"; see Book III). executes a lwarx to obtain a reservation for location X and then a stwcx. to store a value to location X, either Programming Note The results of attempting to execute instructions 1. the stwcx. succeeds and the value is written to from storage that does not satisfy this assumption location X, or are described in Section 1.6.2 and Section 1.6.4 of this Book and in Book III. 2. the stwcx. fails because some other processor or mechanism modified location X, or For each instance of executing an instruction from loca- 3. the stwcx. fails because the processor's reserva- tion X, the instruction may be fetched multiple times. tion was lost for some other reason. The instruction cache is not necessarily kept consistent In Cases 1 and 2, the system as a whole makes with the data cache or with main storage. It is the progress in the sense that some processor successfully responsibility of software to ensure that instruction stor- modifies location X. Case 3 covers reservation loss age is consistent with data storage when such consis- required for correct operation of the rest of the system. tency is required for program correctness. This includes cancellation caused by some other pro- cessor writing elsewhere in the reservation granule for After one or more bytes of a storage location have X, as well as cancellation caused by the operating sys- been modified and before an instruction located in that tem in managing certain limited resources such as real storage location is executed, software must execute storage. It may also include implementation-depen- the appropriate sequence of instructions to make dent causes of reservation loss. instruction storage consistent with data storage. Other- wise the result of attempting to execute the instruction An implementation may make a forward progress guar- is boundedly undefined except as described in antee, defining the conditions under which the system Section 1.8.1, "Concurrent Modification and Execution as a whole makes progress. Such a guarantee must of Instructions" on page 419. Programming Note Following are examples of how to make instruction age consistent with data storage may vary between storage consistent with data storage. Because the systems, many operating systems will provide a system optimal instruction sequence to make instruction stor- service to perform this function. Chapter 1. Storage Model 417 Version 2.05 Case 1: The given program does not modify instruc- icbi X #invalidate copy in instr cache tions executed by another program nor does another sync #order invalidation before store program modify the instructions executed by the given # to flag program. stw r0,flag #set flag indicating instruction # storage is now consistent Assume that location X previously contained the The following instruction sequence, executed by the instruction A0; the program modified one of more bytes waiting program, will prevent the waiting programs of that location such that, in data storage, the location from executing the instruction at location X until loca- contains the instruction A1; and location X is wholly tion X in instruction storage is consistent with data stor- contained in a single cache block. The following age, and then will cause any prefetched instructions to instruction sequence will make instruction storage con- be discarded. sistent with data storage such that if the isync was in location X-4, the instruction A1 in location X would be lwz r0,flag #loop until flag = 1 (when 1 is executed immediately after the isync. cmpwi r0,1 # loaded, location X in inst'n bne $-8 # storage is consistent with dcbst X #copy the block to main storage # location X in data storage) sync #order copy before invalidation isync #discard any prefetched inst'ns icbi X #invalidate copy in instr cache isync #discard prefetched instructions In the preceding instruction sequence any context syn- chronizing instruction (e.g., rfid) can be used instead of Case 2: One or more programs execute the instruc- isync. (For Case 1 only isync can be used.) tions that are concurrently being modified by another program. For both cases, if two or more instructions in separate data cache blocks have been modified, the dcbst Assume program A has modified the instruction at loca- instruction in the examples must be replaced by a tion X and other programs are waiting for program A sequence of dcbst instructions such that each block to signal that the new instruction is ready to execute. containing the modified instructions is copied back to The following instruction sequence will make instruc- main storage. Similarly, for icbi the sequence must tion storage consistent with data storage and then set a invalidate each instruction cache block containing a flag to indicate to the waiting programs that the new location of an instruction that was modified. The sync instruction can be executed. instruction that appears above between "dcbst X" and "icbi X" would be placed between the sequence of li r0,1 #put a 1 value in r0 dcbst instructions and the sequence of icbi instruc- dcbst X #copy the block in main storage sync #order copy before invalidation tions. 418 Power ISATM II Version 2.05 1.8.1 Concurrent Modification and Programming Note Execution of Instructions An example of how failure to satisfy the require- ments given above can cause inconsistent informa- The phrase "concurrent modification and execution of tion to be presented to the system error handler is instructions" (CMODX) refers to the case in which a as follows. If the value X0 (an illegal instruction) is processor fetches and executes an instruction from executed, causing the system illegal instruction instruction storage which is not consistent with data handler to be invoked, and before the error handler storage or which becomes inconsistent with data stor- can load X0 into a register, X0 is replaced with X1, age prior to the completion of its processing. This sec- an Add Immediate instruction, it will appear that a tion describes the only case in which executing this legal instruction caused an illegal instruction instruction under these conditions produces defined exception. results. In the remainder of this section the following terminol- Programming Note ogy is used. It is possible to apply a patch or to instrument a 1 Location X is an arbitrary word-aligned storage given program without the need to suspend or halt location. the program. This can be accomplished by modify- 1 X0 is the value of the contents of location X for ing the example shown in the Programming Note at which software has made the location X in instruc- the end of Section 1.8 where one program is creat- tion storage consistent with data storage. ing instructions to be executed by one or more other programs. 1 X1, X2, ..., Xn are the sequence of the first n values occupying location X after X0. In place of the Store to a flag to indicate to the other programs that the code is ready to be executed, the 1 Xn is the first value of X subsequent to X0 for which program that is applying the patch would replace a software has again made instruction storage con- patch class instruction in the original program with sistent with data storage. a Branch instruction that would cause any program 1 The "patch class" of instructions consists of the I- executing the Branch to branch to the newly cre- form Branch instruction (b[l][a]) and the preferred ated code. The first instruction in the newly created no-op instruction (ori 0,0,0). code must be an isync, which will cause any prefetched instructions to be discarded, ensuring If the instruction from location X is executed after the that the execution is consistent with the newly cre- copy of location X in instruction storage is made consis- ated code. The instruction storage location con- tent for the value X0 and before it is made consistent for taining the isync instruction in the patch area must the value Xn, the results of executing the instruction are be consistent with data storage with respect to the defined if and only if the following conditions are satis- processor that will execute the patched code fied. before the Store which stores the new Branch 1. The stores that place the values X1, ..., Xn into instruction is performed. location X are atomic stores that modify all four bytes of location X. Programming Note 2. Each Xi, 0 i n, is a patch class instruction. It is believed that all processors that comply with 3. Location X is in storage that is Memory Coherence versions of the architecture that precede Version Required. 2.01 support concurrent modification and execution of instructions as described in this section if the If these conditions are satisfied, the result of each exe- requirements given above are satisfied, and that cution of an instruction from location X will be the exe- most such processors yield boundedly undefined cution of some Xi, 0 i n. The value of the ordinate i results if the requirements given above are not sat- associated with each value executed may be different isfied. However, in general such support has not and the sequence of ordinates i associated with a been verified by processor testing. Also, one such sequence of values executed is not constrained, (e.g., processor is known to yield undefined results in a valid sequence of executions of the instruction at certain cases if the requirements given above are location X could be the sequence Xi, Xi+2, then Xi-1). If not satisfied. these conditions are not satisfied, the results of each such execution of an instruction from location X are boundedly undefined, and may include causing incon- sistent information to be presented to the system error handler. Chapter 1. Storage Model 419 Version 2.05 420 Power ISATM II Version 2.05 Chapter 2. Effect of Operand Placement on Performance 2.1 Instruction Restart . . . . . . . . . . . 423 The placement (location and alignment) of operands in 1. Operand Size storage affects relative performance of storage 2. Operand Alignment accesses, and may affect it significantly. The best per- 3. Crossing no boundary formance is guaranteed if storage operands are 4. Crossing a cache block boundary aligned. In order to obtain the best performance across 5. Crossing a virtual page boundary the widest range of implementations, the programmer should assume the performance model described in The Move Assist instructions have no alignment Figure 1 with respect to the placement of storage oper- requirements. ands for the Embedded environment. For the Server environment, Figure 1 applies for Big-Endian byte ordering, and Figure 2 applies for Little-Endian byte ordering. Performance of storage accesses varies depending on the following: Chapter 2. Effect of Operand Placement on Performance 421 Version 2.05 Operand Boundary Crossing Operand Boundary Crossing Byte Cache Virtual Byte Cache Virtual Size Align. None Block Page2 Size Align. None Block Page2 Integer Integer 8 Byte 8 optimal - - 8 Byte 8 optimal - - 4 good good good 4 poor poor poor <4 good good good <4 poor poor poor 4 Byte 4 optimal - - 4 Byte 4 optimal - - <4 good good good <4 poor poor poor 2 Byte 2 optimal - - 2 Byte 2 optimal - - <2 good good good <2 poor poor poor 1 Byte 1 optimal - - 1 Byte 1 optimal - - lmw, 4 good good good Float stmw <4 poor poor poor lfdp, 16 optimal - - string good good good lfdpx, <16 poor poor poor Float stfdp, stfdpx lfdp, 16 optimal - - lfdpx, <16 poor poor poor 8 Byte 8 optimal - - stfdp, 4 poor poor poor stfdpx <4 poor poor poor 8 Byte 8 optimal - - 4 Byte 4 optimal - - 4 good good poor <4 poor poor poor <4 poor poor poor Vector 4 Byte 4 optimal - - any any optimal3 - - <4 poor poor poor 1 If an instruction causes an access that is not Vector atomic and any portion of the operand is in stor- any any optimal3 - - age that is Write Through Required or Caching 1 Inhibited, performance is likely to be poor. If an instruction causes an access that is not 2 If the storage operand spans two virtual pages atomic and any portion of the operand is in stor- that have different storage control attributes or, age that is Write Through Required or Caching in the Server environment, spans two seg- Inhibited, performance is likely to be poor. ments, performance is likely to be poor. 2 If the storage operand spans two virtual pages 3 The storage operands for Vector instructions that have different storage control attributes or, in are all assumed to be aligned (see Section 6.4 the Server environment, spans two segments, of Book I). performance is likely to be poor. 3 Figure 2. [Category: Server] Performance effects The storage operands for Vector instructions are all assumed to be aligned (see Section 6.4 of of storage operand placement, Little- Book I). Endian Figure 1. Performance effects of storage operand placement 422 Power ISATM II Version 2.05 2.1 Instruction Restart Programming Note There are many events that might cause a Load or In this section, "Load instruction" includes the Cache Store instruction to be restarted. For example, a Management and other instructions that are stated in hardware error may cause execution of the instruc- the instruction descriptions to be "treated as a Load", tion to be aborted after part of the access has been and similarly for "Store instruction". performed, and the recovery operation could then The following instructions are never restarted after hav- cause the aborted instruction to be re-executed. ing accessed any portion of the storage operand When an instruction is aborted after being partially (unless the instruction causes a "Data Address Break- executed, the contents of the instruction pointer point match", for which the corresponding rules are indicate that the instruction has not been executed, given in Book III). however, the contents of some registers may have 1. A Store instruction that causes an atomic access been altered and some bytes within the storage and, for the Embedded environment, accesses operand may have been accessed. The following storage that is Guarded are examples of an instruction being partially exe- 2. A Load instructionthat causes an atomic access to cuted and altering the program state even though it storage that is Guarded and, for the Server envi- appears that the instruction has not been executed. ronment, is also Caching Inhibited. 1. Load Multiple, Load String: Some registers in Any other Load or Store instruction may be partially the range of registers to be loaded may have executed and then aborted after having accessed a been altered. portion of the storage operand, and then re-executed 2. Any Store instruction, dcbz: Some bytes of the (i.e., restarted, by the processor or the operating sys- storage operand may have been altered. tem). If an instruction is partially executed, the contents of registers are preserved to the extent that the correct result will be produced when the instruction is re-exe- cuted. Additional restrictions on the partial execution of instructions are described in Section 6.6 of Book III-S and Section 5.7 of Book III-E. Programming Note In order to ensure that the contents of registers are preserved to the extent that a partially executed instruction can be re-executed correctly, the regis- ters that are preserved must satisfy the following conditions. For any given instruction, zero or more of the conditions applies. 1 For a fixed-point Load instruction that is not a multiple or string form, or for an eciwx instruc- tion, if RT=RA or RT=RB then the contents of register RT are not altered. 1 For an update form Load or Store instruction, the contents of register RA are not altered. Chapter 2. Effect of Operand Placement on Performance 423 Version 2.05 424 Power ISATM II Version 2.05 Chapter 3. Storage Control Instructions 3.1 Parameters Useful to Application Pro- 3.4.1 Instruction Synchronize Instruction . grams . . . . . . . . . . . . . . . . . . . . . . . . . 425 440 3.2 Data Stream Control Register (DSCR) 3.4.2 Load and Reserve and Store Condi- [Category: Stream] . . . . . . . . . . . . . . . 426 tional Instructions. . . . . . . . . . . . . . . . . 440 3.3 Cache Management Instructions . 427 3.4.2.1 64-Bit Load and Reserve and Store 3.3.1 Instruction Cache Instructions . . 428 Conditional Instructions [Category: 64-Bit] 3.3.2 Data Cache Instructions . . . . . . 429 444 3.3.2.1 Obsolete Data Cache Instructions 3.4.3 Memory Barrier Instructions . . . . 446 [Category: Vector.Phased-Out] . . . . . . 437 3.4.4 Wait Instruction . . . . . . . . . . . . . 449 3.4 Synchronization Instructions. . . . . 440 3.1 Parameters Useful to Appli- cation Programs It is suggested that the operating system provide a ser- vice that allows an application program to obtain the following information. 1. The virtual page sizes 2. Coherence block size 3. Granule sizes for reservations 4. An indication of the cache model implemented (e.g., Harvard-style cache, combined cache) 5. Instruction cache size 6. Data cache size 7. Instruction cache block size 8. Data cache block size 9. Instruction cache associativity 10. Data cache associativity 11. Number of stream IDs supported for the stream variant of dcbt 12. Factors for converting the Time Base to seconds If the caches are combined, the same value should be given for an instruction cache attribute and the corre- sponding data cache attribute. Chapter 3. Storage Control Instructions 425 Version 2.05 3.2 Data Stream Control Regis- ter (DSCR) [Category: Stream] The layout of the Data Stream Control Register (DSCR) is shown in Figure 3 below. See Section 3.3.2 for information on streams. // SSE DPFD 0 60 61 63 Figure 3. Data Stream Control Register Bits Name Description 60 SSE Store Stream Enable This bit enables hardware detec- tion and initiation of store streams. 61:63 DPFD Default Prefetch Depth This field supplies a prefetch depth for hardware-detected streams and for software-defined streams for which a depth of zero is specified or for which dcbt/ dcbtst with TH=1010 is not used in their description. Values and their meanings are as follows. 0 default (LPCRDPFD) 1 none 2 shallowest 3 shallow 4 medium 5 deep 6 deeper 7 deepest The contents of the DSCR affect how a processor han- dles hardware-detected and software-defined data streams. A move to the DSCR causes all active and nascent data streams to cease to exist. 426 Power ISATM II Version 2.05 3.3 Cache Management Instructions The Cache Management instructions obey the sequen- tial execution model except as described in Section 3.3.1. In the instruction descriptions the statements "this instruction is treated as a Load" and "this instruction is treated as a Store" mean that the instruction is treated as a Load (Store) from (to) the addressed byte with respect to address translation, the definition of program order on page 407, storage protection, reference and change recording, and the storage access ordering described in Section 1.7.1 and is treated as a Read (Write) from (to) the addressed byte with respect to debug events unless otherwise specified. (See Book III-E.) Some Cache Management instructions contain a CT field that is used to specify a cache level within a cache hierarchy or a portion of a cache structure to which the instruction is to be applied. The correspondence between the CT value specified and the cache level is shown below. CT Field Value Cache Level 0 Primary Cache 2 Secondary Cache CT values not shown above may be used to specify implementation-dependent cache levels or implemen- tation-dependent portions of a cache structure. Chapter 3. Storage Control Instructions 427 Version 2.05 3.3.1 Instruction Cache Instructions Instruction Cache Block Invalidate X-form Instruction Cache Block Touch X-form icbi RA,RB icbt CT, RA, RB [Category: Embedded] 31 /// RA RB 982 / 0 6 11 16 21 31 31 / CT RA RB 22 / 0 6 7 11 16 21 31 Let the effective address (EA) be the sum (RA|0)+(RB). Let the effective address (EA) be the sum (RA|0)+(RB). If the block containing the byte addressed by EA is in storage that is Memory Coherence Required and a If CT=0, this instruction provides a hint that the pro- block containing the byte addressed by EA is in the gram will probably soon execute code from the instruction cache of any processors, the block is invali- addressed location. dated in those instruction caches. If CT0, the operation performed by this instruction is If the block containing the byte addressed by EA is in implementation-dependent, except that the instruction storage that is not Memory Coherence Required and is treated as a no-op for values of CT that are not the block is in the instruction cache of this processor, implemented. the block is invalidated in that instruction cache. The hint is ignored if the block is Caching Inhibited. The function of this instruction is independent of This instruction treated as a Load (see Section 3.3), whether the block containing the byte addressed by EA except that the system instruction storage error handler is in storage that is Write Through Required or Caching is not invoked. Inhibited. Special Registers Altered: This instruction is treated as a Load (see Section 3.3), None except that reference and change recording need not be done. Special Registers Altered: None Programming Note Because the instruction is treated as a Load, the effective address is translated using translation resources that are used for data accesses, even though the block being invalidated was copied into the instruction cache based on translation resources used for instruction fetches (see Book III). Programming Note The invalidation of the specified block need not have been performed with respect to the processor executing the icbi instruction until a subsequent isync instruction has been executed by that pro- cessor. No other instruction or event has the corre- sponding effect. 428 Power ISATM II Version 2.05 3.3.2 Data Cache Instructions The Data Cache instructions control various aspects of The encodings of the TH field and of the corresponding the data cache. EA values, are as follows. In the EA layout diagrams, fields shown as "/"s are reserved. These fields, and TH field in the dcbt and dcbtst instructions reserved values of defined EA fields, are treated in the Described below are the TH field values for the dcbt same manner as the corresponding cases for instruc- and dcbtst instructions. For all TH field values which tion fields (see the section entitled "Reserved Fields are not listed, the hint provided by the instruction is and Reserved Values" in Book I), except that a undefined. reserved value in a defined EA field does not make the instruction form invalid. If a defined EA field contains a TH=0b00000 reserved value, the hint provided by the instruction is undefined. If TH=0b00000, the dcbt/dcbtst instruction provides a hint that the program will probably soon access the TH Description block containing the byte addressed by EA. 01000 The dcbt/dcbtst instruction provides a hint TH=0b00000 - 0b00111 that describes certain attributes of a data [Category: Cache Specification] stream, and may indicate that the program will probably soon access the stream. In addition to the hint specified above for the TH field value of 0b00000, an additional hint is provided indicat- The EA is interpreted as follows. ing that placement of the block in the cache specified by the TH field might also improve performance. The EATRUNC D UG / ID correspondence between each value of the TH field 0 57 59 60 63 and the cache to be specified is the same as the corre- spondence between each value the CT field and the Bit(s) Description cache to be specified as defined in Section 3.2. The hints corresponding to values of the TH field not sup- 0:56 EATRUNC ported by the implementation are undefined. High-order 57 bits of the effective TH=0b01000 - 0b01111 [Category: Stream] address of the first unit of the data stream. (i.e., the effective address of The dcbt/dcbtst instructions provide hints regarding a the first unit of the stream is sequence of accesses to data, or indicate the expected EATRUNC || 70) use thereof. Such a sequence is called a "data stream", and a dcbt/dcbtst instruction in which TH is 57 Direction (D) set to one of these values is said to be a "data stream 0 Subsequent units are the sequen- variant" of dcbt/dcbtst. In the remainder of this sec- tially following units. tion, "data stream" may be abbreviated to "stream". 1 Subsequent units are the sequen- tially preceding units. A data stream to which a program may perform Load accesses is said to be a "load data stream", and is 58 Unlimited/GO (UG) described using the data stream variants of the dcbt 0 No information is provided by the instruction. A data stream to which a program may per- UG field. form Store accesses is said to be a "store data stream", 1 The number of units in the data and is described using the data stream variants of the stream is unlimited, the program's dcbtst instruction. need for each unit of the stream is When, and how often, effective addresses for a data not likely to be transient, and the stream are translated is implementation-dependent. program will probably soon access the stream. The address and length of such data streams is speci- fied in terms of aligned 128-byte units of storage; in the 59 Reserved remainder of this instruction description, "aligned 128- 60:63 Stream ID (ID) byte unit of storage" is abbreviated to "unit". Stream ID to use for this data stream. Each such data stream is associated, by software, with a stream ID, which is a resource that the processor 01010 The dcbt/dcbtst instruction provides a hint uses to distinguish the data stream from other such that describes certain attributes of a data data streams. The number of stream IDs is an imple- stream, or indicates that the program will mentation-dependent value in the range 1:16. Stream probably soon access data streams that have IDs are numbered sequentially starting from 0. been described using data stream variants of Chapter 3. Storage Control Instructions 429 Version 2.05 the dcbt/dcbtst instruction, or will probably trary unit of the stream into cache. The no longer access such data streams. values are as follows. The EA is interpreted as follows. If GO=1 and 0 default = DSCRDPFD S0b00 the hint provided by the instruction is 1 none undefined; the remainder of this instruction 2 shallowest description assumes that this combination is 3 shallow not used. 4 medium 5 deep /// GO S / DEP // UNITCNT T U / ID 6 deeper 7 deepest 0 32 35 36 39 47 57 59 60 63 39:46 Reserved Bit(s) Description 47:56 UNITCNT 0:31 Reserved Number of units in data stream. 32 GO 57 Transient (T) 0 No information is provided by the GO field. If T=1, the program's need for each unit 1 For dcbt, the program will probably of the data stream is likely to be tran- soon access all nascent load and sient (i.e., the time interval during store data streams that have been which the program accesses the unit is completely described, and will likely to be short). probably no longer access all other 58 Unlimited (U) nascent load and store data streams. All other fields of the EA If U=1, the number of units in the data are ignored. ("Nascent" and "com- stream is unlimited (and the UNITCNT pletely described" are defined field is ignored). below.) For dcbtst, this field value 59 Reserved holds no meaning and is treated as 60:63 Stream ID (ID) though it were zero. 33:34 Stop (S) Stream ID to use for this data stream (GO=0 and S=0b00), or stream ID 00 No information is provided by the S associated with the data stream which field. the program will probably no longer 01 Reserved access(S=0b10). 10 The program will probably no longer access the data stream (if If the specified stream ID value is greater than m -1, any) associated with the specified where m is the number of stream IDs provided by the stream ID. (All other fields of the implementation, and either (a) TH=0b01000 or (b) EA except the ID field are ignored.) TH=0b01010 with GO=0 and S0b11, no hint is pro- 11 For dcbt, the program will probably vided by the instruction. no longer access the load and The following terminology is used to describe the state store data streams associated with of a data stream. Except as described in the paragraph all stream IDs. (All other fields of after the next paragraph, the state of a data stream at a the EA are ignored.) For dcbtst, given time is determined by the most recently provided this field value holds no meaning, hint for the stream. and is treated as though it were 0b00. 1 A data stream for which only descriptive hints have been provided (by dcbt/dcbtst instructions with TH=0b01000 and UG=0 or with TH=0b01010 and 35 Reserved GO=0 and S=0b00) is said to be "nascent". A 36:38 Depth (DEP) nascent data stream for which both kinds of descriptive hints have been provided (by both of The DEP field provides a relative esti- the dcbt/dcbtst usages listed in the preceding mate of how many units ahead of the sentence) is considered to be "completely point of stream-use the latency-reduc- described". ing actions should go. This value reflects a comparison of the rate of 1 A data stream for which a hint has been provided consumption of the units of the data (by a dcbt/dcbtst instruction with TH=0b01000 stream and the latency to bring an arbi- and UG=1 or dcbt with TH=0b01010 and GO=1) 430 Power ISATM II Version 2.05 that the program will probably soon access it is said to be "active". 1 A data stream that is either nascent or active is considered to "exist". 1 A data stream for which a hint has been provided (e.g., by a dcbt instruction with TH=0b01010 and S0b00) that the program will probably no longer access it is considered no longer to exist. The hint provided by a dcbt/dcbtst instruction with TH=0b01000 and UG=1 implicitly includes a hint that the program will probably no longer access the data stream (if any) previously associated with the specified stream ID. The hint provided by a dcbt/dcbtst instruc- tion with TH=0b01000 and UG=0, or with TH=0b01010 and GO=0 and S=0b00 implicitly includes a hint that the program will probably no longer access the active data stream (if any) previously associated with the specified stream ID. If a data stream is specified without using a dcbt/ dcbtst instruction with TH=0b01010 and GO=0 and S=0b00, then the number of units in the stream is unlimited, and the program's need for each unit of the stream is not likely to be transient. Interrupts (see Book III) cause all existing data streams to cease to exist. In addition, depending on the imple- mentation, certain conditions and events may cause an existing data stream to cease to exist; for example, in some implementations an existing data stream ceases to exist when it comes to the end of a page. Chapter 3. Storage Control Instructions 431 Version 2.05 Programming Note To obtain the best performance across the widest 1 At each level of the storage hierarchy that is "near" range of implementations that support the data stream the processor, units of a data stream that is speci- variants of dcbt/dcbtst , the programmer should fied as transient are most likely to be replaced. As assume the following model when using those variants. a result, it may be desirable to stagger addresses of streams (choose addresses that map to different 1 The processor's response to a hint that the pro- cache congruence classes) to reduce the likeli- gram will probably soon access a given data hood that a unit of a transient stream will be stream is to take actions that reduce the latency of replaced prior to being accessed by the program. accesses to the first few units of the stream. (Such actions may include prefetching cache blocks into 1 Processors that comply with versions of the archi- levels of the storage hierarchy that are "near" the tecture that do not support the TH field at all treat processor.) Thereafter, as the program accesses TH = 0b01000 and 0b01010 as if TH = 0b00000. each successive unit of the stream, the processor 1 A single set of stream IDs is shared between the takes latency-reducing actions for additional units dcbt and dcbtst instructions. of the stream, pacing these actions with the pro- gram's accesses (i.e., taking the actions for only a 1 On some implementations, data streams that are limited number of units ahead of the unit that the not specified by software may be detected by the program is currently accessing). processor. Such data streams are called "hard- ware-detected data streams". On some such The processor's response to a hint that the pro- implementations, data stream resources gram will probably no longer access a given data (resources that are used primarily to support data stream, or to the cessation of existence of a data streams) are shared between software-specified stream, is to stop taking latency-reducing actions data streams and hardware-detected data for the stream. streams. On these latter implementations, the pro- 1 A data stream having finite length ceases to exist gramming model includes the following. when the latency-reducing actions have been - Software-specified data streams take prece- taken for all units of the stream. dence over hardware-detected data streams 1 If the program ceases to need a given data stream in use of data stream resources. before having accessed all units of the stream - The processor's response to a hint that the (always the case for streams having unlimited program will probably no longer access a length), performance may be improved if the pro- given data stream, or to the cessation of exist- gram then provides a hint that it will no longer ence of a data stream, includes releasing the accesses the stream (e.g., by executing the appro- associated data stream resources, so that priate dcbt instruction with TH=0b01010 and they can be used by hardware-detected data S0b00). streams. 432 Power ISATM II Version 2.05 Programming Note This Programming Note describes several aspects of the dcbt instruction with GO=1 from the preceding using the data stream variants of the dcbt and dcbtst dcbt/dcbtst instructions, and another eieio (or instructions. sync) instruction must separate that dcbt instruc- tion from the following dcbt/dcbtst instructions. 1 A non-transient data stream having unlimited length can be completely specified, including pro- 1 In practice, the second eieio (or sync) viding the hint that the program will probably soon described above can sometimes be omitted. For access it, using one dcbt instruction. The corre- example, if the program consists of an outer loop sponding specification for a data stream having that contains the dcbt/dcbtst instructions and an other attributes requires two dcbt/dcbtst instruc- inner loop that contains the Load or Store instruc- tions to describe the stream and one additional tions that access the data streams, the character- dcbt instruction to start the stream. However, one istics of the inner loop and of the implementation's dcbt instruction with TH=0b01010 and GO=1 can branch prediction mechanisms may make it highly apply to a set of the data streams described in the unlikely that hints corresponding to a given itera- preceding sentence, so the corresponding specifi- tion of the outer loop will be provided out of pro- cation for n such data streams requires 2×n dcbt/ gram order with respect to hints corresponding to dcbtst instructions plus one dcbt instruction. the previous iteration of the outer loop. (Also, any (There is no need to execute a dcbt/dcbtst providing of hints out of program order affects only instruction with TH=0b01010 and S=0b10 for a performance, not program correctness.) given stream ID before using the stream ID for a 1 To mitigate the effects of interrupts on data new data stream; the implicit portion of the hint streams, it may be desirable to specify a given provided by dcbt/dcbtst instructions that describe "logical" data stream as a sequence of shorter, data streams suffices.) component data streams. Similar considerations 1 If it is desired that the hint provided by a given apply to conditions and events that, depending on dcbt/dcbtst instruction be provided in program the implementation, may cause an existing data order with respect to the hint provided by another stream to cease to exist; for example, in some dcbt/dcbtst instruction, the two instructions must implementations an existing data stream ceases to be separated by an eieio (or sync) instruction. exist when it comes to the end of a virtual page. For example, if a dcbt instruction with 1 If it is desired to specify data streams without TH=0b01010 and GO=1 is intended to indicate regard to the number of stream IDs provided by that the program will probably soon access the implementation, stream IDs should be nascent data streams described (completely) by assigned to data streams in order of decreasing preceding dcbt/dcbtst instructions, and is stream importance (stream ID 0 to the most impor- intended not to indicate that the program will prob- tant stream, stream ID 1 to the next most important ably soon access nascent data streams described stream, etc.). This order ensures that the hints for (completely) by following dcbt/dcbtst instructions, the most important data streams will be provided. an eieio (or sync) instruction must separate Data Cache Block Allocate X-form when the instruction completes. The hint is ignored if the block is Caching Inhibited. dcba RA,RB This instruction is treated as a Store (see Section 3.3) [Category: Embedded] except that the instruction is treated as a no-op if exe- cution of the instruction would cause the system data 31 /// RA RB 758 / storage error handler to be invoked. 0 6 11 16 21 31 Special Registers Altered: Let the effective address (EA) be the sum (RA|0)+(RB). None This instruction provides a hint that the program will probably soon store into a portion of the block and the contents of the rest of the block are not meaningful to the program. The contents of the block are undefined Chapter 3. Storage Control Instructions 433 Version 2.05 Data Cache Block Touch X-form Programming Notes dcbt RA,RB,TH [Category: Server] New programs should avoid using the dcbt and dcbt TH,RA,RB [Category: Embedded] dcbtst mnemonics; one of the extended mnemon- ics should be used exclusively. 31 TH RA RB 278 / If the dcbt mnemonic is used with only two 0 6 11 16 21 31 operands, the TH operand assumed to be 0b00000. Let the effective address (EA) be the sum (RA|0)+(RB). Processors that comply with versions of the archi- The dcbt instruction provides a hint that describes a tecture that precede Version 2.01 do not necessar- block or data stream to which the program may perform ily ignore the hint provided by dcbt and dcbtst if a Load access. The instruction is also used to indicate the specified block is in storage that is Guarded imminent access or end of access to described load and not Caching Inhibited. and store data streams. A hint that the program will probably soon load from a given storage location is Programming Note ignored if the location is Caching Inhibited or Guarded. See the Programming Notes at the beginning of this section. The only operation that is "caused" by the dcbt instruc- tion is the providing of the hint. The actions (if any) taken by the processor in response to the hint are not considered to be "caused by" or "associated with" the dcbt instruction (e.g., dcbt is considered not to cause any data accesses). No means are provided by which software can synchronize these actions with the execu- tion of the instruction stream. For example, these actions are not ordered by the memory barrier created by a sync instruction. The dcbt instruction may complete before the opera- tion it causes has been performed. The nature of the hint depends, in part, on the value of the TH field, as specified at the beginning of this sec- tion. If TH0b01010 this instruction is treated as a Load (see Section 3.2), except that the system data storage error handler is not invoked, and reference and change recording need not be done. Special Registers Altered: None Extended Mnemonics: Extended mnemonics are provided for the Data Cache Block Touch instruction so that it can be coded with the TH value as the last operand for all categories. Extended: Equivalent to: dcbtct RA,RB,TH dcbt for TH values of 0b00000 - 0b00111; other TH values are invalid. dcbtds RA,RB,TH dcbt for TH values of 0b00000 or 0b01000 - 0b01111; other TH values are invalid. 434 Power ISATM II Version 2.05 Data Cache Block Touch for Store X-form dcbtst RA,RB,TH [Category: Server] dcbtst TH,RA,RB [Category: Embedded] 31 TH RA RB 246 / 0 6 11 16 21 31 Let the effective address (EA) be the sum (RA|0)+(RB). The dcbtst instruction provides a hint that describes a block or data stream to which the program may perform a Store access, or indicates the expected use thereof. A hint that the program will soon store to a given stor- age location is ignored if the location is Caching Inhib- ited or Guarded. The only operation that is "caused by" the dcbtst instruction is the providing of the hint. The actions (if any) taken by the processor in response to the hint are not considered to be "caused by" or "associated with" the dcbtst instruction (e.g., dcbtst is considered not to cause any data accesses). No means are provided by which software can synchronize these actions with the execution of the instruction stream. For example, these actions are not ordered by memory barriers. The dcbtst instruction may complete before the opera- tion it causes has been performed. The nature of the hint depends, in part, on the value of the TH field, as specified at the beginning of this sec- tion. If TH0b01010 this instruction is treated as a Load (see Section 3.2), except that the system data storage error handler is not invoked, and reference and change recording need not be done. Special Registers Altered: None Extended Mnemonic: An extended mnemonic is provided for the Data Cache Block Touch for Store instruction so that it can be coded with the TH value as the last operand for all cat- egories. Extended: Equivalent to: dcbtstct RA,RB,TH dcbtst for TH values of 0b00000 or 0b00000 - 0b00111; other TH values are invalid. dcbtstds RA,RB,TH dcbtst for TH values of 0b00000 or 0b01000 - 0b01010; other TH values are invalid. Programming Note See the Programming Notes at the beginning of this section. Chapter 3. Storage Control Instructions 435 Version 2.05 Data Cache Block set to Zero X-form Data Cache Block Store X-form dcbz RA,RB dcbst RA,RB 31 /// RA RB 1014 / 31 /// RA RB 54 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 1 0 Let the effective address (EA) be the sum (RA|0)+(RB). else b 1 (RA) EA 1 b + (RB) If the block containing the byte addressed by EA is in n 1 block size (bytes) storage that is Memory Coherence Required and a m 1 log2(n) block containing the byte addressed by EA is in the ea 1 EA0:63-m || m0 data cache of any processor and any locations in the MEM(ea, n) 1 n0x00 block are considered to be modified there, those loca- Let the effective address (EA) be the sum (RA|0)+(RB). tions are written to main storage, additional locations in the block may be written to main storage, and the block All bytes in the block containing the byte addressed by ceases to be considered to be modified in that data EA are set to zero. cache. This instruction is treated as a Store (see Section 3.3). If the block containing the byte addressed by EA is in Special Registers Altered: storage that is not Memory Coherence Required and None the block is in the data cache of this processor and any locations in the block are considered to be modified there, those locations are written to main storage, addi- Programming Note tional locations in the block may be written to main stor- dcbz does not cause the block to exist in the data age, and the block ceases to be considered to be cache if the block is in storage that is Caching modified in that data cache. Inhibited. The function of this instruction is independent of For storage that is neither Write Through Required whether the block containing the byte addressed by EA nor Caching Inhibited, dcbz provides an efficient is in storage that is Write Through Required or Caching means of setting blocks of storage to zero. It can Inhibited. be used to initialize large areas of such storage, in a manner that is likely to consume less memory This instruction is treated as a Load (see Section 3.3), bandwidth than an equivalent sequence of Store except that reference and change recording need instructions. not be done, and it is treated as a Write with respect to debug events. For storage that is either Write Through Required or Caching Inhibited, dcbz is likely to take signifi- Special Registers Altered: cantly longer to execute than an equivalent None sequence of Store instructions. For example, on some implementations dcbz for such storage may cause the system alignment error handler to be invoked; on such implementations the system alignment error handler sets the specified block to zero using Store instructions. See Section 5.9.1 of Book III-S and Section 4.9.1 of Book III-E. for additional information about dcbz. 436 Power ISATM II Version 2.05 Data Cache Block Flush X-form not be done, and it is treated as a Write with respect to debug events. dcbf RA,RB,L Special Registers Altered: None 31 /// L RA RB 86 / 0 6 9 11 16 21 31 Extended Mnemonics: Extended mnemonics are provided for the Data Cache Let the effective address (EA) be the sum (RA|0)+(RB). Block Flush instruction so that it can be coded with the L=0 L value as part of the mnemonic rather than as a numeric operand. These are shown as examples with If the block containing the byte addressed by EA is the instruction. See Appendix A. "Assembler Extended in storage that is Memory Coherence Required Mnemonics" on page 457. The extended mnemonics and a block containing the byte addressed by EA are shown below. is in the data cache of any processor and any loca- tions in the block are considered to be modified Extended: Equivalent to: there, those locations are written to main storage dcbf RA,RB dcbf RA,RB,0 and additional locations in the block may be written dcbfl RA,RB dcbf RA,RB,1 to main storage. The block is invalidated in the dcbflp RA,RB dcbf RA,RB,3 data caches of all processors. Except in the dcbf instruction description in this sec- If the block containing the byte addressed by EA is tion, references to "dcbf" in Books I-III imply L=0 unless in storage that is not Memory Coherence Required otherwise stated or obvious from context; "dcbfl" is and the block is in the data cache of this processor used for L=1 and "dcbflp" is used for L=3. and any locations in the block are considered to be modified there, those locations are written to main Programming Note storage and additional locations in the block may dcbf serves as both a basic and an extended mne- be written to main storage. The block is invali- monic. The Assembler will recognize a dcbf mne- dated in the data cache of this processor. monic with three operands as the basic form, and a L=1 ("dcbf local") [Category: Server ] dcbf mnemonic with two operands as the extended form. In the extended form the L operand is omit- The L=1 form of the dcbf instruction permits a pro- ted and assumed to be 0. gram to limit the scope of the "flush" operation to the data cache of this processor. If the block con- taining the byte addressed by EA is in the data Programming Note [Category: Server] cache of this processor, it is removed from this dcbf with L=1 can be used to provide a hint that a cache. The coherence of the block is maintained to block in this processor's data cache will not be the extent required by the Memory Coherence reused soon. Required storage attribute. dcbf with L=3 can be used to flush a block from the L = 3 ("dcbf local primary") [Category: Server] processor's primary data cache but reduce the The L=3 form of the dcbf instruction permits a pro- latency of a subsequent access. For example, the gram to limit the scope of the "flush" operation to block may be evicted from the primary data cache the primary data cache of this processor. If the but a copy retained in a lower level of the cache block containing the byte addressed by EA is in the hierarchy. primary data cache of this processor, it is removed Programs which manage coherence in software from this cache. The coherence of the block is must use dcbf with L=0. maintained to the extent required by the Memory Coherence Required storage attribute. For the L operand, the value 2 is reserved. The results 3.3.2.1 Obsolete Data Cache Instruc- of executing a dcbf instruction with L=2 are boundedly tions [Category: Vector.Phased-Out] undefined. The Data Stream Touch (dst), Data Stream Touch for The function of this instruction is independent of Store (dstst), and Data Stream Stop (dss) instructions whether the block containing the byte addressed by EA (primary opcode 31, extended opcodes 342, 374, and is in storage that is Write Through Required or Caching 822 respectively), which were proposed for addition to Inhibited. the Power ISA and were implemented by some proces- This instruction is treated as a Load (see Section 3.3), sors, must be treated as no-ops (rather than as illegal except that reference and change recording need instructions). Chapter 3. Storage Control Instructions 437 Version 2.05 The treatment of these instructions is independent of whether other Vector instructions are available (i.e., is independent of the contents of MSRVEC (see Book III-S) or MSRSPV (see Book III-E). Programming Note These instructions merely provided hints, and thus were permitted to be treated as no-ops even on processors that implemented them. The treatment of these instructions is independent of whether other Vector instructions are available because, on processors that implemented the instructions, the instructions were available even when other Vector instructions were not. The extended mnemonics for these instructions were dstt, dststt, and dssall. 438 Power ISATM II Version 2.05 Chapter 3. Storage Control Instructions 439 Version 2.05 3.4 Synchronization Instructions The synchronization instructions are used to ensure instructions are initiated, or to control storage access that certain instructions have completed before other ordering, or to support debug operations. 3.4.1 Instruction Synchronize 3.4.2 Load and Reserve and Store Instruction Conditional Instructions The Load And Reserve and Store Conditional instruc- Instruction Synchronize XL-form tions can be used to construct a sequence of instruc- tions that appears to perform an atomic update isync operation on an aligned storage location. See Section 1.7.3, "Atomic Update" for additional informa- 19 /// /// /// 150 / tion about these instructions. 0 6 11 16 21 31 The Load And Reserve and Store Conditional instruc- Executing an isync instruction ensures that all instruc- tions are fixed-point Storage Access instructions; see tions preceding the isync instruction have completed Section 3.3.1, "Fixed-Point Storage Access Instruc- before the isync instruction completes, and that no tions", in Book I. subsequent instructions are initiated until after the The storage location specified by the Load And isync instruction completes. It also ensures that all Reserve and Store Conditional instructions must be in instruction cache block invalidations caused by icbi storage that is Memory Coherence Required if the loca- instructions preceding the isync instruction have been tion may be modified by other processors or mecha- performed with respect to the processor executing the nisms. If the specified location is in storage that is Write isync instruction, and then causes any prefetched Through Required or Caching Inhibited, the system instructions to be discarded. data storage error handler or the system alignment Except as described in the preceding sentence, the error handler is invoked for the Server environment and isync instruction may complete before storage may be invoked for the Embedded environment. accesses associated with instructions preceding the The Load and Reserve instructions include an Exclu- isync instruction have been performed. sive Access hint (EH), which can be used to indicate This instruction is context synchronizing (see Book III). that the instruction sequence being executed is imple- menting one of two types of algorithms: Special Registers Altered: None Atomic Update (EH=0) This hint indicates that the program is using a fetch and operate (e.g., fetch and add) or some similar algorithm and that all programs accessing the shared variable are likely to use a similar operation to access the shared variable for some time. Exclusive Access (EH=1) This hint indicates that the program is attempting to acquire a lock and if it succeeds, will perform another store to the lock variable (releasing the lock) before another program attempts to modify the lock variable. Programming Note The Memory Coherence Required attribute on other processors and mechanisms ensures that their stores to the reservation granule will cause the reservation created by the Load And Reserve instruction to be lost. 440 Power ISATM II Version 2.05 Programming Note Because the Load And Reserve and Store Condi- tional instructions have implementation dependen- cies (e.g., the granularity at which reservations are managed), they must be used with care. The oper- ating system should provide system library pro- grams that use these instructions to implement the high-level synchronization functions (Test and Set, Compare and Swap, locking, etc.; see Appendix B) that are needed by application programs. Applica- tion programs should use these library programs, rather than use the Load And Reserve and Store Conditional instructions directly. Programming Note EH = 1 should be used when the program is obtain- ing a lock variable which it will subsequently release before another program attempts to per- form a store to it. When contention for a lock is sig- nificant, using this hint may reduce the number of times a cache block is transferred between proces- sor caches. EH = 0 should be used when all accesses to a mutex variable are performed using an instruction sequence with Load and Reserve followed by Store Conditional (e.g., emulating atomic update primitives such as "Fetch and Add"; see Appendix B). The processor may use this hint to optimize the cache to cache transfer of the block containing the mutex variable, thus reducing the latency of per- forming an operation such as `Fetch and Add'. Programming Note Warning: On some processors that comply with versions of the architecture that precede Version 2.00, executing a Load And Reserve instruction in which EH = 1 will cause the illegal instruction error handler to be invoked. Chapter 3. Storage Control Instructions 441 Version 2.05 Load Word And Reserve Indexed X-form Store Word Conditional Indexed X-form lwarx RT,RA,RB,EH stwcx. RS,RA,RB 31 RT RA RB 20 EH 31 RS RA RB 150 1 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b +(RB) EA 1 b + (RB) RESERVE 1 1 if RESERVE then RESERVE_ADDR 1 real_addr(EA) if RESERVE_ADDR = real_addr(EA) then RT 1 320 || MEM(EA, 4) MEM(EA, 4) 1 (RS)32:63 CR0 1 0b00 || 0b1 || XERSO Let the effective address (EA) be the sum (RA|0)+(RB). else The word in storage addressed by EA is loaded into u1 1 undefined 1-bit value RT32:63. RT0:31 are set to 0. if u1 then MEM(EA, 4) 1 (RS)32:63 This instruction creates a reservation for use by a Store u2 1 undefined 1-bit value Word Conditional instruction. An address computed CR0 1 0b00 || u2 || XERSO from the EA as described in Section 1.7.3.1 is associ- RESERVE 1 0 ated with the reservation, and replaces any address else previously associated with the reservation. CR0 1 0b00 || 0b0 || XERSO The value of EH provides a hint as to whether the pro- Let the effective address (EA) be the sum (RA|0)+(RB). gram will perform a subsequent store to the word in If a reservation exists and the storage location speci- storage addressed by EA before some other processor fied by the stwcx. is the same as the location specified attempts to modify it. by the Load And Reserve instruction that established 0 Other programs might attempt to modify the reservation, (RS)32:63 are stored into the word in the doubleword in storage addressed by storage addressed by EA and the reservation is EA regardless of the result of the corre- cleared. sponding Store Word/Doubleword Condi- tional instruction. If a reservation exists but the storage location specified 1 Other programs will not attempt to modify by the stwcx. is not the same as the location specified the word in storage addressed by EA until by the Load And Reserve instruction that established the program that has acquired the lock the reservation, the reservation is cleared, and it is performs a subsequent store releasing the undefined whether (RS)32:63 are stored into the word in lock. storage addressed by EA. EA must be a multiple of 4. If it is not, either the system If a reservation does not exist, the instruction com- alignment error handler is invoked or the results are pletes without altering storage. boundedly undefined. CR Field 0 is set as follows. n is a 1-bit value that indi- Special Registers Altered: cates whether the store was performed, except that if a None reservation exists but the storage location specified by the stwcx. is not the same as the location specified by Programming Note the Load And Reserve instruction that established the lwarx serves as both a basic and an extended reservation the value of n is undefined. mnemonic. The Assembler will recognize a lwarx mnemonic with four operands as the basic form, CR0LT GT EQ SO = 0b00 || n || XERSO and a lwarx mnemonic with three operands as the EA must be a multiple of 4. If it is not, either the system extended form. In the extended form the EH oper- alignment error handler is invoked or the results are and is omitted and assumed to be 0. boundedly undefined. Special Registers Altered: CR0 442 Power ISATM II Version 2.05 Chapter 3. Storage Control Instructions 443 Version 2.05 3.4.2.1 64-Bit Load and Reserve and Store Conditional Instructions [Category: 64-Bit] Store Doubleword Conditional Indexed Load Doubleword And Reserve Indexed X-form X-form stdcx. RS,RA,RB ldarx RT,RA,RB,EH 31 RS RA RB 214 1 31 RT RA RB 84 EH 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + (RB) EA 1 b +(RB) if RESERVE then RESERVE 1 1 if RESERVE_ADDR = real_addr(EA) then RESERVE_ADDR 1 real_addr(EA) MEM(EA, 8) 1 (RS) RT 1 MEM(EA, 8) CR0 1 0b00 || 0b1 || XERSO else Let the effective address (EA) be the sum (RA|0)+(RB). u1 1 undefined 1-bit value The doubleword in storage addressed by EA is loaded if u1 then into RT. MEM(EA, 8) 1 (RS) This instruction creates a reservation for use by a Store u2 1 undefined 1-bit value CR0 1 0b00 || u2 || XERSO Doubleword Conditional instruction. An address com- RESERVE 1 0 puted from the EA as described in Section 1.7.3.1 is else associated with the reservation, and replaces any CR0 1 0b00 || 0b0 || XERSO address previously associated with the reservation. Let the effective address (EA) be the sum (RA|0)+(RB). The value of EH provides a hint as to whether the pro- gram will perform a subsequent store to the double- If a reservation exists and the storage location speci- word in storage addressed by EA before some other fied by the stdcx. is the same as the location specified processor attempts to modify it. by the Load And Reserve instruction that established 0 Other programs might attempt to modify the reservation, (RS) is stored into the doubleword in the doubleword in storage addressed by storage addressed by EA and the reservation is EA regardless of the result of the corre- cleared. sponding Store Word/Doubleword Condi- If a reservation exists but the storage location specified tional instruction. by the stdcx. is not the same as the location specified 1 Other programs will not attempt to modify by the Load And Reserve instruction that established the doubleword in storage addressed by the reservation, the reservation is cleared, and it is EA until the program that has acquired the undefined whether (RS) is stored into the doubleword lock performs a subsequent store releas- in storage addressed by EA. ing the lock. If a reservation does not exist, the instruction com- EA must be a multiple of 8. If it is not, either the system pletes without altering storage. alignment error handler is invoked or the results are boundedly undefined.s CR Field 0 is set as follows. n is a 1-bit value that indi- cates whether the store was performed, except that if a Special Registers Altered: reservation exists but the storage location specified by None the stdcx. is not the same as the location specified by the Load And Reserve instruction that established the Programming Note reservation the value of n is undefined. ldarx serves as both a basic and an extended mnemonic. The Assembler will recognize a ldarx CR0LT GT EQ SO = 0b00 || n || XERSO mnemonic with four operands as the basic form, EA must be a multiple of 8. If it is not, either the system and a ldarx mnemonic with three operands as the alignment error handler is invoked or the results are extended form. In the extended form the EH oper- boundedly undefined. and is omitted and assumed to be 0. Special Registers Altered: CR0 444 Power ISATM II Version 2.05 Chapter 3. Storage Control Instructions 445 Version 2.05 3.4.3 Memory Barrier Instructions The Memory Barrier instructions can be used to control Extended mnemonics for Synchronize the order in which storage accesses are performed. Additional information about these instructions and Extended mnemonics are provided for the Synchronize about related aspects of storage management can be instruction so that it can be supported by assemblers found in Book III. that recognize only the msync mnemonic and so that it can be coded with the L value as part of the mne- monic rather than as a numeric operand. These are shown as examples with the instruction. See Appendix A. "Assembler Extended Mnemonics" on page 457. Synchronize X-form 1 Executing the sync instruction ensures that all instructions preceding the sync instruction have sync L completed before the sync instruction completes, and that no subsequent instructions are initiated 31 /// L /// /// 598 / until after the sync instruction completes. 0 6 9 11 16 21 31 1 The sync instruction is execution synchronizing (see Book III). However, address translation and The sync instruction creates a memory barrier (see reference and change recording (see Book III) Section 1.7.1). The set of storage accesses that is associated with subsequent instructions may be ordered by the memory barrier depends on the value of performed before the sync instruction completes. the L field. 1 The memory barrier provides the additional order- L=0 ("heavyweight sync") ing function such that if a given instruction that is the result of a store in set B is executed, all appli- The memory barrier provides an ordering function cable storage accesses in set A have been per- for the storage accesses associated with all formed with respect to the processor executing the instructions that are executed by the processor instruction to the extent required by the associated executing the sync instruction. The applicable memory coherence properties. The single excep- pairs are all pairs ai,bj in which bj is a data access, tion is that any storage access in set A that is except that if ai is the storage access caused by an caused by an icbi instruction executed by the pro- icbi instruction then bj may be performed with cessor executing the sync instruction (P1) may not respect to the processor executing the sync have been performed with respect to P1 (see the instruction before ai is performed with respect to description of the icbi instruction on page 428). that processor. The cumulative properties of the barrier apply to L=1 ("lightweight sync") the execution of the given instruction as they The memory barrier provides an ordering function would to a load that returned a value that was the for the storage accesses caused by Load, Store, result of a store in set B. and dcbz instructions that are executed by the 1 The sync instruction provides an ordering function processor executing the sync instruction and for for the operations caused by the stream variants of which the specified storage location is in storage the dcbt and dcbtst instructions (i.e. the providing that is Memory Coherence Required and is neither of hints). Write Through Required nor Caching Inhibited. The applicable pairs are all pairs ai,bj of such The value L=3 is reserved. accesses except those in which ai is an access caused by a Store or dcbz instruction and bj is an The sync instruction may complete before storage access caused by a Load instruction. accesses associated with instructions preceding the sync instruction have been performed. The sync L=2 instruction may complete before operations caused by The set of storage accesses that is ordered by the dcbt instructions with TH0=1 preceding the sync memory barrier is described in Section 5.9.2 of instruction have been performed. Book III-S, as are additional properties of the sync See Section 4.9.3 of Book III-E for additional informa- instruction with L=2. tion related to the sync instruction for the Embedded The ordering done by the memory barrier is cumulative. environment. If L=0 (or L=2), the sync instruction has the follow- Special Registers Altered: ing additional properties. None 446 Power ISATM II Version 2.05 Extended Mnemonics: Programming Note Extended mnemonics for Synchronize: The sync instruction can be used to ensure that all stores into a data structure, caused by Store Extended: Equivalent to: instructions executed in a "critical section" of a pro- sync sync 0 gram, will be performed with respect to another msync sync 0 processor before the store that releases the lock is lwsync sync 1 performed with respect to that processor; see ptesync sync 2 Section B.2, "Lock Acquisition and Release, and Except in the sync instruction description in this sec- Related Techniques" on page 461. tion, references to "sync" in Books I-III imply L=0 The memory barrier created by a sync instruction unless otherwise stated or obvious from context; the with L=0 or L=1 does not order implicit storage appropriate extended mnemonics are used when other accesses. The memory barrier created by a sync L values are intended. instruction with any L value does not order instruc- tion fetches. Programming Note (The memory barrier created by a sync instruction Section 1.8 contains a detailed description of how with L=0 ­ or L=2; see Book III ­ appears to to modify instructions such that a well-defined order instruction fetches for instructions preceding result is obtained. the sync instruction with respect to data accesses caused by instructions following the sync instruc- Programming Note tion. However, this ordering is a consequence of the first "additional property" of sync with L=0, not sync serves as both a basic and an extended mne- a property of the memory barrier.) monic. The Assembler will recognize a sync mne- monic with one operand as the basic form, and a In order to obtain the best performance across the sync mnemonic with no operand as the extended widest range of implementations, the programmer form. In the extended form the L operand is omit- should use the sync instruction with L=1, or the ted and assumed to be 0. eieio or mbar instruction, if any of these is sufficient for his needs; otherwise he should use sync with L=0. sync with L=2 should not be used by application programs. Programming Note The functions provided by sync with L=1 are a strict subset of those provided by sync with L=0. (The functions provided by sync with L=2 are a strict superset of those provided by sync with L=0; see Book III.) Chapter 3. Storage Control Instructions 447 Version 2.05 Enforce In-order Execution of I/O X-form Memory Barrier X-form eieio mbar MO [Category: Server] [Category: Embedded] 31 /// /// /// 854 / 31 MO /// /// 854 / 0 6 11 16 21 31 0 6 11 16 21 31 The eieio instruction creates a memory barrier (see When MO=0, the mbar instruction creates a cumula- Section 1.7.1, "Storage Access Ordering"), which pro- tive memory barrier (see Section 1.7.1, "Storage vides an ordering function for the storage accesses Access Ordering"), which provides an ordering function caused by Load, Store, dcbz, eciwx, and ecowx for the storage accesses executed by the processor instructions executed by the processor executing the executing the mbar instruction. eieio instruction. These storage accesses are divided When MO0, an implementation may support the mbar into the two sets listed below. The storage access instruction ordering a particular subset of storage caused by an eciwx instruction is ordered as a load, accesses. An implementation may also support multi- and the storage access caused by a dcbz or ecowx ple, non-zero values of MO that each specify a different instruction is ordered as a store. subset of storage accesses that are ordered by the 1. Loads and stores to storage that is both Caching mbar instruction. Which subsets of storage accesses Inhibited and Guarded, and stores to main storage that are ordered and which values of MO that specify caused by stores to storage that is Write Through these subsets is implementation-dependent. Required. The mbar instruction may complete before storage The applicable pairs are all pairs ai,bj of such accesses associated with instructions preceding the accesses. mbar instruction have been performed. The mbar instruction may complete before operations caused by dcbt instructions having TH0=1 preceding the mbar 2. Stores to storage that is Memory Coherence instruction have been performed. Required and is neither Write Through Required nor Caching Inhibited. Special Registers Altered: None The applicable pairs are all pairs ai,bj of such accesses. Programming Note The eieio and mbar instructions are intended for use in doing memory-mapped I/O). The operations caused by the stream variants of the Because loads, and separately stores, to storage dcbt and dcbtst instructions (i.e. the providing of hints) that is both Caching Inhibited and Guarded are per- are ordered by eieio as a third set of operations, and formed in program order (see Section 1.7.1, "Stor- the operations caused by tlbie and tlbsync age Access Ordering" on page 413), eieio or instructions (see Book III-S) are ordered by eieio as a mbar is needed for such storage only when fourth set of operations. loads must be ordered with respect to stores. Each of the four sets of storage accesses or operations For the eieio instruction, accesses in set 1, ai is ordered independently of the other three sets. The and bj need not be the same kind of access or be to ordering done by eieio's memory barrier for the second storage having the same storage control attributes. set is cumulative; the ordering done by eieio's memory For example, ai can be a load to Caching Inhibited, barrier for the other three sets is not cumulative. Guarded storage, and bj a store to Write Through The eieio instruction may complete before storage Required storage. accesses or operations associated with instructions If stronger ordering is desired than that provided by preceding the eieio instruction have been performed. eieio or mbar, the sync instruction must be used, with the appropriate value in the L field. Special Registers Altered: None 448 Power ISATM II Version 2.05 Programming Note 3.4.4 Wait Instruction The functions provided by eieio and mbar are a strict subset of those provided by sync with Wait X-form L=0. The functions provided by eieio for its second set are a strict subset of those provided by wait sync with L=1. [Category: Wait] Since eieio and mbarshare the same op- 31 /// /// /// 62 / code, software designed for both server and 0 6 11 16 21 31 embedded environments must assume that only the eieio functionality applies since the func- The wait instruction provides an ordering function for tions provided by eieio are a subset of those pro- the effects of all instructions executed by the processor vided by mbar. executing the wait instruction. Executing a wait instruc- tion ensures that all instructions have completed before the wait instruction completes, and that no subsequent instructions are initiated until an interrupt occurs. The wait instruction also causes any prefetched instructions to be discarded and instruction fetching is suspended until an interrupt occurs. Once the wait instruction has completed, the NIA will point to the next sequential instruction. Special Registers Altered: None Programming Note The wait instruction can be used in verification test cases to signal the end of a test case. The encod- ing for the instruction is the same in both Big- Endian and Little-Endian modes. Programming Note The wait instruction may be useful as the primary instruction of an "idle process" or the completion of processing for a cooperative thread. Note that wait updates the NIA so that an interrupt that awakens a wait instruction will return to the instruction after the wait. Chapter 3. Storage Control Instructions 449 Version 2.05 450 Power ISATM II Version 2.05 Chapter 4. Time Base 4.1 Time Base Overview . . . . . . . . . . . 451 4.3 Alternate Time Base [Category: Alter- 4.2 Time Base . . . . . . . . . . . . . . . . . . 451 nate Time Base] . . . . . . . . . . . . . . . . . 454 4.2.1 Time Base Instructions . . . . . . . 451 4.1 Time Base Overview updated and other frequencies, such as the CPU clock or bus clock. The Time Base update frequency is not The time base facilities include a Time Base and an required to be constant. What is required, so that sys- Alternate Time Base which is category: Alternate Time tem software can keep time of day and operate interval Base. The Alternate Time Base is analogous to the timers, is one of the following. Time Base except that it may count at a different fre- 1 The system provides an (implementation-depen- quency and is not writable. dent) interrupt to software whenever the update frequency of the Time Base bits 0:59 changes, and a means to determine what the current update fre- 4.2 Time Base quency is. The Time Base (TB) is a 64-bit register (see Figure 4) 1 The update frequency of the Time Base bits 0:59 is containing a 64-bit unsigned integer that is incremented under the control of the system software. periodically as described below. Programming Note TBU TBL If the operating system initializes the Time Base on 0 32 63 power-on to some reasonable value and the update frequency of the Time Base is constant, the Field Description Time Base can be used as a source of values that TBU Upper 32 bits of Time Base increase at a constant rate, such as for time stamps in trace entries. TBL Lower 32 bits of Time Base Even if the update frequency is not constant, val- Figure 4. Time Base ues read from the Time Base are monotonically increasing (except when the Time Base wraps from The Time Base bits 0:59 increment until their value 264-1 to 0). If a trace entry is recorded each time becomes 0xFFF_FFFF_FFFF_FFFF (259 - 1), at the the update frequency changes, the sequence of next increment their value becomes Time Base values can be post-processed to 0x000_0000_0000_0000. There is no interrupt or other become actual time values. indication when this occurs. Successive readings of the Time Base may return Time base bits 60:63 may increment at a variable rate. identical values. When the value of bit 59 changes, bits 60:63 are set to zero; if bits 60:63 increment to 0xF before the value of bit 59 changes, they remain at 0xF until the value of bit 59 changes. 4.2.1 Time Base Instructions 2 64 × 32 Move From Time Base XFX-form TTB = --------------------- = 5.90 x 1011 seconds 1GHz mftb RT,TBR which is approximately 18,700 years. [Category: Server.Phased-Out] The Power ISA AS does not specify a relationship between the frequency at which the Time Base is 31 RT tbr 371 / Chapter 4. Time Base 451 Version 2.05 0 6 11 21 31 This instruction behaves as if it were an mfspr instruc- tion; see the mfspr instruction description in Section 3.3.14 of Book I. Special Registers Altered: None Extended Mnemonics: Extended mnemonics for Move From Time Base: Extended: Equivalent to: mftb Rx,268 mftb Rx mfspr Rx,268 mftb Rx,269 mftbu Rx mfspr Rx,269 Programming Note New programs should use mfspr instead of mftb to access the Time Base. Programming Note mftb serves as both a basic and an extended mne- monic. The Assembler will recognize an mftb mnemonic with two operands as the basic form, and an mftb mnemonic with one operand as the extended form. In the extended form the TBR operand is omitted and assumed to be 268 (the value that corresponds to TB). Programming Note The mfspr instruction can be used to read the Time Base on all processors that comply with Version 2.01 of the architecture or with any subsequent version. It is believed that the mfspr instruction can be used to read the Time Base on most processors that comply with versions of the architecture that pre- cede Version 2.01. Processors for which mfspr cannot be used to read the Time Base include the following. - 601 - POWER3 (601 implements neither the Time Base nor mftb, but depends on software using mftb to read the Time Base, so that the attempt causes the Illegal Instruction error handler to be invoked and thereby permits the operating system to emulate the Time Base.) 452 Power ISATM II Version 2.05 Programming Note Since the update frequency of the Time Base is imple- mulld Rz,Rz,Rx # Rz = quotient * divisor mentation-dependent, the algorithm for converting the sub Rz,Ry,Rz # Rz = excess ticks current value in the Time Base to time of day is also lwz Rx,ns_adj implementation-dependent. slwi Rz,Rz,1 # Rz = 2 * excess ticks mulhwu Rz,Rz,Rx # mul by (ns/tick)/2 * 232 As an example, assume that the Time Base increments stw Rz,posix_ns# product[0:31] = excess ns at the constant rate of 512 MHz. (Note, however, that For the Embedded environment when the processor is programs should allow for the possibility that some in 32-bit mode, it is not possible to read the Time Base implementations may not increment the least-signifi- using a single instruction. Instead, two instructions cant 4 bits of the Time Base at a constant rate.) What is must be used, one of which reads TBL and the other of wanted is the pair of 32-bit values comprising a POSIX which reads TBU. Because of the possibility of a carry standard clock:1 the number of whole seconds that from TBL to TBU occurring between the two reads, a have passed since 00:00:00 January 1, 1970, UTC, sequence such as the following must be used to read and the remaining fraction of a second expressed as a the Time Base. number of nanoseconds. loop: Assume that: mfspr Rx,TBU # load from TBU mfspr Ry,TB # load from TB 1 The value 0 in the Time Base represents the start mfspr Rz,TBU # load from TBU time of the POSIX clock (if this is not true, a simple cmp cr0,0,Rx,Rz# check if `old'='new' 64-bit subtraction will make it so). bne loop #branch if carry occurred 1 The integer constant ticks_per_sec contains the value 512,000,000, which is the number of times Non-constant update frequency the Time Base is updated each second. In a system in which the update frequency of the Time 1 The integer constant ns_adj contains the value Base may change over time, it is not possible to con- vert an isolated Time Base value into time of day. 1,000,000,000 Instead, a Time Base value has meaning only with ------------------------------------- × 232 / 2 = 4194304000 - respect to the current update frequency and the time of 512,000,000 day that the update frequency was last changed. Each which is the number of nanoseconds per tick of the time the update frequency changes, either the system Time Base, multiplied by 232 for use in mulhwu software is notified of the change via an interrupt (see (see below), and then divided by 2 in order to fit, as Book III), or the change was instigated by the system an unsigned integer, into 32 bits. software itself. At each such change, the system soft- ware must compute the current time of day using the When the processor is in 64-bit mode, The POSIX old update frequency, compute a new value of clock can be computed with an instruction sequence ticks_per_sec for the new frequency, and save the time such as this: of day, Time Base value, and tick rate. Subsequent mfspr Ry,268 # Ry = Time Base calls to compute Time of Day use the current Time lwz Rx,ticks_per_sec Base Value and the saved value. divdu Rz,Ry,Rx # Rz = whole seconds stw Rz,posix_sec 1. Described in POSIX Draft Standard P1003.4/D12, Draft Standard for Information Technology -- Portable Operating System Interface (POSIX) -- Part 1: System Application Program Interface (API) - Amendment 1: Real-time Extension [C Language]. Institute of Electrical and Electronics Engi- neers, Inc., Feb. 1992. Chapter 4. Time Base 453 Version 2.05 4.3 Alternate Time Base [Cate- gory: Alternate Time Base] The Alternate Time Base (ATB) is a 64-bit register (see Figure 4) containing a 64-bit unsigned integer that is incremented periodically. The frequency at which the integer is updated is implementation-dependent. ATBU ATBL 0 32 63 Figure 5. Alternate Time Base The ATBL register is an aliased name for the ATB. The Alternate Time Base increments until its value becomes 0xFFFF_FFFF_FFFF_FFFF (264 - 1). At the next increment, its value becomes 0x0000_0000_0000_0000. There is no explicit indica- tion (such as an interrupt; see Book III) that this has occurred. The Alternate Time Base is accessible in both user and supervisor mode. The counter can be read by execut- ing a mfspr instruction specifying the ATB (or ATBL) register, but cannot be written. A second SPR register ATBU, is defined that accesses only the upper 32 bits of the counter. Thus the upper 32 bits of the counter may be read into a register by reading the ATBU regis- ter. The effect of entering a power-savings mode or of pro- cessor frequency changes on counting in the Alternate Time Base is implementation-dependent. 454 Power ISATM II Version 2.05 Chapter 5. External Control [Category: External Control] The External Control category of facilities and instruc- The ecowx instruction might be used to send the tions permits a program to communicate with a special- device the translated real address of a buffer containing purpose device. Two instructions are provided, both of graphics data, and the word transmitted from the Gen- which must be implemented if the facility is provided. eral Purpose Register might be control information that tells the adapter what operation to perform on the data 1 External Control In Word Indexed (eciwx), which in the buffer. The eciwx instruction might be used to does the following: load status information from the adapter. - Computes an effective address (EA) like most A device designed to be used with the External Control X-form instructions facility may also recognize events that indicate that the - Validates the EA as would be done for a load address translation being used by the processor has from that address changed. In this case the operating system need not - Translates the EA to a real address "pin" the area of storage identified by an eciwx or - Transmits the real address to the device ecowx instruction (i.e., need not protect it from being - Accepts a word of data from the device and paged out). places it into a General Purpose Register 1 External Control Out Word Indexed (ecowx), which does the following: - Computes an effective address (EA) like most X-form instructions - Validates the EA as would be done for a store to that address - Translates the EA to a real address - Transmits the real address and a word of data from a General Purpose Register to the device Permission to execute these instructions and identifica- tion of the target device are controlled by two fields, called the E bit and the RID field respectively. If attempt is made to execute either of these instructions when E=0 the system data storage error handler is invoked. The location of these fields is described in Book III. The storage access caused by eciwx and ecowx is performed as though the specified storage location is Caching Inhibited and Guarded, and is neither Write Through Required nor Memory Coherence Required. Interpretation of the real address transmitted by eciwx and ecowx and of the 32-bit value transmitted by ecowx is up to the target device, and is not specified by the Power ISA. See the System Architecture documen- tation for a given Power ISA system for details on how the External Control facility can be used with devices on that system. Example An example of a device designed to be used with the External Control facility might be a graphics adapter. Chapter 5. External Control [Category: External Control] 455 Version 2.05 5.1 External Access Instructions In the instruction descriptions the statements "this treated as a Store" have the same meanings as for the instruction is treated as a Load" and "this instruction is Cache Management instructions; see Section 3.3. External Control In Word Indexed X-form else b 1 (RA) EA 1 b + (RB) eciwx RT,RA,RB raddr 1 address translation of EA send store word request for raddr to device identified by RID send (RS)32:63 to device 31 RT RA RB 310 / 0 6 11 16 21 31 Let the effective address (EA) be the sum (RA|0)+(RB). A store word request for the real address correspond- if RA = 0 then b 1 0 ing to EA and the contents of RS32:63 are sent to the else b 1 (RA) device identified by RID, bypassing the cache. EA 1 b + (RB) raddr 1 address translation of EA The E bit must be 1. If it is not, the data storage error send load word request for raddr to handler is invoked. device identified by RID RT 1 320 || word from device EA must be a multiple of 4. If it is not, either the system alignment error handler is invoked or the results are Let the effective address (EA) be the sum (RA|0)+(RB). boundedly undefined. A load word request for the real address corresponding This instruction is treated as a Store, except that its to EA is sent to the device identified by RID, bypassing storage access is not performed in program order with the cache. The word returned by the device is placed respect to accesses to other Caching Inhibited and into RT32:63. RT0:31 are set to 0. Guarded storage locations unless software explicitly The E bit must be 1. If it is not, the data storage error imposes that order. handler is invoked. See Book III-S for additional information about this EA must be a multiple of 4. If it is not, either the system instruction. alignment error handler is invoked or the results are Special Registers Altered: boundedly undefined. None This instruction is treated as a Load. See Book III-S for additional information about this instruction. Special Registers Altered: None Programming Note The eieio or mbar instruction can be used to ensure that the storage accesses caused by eciwx and ecowx are performed in program order with respect to other Caching Inhibited and Guarded storage accesses. External Control Out Word Indexed X-form ecowx RS,RA,RB 31 RS RA RB 438 / 0 6 11 16 21 31 if RA = 0 then b 1 0 456 Power ISATM II Version 2.05 Appendix A. Assembler Extended Mnemonics In order to make assembler language programs simpler tions. This appendix defines extended mnemonics and to write and easier to understand, a set of extended symbols related to instructions defined in Book II. mnemonics and symbols is provided for certain instruc- Assemblers should provide the extended mnemonics and symbols listed here, and may provide others. A.1 Data Cache Block Flush A.2 Synchronize Mnemonics Mnemonics The L field in the Synchronize instruction controls the scope of the synchronization function performed by the The L field in the Data Cache Block Flush instruction instruction. Extended mnemonics are provided that controls the scope of the flush function performed by represent the L value in the mnemonic rather than the instruction. Extended mnemonics are provided that requiring it to be coded as a numeric operand. Two represent the L value in the mnemonic rather than extended mnemonics are provided for the L=0 value in requiring it to be coded as a numeric operand. order to support assemblers that do not recognize the Note: dcbf serves as both a basic and an extended sync mnemonic. mnemonic. The Assembler will recognize a dcbf mne- Note: sync serves as both a basic and an extended monic with three operands as the basic form, and a mnemonic. The Assembler will recognize a sync mne- dcbf mnemonic with two operands as the extended monic with one operand as the basic form, and a sync form. In the extended form the L operand is omitted mnemonic with no operand as the extended form. In and assumed to be 0. the extended form the L operand is omitted and assumed to be 0. dcbf RA,RB (equivalent to: dcbf RA,RB,0) dcbfl RA,RB (equivalent to: dcbfl RA,RB,1) sync (equivalent to: sync 0) msync (equivalent to: sync 0) lwsync (equivalent to: sync 1) ptesync (equivalent to: sync 2) Appendix A. Assembler Extended Mnemonics 457 Version 2.05 458 Power ISATM II Version 2.05 Appendix B. Programming Examples for Sharing Storage This appendix gives examples of how dependencies In these examples it is assumed that contention for the and the Synchronization instructions can be used to shared resource is low; the conditional branches are control storage access ordering when storage is shared optimized for this case by using "+" and "-" suffixes between programs. appropriately. Many of the examples use extended mnemonics (e.g., The examples deal with words; they can be used for bne, bne-, cmpw) that are defined in Appendix D of doublewords by changing all word-specific mnemonics Book I. to the corresponding doubleword-specific mnemonics (e.g., lwarx to ldarx, cmpw to cmpd). Many of the examples use the Load And Reserve and Store Conditional instructions, in a sequence that In this appendix it is assumed that all shared storage begins with a Load And Reserve instruction and ends locations are in storage that is Memory Coherence with a Store Conditional instruction (specifying the Required, and that the storage locations specified by same storage location as the Load Conditional) fol- Load And Reserve and Store Conditional instructions lowed by a Branch Conditional instruction that tests are in storage that is neither Write Through Required whether the Store Conditional instruction succeeded. nor Caching Inhibited. B.1 Atomic Update Primitives An atomic read/modify/write operation reads a storage location and writes its next value, which may be a func- This section gives examples of how the Load And tion of its current value, all as a single atomic operation. Reserve and Store Conditional instructions can be The examples shown provide the effect of an atomic used to emulate atomic read/modify/write operations. read/modify/write operation, but use several instruc- tions rather than a single atomic instruction. Fetch and No-op Fetch and Store The "Fetch and No-op" primitive atomically loads the The "Fetch and Store" primitive atomically loads and current value in a word in storage. replaces a word in storage. In this example it is assumed that the address of the In this example it is assumed that the address of the word to be loaded is in GPR 3 and the data loaded are word to be loaded and replaced is in GPR 3, the new returned in GPR 4. value is in GPR 4, and the old value is returned in GPR 5. loop: lwarx r4,0,r3 #load and reserve loop: stwcx. r4,0,r3 #store old value if lwarx r5,0,r3 #load and reserve # still reserved stwcx. r4,0,r3 #store new value if bne- loop #loop if lost reservation # still reserved bne- loop loop if lost reservation Note: 1. The stwcx., if it succeeds, stores to the target location the same value that was loaded by the preceding lwarx. While the store is redundant with respect to the value in the location, its success ensures that the value loaded by the lwarx is still the current value at the time the stwcx. is exe- cuted. Appendix B. Programming Examples for Sharing Storage 459 Version 2.05 Fetch and Add Compare and Swap The "Fetch and Add" primitive atomically increments a The "Compare and Swap" primitive atomically com- word in storage. pares a value in a register with a word in storage, if they are equal stores the value from a second register In this example it is assumed that the address of the into the word in storage, if they are unequal loads the word to be incremented is in GPR 3, the increment is in word from storage into the first register, and sets the GPR 4, and the old value is returned in GPR 5. EQ bit of CR Field 0 to indicate the result of the com- loop: parison. lwarx r5,0,r3 #load and reserve In this example it is assumed that the address of the add r0,r4,r5#increment word stwcx. r0,0,r3 #store new value if still res'ved word to be tested is in GPR 3, the comparand is in bne- loop #loop if lost reservation GPR 4 and the old value is returned there, and the new value is in GPR 5. Fetch and AND loop: The "Fetch and AND" primitive atomically ANDs a lwarx r6,0,r3 #load and reserve value into a word in storage. cmpw r4,r6 #1st 2 operands equal? bne- exit #skip if not In this example it is assumed that the address of the stwcx. r5,0,r3 #store new value if still res'ved word to be ANDed is in GPR 3, the value to AND into it bne- loop #loop if lost reservation is in GPR 4, and the old value is returned in GPR 5. exit: mr r4,r6 #return value from storage loop: Notes: lwarx r5,0,r3 #load and reserve and r0,r4,r5#AND word 1. The semantics given for "Compare and Swap" stwcx. r0,0,r3 #store new value if still res'ved above are based on those of the IBM System/370 bne- loop #loop if lost reservation Compare and Swap instruction. Other architec- Note: tures may define a Compare and Swap instruction differently. 1. The sequence given above can be changed to per- form another Boolean operation atomically on a 2. "Compare and Swap" is shown primarily for peda- word in storage, simply by changing the and gogical reasons. It is useful on machines that lack instruction to the desired Boolean instruction (or, the better synchronization facilities provided by xor, etc.). lwarx and stwcx.. A major weakness of a Sys- tem/370-style Compare and Swap instruction is that, although the instruction itself is atomic, it Test and Set checks only that the old and current values of the This version of the "Test and Set" primitive atomically word being tested are equal, with the result that loads a word from storage, sets the word in storage to a programs that use such a Compare and Swap to nonzero value if the value loaded is zero, and sets the control a shared resource can err if the word has EQ bit of CR Field 0 to indicate whether the value been modified and the old value subsequently loaded is zero. restored. The sequence shown above has the same weakness. In this example it is assumed that the address of the word to be tested is in GPR 3, the new value (nonzero) 3. In some applications the second bne- instruction is in GPR 4, and the old value is returned in GPR 5. and/or the mr instruction can be omitted. The bne- is needed only if the application requires that loop: if the EQ bit of CR Field 0 on exit indicates "not lwarx r5,0,r3 #load and reserve equal" then (r4) and (r6) are in fact not equal. The cmpwi r5,0 #done if word not equal to 0 mr is needed only if the application requires that if bne- exit the comparands are not equal then the word from stwcx. r4,0,r3 #try to store non-0 storage is loaded into the register with which it was bne- loop #loop if lost reservation compared (rather than into a third register). If exit: ... either or both of these instructions is omitted, the resulting Compare and Swap does not obey Sys- tem/370 semantics. 460 Power ISATM II Version 2.05 B.2 Lock Acquisition and Release, and Related Techniques This section gives examples of how dependencies and ment locks, import and export barriers, and similar con- the Synchronization instructions can be used to imple- structs. B.2.1 Lock Acquisition and Import quent isync create an import barrier that prevents the load from "data1" from being performed until the branch Barriers has been resolved not to be taken. An "import barrier" is an instruction or sequence of If the shared data structure is in storage that is neither instructions that prevents storage accesses caused by Write Through Required nor Caching Inhibited, an instructions following the barrier from being performed lwsync instruction can be used instead of the isync before storage accesses that acquire a lock have been instruction. If lwsync is used, the load from "data1" performed. An import barrier can be used to ensure may be performed before the stwcx.. But if the stwcx. that a shared data structure protected by a lock is not fails, the second branch is taken and the lwarx is re- accessed until the lock has been acquired. A sync executed. If the stwcx. succeeds, the value returned instruction can be used as an import barrier, but the by the load from "data1" is valid even if the load is per- approaches shown below will generally yield better per- formed before the stwcx., because the lwsync formance because they order only the relevant storage ensures that the load is performed after the instance of accesses. the lwarx that created the reservation used by the suc- cessful stwcx.. B.2.1.1 Acquire Lock and Import Shared Storage B.2.1.2 Obtain Pointer and Import If lwarx and stwcx. instructions are used to obtain the Shared Storage lock, an import barrier can be constructed by placing an If lwarx and stwcx. instructions are used to obtain a isync instruction immediately following the loop con- pointer into a shared data structure, an import barrier is taining the lwarx and stwcx.. The following example not needed if all the accesses to the shared data struc- uses the "Compare and Swap" primitive to acquire the ture depend on the value obtained for the pointer. The lock. following example uses the "Fetch and Add" primitive to obtain and increment the pointer. In this example it is assumed that the address of the lock is in GPR 3, the value indicating that the lock is In this example it is assumed that the address of the free is in GPR 4, the value to which the lock should be pointer is in GPR 3, the value to be added to the pointer set is in GPR 5, the old value of the lock is returned in is in GPR 4, and the old value of the pointer is returned GPR 6, and the address of the shared data structure is in GPR 5. in GPR 9. loop: loop: lwarx r5,0,r3 #load pointer and reserve lwarx r6,0,r3,1 #load lock and reserve add r0,r4,r5#increment the pointer cmpw r4,r6 #skip ahead if stwcx. r0,0,r3 #try to store new value bne- wait # lock not free bne- loop #loop if lost reservation stwcx. r5,0,r3 #try to set lock lwz r7,data1(r5) #load shared data bne- loop #loop if lost reservation isync #import barrier The load from "data1" cannot be performed until the lwz r7,data1(r9)#load shared data pointer value has been loaded into GPR 5 by the . lwarx. The load from "data1" may be performed before . the stwcx.. But if the stwcx. fails, the branch is taken wait... #wait for lock to free and the value returned by the load from "data1" is dis- carded. If the stwcx. succeeds, the value returned by The hint provided with lwarx indicates that after the the load from "data1" is valid even if the load is per- program acquires the lock variable (i.e. stwcx. is suc- formed before the stwcx., because the load uses the cessful), it will release it (i.e. store to it) prior to another pointer value returned by the instance of the lwarx that program attempting to modify it. created the reservation used by the successful stwcx.. The second bne- does not complete until CR0 has An isync instruction could be placed between the bne- been set by the stwcx.. The stwcx. does not set CR0 and the subsequent lwz, but no isync is needed if all until it has completed (successfully or unsuccessfully). accesses to the shared data structure depend on the The lock is acquired when the stwcx. completes suc- value returned by the lwarx. cessfully. Together, the second bne- and the subse- Appendix B. Programming Examples for Sharing Storage 461 Version 2.05 B.2.2 Lock Release and Export The lwsync ensures that the store that releases the lock will not be performed with respect to any other pro- Barriers cessor until all stores caused by instructions preceding the lwsync have been performed with respect to that An "export barrier" is an instruction or sequence of processor. instructions that prevents the store that releases a lock from being performed before stores caused by instruc- tions preceding the barrier have been performed. An export barrier can be used to ensure that all stores to a shared data structure protected by a lock will be per- B.2.3 Safe Fetch formed with respect to any other processor before the If a load must be performed before a subsequent store store that releases the lock is performed with respect to (e.g., the store that releases a lock protecting a shared that processor. data structure), a technique similar to the following can be used. B.2.2.1 Export Shared Storage and In this example it is assumed that the address of the Release Lock storage operand to be loaded is in GPR 3, the contents A sync instruction can be used as an export barrier of the storage operand are returned in GPR 4, and the independent of the storage control attributes (e.g., address of the storage operand to be stored is in GPR presence or absence of the Caching Inhibited attribute) 5. of the storage containing the shared data structure. lwz r4,0(r3)#load shared data Because the lock must be in storage that is neither cmpw r4,r4 #set CR0 to "equal" Write Through Required nor Caching Inhibited, if the bne- $-8 #branch never taken shared data structure is in storage that is Write stw r7,0(r5)#store other shared data Through Required or Caching Inhibited a sync instruc- tion must be used as the export barrier. An alternative is to use a technique similar to that described in Section B.2.1.2, by causing the stw to In this example it is assumed that the shared data depend on the value returned by the lwz and omitting structure is in storage that is Caching Inhibited, the the cmpw and bne-. The dependency could be created address of the lock is in GPR 3, the value indicating by ANDing the value returned by the lwz with zero and that the lock is free is in GPR 4, and the address of the then adding the result to the value to be stored by the shared data structure is in GPR 9. stw. If both storage operands are in storage that is nei- ther Write Through Required nor Caching Inhibited, stw r7,data1(r9)#store shared data (last) another alternative is to replace the cmpw and bne- sync #export barrier with an lwsync instruction. stw r4,lock(r3)#release lock The sync ensures that the store that releases the lock will not be performed with respect to any other proces- sor until all stores caused by instructions preceding the sync have been performed with respect to that proces- sor. B.2.2.2 Export Shared Storage and Release Lock using lwsync If the shared data structure is in storage that is neither Write Through Required nor Caching Inhibited, an lwsync instruction can be used as the export barrier. Using lwsync rather than sync will yield better perfor- mance in most systems. In this example it is assumed that the shared data structure is in storage that is neither Write Through Required nor Caching Inhibited, the address of the lock is in GPR 3, the value indicating that the lock is free is in GPR 4, and the address of the shared data structure is in GPR 9. stw r7,data1(r9)#store shared data (last) lwsync #export barrier stw r4,lock(r3)#release lock 462 Power ISATM II Version 2.05 B.3 List Insertion B.4 Notes This section shows how the lwarx and stwcx. instruc- 1. To increase the likelihood that forward progress is tions can be used to implement simple insertion into a made, it is important that looping on lwarx/stwcx. singly linked list. (Complicated list insertion, in which pairs be minimized. For example, in the "Test and multiple values must be changed atomically, or in Set" sequence shown in Section B.1, this is which the correct order of insertion depends on the achieved by testing the old value before attempting contents of the elements, cannot be implemented in the the store; were the order reversed, more stwcx. manner shown below and requires a more complicated instructions might be executed, and reservations strategy such as using locks.) might more often be lost between the lwarx and the stwcx. The "next element pointer" from the list element after which the new element is to be inserted, here called the 2. The manner in which lwarx and stwcx. are com- "parent element", is stored into the new element, so municated to other processors and mechanisms, that the new element points to the next element in the and between levels of the storage hierarchy within list; this store is performed unconditionally. Then the a given processor, is implementation-dependent. address of the new element is conditionally stored into In some implementations performance may be the parent element, thereby adding the new element to improved by minimizing looping on a lwarx instruc- the list. tion that fails to return a desired value. For exam- ple, in the "Test and Set" sequence shown in In this example it is assumed that the address of the Section B.1, if the programmer wishes to stay in parent element is in GPR 3, the address of the new ele- the loop until the word loaded is zero, he could ment is in GPR 4, and the next element pointer is at off- change the "bne- exit" to "bne- loop". However, in set 0 from the start of the element. It is also assumed some implementations better performance may be that the next element pointer of each list element is in a obtained by using an ordinary Load instruction to reservation granule separate from that of the next ele- do the initial checking of the value, as follows. ment pointer of all other list elements. loop: loop: lwz r5,0(r3)#load the word lwarx r2,0,r3 #get next pointer cmpwi r5,0 #loop back if word stw r2,0(r4)#store in new element bne- loop # not equal to 0 lwsync or sync #order stw before stwcx lwarx r5,0,r3 #try again, reserving stwcx. r4,0,r3 #add new element to list cmpwi r5,0 # (likely to succeed) bne- loop #loop if stwcx. failed bne- loop stwcx.r4,0,r3 #try to store non-0 In the preceding example, if two list elements have next bne- loop #loop if lost reserv'n element pointers in the same reservation granule then, in a multiprocessor, "livelock" can occur. (Livelock is a 3. In a multiprocessor, livelock is possible if there is a state in which processors interact in a way such that no Store instruction (or any other instruction that can processor makes forward progress.) clear another processor's reservation; see Section 1.7.3.1) between the lwarx and the stwcx. of a If it is not possible to allocate list elements such that lwarx/stwcx. loop and any byte of the storage each element's next element pointer is in a different location specified by the Store is in the reservation reservation granule, then livelock can be avoided by granule. For example, the first code sequence using the following, more complicated, sequence. shown in Section B.3 can cause livelock if two list elements have next element pointers in the same lwz r2,0(r3)#get next pointer reservation granule. loop1: mr r5,r2 #keep a copy stw r2,0(r4)#store in new element sync #order stw before stwcx. and before lwarx loop2: lwarx r2,0,r3 #get it again cmpw r2,r5 #loop if changed (someone bne- loop1 # else progressed) stwcx. r4,0,r3 #add new element to list bne- loop2 #loop if failed In the preceding example, livelock is avoided by the fact that each processor re-executes the stw only if some other processor has made forward progress. Appendix B. Programming Examples for Sharing Storage 463 Version 2.05 464 Power ISATM II Version 2.05 Book III-S: Power ISA Operating Environment Architecture - Server Environment Book III-S: Power ISA Operating Environment Architecture - Server Envi- 465 Version 2.05 466 Power ISATM III-S Version 2.05 Chapter 1. Introduction 1.1 Overview. . . . . . . . . . . . . . . . . . . . 467 1.4 Exceptions. . . . . . . . . . . . . . . . . . . 469 1.2 Document Conventions . . . . . . . . 467 1.5 Synchronization. . . . . . . . . . . . . . . 469 1.2.1 Definitions and Notation. . . . . . . 467 1.5.1 Context Synchronization . . . . . . 469 1.2.2 Reserved Fields. . . . . . . . . . . . . 468 1.5.2 Execution Synchronization. . . . . 469 1.3 General Systems Overview . . . . . 468 1 For "system service program" substitute "System 1.1 Overview Call interrupt". Chapter 1 of Book I describes computation modes, 1 For "system trap handler" substitute "Trap type document conventions, a general systems overview, Program interrupt". instruction formats, and storage addressing. This chap- ter augments that description as necessary for the Power ISA Operating Environment Architecture. 1.2.1 Definitions and Notation The definitions and notation given in Book I are aug- mented by the following. 1.2 Document Conventions 1 real page The notation and terminology used in Book I apply to A unit of real storage that is aligned at a boundary this Book also, with the following substitutions. that is a multiple of its size. The real page size is 1 For "system alignment error handler" substitute 4KB. "Alignment interrupt". 1 context of a program 1 For "system data storage error handler" substitute The processor state (e.g., privilege and relocation) "Data Storage interrupt", "Hypervisor Data Storage in which the program executes. The context is con- interrupt", "Data Segment interrupt", or "Hypervisor trolled by the contents of certain System Registers, Data Segment interrupt," as appropriate. such as the MSR and SDR1, of certain lookaside buffers, such as the SLB and TLB, and of the Page 1 For "system error handler" substitute "interrupt". Table. 1 For "system floating-point enabled exception error 1 exception handler" substitute "Floating-Point Enabled Excep- An error, unusual condition, or external signal, that tion type Program interrupt". may set a status bit and may or may not cause an 1 For "system illegal instruction error handler" substi- interrupt, depending upon whether the correspond- tute "Illegal Instruction type Program interrupt". (If ing interrupt is enabled. Category: HEA is supported, see the Programming 1 interrupt Note in Section 6.5.9.) The act of changing the machine state in response 1 For "system instruction storage error handler" sub- to an exception, as described in Chapter stitute "Instruction Storage interrupt", "Hypervisor 6. "Interrupts" on page 547. Instruction Storage interrupt", "Instruction Segment 1 trap interrupt interrupt", or "Hypervisor Instruction Segment An interrupt that results from execution of a Trap interrupt", as appropriate. instruction. 1 For "system privileged instruction error handler" 1 Additional exceptions to the rule that the processor substitute "Privileged Instruction type Program obeys the sequential execution model, beyond interrupt". those described in the section entitled "Instruction Fetching" in Book I, are the following. Chapter 1. Introduction 467 Version 2.05 - A System Reset or Machine Check interrupt 1 /, //, ///, ... denotes a field that is reserved in an may occur. The determination of whether an instruction, in a register, or in an architected stor- instruction is required by the sequential exe- age table. cution model is not affected by the potential 1 ?, ??, ???, ... denotes a field that is implementa- occurrence of a System Reset or Machine tion-dependent in an instruction, in a register, or in Check interrupt. (The determination is an architected storage table. affected by the potential occurrence of any other kind of interrupt.) - A context-altering instruction is executed 1.2.2 Reserved Fields (Chapter 10. "Synchronization Requirements Book I's description of the handling of reserved bits in for Context Alterations" on page 585). The System Registers, and of reserved values of defined context alteration need not take effect until the fields of System Registers, applies also to the SLB. required subsequent synchronizing operation Book I's description of the handling of reserved values has occurred. of defined fields of System Registers applies also to - A Reference and Change bit is updated by the architected storage tables (e.g., the Page Table). processor. The update need not be performed Some fields of certain architected storage tables may with respect to that processor until the be written to automatically by the processor, e.g., Ref- required subsequent synchronizing operation erence and Change bits in the Page Table. When the has occurred. processor writes to such a table, the following rules are - A Branch instruction is executed and the obeyed. branch is taken. The update of the Come- 1 Unless otherwise stated, no defined field other From Address Register (see Section 8.1.1 than the one(s) the processor is specifically updat- of Book III-S) need not occur until a subse- ing are modified. quent context synchronizing operation has occurred. 1 Contents of reserved fields are either preserved by the processor or written as zero. 1 "must" If hypervisor software violates a rule that is stated Programming Note using the word "must" (e.g., "this field must be set to 0"), and the rule pertains to the contents of a Software should set reserved fields in the SLB and hypervisor resource, to executing an instruction in architected storage tables to zero, because that can be executed only in hypervisor state, or to these fields may be assigned a meaning in some accessing storage in real addressing mode, the future version of the architecture. results are undefined, and may include altering resources belonging to other partitions, causing the system to "hang", etc. 1.3 General Systems Overview 1 hardware The processor or processor unit contains the sequenc- Any combination of hard-wired implementation, ing and processing controls for instruction fetch, emulation assist, or interrupt for software assis- instruction execution, and interrupt action. Most imple- tance. In the last case, the interrupt may be to an mentations also contain data and instruction caches. architected location or to an implementation- Instructions that the processing unit can execute fall dependent location. Any use of emulation assists into the following classes: or interrupts to implement the architecture is imple- mentation-dependent. 1 instructions executed in the Branch Processor 1 instructions executed in the Fixed-Point Processor 1 hypervisor privileged 1 instructions executed in the Floating-Point Proces- A term used to describe an instruction or facility sor that is available only when the procesor is in 1 instructions executed in the Vector Processor hypervisor state. Almost all instructions executed in the Branch Proces- 1 privileged state and supervisor mode sor, Fixed-Point Processor, Floating-Point Processor, Used interchangeably to refer to a processor state and Vector Processor are nonprivileged and are in which privileged facilities are available. described in Book I. Book II may describe additional 1 problem state and user mode nonprivileged instructions (e.g., Book II describes some Used interchangeably to refer to a processor state nonprivileged instructions for cache management). in which privileged facilities are not available. Instructions related to the privileged state of the pro- cessor, control of processor resources, control of the storage hierarchy, and all other privileged instructions are described here or are implementation-dependent. 468 Power ISATM III-S Version 2.05 1.4 Exceptions 5. The operation ensures that the instructions that fol- low the operation will be fetched and executed in The following augments the exceptions defined in Book the context established by the operation. (This I that can be caused directly by the execution of an requirement dictates that any prefetched instruc- instruction: tions be discarded and that any effects and side effects of executing them out-of-order also be dis- 1 the execution of a floating-point instruction when carded, except as described in Section 5.5, "Per- MSRFP=0 (Floating-Point Unavailable interrupt) forming Operations Out-of-Order".) 1 an attempt to modify a hypervisor resource when the processor is in privileged but non-hypervisor Programming Note state (see Chapter 2), or an attempt to execute a A context synchronizing operation is necessarily hypervisor-only instruction (e.g., tlbie) when the execution synchronizing; see Section 1.5.2. processor is in privileged but non-hypervisor state Unlike the Synchronize instruction, a context syn- 1 the execution of a traced instruction (Trace inter- chronizing operation does not affect the order in rupt) which storage accesses are performed. 1 the execution of a Vector instruction when the vec- Item 2 permits a choice only for isync (and sync tor processor is unavailable (Vector Unavailable and ptesync; see Section 1.5.2) because all other interrupt) execution synchronizing operations also alter con- text. 1.5 Synchronization The synchronization described in this section refers to 1.5.2 Execution Synchronization the state of the processor that is performing the syn- An instruction is execution synchronizing if it satisfies chronization. items 2 and 3 of the definition of context synchroniza- tion (see Section 1.5.1). sync and ptesync are treated 1.5.1 Context Synchronization like isync with respect to item 2 (i.e., the conditions described in item 2 apply to the completion of sync and An instruction or event is context synchronizing if it sat- ptesync). Examples of execution synchronizing isfies the requirements listed below. Such instructions instructions include sync, ptesync, and mtmsrd. and events are collectively called context synchronizing An instruction is execution synchronizing if it satisfies operations. The context synchronizing operations are items 2 and 3 of the definition of context synchroniza- the isync instruction, the System Linkage instructions, tion (see Section 1.5.1). sync and ptesync are treated the mtmsr[d] instructions with L=0, and most interrupts like isync with respect to item 2. The execution syn- (see Section 6.4). chronizing instructions are sync, ptesync, the 1. The operation causes instruction dispatching (the mtmsr[d] instructions with L=1, and all context issuance of instructions by the instruction fetching synchronizing instructions. mechanism to any instruction execution mecha- nism) to be halted. Programming Note 2. The operation is not initiated or, in the case of All context synchronizing instructions are execution isync and wait [Category: Wait], does not com- synchronizing. plete, until all instructions that precede the opera- Unlike a context synchronizing operation, an exe- tion have completed to a point at which they have cution synchronizing instruction does not ensure reported all exceptions they will cause. that the instructions following that instruction will 3. The operation ensures that the instructions that execute in the context established by that instruc- precede the operation will complete execution in tion. This new context becomes effective some- the context (privilege, relocation, storage protec- time after the execution synchronizing instruction tion, etc.) in which they were initiated, except that completes and before or at a subsequent context the operation has no effect on the context in which synchronizing operation. the associated Reference and Change bit updates are performed. 4. If the operation directly causes an interrupt (e.g., sc directly causes a System Call interrupt) or is an interrupt, the operation is not initiated until no exception exists having higher priority than the exception associated with the interrupt (see Sec- tion 6.8). Chapter 1. Introduction 469 Version 2.05 470 Power ISATM III-S Version 2.05 Chapter 2. Logical Partitioning (LPAR) 2.1 Overview. . . . . . . . . . . . . . . . . . . . 471 2.6 Processor Compatibility Register 2.2 Logical Partitioning Control Register (PCR). . . . . . . . . . . . . . . . . . . . . . . . . . 474 (LPCR) . . . . . . . . . . . . . . . . . . . . . . . . 471 2.7 Other Hypervisor Resources . . . . . 475 2.3 Real Mode Offset Register (RMOR) . . 2.8 Sharing Hypervisor Resources . . . 476 473 2.9 Hypervisor Interrupt Little-Endian 2.4 Hypervisor Real Mode Offset Register (HILE) Bit . . . . . . . . . . . . . . . . . . . . . . . 476 (HRMOR) . . . . . . . . . . . . . . . . . . . . . . 474 2.5 Logical Partition Identification Register (LPIDR) . . . . . . 474 2.1 Overview The number of partitions supported is implementation- dependent. The Logical Partitioning (LPAR) facility permits proces- A processor is assigned to one partition at any given sors and portions of real storage to be assigned to logi- time. A processor can be assigned to any given parti- cal collections called partitions, such that a program tion without consideration of the physical configuration executing on a processor in one partition cannot inter- of the system (e.g., shared registers, caches, organiza- fere with any program executing on a processor in a tion of the storage hierarchy), except that processors different partition. This isolation can be provided for that share certain hypervisor resources may need to be both problem state and privileged state programs, by assigned to the same partition; see Section 2.7. The using a layer of trusted software, called a hypervisor registers and facilities used to control Logical Partition- program (or simply a "hypervisor"), and the resources ing are listed below and described in the following sub- provided by this facility to manage system resources. sections. (A hypervisor is a program that runs in hypervisor state; see below.) Except in the following subsections, references to the "operating system" in this document include the hyper- visor unless otherwise stated or obvious from context. 2.2 Logical Partitioning Control Register (LPCR) The layout of the Logical Partitioning Control Register (LPCR) is shown in Figure 1 below. VRMASD HDICE RMLS DPFD PECE LPES /// /// /// /// /// /// MER ILE TC VC 0 3 9 12 17 34 38 39 49 52 5354 55 60 62 63 Figure 1. Logical Partitioning Control Register The contents of the LPCR control a number of aspects cal partition. Below are shown the bit definitions for the of the operation of the processor with respect to a logi- LPCR. Chapter 2. Logical Partitioning (LPAR) 471 Version 2.05 Bit Description allowed values of the L and LP fields are the same as for the corresponding fields in the 0:2 Virtualization Control (VC) segment descriptor. (See Section 5.7.7.) If Controls the virtualization of partition memory. VPM0=0 or address translation is enabled, the This field contains two subfields, VPM and setting of the VRMASD has no effect. ISL. 0:1 Virtualized Partition Memory (VPM) Bit Description This field controls whether VPM mode is 0 Virtual Page Size Selector Bit 0 (L) enabled as specified below. (See Section 1:2 Reserved 5.7.3.4 and Section 5.7.2, "Virtualized Par- 3:4 Virtual Page Size Selector Bits 1:2 (LP) tition Memory (VPM) Mode" for additional information on VPM mode.) Programming Note 17:33 Reserved Bit Description 34:37 Real Mode Limit Selector (RMLS) 0 This bit controls whether VPM mode is enabled when address translation is The RMLS field specifies the largest effective disabled address that can be used by partition software 0 - VPM mode disabled when address translation is disabled. The 1 - VPM mode enabled valid RMLS values are implementation- dependent, and each value corresponds to a 1 This bit controls whether VPM mode is maximum effective address of 2m, where m enabled when address translation is has a minimum value of 12 and a maximum enabled value equal to the number of bits in the real 0 - VPM mode disabled address size supported by the implementa- 1 - VPM mode enabled tion. 38 Interrupt Little-Endian (ILE) 2 Ignore SLB Large Page Specification (ISL) The contents of the ILE bit are copied into MSRLE by interrupts that set MSRHV to 0 (see Controls whether ISL mode is enabled as Section 6.5), to establish the Endian mode for specified below. the interrupt handler. 0 - ISL mode disabled 39:48 Reserved 1 - ISL mode enabled 49:51 Power-saving mode Exit Cause Enable When ISL mode is enabled and address (PECE) translation is enabled and the processor is 49 If PECE0 = 1 when a power-saving mode not in hypervisor state, address translation instruction is executed, External exceptions is performed as if the contents of SLBL||LP are enabled to cause exit from power-saving were 0b000. When address translation is mode; otherwise External exceptions are dis- disabled, the setting of the ISL bit has no abled from causing exit from power-saving effect. ISL mode has no effect on SLB, TLB, mode. and ERAT entry invalidations caused by slbie, slbia, tlbia, tlbie, and slbie. 50 If PECE1 = 1 when a power-saving mode instruction is executed, Decrementer excep- 3:8 Reserved tions are enabled to cause exit from power- 9:11 Default Prefetch Depth (DPFD) saving mode; otherwise Decrementer excep- tions are disabled from causing exit from The DPFD field is used as the default prefetch power-saving mode. (In sleep and rvwinkle depth for data stream prefetching when power-saving levels, Decrementer exceptions DSCRDPFD=0; see page 430. do not occur if the state of the Decrementer is 12:16 Virtual Real Mode Area Segment Descrip- not maintained and updated as if the proces- tor (VRMASD) sor was not in power-saving mode.) When address translation is disabled and 51 If PECE2=1 when a power-saving mode VPM0=1, the contents of this field specify the instruction is executed, Machine Check, L and LP fields of the segment descriptor that Hypervisor Maintenance, and certain imple- apply for storage references to the virtualized mentation-specific exceptions are enabled to real mode area (VRMA). See Section 5.7.3.4 cause exit from power-saving mode; other- for additional information. The definitions and wise Machine Check, Hypervisor Mainte- 472 Power ISATM III-S Version 2.05 nance, and the same implementation-specific Three of the four LPES values are sup- exceptions are disabled from causing exit ported. The 0b10 value is reserved. from power-saving mode. 60 LPES0 It is implementation-specific whether the exceptions Controls whether External interrupts set enabled by the PECE field cause exit from sleep and MSRHV to 1 and MSRRI to 0, or leaves them rvwinkle power-saving levels. See section 6.6.1 and unchanged. section 6.6.2 for additional information about exit from power-saving mode. 61 LPES1 52 Mediated External Exception Request Controls how storage is accessed when (MER) address translation is disabled, and whether a subset of interrupts set MSRHV to 1. 0 A Mediated External exception is not requested. Programming Note 1 A Mediated External exception is requested. LPES1=0 provides an environment in which only the hypervisor can run with The exception effects of this bit are said to be address translation disabled and in which consistent with the contents of this bit if one of all interrupts invoke the hypervisor. This the following statements is true. value (along with MSRHV=1) can also be - LPCRMER = 1 and a Mediated External used in a system that is not partitioned, to exception exists. permit the operating system to access all - LPCRMER = 0 and a Mediated External system resources. exception does not exist. A context synchronizing instruction or event 62 Reserved that is executed or occurs when LPCRMER = 0 63 Hypervisor Decrementer Interrupt Condi- ensures that the exception effects of tionally Enable (HDICE) LPCRMER are consistent with the contents of LPCRMER. Otherwise, when an instruction 0 Hypervisor Decrementer interrupts are changes the contents of LPCRMER, the disabled. exception effects of LPCRMER become con- 1 Hypervisor Decrementer interrupts are sistent with the new contents of LPCRMER enabled if permitted by MSREE, MSRHV, reasonably soon after the change. and MSRPR; see Section 6.5.12 on page 565. Programming Note See Section 5.7.3 on page 508 (including subsections) LPCRMER provides a means for the and Section 5.7.9 on page 524 for a description of how hypervisor to direct an external exception storage accesses are affected by the setting of LPES1, to a partition independent of the partition's and RMLS. See Section 6.5 on page 555 for a descrip- MSREE setting. (When MSREE=0, it is tion of how the setting of LPES0:1 affects the process- inappropriate for the hypervisor to deliver ing of interrupts. the exception.) Using LPCRMER, the par- tition can be interrupted upon enabling external interrupts. Without using 2.3 Real Mode Offset Register LPCRMER, the hypervisor must check the state of MSREE whenever it gets control, (RMOR) which will result in less timely delivery of The layout of the Real Mode Offset Register (RMOR) is the exception to the partition. shown in Figure 2 below. 53 Reserved // RMO 54 Translation Control (TC) 0 4 63 0 The secondary Page Table search is Bits Name Description enabled. 4:63 RMO Real Mode Offset 1 The secondary Page Table search is dis- abled. Figure 2. Real Mode Offset Register 55:59 Reserved All other fields are reserved. 60:61 Logical Partitioning Environment Selector The supported RMO values are the non-negative multi- (LPES) ples of 2s, where 2s is the smallest implementation- Chapter 2. Logical Partitioning (LPAR) 473 Version 2.05 dependent limit value representable by the contents of Programming Note the Real Mode Limit Selector field of the LPCR. On some implementations, software must prevent The contents of the RMOR affect how some storage the execution of a tlbie instruction on any proces- accesses are performed as described in Section 5.7.3 sor for which the contents of the LPIDR is the same on page 508 and Section 5.7.4 on page 512. as on the processor on which the LPIDR is being modified or is the same as the new value being written to the LPIDR. This restriction can be met 2.4 Hypervisor Real Mode Offset with less effort if one partition identity is used only Register (HRMOR) on processors on which no tlbie instruction is ever executed. This partition can be thought of as the The layout of the Hypervisor Real Mode Offset Register transfer partition used exclusively to move a pro- (HRMOR) is shown in Figure 3 below. cessor from one partition to another. // HRMO 0 4 63 2.6 Processor Compatibility Bits Name Description Register (PCR) 4:63 HRMO Real Mode Offset The layout of the Processor Compatibility Register Figure 3. Hypervisor Real Mode Offset Register (PCR) is shown in Figure 5 below. All other fields are reserved. PCR The supported HRMO values are the non-negative 0 1 62 63 multiples of 2r, where r is an implementation-dependent value and 12 r 26. Figure 5. Processor Compatibility Register The contents of the HRMOR affect how some storage Each defined bit in the PCR controls whether certain accesses are performed as described in Section 5.7.3 instructions, SPRs, and other related facilities are avail- on page 508 and Section 5.7.4 on page 512. able in problem state. The PCR has no effect on facili- ties when the processor is not in problem state. Facilities that are made unavailable by the PCR are 2.5 Logical Partition treated as follows when the processor is in problem state. Identification Register (LPIDR) - Instructions are treated as illegal instructions, The layout of the Logical Partition Identification Regis- - SPRs are treated as if they were not defined ter (LPIDR) is shown in Figure 4 below. for the implementation LPID - Fields in instructions are treated as reserved. 32 63 A PCR bit may also determine how an instruction field value is interpreted or may define other behavior as Bits Name Description specified in the bit definitions below. 32:63 LPID Logical Partition Identifier The PCR has no effect on the setting of the MSR by Figure 4. Logical Partition Identification Register interrupts or instructions such as rfid, mtmsrd. The contents of the LPIDR identify the partition to When facilities that have enable bits in the MSR are which the processor is assigned, affecting operations made unavailable by the value in the PCR (e.g. Vector necessary to manage the coherency of some transla- and Decimal Floating-Point Facility), they become tion lookaside buffers (see Section 5.10.1 and Chapter unavailable as specified above regardless of whether 10.). The number of LPIDR bits supported is implemen- they are enabled by the corresponding MSR bit. tation-dependent. The bit definitions for the PCR are shown below. Bit Description 0 Vector [Category: Vector] This bit controls the availability, in problem state, of the instructions and facilities in the Vector category. 0 The instructions and facilities in the Vector category are available in problem state. 474 Power ISATM III-S Version 2.05 1 The instructions and facilities in the Vector Programming Note category are unavailable in problem state. Since the PCR has no effect on privileged instruc- 1:62 Reserved tions, privileged instructions that are available on 63 Version 2.04 (v2.04) newer processors but not available on older pro- cessors will behave differently when the processor This bit controls the availability, in problem is in problem state. On the older processors, an state, of the instructions and related resources Illegal Instruction type Program interrupt will occur that were new in the version of the architec- since the instruction is undefined; on the newer ture subsequent to Version 2.04. processor, a Privileged Instruction type Program - Decimal Floating-point instructions and interrupt will occur since the instruction is imple- facilities [Category: Decimal Floating- mented. Point] - cmpb In future versions of the architecture, in general the - fcpsgn lowest-order reserved bit of the PCR will be used to - lfdp,lfdpx,stfdp,stfdpx control the availability of the instructions and - lfiwax related resources that are new in that version of the - prtyd, prtyw architecture; the name of the bit will correspond to - W field in the mtfsfi instruction the previous version of the architecture (i.e. the - L and W fields in the mtfsf instruction newest version in which the instructions and related resources were not available). 0 The listed instructions and related resources are available in problem state In these future versions of the architecture, there 1 The listed instructions and related will be a requirement that if any bit of the low-order resources are unavailable in problem defined bits is set to 1 then all higher-order bits of state the defined low-order bits must also be set to 1, and the architecture version with which the proces- sor appears to comply, in problem state, will be the The initial state of the PCR is all 0s. version corresponding to the name of the lowest- order 1 bit in the set of defined low-order PCR bits, Programming Note or the current architecture version if none of these Treating the W field in the mtfsfi instruction and bits are 1. Also, in general the highest-order the L and W fields in the mtfsf instruction as 0s has reserved bits will be used to control the availability the effect of making FPSCR0:31 unavailable. of sets of instructions and related resources having the requirement that their availability be indepen- dent of versions of the architecture. 2.7 Other Hypervisor Resources In addition to the resources described above, all hyper- visor privileged instructions as well as the following resources are hypervisor resources, accessible to soft- ware only when the processor is in hypervisor state except as noted below. 1 All implementation-specific resources, including implementation-specific registers (e.g., "HID" reg- isters), that control hardware functions or affect the results of instruction execution. Examples include resources that disable caches, disable hardware error detection, set breakpoints, control power management, or significantly affect performance. 1 ME bit of the MSR 1 SPRs defined as hypervisor-privileged in Section 4.4.5. (Note: Although the Time Base, the PURR, and the SPURR can be altered only by a hypervi- sor program, the Time Base can be read by all pro- grams and the PURR and SPURR can be read when the processor is in privileged state.) Chapter 2. Logical Partitioning (LPAR) 475 Version 2.05 The contents of a hypervisor resource can be modified by the execution of an instruction (e.g., mtspr) only in 2.9 Hypervisor Interrupt Little- hypervisor state (MSRHV PR = 0b10). An attempt to Endian (HILE) Bit modify the contents of a given hypervisor resource, other than MSRME, in privileged but non-hypervisor The Hypervisor Interrupt Little-Endian (HILE) bit is a bit state (MSRHV PR = 0b00) causes a Privileged Instruc- in an implementation-dependent register or similar tion type Program interrupt. An attempt to modify mechanism. The contents of the HILE bit are copied MSRME in privileged but non-hypervisor state is into MSRLE by interrupts that set MSRHV to 1 (see Sec- ignored (i.e., the bit is not changed). tion 6.5), to establish the Endian mode for the interrupt handler. The HILE bit is set, by an implementation- Programming Note dependent method, during system initialization, and cannot be modified after system initialization. Because the SPRs listed above are privileged for writing, an attempt to modify the contents of any of The contents of the HILE bit must be the same for all these SPRs in problem state (MSRPR=1) using processors under the control of a given instance of the mtspr causes a Privileged Instruction type Pro- hypervisor; otherwise all results are undefined. gram exception, and similarly for MSRME. 2.8 Sharing Hypervisor Resources Some hypervisor resources may be shared among pro- cessors. Programs that modify these resources must be aware of this sharing, and must allow for the fact that changes to these resources may affect more than one processor. The following resources may be shared among processors. 1 RMOR (see Section 2.3.) 1 HRMOR (see Section 2.4.) 1 LPIDR (see Section 2.5.) 1 PVR (see Section 4.3.1.) 1 SDR1 (see Section 5.7.7.2.) 1 Time Base (see Section 7.2.) 1 Hypervisor Decrementer (see Section 7.4.) 1 HMEER (see Section 6.2.9) 1 certain implementation-specific registers The set of resources that are shared is implementation- dependent. Processors that share any of the resources listed above, with the exception of the PVR and the HRMOR, must be in the same partition. For each field of the LPCR, except the HDICE field and the MER field, software must ensure that the contents of the field are identical among all processors that are in the same partition and are in a state such that the contents of the field could have side effects. (E.g., soft- ware must ensure that the contents of LPCRLPES are identical among all processors that are in the same par- tition and are not in hypervisor state.) For the HDICE field, software must ensure that the contents of the field are identical among all processors that share the Hypervisor Decrementer and are in a state such that the contents of the field could have side effects. (There are no identity requirements for the MER field). 476 Power ISATM III-S Version 2.05 Chapter 3. Branch Processor 3.1 Branch Processor Overview . . . . . 477 3.3.2 Power-Saving Mode Instructions 481 3.2 Branch Processor Registers . . . . . 477 3.3.2.1 Entering and Exiting Power-Sav- 3.2.1 Machine State Register . . . . . . . 477 ing Mode . . . . . . . . . . . . . . . . . . . . . . . 485 3.3 Branch Processor Instructions . . . 479 3.3.1 System Linkage Instructions . . . 479 3.1 Branch Processor Overview Programming Note The privilege state of the processor is This chapter describes the details concerning the regis- determined by MSRHV and MSRPR, as fol- ters and the privileged instructions implemented in the lows. Branch Processor that are not covered in Book I. HV PR 0 0 privileged 3.2 Branch Processor Registers 0 1 problem 1 0 privileged and hypervisor 3.2.1 Machine State Register 1 1 problem MSRHV can be set to 1 only by the Sys- The Machine State Register (MSR) is a 64-bit register. tem Call instruction and some interrupts. This register defines the state of the processor. On It can be set to 0 only by rfid and hrfid. interrupt, the MSR bits are altered in accordance with Figure 43 on page 555. The MSR can also be modified 4:37 Reserved by the mtmsr[d], rfid, and hrfid instructions. It can be read by the mfmsr instruction. 38 Vector Available (VEC) [Category: Vector] 0 The processor cannot execute any vector MSR instructions, including vector loads, 0 63 stores, and moves. Figure 6. Machine State Register 1 The processor can execute vector instruc- tions. Below are shown the bit definitions for the Machine State Register. 39:46 Reserved 47 Reserved Bit Description 48 External Interrupt Enable (EE) 0 Sixty-Four-Bit Mode (SF) 0 External and Decrementer interrupts are 0 The processor is in 32-bit mode. disabled. 1 The processor is in 64-bit mode. 1 External and Decrementer interrupts are 1:2 Reserved enabled. 3 Hypervisor State (HV) This bit also affects whether Hypervisor Dec- rementer and Hypervisor Maintenance inter- 0 The processor is not in hypervisor state. rupts are enabled; see Section 6.5.12 on 1 If MSRPR=0 the processor is in hypervisor page 565 and Section 6.2.9 on page 549. state; otherwise the processor is not in hypervisor state. 49 Problem State (PR) 0 The processor is in privileged state. Chapter 3. Branch Processor 477 Version 2.05 1 The processor is in problem state. 56:57 Reserved 58 Instruction Relocate (IR) Programming Note Any instruction that sets MSRPR to 1 also 0 Instruction address translation is disabled. sets MSREE, MSRIR, and MSRDR to 1. 1 Instruction address translation is enabled. 50 Floating-Point Available (FP) Programming Note [Category: Floating-Point] See the Programming Note in the defini- tion of MSRPR. 0 The processor cannot execute any float- ing-point instructions, including floating- 59 Data Relocate (DR) point loads, stores, and moves. 1 The processor can execute floating-point 0 Data address translation is disabled. instructions. Effective Address Overflow (EAO) (see Book I) does not occur. 51 Machine Check Interrupt Enable (ME) 1 Data address translation is enabled. EAO 0 Machine Check interrupts are disabled. causes a Data Storage interrupt. 1 Machine Check interrupts are enabled. Programming Note This bit is a hypervisor resource; see Chapter 2., "Logical Partitioning (LPAR)", on page 471. See the Programming Note in the defini- tion of MSRPR. Programming Note 60 Reserved The only instructions that can alter MSRME are rfid and hrfid. 61 Performance Monitor Mark (PMM) [Category: Server.Performance Monitor] See Appendix B of Book III-S. 52 Floating-Point Exception Mode 0 (FE0) 62 Recoverable Interrupt (RI) [Category: Floating-Point] 0 Interrupt is not recoverable. See below. 1 Interrupt is recoverable. 53 Single-Step Trace Enable (SE) Additional information about the use of this bit [Category: Trace] is given in Sections 6.4.3, "Interrupt Process- 0 The processor executes instructions nor- ing" on page 551, 6.5.1, "System Reset Inter- mally. rupt" on page 556, and 6.5.2, "Machine Check 1 The processor generates a Single-Step Interrupt" on page 557. type Trace interrupt after successfully 63 Little-Endian Mode (LE) completing the execution of the next instruction, unless that instruction is hrfid 0 The processor is in Big-Endian mode. or rfid, which are never traced. Success- 1 The processor is in Little-Endian mode. ful completion means that the instruction caused no other interrupt. Programming Note 54 Branch Trace Enable (BE) The only instructions that can alter MSRLE [Category: Trace] are rfid and hrfid. 0 The processor executes branch instruc- The Floating-Point Exception Mode bits FE0 and FE1 tions normally. are interpreted as shown below. For further details see 1 The processor generates a Branch type Book I. Trace interrupt after completing the exe- cution of a branch instruction, whether or FE0 FE1 Mode not the branch is taken. 0 0 Ignore Exceptions Branch tracing need not be supported on all 0 1 Imprecise Nonrecoverable implementations that support the Trace cate- 1 0 Imprecise Recoverable gory. If the function is not implemented, this 1 1 Precise bit is treated as reserved. 55 Floating-Point Exception Mode 1 (FE1) [Category: Floating-Point] See below. 478 Power ISATM III-S Version 2.05 3.3 Branch Processor Instructions 3.3.1 System Linkage Instructions These instructions provide the means by which a pro- The System Call instruction is described in Book I, but gram can call upon the system to perform a service, only at the level required by an application program- and by which the system can return from performing a mer. A complete description of this instruction appears service or from processing an interrupt. below. System Call SC-form Programming Note sc LEV sc serves as both a basic and an extended mne- monic. The Assembler will recognize an sc mne- 17 /// /// // LEV // 1 / monic with one operand as the basic form, and an 0 6 11 16 20 27 30 31 sc mnemonic with no operand as the extended form. In the extended form the LEV operand is omitted and assumed to be 0. SRR0 1iea CIA + 4 SRR133:36 42:47 1 0 SRR10:32 37:41 48:63 1 MSR0:32 37:41 48:63 MSR 1 new_value (see below) NIA 1 0x0000_0000_0000_0C00 The effective address of the instruction following the System Call instruction is placed into SRR0. Bits 0:32, 37:41, and 48:63 of the MSR are placed into the corre- sponding bits of SRR1, and bits 33:36 and 42:47 of SRR1 are set to zero. Then a System Call interrupt is generated. The inter- rupt causes the MSR to be set as described in Section 6.5, "Interrupt Definitions" on page 555. The setting of the MSR is affected by the contents of the LEV field. LEV values greater than 1 are reserved. Bits 0:5 of the LEV field (instruction bits 20:25) are treated as a reserved field. The interrupt causes the next instruction to be fetched from effective address 0x0000_0000_0000_0C00. This instruction is context synchronizing. Special Registers Altered: SRR0 SRR1 MSR Programming Note If LEV=1 the hypervisor is invoked. If LPES1=1, executing this instruction with LEV=1 is the only way that executing an instruction can cause hypervisor state to be entered. Because this instruction is not privileged, it is possi- ble for application software to invoke the hypervi- sor. However, such invocation should be considered a programming error. Chapter 3. Branch Processor 479 Version 2.05 Return From Interrupt Doubleword Hypervisor Return From Interrupt XL-form Doubleword XL-form rfid hrfid 19 /// /// /// 18 / 19 /// /// /// 274 / 0 6 11 16 21 31 0 6 11 16 21 31 MSR51 1 (MSR3 & SRR151) | ((¬MSR3) & MSR51) MSR48 1 HSRR148 | HSRR149 MSR3 1 MSR3 & SRR13 MSR58 1 HSRR158 | HSRR149 MSR48 1 SRR148 | SRR149 MSR59 1 HSRR159 | HSRR149 MSR58 1 SRR158 | SRR149 MSR0:32 37:41 49:57 60:63 1 HSRR10:32 37:41 49:57 60:63 MSR59 1 SRR159 | SRR149 NIA 1iea HSRR00:61 || 0b00 MSR0:2 4:32 37:41 49:50 52:57 60:631SRR10:2 4:32 37:41 49:50 52:57 60:63 NIA 1iea SRR00:61 || 0b00 The result of ORing bits 48 and 49 of HSRR1 is placed into MSR48. The result of ORing bits 58 and 49 of If MSR3=1 then bits 3 and 51 of SRR1 are placed into HSRR1 is placed into MSR58. The result of ORing bits the corresponding bits of the MSR. The result of ORing 59 and 49 of HSRR1 is placed into MSR59. Bits 0:32, bits 48 and 49 of SRR1 is placed into MSR48. The 37:41, 49:57, and 60:63 of HSRR1 are placed into the result of ORing bits 58 and 49 of SRR1 is placed into corresponding bits of the MSR. MSR58. The result of ORing bits 59 and 49 of SRR1 is placed into MSR59. Bits 0:2, 4:32, 37:41, 49:50, 52:57, If the new MSR value does not enable any pending and 60:63 of SRR1 are placed into the corresponding exceptions, then the next instruction is fetched, under bits of the MSR. control of the new MSR value, from the address HSRR00:61 || 0b00 (when SF=1 in the new MSR value) If the new MSR value does not enable any pending or 320 || HSRR032:61 || 0b00 (when SF=0 in the new exceptions, then the next instruction is fetched, under MSR value). If the new MSR value enables one or control of the new MSR value, from the address more pending exceptions, the interrupt associated with SRR00:61 || 0b00 (when SF=1 in the new MSR value) the highest priority pending exception is generated; in or 320 || SRR032:61 || 0b00 (when SF=0 in the new MSR this case the value placed into SRR0 or HSRR0 by the value). If the new MSR value enables one or more interrupt processing mechanism (see Section 6.4.3) is pending exceptions, the interrupt associated with the the address of the instruction that would have been highest priority pending exception is generated; in this executed next had the interrupt not occurred. case the value placed into SRR0 or HSRR0 by the interrupt processing mechanism (see Section 6.4.3) is This instruction is hypervisor privileged and context the address of the instruction that would have been synchronizing. executed next had the interrupt not occurred. Special Registers Altered: This instruction is privileged and context synchronizing. MSR Special Registers Altered: Programming Note MSR If this instruction sets MSRPR to 1, it also sets MSREE, MSRIR, and MSRDR to 1. Programming Note If this instruction sets MSRPR to 1, it also sets MSREE, MSRIR, and MSRDR to 1. 480 Power ISATM III-S Version 2.05 3.3.2 Power-Saving Mode Instructions The power-saving mode instructions provide a means processor exits from power-saving mode, are imple- by which the hypervisor can put the processor into mentation-specific. power-saving mode. When the processor is in power- Read-only resources are maintained in all power-sav- saving mode it does not execute instructions, and it ing levels. Descriptions of resource state loss in the may consume less power than it would consume when power-saving mode instruction descriptions do not it is not in power-saving mode. apply to read-only resources. There are four levels of power-savings, called doze, nap, sleep, and rvwinkle. For each level in this list, the Programming Note power consumed is less than or equal to the power The hypervisor determines which power-saving consumed in the preceding level, and the time required level to enter based on how responsive the system for the processor to exit from the level and for software needs to be. If the hypervisor decides that some then to resume normal operation is greater than or loss of state is acceptable, it can use the nap equal to the corresponding time for the preceding level. instruction rather than the doze instruction, and Doze power-saving level requires a minimum amount when the processor exits from power-saving mode of such time, while the other levels may require more the hypervisor can quickly determine whether any time. Resources other than those listed in the instruc- resources need to be restored. tion descriptions that are maintained in each level other than doze, and the actions required by the hypervisor in order for software to resume normal operation after the Chapter 3. Branch Processor 481 Version 2.05 Doze XL-form Nap XL-form doze nap 19 /// /// /// 402 / 19 /// /// /// 434 / 0 6 11 16 21 31 0 6 11 16 21 31 The processor is placed into doze power-saving level. The processor is placed into nap power-saving level. When the processor is in doze power-saving level, the When the processor is in nap power-saving level, the state of all processor resources is maintained as if the state of the Decrementer and all hypervisor resources processor was not in power-saving mode. is maintained as if the processor was not in power-sav- ing mode, and sufficient information is maintained to When the interrupt that causes exit from doze power- allow the hypervisor to resume execution. saving level occurs, resource state is as described in the preceding paragraph, except that if the exception When the interrupt that causes exit from nap power- that caused the exit is a System Reset, Machine saving level occurs, resource state is as described in Check, or Hypervisor Maintenance exception, resource the preceding paragraph, except that if the exception state that would be lost if the exception occurred when that caused the exit is a System Reset, Machine the processor was not in power-saving mode may be Check, or Hypervisor Maintenance exception, resource lost. state that would be lost if the exception occurred when the processor was not in power-saving mode may be This instruction is hypervisor privileged and context lost. synchronizing. This instruction is hypervisor privileged and context Special Registers Altered: synchronizing. None Special Registers Altered: None Programming Note If the state of the Decrementer were not maintained and updated as if the processor was not in power- saving mode, Decrementer exceptions would not reliably cause exit from nap power-saving level even if Decrementer exceptions were enabled to cause exit. 482 Power ISATM III-S Version 2.05 Sleep XL-form Rip Van Winkle XL-form sleep rvwinkle 19 /// /// /// 466 / 19 /// /// /// 498 / 0 6 11 16 21 31 0 6 11 16 21 31 The processor is placed into sleep power-saving level. The processor is placed into rvwinkle power-saving level. When the processor is in sleep power-saving level, the state of all resources may be lost except for the When the processor is in rvwinkle power-saving level, HRMOR. the state of all resources may be lost except for the HRMOR. When the interrupt that causes exit from sleep power- saving level occurs, resource state is as described in When the interrupt that causes exit from rvwinkle the preceding paragraph, except that if the exception power-saving level occurs, resource state is as that caused the exit is a System Reset, Machine described in the preceding paragraph, except that if the Check, or Hypervisor Maintenance exception, resource exception that caused the exit is a System Reset, state that would be lost if the exception occurred when Machine Check, or Hypervisor Maintenance exception, the processor was not in power-saving mode may be resource state that would be lost if the exception lost. occurred when the processor was not in power-saving mode may be lost. This instruction is hypervisor privileged and context synchronizing. This instruction is hypervisor privileged and context synchronizing. Special Registers Altered: None Special Registers Altered: None Programming Note If the state of the Decrementer is not maintained Programming Note and updated, in sleep or rvwinkle power-saving In the short story by Washington Irving, Rip Van level, as if the processor was not in power-saving Winkle is a man who fell asleep on a green knoll mode, Decrementer exceptions will not reliably and awoke twenty years later. cause exit from power-saving mode even if Decre- menter exceptions are enabled to cause exit. Note See the Notes that appear in the sleep instruction Note description. See the Notes that appear in the rvwinkle instruc- tion description. Chapter 3. Branch Processor 483 Version 2.05 484 Power ISATM III-S Version 2.05 3.3.2.1 Entering and Exiting Power-Saving Mode In order to enter power-saving mode, the hypervisor Programming Note must use the instruction sequence shown below. Before executing this sequence, the hypervisor must The ptesync instruction (see Book III-S, Section ensure that LPCRMER contains the value 0, the 5.9.2) in the preceding sequence, in conjunction LPCRPECE contains the desired value if doze or nap with the ld instruction and the loop, ensure that all power-saving level is to be entered, MSRSF, MSRHV, storage accesses associated with instructions pre- and MSRME contain the value 1, and all other bits of ceding the ptesync instruction, and all Reference, the MSR contain the value 0 except for MSRRI, which and Change bit updates associated with additional may contain either 0 or 1. Depending on the implemen- address translations that were performed, by the tation and on the power-saving mode being entered, it processor executing the ptesync instruction, may also be necessary for the hypervisor to save the before the ptesync instruction is executed, have state of certain processor resources before entering the been performed with respect to all processors and sequence. The sequence must be exactly as shown, mechanisms, to the extent required by the associ- with no intervening instructions, except that any GPR ated Memory Coherence Required attributes, may be used as Rx and as Ry, and any value may be before the processor enters power-saving mode. used for "save_area" provided the resulting effective The b instruction (branch to self) is not executed address is double-word aligned and corresponds since the preceding power-saving mode instruction to a valid real address. puts the processor in a power-saving mode in which instructions are not executed. Even though it std Rx,save_area(Ry) /* last store neces-*/ is not executed, requiring it to be present simplifies /* sary to save state*/ implementation and testing because it reduces the ptesync /* order load after*/ synchronization needed between execution of the /* last store */ instruction stream and entry into power-saving ld Rx,save_area(Ry) /* reload from last */ mode. /* store location, */ /* for synchro- */ If the performance monitor is in use when the pro- /* nization */ cessor enters power-saving mode, the Perfor- loop: mance Monitor data obtainable when the processor cmp Rx,Rx /* create dependency */ exits from power-saving mode may be incomplete bne loop or otherwise misleading. nap/doze/sleep/rvwinkle /* enter power- */ /* saving mode */ b $ /* branch to self */ Programming Note After the processor has entered power-saving mode as Software is not required to set the RI bit to any par- specified above, various exceptions may cause exit ticular value prior to entering power-saving mode from power-saving mode. The exceptions include, Sys- because the setting of SRR162 upon exit from tem Reset, Machine Check, Decrementer, External, power-saving mode is independent of the value of Hypevisor Maintence, and implementation-specific the RI bit upon entry into power-saving mode. exceptions. Upon exit from power-saving mode, if the exception was a Machine Check exception, then a Machine Check interrupt occurs; otherwise a System Reset interrupt occurs, and the contents of SRR1 indi- cate the type of exception that caused exit from power- saving mode. See to 6.6.1 for additional information. Chapter 3. Branch Processor 485 Version 2.05 486 Power ISATM III-S Version 2.05 Chapter 4. Fixed-Point Processor 4.1 Fixed-Point Processor Overview . 487 4.4.2 Fixed-Point Load and Store Quad- 4.2 Special Purpose Registers . . . . . . 487 word Instructions [Category: Load/Store 4.3 Fixed-Point Processor Registers . 487 Quadword] . . . . . . . . . . . . . . . . . . . . . . 493 4.3.1 Processor Version Register . . . . 487 4.4.3 Binary Coded Decimal (BCD) Assis- 4.3.2 Processor Identification Register 487 tance Instructions [Category: BCD Assis- 4.3.3 Control Register. . . . . . . . . . . . . 488 tance]. . . . . . . . . . . . . . . . . . . . . . . . . . 494 4.3.4 Program Priority Register . . . . . 488 4.4.4 OR Instruction . . . . . . . . . . . . . . 496 4.3.5 Software-use SPRs . . . . . . . . . . 489 4.4.5 Move To/From System Register 4.4 Fixed-Point Processor Instructions 490 Instructions . . . . . . . . . . . . . . . . . . . . . 496 4.4.1 Fixed-Point Load and Store Caching Inhibited Instructions . . . . . . . . . . . . . . 490 4.1 Fixed-Point Processor Over- The PVR distinguishes between processors that differ in attributes that may affect software. It contains two view fields. Version A 16-bit number that identifies the version This chapter describes the details concerning the regis- of the processor. Different version num- ters and the privileged instructions implemented in the bers indicate major differences between Fixed-Point Processor that are not covered in Book I. processors, such as which categories are supported. 4.2 Special Purpose Registers Revision A 16-bit number that distinguishes between implementations of the version. Different Special Purpose Registers (SPRs) are read and written revision numbers indicate minor differences using the mfspr (page 500) and mtspr (page 499) between processors having the same ver- instructions. Most SPRs are defined in other chapters sion number, such as clock rate and Engi- of this book; see the index to locate those definitions. neering Change level. Version numbers are assigned by the Power ISA pro- 4.3 Fixed-Point Processor Reg- cess. Revision numbers are assigned by an implemen- tation-defined process. isters 4.3.2 Processor Identification 4.3.1 Processor Version Register Register The Processor Version Register (PVR) is a 32-bit read- The Processor Identification Register (PIR) is a 32-bit only register that contains a value identifying the ver- register that contains a value that can be used to distin- sion and revision level of the processor. The contents guish the processor from other processors in the sys- of the PVR can be copied to a GPR by the mfspr tem. The contents of the PIR can be copied to a GPR instruction. Read access to the PVR is privileged; write by the mfspr instruction. Read access to the PIR is access is not provided. privileged; write access, if provided, is hypervisor privi- Version Revision 32 48 63 Figure 7. Processor Version Register Chapter 4. Fixed-Point Processor 487 Version 2.05 leged. It is implementation-dependent whether write 4.3.3 Control Register access is provided.. The Control Register (CTRL) is a 32-bit register that PROCID controls an external I/O pin. This signal may be used 32 63 for the following: 1 driving the RUN Light on a system operator panel Bits Name Description 1 Direct External exception routing 0:31 PROCID Processor ID 1 Performance Monitor Counter incrementing (see Appendix B) Figure 8. Processor Identification Register /// RUN The means by which the PIR is initialized are imple- 32 63 mentation-dependent. The PIR is a hypervisor resource; see Chapter 2. Bit Name Description 63 RUN Run state bit All other fields are implementation-dependent. Figure 9. Control Register The CTRL RUN can be used by the operating system to indicate when the processor is doing useful work. The contents of the CTRL can be written by the mtspr instruction and read by the mfspr instruction. Write access to the CTRL is privileged. Reads can be per- formed in privileged or problem state. 4.3.4 Program Priority Register The Program Priority Register (PPR) is a 64-bit register that controls the program's priority. The layout of the PPR is shown in Figure 10. A subset of the PRI values may be set by problem state programs (see Section 3.2.3 of Book I). /// PRI /// imp-specific 0 11 14 44 63 Bit(s) Description 11:13 Program Priority (PRI) 001 very low 010 low 011 medium low 100 medium (normal) 101 medium high 110 high 111 very high 44:63 Implementation-specific Figure 10. Program Priority Register 488 Power ISATM III-S Version 2.05 4.3.5 Software-use SPRs Software-use SPRs are 64-bit registers provided for use by software. SPRG0 SPRG1 SPRG2 SPRG3 0 63 Figure 11. Software-use SPRs SPRG0, SPRG1, and SPRG2 are privileged registers. SPRG3 is a privileged register except that the contents may be copied to a GPR in Problem state when accessed using the mfspr instruction. Programming Note Neither the contents of the SPRGs, nor accessing them using mtspr or mfspr, has a side effect on the operation of the processor. One or more of the registers is likely to be needed by non-hypervisor interrupt handler programs (e.g., as scratch regis- ters and/or pointers to per processor save areas). Operating systems must ensure that no sensitive data are left in SPRG3 when a problem state pro- gram is dispatched, and operating systems for secure systems must ensure that SPRG3 cannot be used to implement a "covert channel" between problem state programs. These requirements can be satisfied by clearing SPRG3 before passing control to a program that will run in problem state. HSPRG0 and HSPRG1 are 64-bit registers provided for use by hypervisor programs. HSPRG0 HSPRG1 0 63 Figure 12. SPRs for use by hypervisor programs Programming Note Neither the contents of the HSPRGs, nor accessing them using mtspr or mfspr, has a side effect on the operation of the processor. One or more of the registers is likely to be needed by hypervisor inter- rupt handler programs (e.g., as scratch registers and/or pointers to per processor save areas). Chapter 4. Fixed-Point Processor 489 Version 2.05 4.4 Fixed-Point Processor Instructions 4.4.1 Fixed-Point Load and Store Caching Inhibited Instructions The storage accesses caused by the instructions Programming Note described in this section are performed as though the specified storage location is Caching Inhibited and The instructions described in this section can be Guarded. The instructions can be executed only in used to permit a control register on an I/O device hypervisor state. If any of the following restrictions on to be accessed without permitting the correspond- execution of these instructions (while in hypervisor ing storage location to be copied into the caches. state) are violated, the results are undefined. 1 They must be executed only when MSRDR=0. See also, in Book I, the introductions to Section 3.3.1, 1 The specified storage location must not be in stor- "Fixed-Point Storage Access Instructions", age specified by the Real Mode Storage Control Section 3.3.2, "Fixed-Point Load Instructions", and facility to be treated as non-Guarded. Section 3.3.3, "Fixed-Point Store Instructions". 1 Software must ensure that the specified storage location is not in the caches. 490 Power ISATM III-S Version 2.05 Load Byte and Zero Caching Inhibited Load Halfword and Zero Caching Indexed X-form Inhibited Indexed X-form lbzcix RT,RA,RB lhzcix RT,RA,RB 31 RT RA RB 853 / 31 RT RA RB 821 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + (RB) EA 1 b + (RB) RT 1 560 || MEM(EA, 1) RT 1 480 || MEM(EA, 2) Let the effective address (EA) be the sum Let the effective address (EA) be the sum (RA|0)+ (RB). The byte in storage addressed by EA is (RA|0)+ (RB). The halfword in storage addressed by loaded into RT56:63. RT0:55 are set to 0. EA is loaded into RT48:63. RT0:47 are set to 0. The storage access caused by this instruction is per- The storage access caused by this instruction is per- formed as though the specified storage location is formed as though the specified storage location is Caching Inhibited and Guarded. Caching Inhibited and Guarded. This instruction is hypervisor privileged. This instruction is hypervisor privileged. Special Registers Altered: Special Registers Altered: None None Load Word and Zero Caching Inhibited Load Doubleword Caching Inhibited Indexed X-form Indexed X-form lwzcix RT,RA,RB ldcix RT,RA,RB 31 RT RA RB 789 / 31 RT RA RB 885 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + (RB) EA 1 b + (RB) RT 1 320 || MEM(EA, 4) RT 1 MEM(EA, 8) Let the effective address (EA) be the sum Let the effective address (EA) be the sum (RA|0)+ (RB). The word in storage addressed by EA is (RA|0)+ (RB). The doubleword in storage addressed by loaded into RT32:63. RT0:31 are set to 0. EA is loaded into RT. The storage access caused by this instruction is per- The storage access caused by this instruction is per- formed as though the specified storage location is formed as though the specified storage location is Caching Inhibited and Guarded. Caching Inhibited and Guarded. This instruction is hypervisor privileged. This instruction is hypervisor privileged. Special Registers Altered: Special Registers Altered: None None Chapter 4. Fixed-Point Processor 491 Version 2.05 Store Byte Caching Inhibited Indexed Store Halfword Caching Inhibited Indexed X-form X-form stbcix RS,RA,RB sthcix RS,RA,RB 31 RS RA RB 981 / 31 RS RA RB 949 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + (RB) EA 1 b + (RB) MEM(EA, 1) 1 (RS)56:63 MEM(EA, 2) 1 (RS)48:63 Let the effective address (EA) be the sum Let the effective address (EA) be the sum (RA|0)+ (RB). (RS)56:63 are stored into the byte in stor- (RA|0)+ (RB). (RS)48:63 are stored into the halfword in age addressed by EA. storage addressed by EA. The storage access caused by this instruction is per- The storage access caused by this instruction is per- formed as though the specified storage location is formed as though the specified storage location is Caching Inhibited and Guarded. Caching Inhibited and Guarded. This instruction is hypervisor privileged. This instruction is hypervisor privileged. Special Registers Altered: Special Registers Altered: None None Store Word Caching Inhibited Indexed Store Doubleword Caching Inhibited X-form Indexed X-form stwcix RS,RA,RB stdcix RS,RA,RB 31 RS RA RB 917 / 31 RS RA RB 1013 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + (RB) EA 1 b + (RB) MEM(EA, 4) 1 (RS)32:63 MEM(EA, 8) 1 (RS) Let the effective address (EA) be the sum Let the effective address (EA) be the sum (RA|0)+ (RB). (RS)32:63 are stored into the word in stor- (RA|0)+ (RB). (RS) is stored into the doubleword in age addressed by EA. storage addressed by EA. The storage access caused by this instruction is per- The storage access caused by this instruction is per- formed as though the specified storage location is formed as though the specified storage location is Caching Inhibited and Guarded. Caching Inhibited and Guarded. This instruction is hypervisor privileged. This instruction is hypervisor privileged. Special Registers Altered: Special Registers Altered: None None 492 Power ISATM III-S Version 2.05 4.4.2 Fixed-Point Load and Store Quadword Instructions [Category: Load/ Store Quadword] Load Quadword DQ-form Store Quadword DS-form lq RTp,DQ(RA) stq RSp,DS(RA) 56 RTp RA DQ /// 62 RSp RA DS 2 0 6 11 16 28 31 0 6 11 16 30 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTS(DQ || 0b0000) EA 1 b + EXTS(DS || 0b00) RTp 1 MEM(EA, 8) MEM(EA, 8) 1 RSp Let the effective address (EA) be the sum (RA|0)+ Let the effective address (EA) be the sum (RA|0)+ (DQ||0b0000). The quadword in storage addressed by (DS||0b00). The contents of register pair RSp are EA is loaded into register pair RTp. stored into the quadword in storage addressed by EA. EA must be a multiple of 16. If it is not, an Alignment EA must be a multiple of 16. If it is not, an Alignment interrupt occurs. interrupt occurs. If RTp is odd or RTp=RA, the instruction form is invalid. If RSp is odd, the instruction form is invalid. If RTp=RA, an attempt to execute this instruction This instruction is not supported in Little-Endian mode. causes an Illegal Instruction type Program interrupt. Execution of this instruction in Little-Endian mode (The RTp=RA case includes the case of RTp=RA=0.) causes either an Alignment interrupt or the results are This instruction is not supported in Little-Endian mode. boundedly undefined. Execution of this instruction in Little-Endian mode This instruction is privileged. causes either an Alignment interrupt or the results are boundedly undefined. This instruction is privileged. Special Registers Altered: None Special Registers Altered: None Chapter 4. Fixed-Point Processor 493 Version 2.05 4.4.3 Binary Coded Decimal (BCD) Assistance Instructions [Category: BCD Assistance] The Binary Coded Decimal Assist instructions operate addg6s) and Decimal Floating-Point operands (cdt- on Binary Coded Decimal operands (cbcdtd and bcd) (see Chapter 5 of Book I - III). Convert Declets To Binary Coded Decimal Convert Binary Coded Decimal To Declets X-form X-form cdtbcd RA, RS cbcdtd RA, RS 31 RS RA /// 282 / 31 RS RA /// 314 / 0 6 11 16 21 31 0 6 11 16 21 31 do i = 0 to 1 do i = 0 to 1 n 1 i x 32 n 1 i x 32 RAn+0:n+7 1 0 RAn+0:n+11 1 0 RAn+8:n+19 1 DEC_TO_BCD( (RS)n+12:n+21 ) RAn+12:n+21 1 BCD_TO_DEC( (RS)n+8:n+19 ) RAn+20:n+31 1 DEC_TO_BCD( (RS)n+22:n+31 ) RAn+22:n+31 1 BCD_TO_DEC( (RS)n+20:n+31 ) The low-order 20 bits of each word of register RS con- The low-order 24 bits of each word of register RS con- tain two declets which are converted to six 4-bit BCD tain six 4-bit BCD fields which are converted to two fields, and the result is placed into the low-order 24 bits declets, and the result is placed into the low-order 20 of the corresponding word in RA. The high-order 8 bits bits of the corresponding word in RA. The high-order in each word of RA are set to 0. 12 bits in each word of RA are set to 0. This instruction is hypervisor privileged. If a 4-bit BCD field has a value greater than 9, the results are undefined. Special Registers Altered: None This instruction is hypervisor privileged. Special Registers Altered: None 494 Power ISATM III-S Version 2.05 Add and Generate Sixes XO-form addg6s RT,RA,RB 31 RT RA RB / 74 / 0 6 11 16 21 22 31 do i = 0 to 15 dci 1 carry_out(RA4xi:63 + RB4xi:63) c 1 4(dc0) || 4(dc1) || ... || 4(dc15) RT 1 (¬c) & 0x6666_6666_6666_6666 The contents of register RA are added to the contents of register RB. The carry out of decimal digit position n (bit position 42n) produces 16 carry bits. A doubleword is composed from the 16 carry bits, and placed into RT. The doubleword consists of a decimal six (0b0110) in every decimal digit position for which the corresponding carry bit is 0, and a zero (0b0000) in every position for which the corresponding carry bit is 1. This instruction is hypervisor privileged. Special Registers Altered: None Programming Note See Appendix E of Book III-S for programming examples using add6gs. Chapter 4. Fixed-Point Processor 495 Version 2.05 4.4.4 OR Instruction 4.4.5 Move To/From System Reg- or Rx,Rx,Rx can be used to set PPRPRI (see Section ister Instructions 4.3.4) as shown in Figure 13. PPRPRI remains The Move To Special Purpose Register and Move unchanged if the privilege state of the processor exe- From Special Purpose Register instructions are cuting the instruction is lower than the privilege indi- described in Book I, but only at the level available to an cated in the figure. (The encodings available to application programmer. For example, no mention is application programs are also shown in Book I.) made there of registers that can be accessed only in privileged state. The descriptions of these instructions Rx PPRPRI Priority Privileged given below extend the descriptions given in Book I, but 31 001 very low yes do not list Special Purpose Registers that are imple- 1 010 low no mentation-dependent. In the descriptions of these instructions given below, the "defined" SPR numbers 6 011 medium low no are the SPR numbers shown in the figure for the 2 100 medium (normal) no instruction and the implementation-specific SPR num- 5 101 medium high yes bers that are implemented, and similarly for "defined" registers. 3 110 high yes 7 111 very high hypv Extended mnemonics Figure 13. Priority levels for or Rx,Rx,Rx Extended mnemonics are provided for the mtspr and mfspr instructions so that they can be coded with the SPR name as part of the mnemonic rather than as a numeric operand. See Appendix A, "Assembler Extended Mnemonics" on page 589. 496 Power ISATM III-S Version 2.05 Figure 14. SPR encodings (Sheet 1 of 2) SPR1 Privileged Length decimal Register Name Cat2 spr5:9 spr0:4 mtspr mfspr (bits) 1 00000 00001 XER no no 64 B 8 00000 01000 LR no no 64 B 9 00000 01001 CTR no no 64 B 17 00000 10001 DSCR yes yes 64 S 18 00000 10010 DSISR yes yes 32 S 19 00000 10011 DAR yes yes 64 S 22 00000 10110 DEC yes yes 32 B 25 00000 11001 SDR1 hypv3 hypv3 64 S 26 00000 11010 SRR0 yes yes 64 B 27 00000 11011 SRR1 yes yes 64 B 28 00000 11100 CFAR yes yes 64 S 29 00000 11101 AMR yes yes 64 S 136 00100 01000 CTRL - no 32 S 152 00100 11000 CTRL yes - 32 S 256 01000 00000 VRSAVE no no 32 V 259 01000 00011 SPRG3 - no 64 B 268 01000 01100 TB - no 64 B 269 01000 01100 TBU - no 32 B 272-275 01000 100xx SPRG[0-3] yes yes 64 B 282 01000 11010 EAR hypv3 hypv3 32 EC 284 01000 11100 TBL hypv 3 - 32 B 285 01000 11101 TBU hypv3 - 32 B 286 01000 11110 TBU40 hypv - 64 S 287 01000 11111 PVR - yes 32 B 304 01001 10000 HSPRG0 hypv3 hypv3 64 S 3 305 01001 10001 HSPRG1 hypv hypv3 64 S 306 01001 10010 HDSISR hypv3 hypv3 32 B 3 307 01001 10011 HDAR hypv hypv3 64 B 3 308 01001 10100 SPURR hypv yes 64 S 309 01001 10101 PURR hypv3 yes 64 S 310 01001 10110 HDEC hypv3 hypv3 32 S 312 01001 11000 RMOR hypv3 hypv3 64 S 313 01001 11001 HRMOR hypv 3 hypv3 64 S 3 314 01001 11010 HSRR0 hypv hypv3 64 S 315 01001 11011 HSRR1 hypv3 hypv3 64 S Chapter 4. Fixed-Point Processor 497 Version 2.05 Figure 14. SPR encodings (Sheet 2 of 2) SPR1 Privileged Length decimal Register Name Cat2 spr5:9 spr0:4 mtspr mfspr (bits) 318 01001 11110 LPCR hypv3 hypv3 64 S 3 hypv3 319 01001 11111 LPIDR hypv 32 S 336 01010 10000 HMER hypv3,4 hypv3 64 S 337 01010 10001 HMEER hypv3 hypv3 64 S 338 01010 10010 PCR hypv3 hypv3 64 S 339 01010 10011 HEIR hypv3 hypv3 32 HEA 512 10000 00000 SPEFSCR no no 32 SP 526 10000 01110 ATB/ATBL - no 64 ATB 527 10000 01111 ATBU - no 32 ATB 768-783 11000 0xxxx perf_mon - no 64 S.PM 784-799 11000 1xxxx perf_mon varies yes 64 S.PM 896 11100 00000 PPR no no 64 S 1013 11111 10101 DABR hypv3 hypv3 64 S 1015 11111 10111 DABRX hypv3 hypv3 64 S 1023 11111 11111 PIR - yes 32 S - This register is not defined for this instruction. 1 Note that the order of the two 5-bit halves of the SPR number is reversed. 2 See Section 1.3.5 of Book I. 3 This register is a hypervisor resource, and can be modified by this instruction only in hypervi- sor state (see Chapter 2). 4 This register cannot be directly written. Instead, bits in the register corresponding to 0 bits in (RS) can be cleared using mtspr SPR,RS. All SPR numbers that are not shown above and are not implementation-specific are reserved. 498 Power ISATM III-S Version 2.05 Move To Special Purpose Register Register that is not provided by the implementation, the XFX-form effect of executing this instruction is the same as if the SPR number were reserved. mtspr SPR,RS Special Registers Altered: See Figure 31 RS spr 467 / 0 6 11 21 31 Programming Note For a discussion of software synchronization n 1 spr5:9 || spr0:4 requirements when altering certain Special Pur- if length(SPR(n)) = 64 then if n = 336 then pose Registers, see Chapter 10. "Synchronization SPR(336) 1 (SPR(336)) & (RS) Requirements for Context Alterations" on else page 585. SPR(n) 1 (RS) else SPR(n) 1 (RS)32:63 The SPR field denotes a Special Purpose Register, encoded as shown in Figure 14. The contents of regis- ter RS are placed into the designated Special Purpose Register, except in the case described below. For Spe- cial Purpose Registers that are 32 bits long, the low- order 32 bits of RS are placed into the SPR. When the designated SPR is the Hypervisor Mainte- nance Exception Register (HMER), the contents of reg- ister RS are ANDed with the contents of the HMER and the result is placed into the HMER. For this instruction, SPRs TBL and TBU are treated as separate 32-bit registers; setting one leaves the other unaltered. spr0=1 if and only if writing the register is privileged. Execution of this instruction specifying an SPR number with spr0=1 causes a Privileged Instruction type Pro- gram interrupt when MSRPR=1 and, if the SPR is a hypervisor resource (see Figure ), when MSRHV PR=0b00 except that when MSRHV PR=0b00 no operation occurs for SPRs 791 or 792 on proceessors in which these registers are hypervisor privileged. Execution of this instruction specifying an SPR number that is not defined for the implementation causes either an Illegal Instruction type Program interrupt or one of the following. 1 if spr0=0: - if MSRPR=1: Hypervisor Emulation Assistance interrupt if Category: HEA is supported; other- wise Illegal Instruction type Program interrupt - if MSRPR=0: Ilegal Instruction type Program interrupt for SPR 0 and no operation (i.e. the instruction is treated as a no-op) for all other SPRs 1 if spr0=1: - if MSRPR=1: Privileged Instruction type Pro- gram interrupt - if MSRPR=0: no operation (i.e. the instruction is treated as a no-op) If the SPR number is set to a value that is shown in Figure but corresponds to an optional Special Purpose Chapter 4. Fixed-Point Processor 499 Version 2.05 Move From Special Purpose Register XFX-form mfspr RT,SPR 31 RT spr 339 / 0 6 11 21 31 n 1 spr5:9 || spr0:4 if length(SPR(n)) = 64 then RT 1 SPR(n) else RT 1 320 || SPR(n) The SPR field denotes a Special Purpose Register, encoded as shown in Figure . The contents of the des- ignated Special Purpose Register are placed into regis- ter RT. For Special Purpose Registers that are 32 bits long, the low-order 32 bits of RT receive the contents of the Special Purpose Register and the high-order 32 bits of RT are set to zero. spr0=1 if and only if reading the register is privileged. Execution of this instruction specifying an SPR number with spr0=1 causes a Privileged Instruction type Pro- gram interrupt when MSRPR=1 and, if the SPR is a hypervisor resource (see Figure ), when MSRPR HV=0b00. Execution of this instruction specifying an SPR number that is not defined for the implementation causes either an Illegal Instruction type Program interrupt or one of the following. 1 if spr0=0: - if MSRPR=1: Hypervisor Emulation Assistance interrupt if Category: HEA is supported; other- wise Illegal Instruction type Program interrupt - if MSRPR=0:Ilegal Instruction type Program interrupt for SPRs 0, 4, 5, and 6 and no opera- tion (i.e. the instruction is treated as a no-op) for all other SPRs 1 if spr0=1: - if MSRPR=1: Privileged Instruction type Pro- gram interrupt - if MSRPR=0: "no-op" If the SPR field contains a value that is shown in Figure but corresponds to an optional Special Purpose Register that is not provided by the implementation, the effect of executing this instruction is the same as if the SPR number were reserved. Special Registers Altered: None Note See the Notes that appear with mtspr. 500 Power ISATM III-S Version 2.05 Move To Machine State Register X-form Programming Note mtmsr RS,L If MSREE=0 and an External or Decrementer exception is pending, executing an mtmsr instruc- 31 RS /// L /// 146 / tion that sets MSREE to 1 will cause the External or 0 6 11 15 16 21 31 Decrementer interrupt to occur before the next instruction is executed, if no higher priority excep- tion exists (see Section 6.8, "Interrupt Priorities" on if L = 0 then page 571). Similarly, if a Hypervisor Decrementer MSR48 1 (RS)48 | (RS)49 MSR58 1 (RS)58 | (RS)49 interrupt is pending, execution of the instruction by MSR59 1 (RS)59 | (RS)49 the hypervisor causes a Hypervisor Decrementer MSR32:47 49:50 52:57 60:62 1(RS)32:47 49:50 52:57 60:62 interrupt to occur if HDICE=1. else For a discussion of software synchronization MSR48 62 1 (RS)48 62 requirements when altering certain MSR bits, see The MSR is set based on the contents of register RS Chapter 10. and of the L field. L=0: Programming Note The result of ORing bits 48 and 49 of register RS is mtmsr serves as both a basic and an extended placed into MSR48. The result of ORing bits 58 mnemonic. The Assembler will recognize an and 49 of register RS is placed into MSR58. The mtmsr mnemonic with two operands as the basic result of ORing bits 59 and 49 of register RS is form, and an mtmsr mnemonic with one operand placed into MSR59. Bits 32:47, 49:50, 52:57, and as the extended form. In the extended form the L 60:62 of register RS are placed into the corre- operand is omitted and assumed to be 0. sponding bits of the MSR. L=1: Programming Note Bits 48 and 62 of register RS are placed into the There is no need for an analogous version of the corresponding bits of the MSR. The remaining bits mfmsr instruction, because the existing instruction of the MSR are unchanged. copies the entire contents of the MSR to the selected GPR. This instruction is privileged. If L=0 this instruction is context synchronizing. If L=1 this instruction is execution synchronizing; in addition, the alterations of the EE and RI bits take effect as soon as the instruction completes. Special Registers Altered: MSR Except in the mtmsr instruction description in this sec- tion, references to "mtmsr" in this document imply either L value unless otherwise stated or obvious from context (e.g., a reference to an mtmsr instruction that modifies an MSR bit other than the EE or RI bit implies L=0). Programming Note If this instruction sets MSRPR to 1, it also sets MSREE, MSRIR, and MSRDR to 1. This instruction does not alter MSRME or MSRLE. (This instruction does not alter MSRHV because it does not alter any of the high-order 32 bits of the MSR.) If the only MSR bits to be altered are MSREE RI, to obtain the best performance L=1 should be used. Chapter 4. Fixed-Point Processor 501 Version 2.05 Move To Machine State Register Programming Note Doubleword X-form If MSREE=0 and an External or Decrementer mtmsrd RS,L exception is pending, executing an mtmsrd instruction that sets MSREE to 1 will cause the 31 RS /// L /// 178 / External or Decrementer interrupt to occur before 0 6 11 15 16 21 31 the next instruction is executed, if no higher priority exception exists (see Section 6.8, "Interrupt Priori- ties" on page 571). Similarly, if a Hypervisor Decre- if L = 0 then menter interrupt is pending, execution of the instruction by the hypervisor causes a Hypervisor MSR48 1 (RS)48 | (RS)49 MSR58 1 (RS)58 | (RS)49 Decrementer interrupt to occur if HDICE=1. MSR59 1 (RS)59 | (RS)49 For a discussion of software synchronization MSR0:2 4:47 49:50 52:57 60:62 1 (RS)0:2 4:47 49:50 52:57 60:62 requirements when altering certain MSR bits, see else MSR48 62 1 (RS)48 62 Chapter 10. The MSR is set based on the contents of register RS and of the L field. Programming Note mtmsrd serves as both a basic and an extended L=0: mnemonic. The Assembler will recognize an The result of ORing bits 48 and 49 of register RS is mtmsrd mnemonic with two operands as the basic placed into MSR48. The result of ORing bits 58 and form, and an mtmsrd mnemonic with one operand 49 of register RS is placed into MSR58. The result as the extended form. In the extended form the L of ORing bits 59 and 49 of register RS is placed operand is omitted and assumed to be 0. into MSR59. Bits 0:2, 4:47, 49:50, 52:57, and 60:62 of register RS are placed into the corresponding bits of the MSR. L=1: Bits 48 and 62 of register RS are placed into the corresponding bits of the MSR. The remaining bits of the MSR are unchanged. This instruction is privileged. If L=0 this instruction is context synchronizing. If L=1 this instruction is execution synchronizing; in addition, the alterations of the EE and RI bits take effect as soon as the instruction completes. Special Registers Altered: MSR Except in the mtmsrd instruction description in this section, references to "mtmsrd" in this document imply either L value unless otherwise stated or obvious from context (e.g., a reference to an mtmsrd instruction that modifies an MSR bit other than the EE or RI bit implies L=0). Programming Note If this instruction sets MSRPR to 1, it also sets MSREE, MSRIR, and MSRDR to 1. This instruction does not alter MSRLE, MSRME or MSRHV. If the only MSR bits to be altered are MSREE RI, to obtain the best performance L=1 should be used. 502 Power ISATM III-S Version 2.05 Move From Machine State Register X-form mfmsr RT 31 RT /// /// 83 / 0 6 11 16 21 31 RT 1 MSR The contents of the MSR are placed into register RT. This instruction is privileged. Special Registers Altered: None Chapter 4. Fixed-Point Processor 503 Version 2.05 504 Power ISATM III-S Version 2.05 Chapter 5. Storage Control 5.1 Overview. . . . . . . . . . . . . . . . . . . . 505 5.7.7.4 Relaxed Page Table Alignment 5.2 Storage Exceptions . . . . . . . . . . . 506 [Category: Server.Relaxed Page Table 5.3 Instruction Fetch . . . . . . . . . . . . . 506 Alignment] . . . . . . . . . . . . . . . . . . . . . . 522 5.3.1 Implicit Branch . . . . . . . . . . . . . . 506 5.7.8 Reference and Change Recording. . 5.3.2 Address Wrapping Combined with 522 Changing MSR Bit SF . . . . . . . . . . . . . 506 5.7.9 Storage and Virtual Page Class Key 5.4 Data Access . . . . . . . . . . . . . . . . . 506 Protection . . . . . . . . . . . . . . . . . . . . . . 524 5.5 Performing Operations 5.7.9.1 Virtual Page Class Key Protection Out-of-Order . . . . . . . . . . . . . . . . . . . . 506 524 5.6 Invalid Real Address. . . . . . . . . . . 507 5.7.9.2 Storage Protection, Address 5.7 Storage Addressing . . . . . . . . . . . 508 Translation Enabled. . . . . . . . . . . . . . . 525 5.7.1 32-Bit Mode . . . . . . . . . . . . . . . . 508 5.7.9.3 Storage Protection, Address 5.7.2 Virtualized Partition Memory (VPM) Translation Disabled . . . . . . . . . . . . . . 526 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . 508 5.8 Storage Control Attributes. . . . . . . 527 5.7.3 Real And Virtual Real Addressing 5.8.1 Guarded Storage . . . . . . . . . . . . 527 Modes . . . . . . . . . . . . . . . . . . . . . . . . . 508 5.8.1.1 Out-of-Order Accesses to Guarded 5.7.3.1 Hypervisor Offset Real Mode Storage . . . . . . . . . . . . . . . . . . . . . . . . 527 Address . . . . . . . . . . . . . . . . . . . . . . . . 509 5.8.2 Storage Control Bits . . . . . . . . . . 527 5.7.3.2 Offset Real Mode Address . . . 509 5.8.2.1 Storage Control Bit Restrictions . . 5.7.3.3 Storage Control Attributes for 528 Accesses in Real and Hypervisor Real 5.8.2.2 Altering the Storage Control Bits . . Addressing Modes . . . . . . . . . . . . . . . 510 528 5.7.3.3.1 Hypervisor Real Mode Storage 5.9 Storage Control Instructions . . . . . 529 Control . . . . . . . . . . . . . . . . . . . . . . . . 510 5.9.1 Cache Management Instructions 529 5.7.3.4 Virtual Real Mode Addressing 5.9.2 Synchronize Instruction . . . . . . . 529 Mechanism . . . . . . . . . . . . . . . . . . . . . 510 5.9.3 Lookaside Buffer 5.7.3.5 Storage Control Attributes for Management . . . . . . . . . . . . . . . . . . . . 529 Implicit Storage Accesses . . . . . . . . . . 511 5.9.3.1 SLB Management Instructions 530 5.7.4 Address Ranges Having Defined 5.9.3.2 Bridge to SLB Architecture [Cate- Uses . . . . . . . . . . . . . . . . . . . . . . . . . . 512 gory:Server.Phased-Out] . . . . . . . . . . . 536 5.7.5 Address Translation Overview. . 514 5.9.3.2.1 Segment Register 5.7.6 Virtual Address Generation . . . . 514 Manipulation Instructions . . . . . . . . . . . 536 5.7.6.1 Segment Lookaside Buffer (SLB) 5.9.3.3 TLB Management Instructions. 539 514 5.10 Page Table Update Synchronization 5.7.6.2 SLB Search . . . . . . . . . . . . . . 515 Requirements . . . . . . . . . . . . . . . . . . . 543 5.7.7 Virtual to Real Translation . . . . . 517 5.10.1 Page Table Updates . . . . . . . . 543 5.7.7.1 Page Table . . . . . . . . . . . . . . . 518 5.10.1.1 Adding a Page Table Entry . . 544 5.7.7.2 Storage Description 5.10.1.2 Modifying a Page Table Entry 545 Register 1 . . . . . . . . . . . . . . . . . . . . . . 520 5.10.1.3 Deleting a Page Table Entry . 546 5.7.7.3 Page Table Search . . . . . . . . . 520 5.1 Overview A program references storage using the effective address computed by the processor when it executes a Chapter 5. Storage Control 505 Version 2.05 Load, Store, Branch, or Cache Management instruc- If an implicit branch occurs, the results are boundedly tion, or when it fetches the next sequential instruction. undefined. The effective address is translated to a real address according to procedures described in Section 5.7.3, in Section 5.7.5 and in the following sections. The real 5.3.2 Address Wrapping Com- address is what is presented to the storage subsystem. bined with Changing MSR Bit SF For a complete discussion of storage addressing and If the current instruction is at effective address 232 - 4 effective address calculation, see Section 1.10 of Book and is an mtmsrd instruction that changes the contents I. of MSRSF, the effective address of the next sequential instruction is undefined. 5.2 Storage Exceptions Programming Note A storage exception results when the sequential execu- In the case described in the preceding paragraph, if tion model requires that a storage access be performed an interrupt occurs before the next sequential but the access is not permitted (e.g., is not permitted by instruction is executed, the contents of SRR0, or the storage protection mechanism), the access cannot HSRR0, as appropriate to the interrupt, are unde- be performed because the effective address cannot be fined. translated to a real address, or the access matches some tracking mechanism criteria (e.g., Data Address Breakpoint). 5.4 Data Access In certain cases a storage exception may result in the Data accesses are controlled by MSRDR. "restart" of (re-execution of at least part of) a Load or Store instruction. See Section 2.1 of Book II, and Sec- MSRDR=0 tion 6.6 in this Book. The effective address of the data is interpreted as described in Section 5.7.3. 5.3 Instruction Fetch MSRDR=1 Instructions are fetched under control of MSRIR. The effective address of the data is translated by the Address Translation mechanism described in MSRIR=0 Section 5.7.5. The effective address of the instruction is inter- preted as described in Section 5.7.3. 5.5 Performing Operations MSRIR=1 The effective address of the instruction is trans- Out-of-Order lated by the Address Translation mechanism An operation is said to be performed "in-order" if, at the described beginning in Section 5.7.5. time that it is performed, it is known to be required by the sequential execution model. An operation is said to be performed "out-of-order" if, at the time that it is per- 5.3.1 Implicit Branch formed, it is not known to be required by the sequential Explicitly altering certain MSR bits (using mtmsr[d]), or execution model. explicitly altering SLB entries, Page Table Entries, or Operations are performed out-of-order by the proces- certain System Registers (including the HRMOR, and sor on the expectation that the results will be needed possibly other implementation-dependent registers), by an instruction that will be required by the sequential may have the side effect of changing the addresses, execution model. Whether the results are really needed effective or real, from which the current instruction is contingent on everything that might divert the control stream is being fetched. This side effect is called an flow away from the instruction, such as Branch, Trap, implicit branch. For example, an mtmsrd instruction System Call, and Return From Interrupt instructions, that changes the value of MSRSF may change the and interrupts, and on everything that might change the effective addresses from which the current instruction context in which the instruction is executed. stream is being fetched. The MSR bits and System Registers (excluding implementation-dependent regis- Typically, the processor performs operations out-of- ters) for which alteration can cause an implicit branch order when it has resources that would otherwise be are indicated as such in Chapter 10. "Synchronization idle, so the operation incurs little or no cost. If subse- Requirements for Context Alterations" on page 585. quent events such as branches or interrupts indicate Implicit branches are not supported by the Power ISA. that the operation would not have been performed in the sequential execution model, the processor aban- 506 Power ISATM III-S Version 2.05 dons any results of the operation (except as described Programming Note below). In configurations supporting multiple partitions, In the remainder of this section, including its subsec- hypervisor software must ensure that a storage tions, "Load instruction" includes the Cache Manage- access by a program in one partition will not cause ment and other instructions that are stated in the a Checkstop or other system-wide event that could instruction descriptions to be "treated as a Load", and affect the integrity of other partitions (see Chapter similarly for "Store instruction". 2). For example, such an event could occur if a real A data access that is performed out-of-order may cor- address placed in a Page Table Entry or made respond to an arbitrary Load or Store instruction (e.g., a accessible to a partition using the Offset Real Load or Store instruction that is not in the instruction Mode Address mechanism (see Section 5.7.3.3) stream being executed). Similarly, an instruction fetch does not exist. that is performed out-of-order may be for an arbitrary instruction (e.g., the aligned word at an arbitrary loca- tion in instruction storage). Most operations can be performed out-of-order, as long as the machine appears to follow the sequential execu- tion model. Certain out-of-order operations are restricted, as follows. 1 Stores Stores are not performed out-of-order (even if the Store instructions that caused them were executed out-of-order). 1 Accessing Guarded Storage The restrictions for this case are given in Section 5.8.1.1. The only permitted side effects of performing an opera- tion out-of-order are the following. 1 A Machine Check or Checkstop that could be caused by in-order execution may occur out-of- order, except as described in Section 5.7.3.3.1 for the Real Mode Storage Control facility. 1 On implementations which support Reference and Change bits, these bits may be set as described in Section 5.7.8. 1 Non-Guarded storage locations that could be fetched into a cache by in-order fetching or execu- tion of an arbitrary instruction may be fetched out- of-order into that cache. 5.6 Invalid Real Address A storage access (including an access that is per- formed out-of-order; see Section 5.5) may cause a Machine Check if the accessed storage location con- tains an uncorrectable error or does not exist. In the case that the accessed storage location does not exist, the Checkstop state may be entered. See Section 6.5.2 on page 557. Chapter 5. Storage Control 507 Version 2.05 5.7 Storage Addressing Storage Control Overview Programming Note 1 Real address space size is 2m bytes, m60; see Treating the high-order 32 bits of the effective Note 1. address as zeros effectively truncates the 64-bit 1 Real page size is 212 bytes (4 KB). effective address to a 32-bit effective address such as would have been generated on a 32-bit imple- 1 Effective address space size is 264 bytes. mentation of the Power ISA. Thus, for example, the 1 An effective address is translated to a virtual ESID in 32-bit mode is the high-order four bits of address via the Segment Lookaside Buffer (SLB). this truncated effective address; the ESID thus lies - Virtual address space size is 2n bytes, in the range 0-15. When address translation is 65n78; see Note 2. enabled, these four bits would select a Segment - Segment size is 2s bytes, s=28 or 40. Register on a 32-bit implementation of the Power - 2n-40 number of virtual segments 2n-28; ISA. The SLB entries that translate these 16 ESIDs see Note 2. can be used to emulate these Segment Registers. - Virtual page size is 2p bytes, where 12p, and 2p is no larger than either the size of the big- gest segment or the real address space; a 5.7.2 Virtualized Partition Mem- size of 4KB, 64 KB, and an implementation- ory (VPM) Mode dependent number of other sizes are sup- ported; see Note 3. VPM mode enables the hypervisor to reassign all or - Segments contain pages of a single size or a part of a partition's memory transparently so that the mixture of 4KB and 64KB pages reassignment is not visible to the partition. When this is 1 A virtual address is translated to a real address via done, the partition's memory is said to be "virtualized." the Page Table. The VPM field in the LPCR enables VPM mode sepa- rately when address translation is enabled and when Notes: translation is disabled. 1. The value of m is implementation-dependent (sub- If the processor is not in hypervisor state, and either ject to the maximum given above). When used to address translation is enabled and VPM1=1, or address address storage, the high-order 60-m bits of the translation is disabled and VPM0=1, conditions that "60-bit" real address must be zeros. would have caused a Data Storage or an Instruction 2. The value of n is implementation-dependent (sub- Storage interrupt if the affected memory were not virtu- ject to the range given above). In references to 78- alized instead cause a Hypervisor Data Storage or a bit virtual addresses elsewhere in this Book, the Hypervisor Instruction Storage interrupt respectively. high-order 78-n bits of the "78-bit" virtual address Because the Hypervisor Data Storage and Hypervisor are assumed to be zeros. Instruction Storage interrupts always put the processor in hypervisor state, they permit the hypervisor to handle 3. The supported values of p for the larger virtual the condition if appropriate (e.g., to restore the contents page sizes are implementation-dependent (subject of a page that was reassigned), and to reflect it to the to the limitations given above). operating system's Data Storage or Instruction Stor- age interrupt handler otherwise. 5.7.1 32-Bit Mode When address translation is enabled, VPM mode has no effect on address translation. When address transla- The computation of the 64-bit effective address is inde- tion is disabled, addressing is controlled as specified in pendent of whether the processor is in 32-bit mode or Section 5.7.3. 64-bit mode. In 32-bit mode (MSRSF=0), the high-order 32 bits of the 64-bit effective address are treated as zeros for the purpose of addressing storage. This 5.7.3 Real And Virtual Real applies to both data accesses and instruction fetches. It Addressing Modes applies independent of whether address translation is enabled or disabled. This truncation of the effective When a storage access is an instruction fetch per- address is the only respect in which storage accesses formed when instruction address translation is dis- in 32-bit mode differ from those in 64-bit mode. abled, or if the access is a data access and data address translation is disabled, it is said to be per- formed in "real addressing mode" if VPM0=0 and the processor is not in hypervisor state. If the processor is in hypervisor state, the access is said to be performed 508 Power ISATM III-S Version 2.05 in "hypervisor real addressing mode" regardless of the Programming Note value of VPM0. If the processor is not in hypervisor state and VPM0=1, the access is said to be performed EA4:63-r should equal 60-r0. If this condition is satis- in "virtual real addressing mode." Storage accesses in fied, ORing the effective address with the offset real, hypervisor real, and virtual real addressing modes produces a result that is equivalent to adding the are performed in a manner that depends on the con- effective address and the offset. tents of MSRHV, LPES, VPM, VRMASD, HRMOR, If m<60, EA4:63-m and HRMOR0:59-m must be RMLS, and RMOR (see Chapter 2), and bit 0 of the zeros. effective address (EA0) as described below. Bit 1 of the effective address is ignored. Software must ensure that altering the HRMOR does not cause an implicit branch. MSRHV=1 1 If EA0=0, the Hypervisor Offset Real Mode Address mechanism, described in Section 5.7.3.1, 5.7.3.2 Offset Real Mode Address controls the access. If VPM0=0, MSRHV=0, and LPES1=1, the access is controlled by the contents of the Real Mode Limit 1 If EA0=1, bits 4:63 of the effective address are Selector and Real Mode Offset Register, as specified used as the real address for the access. below, and the set of storage locations accessible by code is referred to as the Real Mode Area (RMA). MSRHV=0 Real Mode Limit Selector (RMLS) 1 If LPES1=0, the access causes a storage excep- tion as described in Section 5.7.9.3. If bits 4:63 of effective address for the access are 1 If LPES1=1 and VPM0=0, the Offset Real Mode greater than or equal to the value (limit) repre- Address mechanism, described in Section 5.7.3.2, sented by the contents of the RMLR, the access controls the access. causes a storage exception (see Section 5.7.9.3). In this comparison, if m<60, bits 4:63-m of the 1 If LPES1=1 and VPM0=1, the Virtual Real Mode effective address may be ignored (i.e., treated as if Addressing mechanism, described in Section they were zeros), where the real address size sup- 5.7.3.4, controls the access. ported by the implementation is m bits. The sup- ported limit values are of the form 2j, where 12 j 5.7.3.1 Hypervisor Offset Real Mode 60. Subject to the preceding sentence, the num- ber and values of the limits supported are imple- Address mentation-dependent. If MSRHV = 1 and EA0 = 0, the access is controlled by Real Mode Offset Register (RMOR) the contents of the Hypervisor Real Mode Offset Regis- ter, as follows. If the access is permitted by the RMLR, bits 4:63 of the effective address for the access are ORed with Hypervisor Real Mode Offset Register (HRMOR) the 60-bit offset represented by the contents of the Bits 4:63 of the effective address for the access RMOR, and the low-order m bits of the 60-bit result are ORed with the 60-bit offset represented by the are used as the real address for the access. The contents of the HRMOR, and the 60-bit result is supported offset values are all values of the form used as the real address for the access. The sup- i×2s, where 0 i < 2k, and k and s are implementa- ported offset values are all values of the form i×2r, tion-dependent values having the properties that where 0 i < 2j, and j and r are implementation- 2s is the minimum limit value supported by the dependent values having the properties that 12 r implementation (i.e., the minimum value represent- 26 (i.e., the minimum offset granularity is 4 KB able by the contents of the RMLR) and k+s = m. and the maximum offset granularity is 64 MB) and j+r = m, where the real address size supported by Programming Note the implementation is m bits. The offset specified by the RMOR should be a non- zero multiple of the limit specified by the RMLS. If these registers are set thus, ORing the effective address with the offset produces a result that is equivalent to adding the effective address and the offset. (The offset must not be zero, because real page 0 contains the fixed interrupt vectors and real pages 1 and 2 may be used for implementation- specific purposes; see Section 5.7.4, "Address Ranges Having Defined Uses" on page 512.) Chapter 5. Storage Control 509 Version 2.05 5.7.3.3 Storage Control Attributes for facility to be treated as non-Guarded. If this restriction Accesses in Real and Hypervisor Real is violated, the results are undefined. Addressing Modes The facility does not apply to implicit accesses to the Page Table by the processor in performing address Storage accesses in hypervisor real addressing mode translation or in recording reference and change infor- are performed as though all of storage had the follow- mation. These accesses are performed as described in ing storage control attributes, except as modified by the Section 5.7.3.3. Real Mode Storage Control facility (see Section 5.7.3.3.1). (The storage control attributes are Programming Note defined in Book II.) The preceding capability can be used to improve 1 not Write Through Required the performance of hypervisor software that runs in 1 not Caching Inhibited, for instruction fetches hypervisor real addressing mode, by causing 1 not Caching Inhibited, for data accesses except accesses to instructions and data that occupy well- those caused by the Load/Store Caching Inhibited behaved storage to be treated as non-Guarded. instructions; Caching Inhibited, for data accesses See also the second paragraph of the Program- caused by the Load/Store Caching Inhibited ming Note in Section 5.7.3.3. instructions 1 Memory Coherence Required, for data accesses The statement in Section 5.5, that non-Guarded 1 Guarded storage locations may be fetched out-of-order into a cache only if they could be fetched into that Storage accesses in real addressing mode are per- cache by in-order execution, does not preclude the formed as though all of storage had the following stor- out-of-order fetching into the data cache of storage age control attributes. (Such accesses use the Offset locations that are treated as non-Guarded for Load/ Real Mode Address mechanism.) Store Caching-Inhibited instructions, because the effective Caching Inhibited value that could be used 1 not Write Through Required for an in-order data access to such a storage loca- 1 not Caching Inhibited tion is undefined and hence could be 0. 1 Memory Coherence Required, for data accesses 1 not Guarded Additionally, storage accesses in real or hypervisor real 5.7.3.4 Virtual Real Mode Addressing addressing modes are performed as though all storage Mechanism was not No-execute. If VPM0=1, MSRHV=0, LPES1=1, and MSRDR=0 or Programming Note MSRIR=0 as appropriate for the type of access, the access is said to be made in virtual real addressing Because storage accesses in real addressing mode and is controlled by the mechanism specified mode and hypervisor real addressing mode do not below. The set of storage locations accessible by use the SLB or the Page Table, accesses in these code is referred to as the Virtualized Real Mode Area modes bypass all checking and recording of infor- (VRMA). mation contained therein (e.g., storage protection checks that use information contained therein are In virtual real addressing mode, address translation, not performed, and reference and change informa- storage protection, and reference and change record- tion is not recorded). ing are handled as follows. 1 Address translation and storage protection are handled as if address translation were enabled, except that translation of effective addresses to vir- 5.7.3.3.1 Hypervisor Real Mode Storage Control tual addresses use the SLBE values in Figure 15 instead of the entry in the SLB corresponding to The Hypervisor Real Mode Storage Control facility pro- the ESID, bits 0:3 of the effective address are vides a means of specifying portions of real storage ignored (i.e. treated as if they were 0s), bits 4:63- that are treated as non-Guarded in hypervisor real m of the effective address may be ignored (where addressing mode (MSRHV PR=0b10, and MSRIR=0 or the real address size supported by the implemen- MSRDR=0, as appropriate for the type of access). The tation is m bits), and the Virtual Page Class Key remaining portions are treated as Guarded in hypervi- protection mechanism does not apply. sor real addressing mode. The means is a hypervisor resource (see Chapter 2), and may also be system- specific. When executing a Load/Store Caching Inhibited instruction, the specified storage location must not be in storage specified by the Real Mode Storage Control 510 Power ISATM III-S Version 2.05 Programming Note Programming Note The Virtual Page Class Key protection mecha- All accesses to the RMA are considered not nism does not apply because the authority Guarded. The G bit of the associated Page Table mask that an OS has set for application pro- Entry determines whether an access to the VRMA grams executing with address translation is Guarded. Therefore, if an instruction is fetched enabled may not be the same as the authority from the VRMA, a Hypervisor Instruction Storage mask required by the OS when address trans- interrupt will result if G=1 in the associated Page lation is disabled, such as when first entering Table Entry. an interrupt handler. 1 Reference and change recording are handled as if 5.7.3.5 Storage Control Attributes for address translation were enabled. Implicit Storage Accesses Field Value Implicit accesses to the Page Table by the processor in 36 performing address translation and in recording refer- ESID 0 ence and change information are performed as though V 1 the storage occupied by the Page Table had the follow- B 0b01 - 1 TB ing storage control attributes. VSID 0x0_01FF_FFFF 1 not Write Through Required Ks 0 1 not Caching Inhibited Kp undefined 1 Memory Coherence Required N 0 1 not Guarded L VRMASDL The definition of "performed" given in Book II applies C 0 also to these implicit accesses; accesses for perform- LP VRMASDLP ing address translation are considered to be loads in this respect, and accesses for recording reference and Figure 15. SLBE for VRMA change information are considered to be stores. These implicit accesses are ordered by the ptesync instruc- If the effective address is not less than 1 TB, a Hypervi- tion as described in Section 5.9.2. sor Data Segment or Hypervisor Instruction Segment interrupt may occur. Programming Note The C bit in Figure 15 is set to 0 because the imple- mentation-dependent lookaside information associ- ated with the VRMA is expected to be long-lived. See Section 5.9.3.1. Programming Note The 1 TB VSID 0x0_01FF_FFFF should not be used by the operating system for purposes other than mapping the VRMA when address translation is enabled. Programming Note Software should specify PTEB = 0b01 for all Page Table Entries that map the VRMA in order to be consistent with the values in Figure 15. Chapter 5. Storage Control 511 Version 2.05 5.7.4 Address Ranges Having Defined Uses The address ranges described below have uses that are defined by the architecture. 1 Fixed interrupt vectors Except for the first 256 bytes, which are reserved for software use, the real page beginning at real address 0x0000_0000_0000_0000 is either used for interrupt vectors or reserved for future interrupt vectors. 1 Implementation-specific use The two contiguous real pages beginning at real address 0x0000_0000_0000_1000 are reserved for implementation-specific purposes. 1 Offset Real Mode interrupt vectors The real pages beginning at the real address spec- ified by the HRMOR and RMOR are used similarly to the page for the fixed interrupt vectors. 1 Page Table A contiguous sequence of real pages beginning at the real address specified by SDR1 contains the Page Table. 512 Power ISATM III-S Version 2.05 Chapter 5. Storage Control 513 Version 2.05 5.7.5 Address Translation Overview The effective address (EA) is the address generated by the processor for an instruction fetch or for a data 64-bit Effective Address access. If address translation is enabled, this address is passed to the Address Translation mechanism, 64-s s-p p which attempts to convert the address to a real address ESID Page Byte which is then used to access storage. 0 63-s 64-s 63-p 64-p 63 The first step in address translation is to convert the effective address to a virtual address (VA), as described in Section 5.7.6. The second step, conver- sion of the virtual address to a real address (RA), is Segment Lookaside Buffer (SLB) described in Section 5.7.7. If the effective address cannot be translated, a storage SLBE0 ESID V B VSID KsKpNLC LP exception (see Section 5.2) occurs. Figure 16 gives an overview of the address translation process. SLBEn Effective Address 0 35 37 39 88 89 93 95 96 VSID0:77-s 78-s s-p p VSID Page Byte Lookup in SLB Virtual Page Number (VPN) 78-bit Virtual Address Figure 17. Translation of 64-bit effective address to 78 bit virtual address Virtual Address 5.7.6.1 Segment Lookaside Buffer (SLB) The Segment Lookaside Buffer (SLB) specifies the Lookup in mapping between Effective Segment IDs (ESIDs) and Page Table Virtual Segment IDs (VSIDs). The number of SLB entries is implementation-dependent, except that all implementations provide at least 32 entries. The contents of the SLB are managed by software, using the instructions described in Section 5.9.3.1. See Real Address Chapter 10. "Synchronization Requirements for Con- text Alterations" on page 585 for the rules that software must follow when updating the SLB. SLB Entry Figure 16. Address translation overview Each SLB entry (SLBE, sometimes referred to as a "segment descriptor") maps one ESID to one VSID. 5.7.6 Virtual Address Generation Figure 18 shows the layout of an SLB entry Conversion of a 64-bit effective address to a virtual address is done by searching the Segment Lookaside Buffer (SLB) as shown in Figure 17. 514 Power ISATM III-S Version 2.05 . - L||LP contains a value supported by the imple- mentation. ESID V B VSID KsKpNLC / LP - The page size selected by the L and LP fields 0 36 37 39 89 94 95 96 does not exceed the segment size selected by the B field. Bit(s) Name Description - If s=40, the following bits of the SLB entry 0:35 ESID Effective Segment ID contain 0s. 36 V Entry valid (V=1) or invalid (V=0) - ESID24:35 37:38 B Segment Size Selector - VSID38:49 0b00 - 256 MB (s=28) The bits in the above two items are ignored by 0b01 - 1 TB (s=40) the processor. 0b10 - reserved 0b11 - reserved The Class field of the SLB is used in conjunction with 39:88 VSID Virtual Segment ID the slbie and slbia instructions (see Section 5.9.3.1). 89 Ks Supervisor (privileged) state stor- "Class" refers to a grouping of SLB entries and imple- age key (see Section 5.7.9.2) mentation-specific lookaside information so that only 90 Kp Problem state storage key (See entries in a certain group need be invalidated and oth- Section 5.7.9.2.) ers might be preserved. The Class value assigned to 91 N No-execute segment if N=1 an implementation-specific lookaside entry derived 92 L Virtual page size selector bit 0. from an SLB entry must match the Class value of that 93 C Class SLB entry. The Class value assigned to an implemen- 95:96 LP Virtual page size selector bits 1:2. tation-specific lookaside entry that is not derived from an SLB entry (such as real mode address "transla- All other fields are reserved. B0 (SLBE37)is treated as a tions") is 0. reserved field. Software must ensure that the SLB contains at most Figure 18. SLB Entry one entry that translates a given effective address, and Instructions cannot be executed from a No-execute that if the SLB contains an entry that translates a given (N=1) segment. effective address, then any previously existing transla- tion of that effective address has been invalidated. An The L and LP bits specify the page size or sizes that attempt to create an SLB entry that violates this the segment may contain as shown in Figure 19. A requirement may cause a Machine Check. Mixed Page Size (MPS) segment is a segment that may contain 4 KB pages, 64 KB pages, or a mixture of Programming Note both. A Uniform Page Size (UPS) segment is a seg- It is permissible for software to replace the contents ment that must contain pages of only a single size. of a valid SLB entry without invalidating the transla- tion specified by that entry provided the specified Seg- restrictions are followed. See Chapter 10 Note 11. SLBEL||LP ment Virtual Page Size(s) Type 0b000 MPS 4 KB, 64 KB if PTEL LP specifies 64 KB page in MPS segment, or both sizes 5.7.6.2 SLB Search 0b101 UPS 64 KB if PTEL LP specifies 64 KB When the hardware searches the SLB, all entries are page in UPS segment tested for a match with the EA. For a match to exist, the additional UPS 2p bytes, where p > 12 and may following conditions must be satisfied for indicated values1 differ among SLBL||LP values fields in the SLBE. 1 The "additional values" of SLBL||LP are implementa- 1 V=1 tion-dependent, as are the corresponding virtual 1 ESID0:63-s=EA0:63-s, where the value of s is speci- page sizes. fied by the B field in the SLBE being tested If no match is found, the search fails. If one match is Figure 19. SLBLL||LP Encoding found, the search succeeds. If more than one match is found, one of the matching entries is used as if it were the only matching entry, or a Machine Check occurs. If the SLB search succeeds, the virtual address (VA) is formed from the EA and the matching SLB entry fields For each SLB entry, software must ensure the following as follows. requirements are satisfied. VA=VSID0:77-s || EA64-s:63 Chapter 5. Storage Control 515 Version 2.05 The Virtual Page Number (VPN) is bits 0:77-p of the virtual address. If the value of the virtual page size selector field in the matching SLBE is 0b000, then the value of p is the value specified in the PTE used to translate the virtual address (see Section 5.7.7.1); oth- erwise the value of p is the value specified in the virtual page size selector field in the matching SLBE. If SLBEN = 1, the N (No-execute) value used for the storage access is 1. If the SLB search fails, a segment fault occurs. This is an Instruction Segment exception or a Data Segment exception, depending on whether the effective address is for an instruction fetch or for a data access. 516 Power ISATM III-S Version 2.05 5.7.7 Virtual to Real Translation Conversion of a 78-bit virtual address to a real address is done by searching the Page Table as shown in Figure 20. HTABORG HTABSIZE 78-bit Virtual Address 2 44 13 5 78-p p // xxx.......xx000.00 /// Virtual Page Number (VPN) Byte 0 4 1718 45 59 63 78-p 77 28 39 Decode to Mask Hash Function (see Section 5.7.7.3) 0 27 0 2728 38 28 AND * If the Server.Relaxed Page Table Alignment category 28 is supported, low order HTABORG bits are not OR * necessarily zero; the OR block to the left is replaced with a full adder, and the carry out is added to bits HTABORG[4:17] to form RA[0:13] of the PTEG. Page Table 16 bytes PTEG 0 PTE0 PTE7 14 28 11 7 0000000 60-bit Real Address of Page Table Entry Group (PTEG) PTEG n 128 bytes Page Table Entry (PTE) 16 bytes B AVPN SW L H V pp / key ARPN LP key R C WIMG N pp 0 57 6162 63 0 1 2 4 44 52 54 5556 57 61 62 63 (ARPN||LP)0:59-p 60-p p 60-bit Real Address Byte Figure 20. Translation of 78-bit virtual address to 60-bit real address Chapter 5. Storage Control 517 Version 2.05 5.7.7.1 Page Table The Hashed Page Table (HTAB) is a variable-sized Page Table Entry data structure that specifies the mapping between vir- tual page numbers and real page numbers, where the Each Page Table Entry (PTE) maps one VPN to one real page number of a real page is bits 0:49 of the RPN. Figure 21 shows the layout of a PTE. This layout address of the first byte in the real page. The HTAB's is independent of the Endian mode of the processor. size can be any size 2n bytes where 18n46. The HTAB must be located in storage having the storage 0 57 61 62 63 control attributes that are used for implicit accesses to it B AVPN SW L H V (see Section 5.7.3.3). The starting address must be a pp / key ARPN LP key R C WIMG N pp multiple of its size unless the implementation supports 0 1 2 4 44 52 55 56 57 61 62 63 the Server.Relaxed Page Table Alignment category, in which case its starting address is a multiple of 218 Dword Bit(s) Name Description bytes (see Section 5.7.7.4). 0 0:1 B Segment Size The HTAB contains Page Table Entry Groups 0b00 - 256 MB (PTEGs). A PTEG contains 8 Page Table Entries 0b01 - 1 TB (PTEs) of 16 bytes each; each PTEG is thus 128 bytes 0b10 - reserved long. PTEGs are entry points for searches of the Page 0b11 - reserved Table. 2:56 AVPN Abbreviated Virtual Page See Section 5.10 for the rules that software must follow Number when updating the Page Table. 57:60 SW Available for software use 61 L Virtual page size Programming Note 0b0 - 4 KB 0b1 - greater than 4KB The Page Table must be treated as a hypervisor (large page) resource (see Chapter 2), and therefore must be placed in real storage to which only the hypervisor 62 H Hash function identifier has write access. Moreover, the contents of the 63 V Entry valid (V=1) or invalid Page Table must be such that non-hypervisor soft- (V=0) ware cannot modify storage that contains hypervi- 1 0 pp Page Protection bit 0 sor programs or data. 2:3 key KEY bits 0:1 4:43 ARPN Abbreviated Real Page Number 44:51 LP Large page size selector 52:54 key KEY bits 2:4 55 R Reference bit 56 C Change bit 57:60 WIMG Storage control bits 61 N No-execute page if N=1 62:63 pp Page protection bits 1:2 All other fields are reserved. Figure 21. Page Table Entry Programming Note The H bit in the Page Table entry should not be set to one unless the secondary Page Table search has been enabled. If p23, the Abbreviated Virtual Page Number (AVPN) field contains bits 0:54 of the VPN. Otherwise bits 0:77- p of the AVPN field contain bits 0:77-p of the VPN, and bits 78-p:54 of the AVPN field must be zeros and are ignored by the processor. 518 Power ISATM III-S Version 2.05 Programming Note Programming Note If p23, the AVPN field omits the low-order 23-p The processor often has implementation-depen- bits of the VPN. These bits are not needed in the dent lookaside buffers (e.g. TLBs and ERATs) used PTE, because the low-order 11 bits of the VPN are to cache translations of recently used storage always used in selecting the PTEGs to be searched addresses. Mapping virtual storage to large pages (see Section 5.7.7.3). may increase the effectiveness of such lookaside buffers, improving performance, because it is pos- On implementations that support a virtual address size sible for such buffers to translate a larger range of of only n bits, n<78, bits 0:77-n of the AVPN field must addresses, reducing the frequency that the Page be zeros. Table must be searched to translate an address. A virtual page is mapped to a sequence of 2p-12 contig- Instructions cannot be executed from a No-execute uous real pages such that the low-order p-12 bits of the (N=1) page. real page number of the first real page in the sequence are 0s. Page Table Size If PTEL=0, the virtual page size is 4KB, and ARPN con- catenated with LP (ARPN||LP) contains the page num- The number of entries in the Page Table directly affects ber of the real page that maps the virtual page performance because it influences the hit ratio in the described by the entry. Page Table and thus the rate of page faults. If the table is too small, it is possible that not all the virtual pages If PTEL=1, the virtual page size is specified by PTELP. that actually have real pages assigned can be mapped In this case, the contents of PTELP have the format via the Page Table. This can happen if too many hash shown in Figure 22. Bits labelled "r" are bits of the real collisions occur and there are more than 16 entries for page number. The page size specified by the non-r bits the same primary/secondary pair of PTEGs (when the of PTELP is implementation-dependent. secondary Page Table search is enabled) or more than r r r r _r r r 0 8 entries for the same primary PTEG (when the sec- r r r r _r r 01 ondary Page Table search is disabled). r r r r _r 011 r r r r _0111 . While this situation cannot be guaranteed not to occur r r r 0_1111 for any size Page Table, making the Page Table larger r r 01_1111 than the minimum size (see Section 5.7.7.2) will reduce r 011_1111 0111_1111 the frequency of occurrence of such collisions. Figure 22. Format of PTELP Programming Note If large pages are not used, it is recommended that There are at least 2 formats of PTELP that specify a the number of PTEGs in the Page Table be at least 64 KB page. One format specifies a 64 KB page con- half the number of real pages to be accessed. For tained in an MPS segment, and another specifies a 64 example, if the amount of real storage to be K page contained in a Uniform segment. accessed is 231 bytes (2 GB), then we have If L=1, the page size selected by the LP field must not 231-12=219 real pages. The minimum recom- exceed the segment size selected by the B field. Forms mended Page Table size would be 218 PTEGs, or of PTELP not supported by a given processor are 225 bytes (32 MB). treated as reserved values for that processor. The concatenation of the ARPN field and bits labeled "r" in the LP field contain the high-order bits of the real page number of the real page that maps the first 4KB of the virtual page described by the entry. The low-order p-12 bits of the real page number con- tained in the ARPN and LP fields must be 0s and are ignored by the processor. Programming Note The page size specified by a given PTELP format is at least 212+(8-c), where c is the number of r bits in the format. Chapter 5. Storage Control 519 Version 2.05 5.7.7.2 Storage Description Programming Note Register 1 Let n equal the virtual address size (in bits) sup- The Storage Description Register 1 (SDR1) register is ported by the implementation. If n<67, software shown in Figure 23. should set the HTABSIZE field to a value that does not exceed n-39. Because the high-order 78-n bits // HTABORG /// HTABSIZE of the VSID are assumed to be zeros, the hash 0 4 46 59 63 value used in the Page Table search will have the high-order 67-n bits either all 0s (primary hash; see Section 5.7.7.3) or all 1s (secondary hash). If Bits Name Description HTABSIZE > n-39, some of these hash value bits 4:45 HTABORG Real address of Page Table will be used to index into the Page Table, with the 59:63 HTABSIZE Encoded size of Page Table result that certain PTEGs will not be searched. All other fields are reserved. Example: Figure 23. SDR1 Suppose that the Page Table is 16,384 (214) 128-byte PTEGs, for a total size of 221 bytes (2 MB). A 14-bit SDR1 is a hypervisor resource; see Chapter 2. index is required. Eleven bits are provided from the The HTABORG field in SDR1 contains the high-order hash to start with, so 3 additional bits from the hash 42 bits of the 60-bit real address of the Page Table. must be selected. Thus the value in HTABSIZE must The Page Table is thus constrained to lie on a 218 byte be 3 and the value in HTABORG must have its low- (256 KB) boundary at a minimum. At least 11 bits from order 3 bits (bits 43:45 of SDR1) equal to 0. This the hash function (see Figure 20) are used to index into means that the Page Table must begin on a 23+11+7 = the Page Table. The minimum size Page Table is 256 221 = 2 MB boundary. KB (211 PTEGs of 128 bytes each). The Page Table can be any size 2n bytes where 5.7.7.3 Page Table Search 18n46. As the table size is increased, more bits are When the hardware searches the Page Table, the used from the hash to index into the table and the value accesses are performed as described in in HTABORG must have more of its low-order bits Section 5.7.3.3. equal to 0 unless the implementation supports the Server.Relaxed Page Table Alignment category. An outline of the HTAB search process is shown in Figure 20. If the implementation supports the The HTABSIZE field in SDR1 contains an integer giving Server.Relaxed Page Table Alignment category see the number of bits (in addition to the minimum of 11 Section 5.7.7.4. Up to two hash functions are used to bits) from the hash that are used in the Page Table locate a PTE that may translate the given virtual index. This number must not exceed 28. HTABSIZE is address. used to generate a mask of the form 0b00...011...1, which is a string of 28 - HTABSIZE 0-bits followed by a A 39-bit hash value is computed from the VPN. The string of HTABSIZE 1-bits. The 1-bits determine which value of s is the value specified in the SLBE that was additional bits (beyond the minimum of 11) from the used to generate the virtual address; the value of p hash are used in the index (see Figure 20). The num- used when computing the hash function is 12 if ber of low-order 0 bits in HTABORG must be greater SLBEL||LP =0b000, otherwise the value of p is the value than or equal to the value in HTABSIZE. specified in the SLBE. On implementations that support a real address size of 1. Primary Hash: only m bits, m<60, bits 0:59-m of the HTABORG field If s=28, the hash value is computed by Exclusive are treated as reserved bits, and software must set ORing VPN11:49 with (11+p0||VPN50:77-p) them to zeros. If s=40, the hash value is computed by Exclusive ORing the following three quantities: (VPN24:37 ||250), (0||VPN0:37), and (p-10||VPN38:77-p) The 60-bit real address of a PTEG is formed by concatenating the following values: 1 Bits 4:17 of SDR1 (the high-order 14 bits of HTABORG). 1 Bits 0:27 of the 39-bit hash value ANDed with the mask generated from bits 59:63 of SDR1 (HTABSIZE) and then ORed with bits 18:45 of SDR1 (the low-order 28 bits of HTABORG). 520 Power ISATM III-S Version 2.05 1 Bits 28:38 of the 39-bit hash value. The N (No-execute) value used for the storage access 1 Seven 0-bits. is the result of ORing the N bit from the matching PTE with the N bit from the SLB entry that was used to This operation identifies a particular PTEG, called translate the effective address. the "primary PTEG", whose eight PTEs will be tested. Programming Note 2. Secondary Hash: For segments that may contain a mixture of 4 KB If the secondary Page Table search is enabled and 64 KB pages (i.e. SLBEL||LP = 0b000), the (LPCRTC=0), perform the secondary hash function value of p used when searching the Page Table to as follows; otherwise do not perform step 2 and identify the PTEGs is specified to be 12. Since the proceed to step 3 below. segment may contain pages of size 4KB and 64 KB, the processor searches for PTEs specifying If s=28, the hash value is computed by taking the pages of either size, and the real address is formed ones complement of the Exclusive OR of VPN11:49 using a value of p specified by the matching PTE. with (11+p0||VPN50:77-p) If s=40, the hash value is computed by taking the If the Page Table search fails, a page fault occurs. This ones complement of the Exclusive OR of the fol- is an Instruction Storage exception or a Data Storage lowing three quantities: (VPN24:37 ||250), exception, depending on whether the effective address (0||VPN0:37), and (p-10||VPN38:77-p) is for an instruction fetch or for a data access. The N value used for the storage access is the N bit from the The 60-bit real address of a PTEG is formed by SLB entry that was used to translate the effective concatenating the following values: address. 1 Bits 4:17 of SDR1 (the high-order 14 bits of HTABORG). Programming Note 1 Bits 0:27 of the 39-bit hash value ANDed with the mask generated from bits 59:63 of SDR1 To obtain the best performance, Page Table Entries (HTABSIZE) and then ORed with bits 18:45 of should be allocated beginning with the first empty SDR1 (the low-order 28 bits of HTABORG). entry in the primary PTEG, or with the first empty 1 Bits 28:38 of the 39-bit hash value. entry in the secondary PTEG if the primary PTEG 1 Seven 0-bits. is full and the secondary Page Table search is enabled (LPCRTC=0). This operation identifies the "secondary PTEG". 3. As many as 8 PTEs in the primary PTEG and, if the secondary Page Table search is enabled, 8 Translation Lookaside Buffer PTEs in the secondary PTEG are tested to deter- Conceptually, the Page Table is searched by the mine if any translate the given virtual address. Let address relocation hardware to translate every refer- q = minimum(54, 77-p). For a match to exist, the ence. For performance reasons, the hardware usually following conditions must be satisfied, where keeps a Translation Lookaside Buffer (TLB) that holds SLBE is the SLBE used to form the virtual address. PTEs that have recently been used. The TLB is 1 PTEH=0 for the primary PTEG, 1 for the sec- searched prior to searching the Page Table. As a con- ondary PTEG sequence, when software makes changes to the Page 1 PTEV=1 Table it must perform the appropriate TLB invalidate 1 PTEB=SLBEB operations to maintain the consistency of the TLB with 1 PTEAVPN[0:q]=VA0:q the Page Table (see Section 5.10). 1 if PTEL=0 then SLBEL||LP =0b000 else PTELP specifies a page size Programming Notes specified by SLBEL||LP 1. Page Table Entries may or may not be cached If no match is found, the search fails. If one match in a TLB. is found, the search succeeds. If more than one 2. It is possible that the hardware implements match is found, one of the matching entries is used more than one TLB, such as one for data and as if it were the only matching entry, or a Machine one for instructions. In this case the size and Check occurs. shape of the TLBs may differ, as may the val- If the Page Table search succeeds, the real address ues contained therein. (RA) is formed by concatenating bits 0:59-p of 3. Use the tlbie or tlbia instruction to ensure that (ARPN||LP) from the matching PTE with bits 64-p:63 of the TLB no longer contains a mapping for a the effective address (the byte offset), where the p particular virtual page. value is the value specified by PTEL LP. RA=(ARPN || LP)0:59-p || EA64-p:63 Chapter 5. Storage Control 521 Version 2.05 5.7.7.4 Relaxed Page Table Align- The Reference bit is set to 1 if the corresponding ment [Category: Server.Relaxed Page access (load, store, or instruction fetch) is required by the sequential execution model and is per- Table Alignment] formed. Otherwise the Reference bit may be set to The Page Table can be aligned on any 218 byte (256 1 if the corresponding access is attempted, either KB) boundary regardless of the HTAB size. in-order or out-of-order, even if the attempt causes an exception. Section 5.7.7.2 describes the Storage Description Reg- ister, which includes the HTABORG field. That descrip- Change Bit tion generally applies except for the following The Change bit is set to 1 if a Store instruction is difference. As the Page Table size is increased beyond executed and the store is performed. Otherwise 256 KB, the value in HTABORG need not have more of the Change bit may be set to 1 if a Store instruc- its low-order bits equal to 0. Instead, (HTABORG || 180) tion is executed and the store is permitted by the is the real address of the start of the Page Table storage protection mechanism and, if the Store regardless of the Page Table size. instruction is executed out-of-order, the instruction A Page Table search is performed as described in Sec- would be required by the sequential execution tion 5.7.7.3 except the 60-bit real address of a PTEG model in the absence of the following kinds of for both the primary and, if the secondary Page Table interrupts: search is enabled, the secondary hash is formed by 1 system-caused interrupts (see Section 6.4 on concatenating the following values: page 550) 1 Bits 0:27 of the 39-bit appropriate primary or 1 Floating-Point Enabled Exception type Pro- secondary hash value ANDed with the mask gram interrupts when the processor is in an generated from bits 59:63 of SDR1 (HTAB- Imprecise mode. SIZE) and then added to the value of bits 4:45 of SDR1 (HTABORG). This part of the real Programming Note address differs from Section 5.7.7.2. A 64 KB virtual page in an MPS segment may be 1 Bits 28:38 of the 39-bit hash value. mapped by multiple PTEs. For each access of a 1 Seven 0-bits. virtual page, hardware may search the Page Table An outline of the PTEG real address computation is to update the R and C bits. If lookaside buffer infor- shown in Figure 20. mation for the virtual page already indicates that all such bits to be set have already been set in a PTE that maps the virtual page, hardware need not make an update. Consider the following sequence of events: 1. A virtual page is mapped by 2 PTEs A and B 5.7.8 Reference and Change and the R and C bits in both PTEs are 0. Recording 2. A Load instruction accesses the virtual page and the R bit is updated in PTE A. If address translation is enabled, Reference (R) and 3. A Load instruction accesses the virtual page Change (C) bits are updated in the Page Table Entry and the R bit is updated in PTE B. that is used to translate the virtual address. If the stor- 4. A Store instruction accesses the virtual page age operand of a Load or Store instruction crosses a and the C bit is updated in PTE B. virtual page boundary, the accesses to the components 5. The virtual page is paged out. Software must of the operand in each page are treated as separate examine both PTE A and B to get the state of and independent accesses to each of the pages for the the R and C bits for the virtual page. purpose of setting the Reference and Change bits. Furthermore, if in event 2, PTE A was not found, a Reference and Change bits are set by the processor as Data Storage interrupt or Hypervisor Data Storage described below. Setting the bits need not be atomic interrupt may occur. Subsequently, if in event 3 or with respect to performing the access that caused the 4, PTE B was not found, a Data Storage interrupt bits to be updated. An attempt to access storage may or Hypervisor Data Storage interrupt may occur. cause one or more of the bits to be set (as described below) even if the access is not performed. The bits are updated in the Page Table Entry if the new value would otherwise be different from the old value for the virtual page, as determined by examining either the Page Table Entry or any lookaside information for the virtual page (e.g., TLB) maintained by the processor. Reference Bit 522 Power ISATM III-S Version 2.05 not necessarily performed as an atomic read/modify/ Programming Note write of the affected bytes. Even though the execution of a Store instruction causes the Change bit to be set to 1, the store These Reference and Change bit updates are not nec- might not be performed or might be only partially essarily immediately visible to software. Executing a performed in cases such as the following. sync instruction ensures that all Reference and Change bit updates associated with address transla- 1 A Store Conditional instruction (stwcx. or tions that were performed, by the processor executing stdcx.) is executed, but no store is performed. the sync instruction, before the sync instruction is exe- 1 A Store String Word Indexed instruction cuted will be performed with respect to that processor (stswx) is executed, but the length is zero. before the sync instruction's memory barrier is created. There are additional requirements for synchronizing 1 The Store instruction causes a Data Storage Reference and Change bit updates in multiprocessor exception (for which setting the Change bit is systems; see Section 5.10, "Page Table Update Syn- not prohibited). chronization Requirements" on page 543. 1 The Store instruction causes an Alignment exception. Programming Note 1 The Page Table Entry that translates the virtual Because the sync instruction is execution synchro- address of the storage operand is altered such nizing, the set of Reference and Change bit that the new contents of the Page Table Entry updates that are performed with respect to the pro- preclude performing the store (e.g., the PTE is cessor executing the sync instruction before the made invalid, or the PP bits are changed). memory barrier is created includes all Reference and Change bit updates associated with instruc- For example, when executing a Store instruc- tions preceding the sync instruction. tion, the processor may search the Page Table for the purpose of setting the Change bit and If software refers to a Page Table Entry when then re-execute the instruction. When reexe- MSRDR=1, the Reference and Change bits in the asso- cuting the instruction, the processor may ciated Page Table Entry are set as for ordinary loads search the Page Table a second time. If the and stores. See Section 5.10 for the rules software Page Table Entry has meanwhile been altered, must follow when updating Reference and Change bits. by a program executing on another processor, the second search may obtain the new con- tents, which may preclude the store. Status of Access R C 1 A system-caused interrupt occurs before the Storage protection violation Acc1 No store has been performed. Out-of-order I-fetch or Load-type insn Acc No Out-of-order Store-type insn Figure 24 on page 523 summarizes the rules for setting Would be required by the sequential the Reference and Change bits. The table applies to execution model in the absence of each atomic storage reference. It should be read from system-caused or imprecise the top down; the first line matching a given situation interrupts3 Acc Acc1 2 applies. For example, if stwcx. fails due to both a stor- age protection violation and the lack of a reservation, All other cases Acc No the Change bit is not altered. In-order Load-type or Store-type insn, access not performed In the figure, the "Load-type" instructions are the Load Load-type insn Acc No instructions described in Books I, II, and III-S, eciwx, Store-type insn Acc Acc2 and the Cache Management instructions that are treated as Loads. The "Store-type" instructions are the Other in-order access Store instructions described in Books I, II, and III-S, I-fetch Yes No ecowx, and the Cache Management instructions that Ordinary Load, eciwx Yes No are treated as Stores. The "ordinary" Load and Store Other ordinary Store, ecowx, dcbz Yes Yes instructions are those described in Books I, II, and III-S. icbi, dcbt, dcbtst, dcbst, dcbf[l] Acc No "set" means "set to 1". "Acc" means that it is acceptable to set the bit. When the processor updates the Reference and 1 It is preferable not to set the bit. Change bits in the Page Table Entry, the accesses are 2 If C is set, R is also set unless it is already set. performed as described in Section 5.7.3.3, "Storage 3 For Floating-Point Enabled Exception type Pro- Control Attributes for Accesses in Real and Hypervisor gram interrupts, "imprecise" refers to the exception Real Addressing Modes" on page 510. The accesses mode controlled by MSRFE0 FE1. may be performed using operations equivalent to a Figure 24. Setting the Reference and Change bits store to a byte, halfword, word, or doubleword, and are Chapter 5. Storage Control 523 Version 2.05 5.7.9 Storage and Virtual Page Class Key protection mechanism has no effect on instruction fetches. Class Key Protection The storage and virtual page class key protection Key0 Key1 Key2 ... Key29 Key30 Key31 mechanism provides a means for selectively granting 0 2 4 6 58 60 62 instruction fetch access, granting read access, granting read/write access, and prohibiting access to areas of Figure 25. Authority Mask Register (AMR) storage based on a number of control criteria. The contents of the AMR are as follows. The operation of the protection mechanism depends on one or more of the following conditions. Bit Description - the state of MSR bits HV, IR,DR, PR 0:1 Access mask for class number 0 - the value of the key bits in the associated SLB 2:3 Access mask for class number 1 entry ... - the values of the page protection and key bits 2n:2n+1 Access mask for class number n in the associated PTE ... - the contents of the Authority Mask Register 62:63 Access mask for class number 31 When translation is enabled for an access, the access is permitted if and only if the access is permitted by the The access mask for each class defines the access virtual page class key protection (see Section 5.7.9.1) permissions used in conjunction with load and store and the storage protection mechanism (see operations corresponding to page table entries contain- Section 5.7.9.2). If an instruction fetch is not permitted, ing a KEY field value equal to the class number. The an Instruction Storage exception is generated. If a data access permissions associated with each class are access is not permitted, a Data Storage exception is defined as follows, where AMR2n and AMR2n+1 refer to generated. (See Section 5.2) the first and second bits of the of the access mask cor- responding to class number n. Unless otherwise indicated, references to "storage pro- tection mechanism" or "protection mechanism" - An access caused by a Store instruction is throughout the Books refer to both the Storage Protec- permitted if AMR2n=0b0; otherwise the access tion mechansm and the Virtual Page Class Key Protec- is not permitted. tion mechanism. - An access caused by a Load instruction is When address translation is enabled, a protection permitted if AMR2n+1=0b0; otherwise the domain is a range of unmapped effective addresses, a access is not permitted. virtual page, or a segment. When address translation is disabled and LPES1=1 there are two protection Programming Note domains: the set of effective addresses that are less If translation is disabled for a given access, the than the value specified by the RMLS, and all other access is not affected by the Virtual Page Class effective addresses. When address translation is dis- Key protection mechanism even if the access is abled and LPES1=0 the entire effective address space made in virtual real addressing mode. comprises a single protection domain. A protection boundary is a boundary between protection domains. 5.7.9.1 Virtual Page Class Key Protec- tion The Virtual Page Class Key protection mechanism pro- vides the means to assign virtual pages to one of 32 classes, and to modify access permissions for each class quickly by modifying the Authority Mask Register (AMR) shown in Figure 25. The access permissions associated with the Virtual Page Class Key protection mechanism apply only to load and store operations when address translation is enabled. The Virtual Page 524 Power ISATM III-S Version 2.05 Programming Note The Virtual Page Class Key protection mechanism replaces the Data Address Compare mechanism that was defined in versions of the architecture that precede Version 2.04 (e.g., the two facilities use some of the same processor resources, as described below). However, the Virtual Page Class Key protection mechanism can be used to emulate the Data Address Compare mechanism. Moreover, programs that use the Data Address Compare mechanism can be modified in a manner such that they will work correctly both on processors that comply with versions of the archi- tecture that precede Version 2.04 (and hence implement the Data Address Compare mechanism) and on processors that comply with Version 2.04 of the architecture or with any subsequent version (and hence instead implement the Virtual Page Class Key protection mechanism). The technique takes advantage of the facts that the AMR has the same SPR number as the Data Address Compare mechanism's ACCR (Address Compare Control Register), that KEY4 occupies the same bit in the PTE as the Data Address Compare mechanism's AC (Address Compare) bit, and that the definition of ACCR62:63 is very similar to the definition of each even-odd pair of AMR bits. The technique is as follows, where PTE1 refers to doubleword 1 of the PTE. - Set bits 2:3 and 62:63 of SPR 29 (which is also be used for any virtual pages for which it either the ACCR or the AMR) to x, where x is is desired that the Virtual Page Class Key the desired 2-bit value for controlling Data mechanism permit all accesses. Do not use Address Compare matches, and set bits 0:1 to PTEKEY =31. 0s. - When a Data Storage interrupt occurs, if - Set PTE154 (which is either the AC bit or DSISR42=1 then ignore the interrupt for KEY4) to the same value that the AC bit would Cache Management instructions other than be set to, and set PTE12:3 (which are either dcbz. (These instructions can cause a virtual RPN bits, that correspond to a real address page class key protection violation but cannot size larger than the size implemented by any cause a Data Address Compare match.) Oth- processor that implements the Data Address erwise treat the interrupt as if a Data Address Compare mechanism, or KEY0:1) and Compare match had occurred. (Note: Cases PTE152:53 (which are either reserved bits or for which it is undefined whether a Data KEY2:3) to 0s. Address Compare match occurs do not nec- essarily cause a virtual page class key protec- - Use PTEKEY values 0 and 1 only for purposes tion violation.) of emulating the Data Address Compare mechanism, except that PTEKEY value 0 may 5.7.9.2 Storage Protection, Address 2. For any access except an instruction fetch that is Translation Enabled not permitted by rule 1, a "Key" value is computed using the following formula: When address translation is enabled, the protection Key 1 (Kp & MSRPR) | (Ks & ¬MSRPR) mechanism is controlled both by virtual page class key Using the computed Key, Figure 26 is applied. An protection (see Section 5.7.9.1) and the following. instruction fetch is permitted for any entry in the 1 MSRPR, which distinguishes between supervisor figure except "no access". A load is permitted for (privileged) state and problem state 1 Ks and Kp, the supervisor (privileged) state and problem state storage key bits in the SLB entry used to translate the effective address 1 PP, page protection bits 0:2 in the Page Table Entry used to translate the effective address 1 For instruction fetches only: - the N (No-execute) value used for the access (see Sections 5.7.6.1 and 5.7.7.3) - PTEG, the G (Guarded) bit in the Page Table Entry used to translate the effective address Using the above values, the following rules are applied. 1. For an instruction fetch, the access is not permit- ted if the N value is 1 or if PTEG=1. Chapter 5. Storage Control 525 Version 2.05 any entry except "no access". A store is permitted Programming Note only for entries with "read/write". The comparison described in note 1 in Figure 27 Key PP Access Authority ignores bits 0:3 of the effective address and may ignore bits 4:63-m; see Section 5.7.3. 0 000 read/write 0 001 read/write 0 010 read/write 0 011 read only 0 110 read only 1 000 no access 1 001 read only 1 010 read/write 1 011 read only 1 110 no access All PP encodings not shown above are reserved. The results of using reserved PP encodings are bound- edly undefined. Figure 26. PP bit protection states, address translation enabled 5.7.9.3 Storage Protection, Address Translation Disabled When address translation is disabled, the protection mechanism is controlled by the following (see Chapter 2 and Section 5.7.3, "Real And Virtual Real Addressing Modes"). 1 LPES1, which distinguishes between the two modes of accessing storage using the LPAR facil- ity 1 MSRHV, which distinguishes between hypervisor state and other privilege states 1 RMLS, which specifies the real mode limit value Using the above values, Figure 27 is applied. The access is permitted for any entry in the figure except "no access". LPES1 HV Access Authority 0 0 no access 0 1 read/write 1 0 read/write or no access1 1 1 read/write 1 If VPM0=1, the access authority is read/write. If VPM0=0 and the effective address for the access is less than the value specified by the RMLS, the access authority is read/write; otherwise the access is not permitted. Figure 27. Protection states, address translation disabled 526 Power ISATM III-S Version 2.05 5.8 Storage Control Attributes This section describes aspects of the storage control 5.8.1.1 Out-of-Order Accesses to attributes that are relevant only to privileged software Guarded Storage programmers. The rest of the description of storage control attributes may be found in Section 1.6 of Book II In general, Guarded storage is not accessed out-of- and subsections. order. The only exceptions to this rule are the following. Load Instruction 5.8.1 Guarded Storage If a copy of any byte of the storage operand is in a Storage is said to be "well-behaved" if the correspond- cache then that byte may be accessed in the cache or ing real storage exists and is not defective, and if the in main storage. effects of a single access to it are indistinguishable from the effects of multiple identical accesses to it. Data Instruction Fetch and instructions can be fetched out-of-order from well- If MSRHV IR=0b10 then an instruction may be fetched if behaved storage without causing undesired side any of the following conditions are met. effects. 1. The instruction is in a cache. In this case it may be Storage is said to be Guarded if any of the following fetched from the cache or from main storage. conditions is satisfied. 2. The instruction is in a real page from which an 1 MSR bit IR or DR is 1 for instruction fetches or instruction has previously been fetched, except data accesses respectively, and the G bit is 1 in that if that previous fetch was based on condition 1 the relevant Page Table Entry. then the previously fetched instruction must have 1 MSR bit IR or DR is 0 for instruction fetches or been in the instruction cache. data accesses respectively, MSRHV=1, and the 3. The instruction is in the same real page as an storage is outside the range(s) specified by the instruction that is required by the sequential execu- Real Mode Storage Control facility (see tion model, or is in the real page immediately fol- Section 5.7.3.3.1). lowing such a page. In general, storage that is not well-behaved should be Guarded. Because such storage may represent a con- Programming Note trol register on an I/O device or may include locations Software should ensure that only well-behaved that do not exist, an out-of-order access to such stor- storage is copied into a cache, either by accessing age may cause an I/O device to perform unintended as Caching Inhibited (and Guarded) all storage that operations or may result in a Machine Check. may not be well-behaved, or by accessing such storage as not Caching Inhibited (but Guarded) and The following rules apply to in-order execution of Load referring only to cache blocks that are well- and Store instructions for which the first byte of the behaved. storage operand is in storage that is both Caching Inhibited and Guarded. If a real page contains instructions that will be exe- cuted when MSRIR=0 and MSRHV=1, software 1 Load or Store instruction that causes an atomic should ensure that this real page and the next real access page contain only well-behaved storage (or that the If any portion of the storage operand has been Real Mode Storage Control facility specifies that accessed and an External, Decrementer, Hypervi- this real page is not Guarded). sor Decrementer, or Imprecise mode Floating- Point Enabled exception is pending, the instruction completes before the interrupt occurs. 5.8.2 Storage Control Bits 1 Load or Store instruction that causes an Alignment When address translation is enabled, each storage exception, or that causes a Data Storage excep- access is performed under the control of the Page tion for reasons other than Data Address Break- Table Entry used to translate the effective address. point match. Each Page Table Entry contains storage control bits The portion of the storage operand that is in Cach- that specify the presence or absence of the corre- ing Inhibited and Guarded storage is not accessed. sponding storage control for all accesses translated by the entry as shown in Figure 28. (The corresponding rules for instructions that cause a Data Address Breakpoint match are given in Section 8.1.2.) Chapter 5. Storage Control 527 Version 2.05 At any given time, the value of the W bit must be the same for all accesses to a given real page. Bit Storage Control Attribute W1 0 - not Write Through Required 5.8.2.2 Altering the Storage Control 1 - Write Through Required Bits I 0 - not Caching Inhibited When changing the value of the I bit for a given real 1 - Caching Inhibited page from 0 to 1, software must set the I bit to 1 and M2 0 - not Memory Coherence Required then flush all copies of locations in the page from the 1 - Memory Coherence Required caches using dcbf[l] and icbi before permitting any G 0 - not Guarded other accesses to the page. 1 - Guarded When changing the value of the W bit for a given real 1 Support for the 1 value of the W bit is optional. page from 0 to 1, software must ensure that no proces- Implementations that do not support the 1 value sor modifies any location in the page until after all cop- treat the bit as reserved and assume its value to ies of locations in the page that are considered to be be 0. modified in the data caches have been copied to main 2 [Category: Memory Coherence] Support for the 0 storage using dcbst or dcbf[l] value of the M bit is optional, implementations that do not support the 0 value assume the value of the Programming Note bit to be 1, and may either preserve the value of It is recommended that dcbf be used, rather than the bit or write it as 1. dcbfl, when changing the value of the I or W bit Figure 28. Storage control bits from 0 to 1. (dcbfl would have to be executed on all processors for which the contents of the data cache When address translation is enabled, instructions are may be inconsistent with the new value of the bit, not fetched from storage for which the G bit in the Page whereas, if the M bit for the page is 1, dcbf need be Table Entry is set to 1; see Section 5.7.9. executed on only one processor in the system.) When address translation is disabled, the storage con- trol attributes are implicit; see Section 5.7.3.3. When changing the value of the M bit for a given real page, software must ensure that all data caches are In Section 5.8.2.1 and 5.8.2.2, "access" includes consistent with main storage. The actions required to accesses that are performed out-of-order, and refer- do this to are system-dependent. ences to W, I, M, and G bits include the values of those bits that are implied when address translation is dis- Programming Note abled. For example, when changing the M bit in some directory-based systems, software may be required Programming Note to execute dcbf[l] on each processor to flush all In a uniprocessor system in which only the proces- storage locations accessed with the old M value sor has caches, correct coherent execution does before permitting the locations to be accessed with not require the processor to access storage as the new M value. Memory Coherence Required, and accessing stor- age as not Memory Coherence Required may give Additional requirements for changing the storage con- better performance. trol bits in the Page Table are given in Section 5.10. 5.8.2.1 Storage Control Bit Restrictions All combinations of W, I, M, and G values are permitted except those for which both W and I are 1. Programming Note If an application program requests both the Write Through Required and the Caching Inhibited attributes for a given storage location, the operating system should set the I bit to 1 and the W bit to 0. At any given time, the value of the I bit must be the same for all accesses to a given real page. 528 Power ISATM III-S Version 2.05 5.9 Storage Control Instructions 5.9.1 Cache Management Instructions This section describes aspects of cache management delayed Machine Check interrupt or a delayed Check- that are relevant only to privileged software program- stop. mers. Each implementation provides an efficient means by For a dcbz instruction that causes the target block to which software can ensure that all blocks that are con- be newly established in the data cache without being sidered to be modified in the data cache have been fetched from main storage, the processor need not ver- copied to main storage before the processor enters any ify that the associated real address is valid. The exist- power conserving mode in which data cache contents ence of a data cache block that is associated with an are not maintained. invalid real address (see Section 5.6) can cause a 5.9.2 Synchronize Instruction The Synchronize instruction is described in respect to the processor executing the ptesync Section 3.4.3 of Book II, but only at the level required instruction, before any implicit accesses to the by an application programmer (sync with L=0 or L=1). affected Page Table Entries, by such Page Table This section describes properties of the instruction that searches, are performed with respect to that pro- are relevant only to operating system and hypervisor cessor. software programmers. This variant of the Synchronize 1 In conjunction with the tlbie and tlbsync instruc- instruction is designated the Page Table Entry sync tions, the ptesync instruction provides an ordering and is specified by the extended mnemonic ptesync function for TLB invalidations and related storage (equivalent to sync with L=2). accesses on other processors as described in the The ptesync instruction has all of the properties of tlbsync instruction description on page 542. sync with L=0 and also the following additional proper- ties. Programming Note 1 The memory barrier created by the ptesync For instructions following a ptesync instruc- instruction provides an ordering function for the tion, the memory barrier need not order implicit storage accesses associated with all instructions storage accesses for purposes of address that are executed by the processor executing the translation and reference and change record- ptesync instruction and, as elements of set A, for ing. all Reference and Change bit updates associated The functions performed by the ptesync with additional address translations that were per- instruction may take a significant amount of formed, by the processor executing the ptesync time to complete, so this form of the instruction instruction, before the ptesync instruction is exe- should be used only if the functions listed cuted. The applicable pairs are all pairs ai,bj in above are needed. Otherwise sync with L=0 which bj is a data access and ai is not an instruc- should be used (or sync with L=1, or eieio, if tion fetch. appropriate). 1 The ptesync instruction causes all Reference and Section 5.10, "Page Table Update Synchroni- Change bit updates associated with address trans- zation Requirements" on page 543 gives lations that were performed, by the processor exe- examples of uses of ptesync. cuting the ptesync instruction, before the ptesync instruction is executed, to be performed with respect to that processor before the ptesync instruction's memory barrier is created. 5.9.3 Lookaside Buffer 1 The ptesync instruction provides an ordering func- Management tion for all stores to the Page Table caused by All implementations have a Segment Lookaside Buffer Store instructions preceding the ptesync instruc- (SLB). For performance reasons, most implementa- tion with respect to searches of the Page Table tions also have implementation-specific lookaside infor- that are performed, by the processor executing the mation that is used in address translation. This ptesync instruction, after the ptesync instruction lookaside information may be: a Translation Lookaside completes. Executing a ptesync instruction Buffer (TLB) which is a cache of recently used Page ensures that all such stores will be performed, with Chapter 5. Storage Control 529 Version 2.05 Table Entries (PTEs); a cache of recently used transla- Programming Note tions of effective addresses to real addresses; etc.; or any combination of these. Lookaside information, The function of all the instructions described in including the SLB, is managed using the instructions Sections 5.9.3.1 - 5.9.3.3 is independent of described in the subsections of this section. whether address translation is enabled or disabled. Lookaside information derived from PTEs is not neces- For a discussion of software synchronization sarily kept consistent with the Page Table. When soft- requirements when invalidating SLB and TLB ware alters the contents of a PTE, in general it must entries, see Chapter 10. also invalidate all corresponding implementation-spe- cific lookaside information; exceptions to this rule are described in Section 5.10.1.2. 5.9.3.1 SLB Management Instructions The effects of the slbie, slbia, and TLB Management Programming Note instructions on address translations, as specified in Accesses to a given SLB entry caused by the Sections 5.9.3.1 and 5.9.3.3 for the SLB and TLB instructions described in this section obey the respectively, apply to all implementation-specific looka- sequential execution model with respect to the con- side information that is used in address translation. tents of the entry and with respect to data depen- Unless otherwise stated or obvious from context, refer- dencies on those contents. That is, if an instruction ences to SLB entry invalidation and TLB entry invalida- sequence contains two or more of these instruc- tion elsewhere in the Books apply also to all tions, when the sequence has completed, the final implementation-specific lookaside information that is state of the SLB entry and of General Purpose derived from SLB entries and PTEs respectively. Registers is as if the instructions had been exe- The tlbia instruction is optional. However, all imple- cuted in program order. mentations provide a means by which software can However, software synchronization is required in invalidate all implementation-specific lookaside infor- order to ensure that any alterations of the entry mation that is derived from PTEs. take effect correctly with respect to address transla- Implementation-specific lookaside information that con- tion; see Chapter 10. tains translations of effective addresses to real addresses may include "translations" that apply in real addressing mode. Because such "translations" are affected by the contents of the LPCR, RMOR, and HRMOR, when software alters the contents of these registers it must also invalidate the corresponding implementation-specific lookaside information. Soft- ware can invalidate all such lookaside information by using the slbia instruction with IH=0b000. However, better performance will likely be observed if other appropriate IH values are used to limit the amount of lookaside information invalidated. All implementations that have such lookaside informa- tion provide a means by which software can invalidate all such lookaside information. For simplicity, elsewhere in the Books it is assumed that the TLB exists. Programming Note Because the instructions used to manage imple- mentation-specific lookaside information that is derived from PTEs may be changed in a future ver- sion of the architecture, it is recommended that software "encapsulate" uses of the TLB Manage- ment instructions into subroutines. 530 Power ISATM III-S Version 2.05 SLB Invalidate Entry X-form If this instruction is executed in 32-bit mode, (RB)0:31 must be zeros. slbie RB This instruction is privileged. 31 /// /// RB 434 / Special Registers Altered: 0 6 11 16 21 31 None ea0:35 1 (RB)0:35 Programming Note if, for SLB entry that translates slbie does not affect SLBs on other processors. or most recently translated ea, entry_class = (RB)36 and entry_seg_size = size specified in (RB)37:38 Programming Note then for SLB entry (if any) that translates ea The reason the class value specified by slbie must SLBEV 1 0 be the same as the Class value that is or was in the all other fields of SLBE 1 undefined else relevant SLB entry is that the processor may use s 1 log_base_2(entry_seg_size) these values to optimize invalidation of implemen- esid 1 (RB)0:63-s tation-specific lookaside information used in u <- undefined 1-bit value address translation. If the value specified by slbie if u then differs from the value that is or was in the relevant if an SLB entry translates esid SLB entry, these optimizations may produce incor- SLBEV 1 0 rect results. (An example of implementation-spe- all other fields of SLBE 1 undefined cific address translation lookaside information is Let the Effective Address (EA) be any EA for which the set of recently used translations of effective EA0:35 = (RB)0:35. Let the class be (RB)36. Let the seg- addresses to real addresses that some processors ment size be equal to the segment size specified in maintain in an Effective to Real Address Translation (RB)37:38; the allowed values of (RB)37:38, and the cor- (ERAT) lookaside buffer.) respondence between the values and the segment When switching tasks in certain cases, it may be size, are the same as for the B field in the SLBE (see advantageous to preserve some implementation- Figure 18 on page 515). specific lookaside entries while invalidating others. The class value and segment size must be the same as The IH=0b001 invalidation hint of the slbia instruc- the class value and segment size in the SLB entry that tion can be used for this purpose if SLB class val- translates the EA, or the values that were in the SLB ues are appropriately assigned, i.e. a class value of entry that most recently translated the EA if the transla- 0 gives the hint that the entry should be preserved tion is no longer in the SLB; if these values are not the and a class value of 1 indicates the entry must be same, it is implementation-dependent whether the SLB invalidated. Also, it is advantageous to assign a entry (or implementation-dependent translation infor- class value of 1 to entries that need to be invali- mation) that translates the EA is invalidated, and the dated via an slbie instruction while preserving next paragraph need not apply. implementation-specific lookaside entries that are not derived from an SLB entry since such entries If the SLB contains only a single entry that translates are assigned a class value of 0. the EA, then that is the only SLB entry that is invali- dated, except that it is implementation-dependent The Move To Segment Register instructions (see whether an implementation-specific lookaside entry for Section 5.9.3.2.1) create SLB entries in which the a real mode address "translation" is invalidated. If the Class value is 0. SLB contains more than one such entry, then zero or more such entries are invalidated, and similarly for any Programming Note implementation-specific lookaside information used in address translation; additionally, a machine check may The B value in register RB may be needed for inval- occur. idating ERAT entries corresponding to the transla- tion being invalidated. SLB entries are invalidated by setting the V bit in the entry to 0, and the remaining fields of the entry are set to undefined values. The processor ignores the contents of RB listed below and software must set them to 0s. - (RB)37 - (RB)39:63 - If s = 40, (RB)24:35 Chapter 5. Storage Control 531 Version 2.05 SLB Invalidate All X-form Programming Note slbia IH The defined values for IH are as follows. 0b000 All ERAT entries are invalidated. (This 31 // IH /// /// 498 / value is not a hint.) This value should be 0 6 8 11 16 21 31 used by the hypervisor when relocating itself (i.e. when modifying the HRMOR) or for each SLB entry except SLB entry 0 when reconfiguring real storage. SLBEV 1 0 all other fields of SLBE 1 undefined 0b001 Preserve ERAT entries with a Class value of 0. This value should be used by an For all SLB entries except SLB entry 0, the V bit in the operating system when switching tasks in entry is set to 0, making the entry invalid, and the certain cases; for example, if SLBEC=0 is remaining fields of the entry are set to undefined val- used for SLB translations shared between ues. SLB entry 0 is not altered. the tasks. On implementations that have implementation-specific 0b010 Preserve ERAT entries created when lookaside information for effective to real address trans- MSRIR/DR=0. This value should generally lations, the IH field provides a hint which can be used to be used by an operating system when selectively invalidate entries in such lookaside informa- switching tasks. tion. The defined values for IH are as follows. 0b110 Preserve ERAT entries created when 0b000 All implementation-specific lookaside informa- MSRHV=1 and MSRIR/DR=0. This value tion is invalidated. (This value is not a hint.) should be used by the hypervisor when 0b001 Preserve implementation-specific lookaside switching partitions. information with a Class value of 0. All other values are reserved. If the IH field con- 0b010 Preserve implementation-specific lookaside tains a reserved value, the hint provided by the information created when MSRIR/DR=0. instruction is undefined. 0b110 Preserve implementation-specific lookaside information created when MSRHV=1, Programming Note MSRPR=0, and MSRIR/DR=0. slbia serves as both a basic and an extended mne- monic. The Assembler will recognize an slbia All other values are reserved. mnemonic with one operand as the basic form, and Implementation specific lookaside information for which an slbia mnemonic with no operand as the preservation is not requested must be invalidated. extended form. In the extended form the IH oper- Implementation specific lookaside information for which and is omitted and assumed to be 0. preservation is requested may be invalidated. This instruction is privileged. Special Registers Altered: None Programming Note slbia does not affect SLBs on other processors. Programming Note If slbia is executed when instruction address trans- lation is enabled, software can ensure that attempt- ing to fetch the instruction following the slbia does not cause an Instruction Segment interrupt by plac- ing the slbia and the subsequent instruction in the effective segment mapped by SLB entry 0. (The preceding assumes that no other interrupts occur between executing the slbia and executing the subsequent instruction.) 532 Power ISATM III-S Version 2.05 SLB Move To Entry X-form Programming Note slbmte RS,RB The reason slbmte cannot be used to invalidate an SLB entry is that it does not necessarily affect 31 RS /// RB 402 / implementation-specific address translation looka- 0 6 11 16 21 31 side information. slbie (or slbia) must be used for this purpose. The SLB entry specified by bits 52:63 of register RB is loaded from register RS and from the remainder of reg- ister RB. The contents of these registers are inter- preted as shown in Figure 29. RS B VSID KsKpNLC 0 LP 0s 0 2 52 57 58 60 63 RB ESID V 0s index 0 36 37 52 63 RS0:1 B RS2:51 VSID RS52 Ks RS53 Kp RS54 N RS55 L RS56 C RS57 must be 0b0 RS58:59 LP RS60:63 must be 0b0000 RB0:35 ESID RB36 V RB37:51 must be 0b000 || 0x000 RB52:63 index, which selects the SLB entry Figure 29. GPR contents for slbmte On implementations that support a virtual address size of only n bits, n<78, (RS)0:77-n must be zeros. (RS)57 and(RS)60:63 must be ignored by the processor. High-order bits of (RB)52:63 that correspond to SLB entries beyond the size of the SLB provided by the implementation must be zeros. If this instruction is executed in 32-bit mode, (RB)0:31 must be zeros (i.e., the ESID must be in the range 0- 15). This instruction cannot be used to invalidate an SLB entry. This instruction is privileged. Special Registers Altered: None Chapter 5. Storage Control 533 Version 2.05 SLB Move From Entry VSID X-form SLB Move From Entry ESID X-form slbmfev RT,RB slbmfee RT,RB 31 RT /// RB 851 / 31 RT /// RB 915 / 0 6 11 16 21 31 0 6 11 16 21 31 If the SLB entry specified by bits 52:63 of register RB is If the SLB entry specified by bits 52:63 of register RB is valid (V=1), the contents of the B, VSID, Ks, Kp, N, L, C, valid (V=1), the contents of the ESID and V fields of the and LP fields of the entry are placed into register RT. entry are placed into register RT. The contents of these The contents of these registers are interpreted as registers are interpreted as shown in Figure 31. shown in Figure 30. RT RT ESID V 0s B VSID KsKpNLC 0 LP 0s 0 36 37 63 0 2 52 57 58 60 63 RB RB 0s index 0s index 0 52 63 0 52 63 RT0:35 ESID RT0:1 B RT36 V RT2:51 VSID RT37:63 set to 0b000 || 0x00_0000 RT52 Ks RB0:51 must be 0x0_0000_0000_0000 RT53 Kp RB52:63 index, which selects the SLB entry RT54 N Figure 31. GPR contents for slbmfee RT55 L RT56 C If the SLB entry specified by bits 52:63 of register RB is RT57 set to 0b0 invalid (V=0), the contents of register RT are set to 0. RT58:59 LP RT60:63 set to 0b0000 High-order bits of (RB)52:63 that correspond to SLB entries beyond the size of the SLB provided by the RB0:51 must be 0x0_0000_0000_0000 implementation must be zeros. RB52:63 index, which selects the SLB entry This instruction is privileged. Figure 30. GPR contents for slbmfev Special Registers Altered: On implementations that support a virtual address size None of only n bits, n<78, RT0:77-n are set to zeros. If the SLB entry specified by bits 52:63 of register RB is invalid (V=0), the contents of register RT are set to 0. High-order bits of (RB)52:63 that correspond to SLB entries beyond the size of the SLB provided by the implementation must be zeros. This instruction is privileged. Special Registers Altered: None 534 Power ISATM III-S Version 2.05 SLB Find Entry ESID X-form If this instruction is executed in 32-bit mode, (RB)0:31 must be zeros (i.e., the ESID must be in the range 0- slbfee. RT,RB 15). This instruction is privileged. 31 RT /// RB 979 1 0 6 11 16 21 31 Special Registers Altered: CR0 The SLB is searched for an entry that matches the effective address specified by register RB. The search is performed as if it were being performed for purposes of address translation. E.g., in order for a given entry to satisfy the search, the entry must be valid (V=1), and (RB)0:63-s must equal SLBE[ESID0:63-s] (where 2s is the segment size selected by the B field in the entry). If exactly one matching entry is found, the contents of the B, VSID, Ks, Kp, N, L, C, and LP fields of the entry are placed into register RT. If no matching entry is found, register RT is set to 0. If more than one matching entry is found, either one of the matching entries is used, as if it were the only matching entry, or a Machine Check occurs. If a Machine Check occurs, register RT, and CR Field 0 are set to undefined values, and the description below of how this register and this field is set does not apply. The contents of registers RT and RB are interpreted as shown in Figure 32. RT B VSID KsKpNLC 0 LP 0s 0 2 52 57 58 60 63 RB ESID 0s 0 40 63 RT0:1 B RT2:51 VSID RT52 Ks RT53 Kp RT54 N RT55 L RT56 C RT57 set to 0b0 RT58:59 LP RT60:63 set to 0b0000 RB0:35 ESID RB36:63 must be 0x0000000 Figure 32. GPR contents for slbfee. If s > 28, RT80-s:51 are set to zeros. On implementa- tions that support a virtual address size of only n bits, n < 78, RT2:79-n are set to zeros. CR Field 0 is set as follows. j is a 1-bit value that is equal to 0b1 if a matching entry was found. Otherwise, j is 0b0. CR0LT GT EQ SO = 0b00 || j || XERSO Chapter 5. Storage Control 535 Version 2.05 5.9.3.2 Bridge to SLB Architecture [Category:Server.Phased-Out] The facility described in this section can be used to RS/RT ease the transition to the current Power ISA software- ::: . KsKpN 0 VSID23:49 managed Segment Lookaside Buffer (SLB) architec- 0 32 33 36 37 63 ture, from the Segment Register architecture provided by 32-bit PowerPC implementations. A complete RB description of the Segment Register architecture may --- ESID --- be found in "Segmented Address Translation, 32-Bit 0 32 36 63 Implementations," Section 4.5, Book III of Version 1.10 of the PowerPC architecture, referenced in the intro- Figure 33. GPR contents for mtsr, mtsrin, mfsr, and duction to this architecture. mfsrin The facility permits the operating system to continue to Programming Note use the 32-bit PowerPC implementation's Segment Register Manipulation instructions. The "Segment Register" format used by the instruc- tions described in this section corresponds to the 5.9.3.2.1 Segment Register low-order 32 bits of RS and RT shown in the figure. This format is essentially the same as that for the Manipulation Instructions Segment Registers of 32-bit PowerPC implementa- The instructions described in this section -- mtsr, tions. The only differences are the following. mtsrin, mfsr, and mfsrin -- allow software to associate 1 Bit 36 corresponds to a reserved bit in Seg- effective segments 0 through 15 with any of virtual seg- ment Registers. Software must supply 0 for the ments 0 through 227-1. SLB entries 0:15 serve as vir- bit because it corresponds to the L bit in SLB tual Segment Registers, with SLB entry i used to entries, and large pages are not supported for emulate Segment Register i. The mtsr and mtsrin SLB entries created by the Move To Segment instructions move 32 bits from a selected GPR to a Register instructions. selected SLB entry. The mfsr and mfsrin instructions move 32 bits from a selected SLB entry to a selected 1 VSID bits 23:25 correspond to reserved bits in GPR. Segment Registers. Software can use these extra VSID bits to create VSIDs that are larger The contents of the GPRs used by the instructions than those supported by the Segment Register described in this section are shown in Figure 33. Fields Manipulation instructions of 32-bit PowerPC shown as zeros must be zero for the Move To Segment implementations. Register instructions. Fields shown as hyphens are ignored. Fields shown as periods are ignored by the Bit 32 of RS and RT corresponds to the T (direct- Move To Segment Register instructions and set to zero store) bit of early 32-bit PowerPC implementations. by the Move From Segment Register instructions. No corresponding bit exists in SLB entries. Fields shown as colons are ignored by the Move To Segment Register instructions and set to undefined val- Programming Note ues by the Move From Segment Register instructions. The Programming Note in the introduction to Sec- tion 5.9.3.1 applies also to the Segment Register Manipulation instructions described in this section, and to any combination of the instructions described in the two sections, except as specified below for mfsr and mfsrin. The requirement that the SLB contain at most one entry that translates a given effective address (see Section 5.7.6.1) applies to SLB entries created by mtsr and mtsrin. This requirement is satisfied nat- urally if only mtsr and mtsrin are used to create SLB entries for a given ESID, because for these instructions the association between SLB entries and ESID values is fixed (SLB entry i is used for ESID i). However, care must be taken if slbmte is also used to create SLB entries for the ESID, because for slbmte the association between SLB entries and ESID values is specified by software. 536 Power ISATM III-S Version 2.05 Move To Segment Register X-form Move To Segment Register Indirect X-form mtsr SR,RS mtsrin RS,RB 31 RS / SR /// 210 / 0 6 11 12 16 21 31 31 RS /// RB 242 / 0 6 11 16 21 31 The SLB entry specified by SR is loaded from register RS, as follows. The SLB entry specified by (RB)32:35 is loaded from register RS, as follows. SLBE Set to SLB Field(s) Bit(s) SLBE Set to SLB Field(s) 0:31 0x0000_0000 ESID0:31 Bit(s) 32:35 SR ESID32:35 0:31 0x0000_0000 ESID0:31 36 0b1 V 32:35 (RB)32:35 ESID32:35 37:38 0b00 B 36 0b1 V 39:61 0b000||0x0_0000 VSID0:22 62:88 (RS)37:63 VSID23:49 37:38 0b00 B 89:91 (RS)33:35 KsKpN 39:61 0b000||0x0_0000 VSID0:22 92 (RS)36 L ((RS)36 must be 0b0) 62:88 (RS)37:63 VSID23:49 93 0b0 C 89:91 (RS)33:35 KsKpN 94 0b0 reserved 92 (RS)36 L ((RS)36 must be 0b0) 95:96 0b00 LP 93 0b0 C 94 0b0 reserved MSRSF must be 0 when this instruction is executed; otherwise the results are boundedly undefined. 95:96 0b00 LP This instruction is privileged. MSRSF must be 0 when this instruction is executed; Special Registers Altered: otherwise the results are boundedly undefined. None This instruction is privileged. Special Registers Altered: None Chapter 5. Storage Control 537 Version 2.05 Move From Segment Register X-form Move From Segment Register Indirect X-form mfsr RT,SR mfsrin RT,RB 31 RT / SR /// 595 / 0 6 11 12 16 21 31 31 RT /// RB 659 / 0 6 11 16 21 31 The contents of the low-order 27 bits of the VSID field and the contents of the Ks, Kp, N, and L fields of the The contents of the low-order 27 bits of the VSID field SLB entry specified by SR are placed into register RT and the contents of the Ks, Kp, N, and L fields of the as follows. SLB entry specified by (RB)32:35 are placed into regis- ter RT as follows. SLBE Bit(s) Copied to SLB Field(s) 62:88 RT37:63 VSID23:49 SLBE Bit(s) Copied to SLB Field(s) 89:91 RT33:35 KsKpN 62:88 RT37:63 VSID23:49 92 RT36 L (SLBEL must be 0b0) 89:91 RT33:35 KsKpN 92 RT36 L (SLBEL must be 0b0) RT32 is set to 0. The contents of RT0:31 are undefined. RT32 is set to 0. The contents of RT0:31 are undefined. MSRSF must be 0 when this instruction is executed; otherwise the results are boundedly undefined. MSRSF must be 0 when this instruction is executed; otherwise the results are boundedly undefined. This instruction must be used only to read an SLB entry that was, or could have been, created by mtsr or This instruction must be used only to read an SLB entry mtsrin and has not subsequently been invalidated (i.e., that was, or could have been, created by mtsr or an SLB entry in which ESID<16, V=1, VSID<227, L=0, mtsrin and has not subsequently been invalidated (i.e., and C=0). If the SLB entry is invalid (V=0), RT33:63 are an SLB entry in which ESID<16, V=1, VSID<227, L=0, set to 0. Otherwise the contents of register RT are and C=0). If the SLB entry is invalid (V=0), RT33:63 are undefined. set to 0. Otherwise the contents of register RT are undefined. This instruction is privileged. This instruction is privileged. Special Registers Altered: None Special Registers Altered: None 538 Power ISATM III-S Version 2.05 5.9.3.3 TLB Management Instructions TLB Invalidate Entry X-form LP field of the PTE that was used to create the TLB entry to be invalidated. tlbie RB,L - (RB)0:43+c must contain bits 14:77-p of the vir- tual address translated by the TLB to be inval- idated, followed by p+c-20 zeros which must 31 /// L /// RB 306 / be ignored by the processor. 0 6 10 11 16 21 31 Let the segment size be equal to the segment size if L = 0 specified in RB54:55 (B field). The contents of RB54:55 then must be the same as the contents of PTEB used to cre- p = 12 ate the TLB entry to be invalidated. if (RB)56=0 then pg_size 1 4 KB RB52:53 , RB56 (when the L field of the instruction is 1), else pg_size 1 64 KB and RB57:63 must be set to zeros and must be ignored else by the processor. pg_size 1 page size specified in (RB)44:51 All TLB entries that have all of the following properties p 1 log_base_2(pg_size) sg_size 12segment size specified in (RB)54:55 are made invalid on all processors that are in the same for each processor in the partition partition as the processor executing the tlbie instruc- for each TLB entry tion. if (entry_VA14:77-p = (RB)0:63-p) & 1 The entry translates a virtual address for which (entry_sg_size = sg_size) & VA14:77-p is equal to (RB)0:63-p. (entry_pg_size = pg_size) then TLB entry 1 invalid 1 The segment size of the entry is the same as the segment size specified in (RB)54:55. The operation performed by this instruction is based 1 Either of the following is true: upon the contents of RB and the L field. The contents - The L field in the instruction is 0, and either of RB are shown below, where L is the L field in the the page size of the entry is 4KB and instruction. (RB)56=0, or the page size of the entry is L=0: 64KB and (RB)56 =1. - The L field of the instruction is 1, and the page size of the entry matches the page size speci- VPN 0s B AP 0s fied in (RB)44:51. 0 52 54 56 57 63 Additional TLB entries may also be made invalid on any L=1: processor that is in the same partition as the processor executing the tlbie instruction. VPN LP 0s B 0s 0 44 52 54 56 63 MSRSF must be 1 when this instruction is executed; otherwise the results are undefined. If the L field of the instruction contains 0, RB56 (AP - Admixed Page size field) must be set to 0 if the page The operation performed by this instruction is ordered size specified by the PTE that was used to create the by the eieio (or sync or ptesync) instruction with TLB entry to be invalidated is 4 KB and must be set to 1 respect to a subsequent tlbsync instruction executed if the page size specified by the PTE that was used to by the processor executing the tlbie instruction. The create the TLB entry to be invalidated is 64 KB. The operations caused by tlbie and tlbsync are ordered by VPN field in register RB must contain bits 14:65 of the eieio as a fourth set of operations, which is indepen- virtual address translated by the TLB entry to be invali- dent of the other three sets that eieio orders. dated. This instruction is hypervisor privileged. If the L field in the instruction contains 1, the following See Section 5.10, "Page Table Update Synchronization rules apply, where c is the number of "r" bits in the LP Requirements" for a description of other requirements field of the PTE that was used to create the TLB entry associated with the use of this instruction. to be invalidated. Special Registers Altered: - The page size is specified in the LP field in None register RB, where the relationship between (RB)LP and the page size is the same as the relationship between PTELP and the page size (see Figure 22). Specifically, (RB)44+c:51 must be equal to the contents of bits c:7 of the Chapter 5. Storage Control 539 Version 2.05 Programming Note For tlbie[l] instructions in which L=0, the AP value in RB is provided to make it easier for the processor to locate address translations, in lookaside buffers, corresponding to the address translation being invalidated. 540 Power ISATM III-S Version 2.05 TLB Invalidate Entry Local X-form idated, followed by p+c-20 zeros which must be ignored by the processor. tlbiel RB,L Let the segment size be equal to the segment size specified in RB54:55 (B field). The contents of RB54:55 31 /// L /// RB 274 / must be the same as the contents of PTEB used to cre- 0 6 10 11 16 21 31 ate the TLB entry to be invalidated. if L = 0 RB52:53 , RB56 (when the L field of the instruction is 1), then and RB 57:63 must be set to 0s and must be ignored by p = 12 the processor. if (RB)56=0 then pg_size 1 4 KB All TLB entries that have all of the following properties else pg_size 1 64 KB are made invalid on the processor executing the tlbiel else instruction. pg_size 1 page size specified in (RB)44:51 1 The entry translates a virtual address for which p 1 log_base_2(pg_size) sg_size 12segment size specified in (RB)54:55 VA14:77-p is equal to (RB)0:63-p. for each TLB entry 1 The segment size of the entry is the same as the if (entry_VA14:77-p = (RB)0:63-p) & segment size specified in (RB)54:55. (entry_sg_size = segment_size) 1 Either of the following is true: (entry_pg_size = pg_size) - The L field in the instruction is 0, and either then TLB entry 1 invalid the page size of the entry is 4KB and The operation performed by this instruction is based (RB)56=0, or the page size of the entry is upon the contents of RB and the L field. The contents 64KB and (RB)56 =1. of RB are shown below, where L is the L field in the - The L field of the instruction is 1, and the page instruction. size of the entry matches the page size speci- fied in (RB)44:51. L=0: Only TLB entries on the processor executing the tlbiel instruction are affected. VPN 0s B AP 0s 0 52 54 56 57 63 MSRSF must be 1 when this instruction is executed; otherwise the results are undefined. L=1: This instruction is hypervisor privileged. VPN LP 0s B 0s See Section 5.10, "Page Table Update Synchronization 0 44 52 54 56 63 Requirements" on page 543 for a description of other If the L field of the instruction contains 0, RB56 (AP - requirements associated with the use of this instruction. Admixed Page size field) must be set to 0 if the page Special Registers Altered: size specified by the PTE that was used to create the None TLB entry to be invalidated is 4 KB and must be set to 1 if the page size specified by the PTE that was used to Programming Note create the TLB entry to be invalidated is 64 KB. The The primary use of this instruction by hypervisor VPN field in register RB must contain bits 14:65 of the state code is to invalidate TLB entries prior to reas- virtual address translated by the TLB entry to be invali- signing a processor to a new logical partition. dated. tlbiel may be executed on a given processor even if If the L field in the instruction contains 1, the following the sequence tlbie - eieio - tlbsync - ptesync is rules apply, where c is the number of "r" bits in the LP concurrently being executed on another processor. field of the PTE that was used to create the TLB entry to be invalidated. See also the Programming Note with the descrip- - The page size is specified in the LP field in tion of the tlbie instruction. register RB, where the relationship between (RB)LP and the page size is the same as the relationship between PTELP and the page size (see Figure 22). Specifically, (RB)44+c:51 must be equal to the contents of bits c:7 of the LP field of the PTE that was used to create the TLB entry to be invalidated. - (RB)0:43+c must contain bits 14:77-p of the vir- tual address translated by the TLB to be inval- Chapter 5. Storage Control 541 Version 2.05 TLB Invalidate All X-form TLB Synchronize X-form tlbia tlbsync 31 /// /// /// 370 / 31 /// /// /// 566 / 0 6 11 16 21 31 0 6 11 16 21 31 all TLB entries 1 invalid The tlbsync instruction provides an ordering function for the effects of all tlbie instructions executed by the All TLB entries are made invalid on the processor exe- processor executing the tlbsync instruction, with cuting the tlbia instruction. respect to the memory barrier created by a subsequent This instruction is hypervisor privileged. ptesync instruction executed by the same processor. Executing a tlbsync instruction ensures that all of the This instruction is optional, and need not be imple- following will occur. mented. 1 All TLB invalidations caused by tlbie instructions Special Registers Altered: preceding the tlbsync instruction will have com- None pleted on any other processor before any data accesses caused by instructions following the pte- Programming Note sync instruction are performed with respect to that tlbia does not affect TLBs on other processors. processor. 1 All storage accesses by other processors for which the address was translated using the translations being invalidated, and all Reference and Change bit updates associated with address translations that were performed by other processors using the translations being invalidated, will have been per- formed with respect to the processor executing the ptesync instruction, to the extent required by the associated Memory Coherence Required attributes, before the ptesync instruction's mem- ory barrier is created. The operation performed by this instruction is ordered by the eieio (or sync or ptesync) instruction with respect to preceding tlbie instructions executed by the processor executing the tlbsync instruction. The oper- ations caused by tlbie and tlbsync are ordered by eieio as a fourth set of operations, which is indepen- dent of the other three sets that eieio orders. The tlbsync instruction may complete before opera- tions caused by tlbie instructions preceding the tlb- sync instruction have been performed. This instruction is hypervisor privileged. See Section 5.10 for a description of other require- ments associated with the use of this instruction. Special Registers Altered: None Programming Note tlbsync should not be used to synchronize the completion of tlbiel. 542 Power ISATM III-S Version 2.05 5.10 Page Table Update Synchronization Requirements This section describes rules that software must follow Unsynchronized lookups in the HTAB continue when updating the Page Table, and includes sug- even while it is being modified. Any processor, gested sequences of operations for some representa- including a processor on which software is modifying tive cases. the HTAB, may look in the HTAB at any time in an attempt to translate a virtual address. When modifying In the sequences of operations shown in the following a PTE, software must ensure that the PTE's Valid bit is subsections, any alteration of a Page Table Entry 0 if the PTE is inconsistent (e.g., if the RPN field is not (PTE) that corresponds to a single line in the sequence correct for the current AVPN field). is assumed to be done using a Store instruction for which the access is atomic. Appropriate modifications Updates of Reference and Change bits by the pro- must be made to these sequences if this assumption is cessor are not synchronized with the accesses that not satisfied (e.g., if a store doubleword operation is cause the updates. When modifying doubleword 1 of done using two Store Word instructions). a PTE, software must take care to avoid overwriting a processor update of these bits and to avoid having the Stores are not performed out-of-order, as described in value written by a Store instruction overwritten by a Section 5.5, "Performing Operations Out-of-Order" on processor update. page 506. Moreover, address translations associated with instructions preceding the corresponding Store Before permitting one or more tlbie instructions to be instructions are not performed again after the stores executed on a given processor in a given partition soft- have been performed. (These address translations ware must ensure that no other processor will execute must have been performed before the store was deter- a "conflicting instruction" until after the following mined to be required by the sequential execution sequence of instructions has been executed on the model, because they might have caused an exception.) given processor. As a result, an update to a PTE need not be preceded by a context synchronizing operation. the tlbie instruction(s) eieio All of the sequences require a context synchronizing tlbsync operation after the sequence if the new contents of the ptesync PTE are to be used for address translations associated with subsequent instructions. The "conflicting instructions" in this case are the follow- ing. As noted in the description of the Synchronize instruc- tion in Section 3.4.3 of Book II, address translation 1 a tlbie or tlbsync instruction, if executed on associated with instructions which occur in program another processor in the given partition order subsequent to the Synchronize (and this includes 1 an mtspr instruction that modifies the LPIDR, if the the ptesync variant) may actually be performed prior to modification has either of the following properties. the completion of the Synchronize. To ensure that these instructions and data which may have been - The old LPID value (i.e., the contents of the speculatively fetched are discarded, a context synchro- LPIDR just before the mtspr instruction is nizing operation is required. executed) is the value that identifies the given partition Programming Note - The new LPID value (i.e., the value specified by the mtspr instruction) is the value that In many cases this context synchronization will identifies the given partition occur naturally; for example, if the sequence is exe- cuted within an interrupt handler the rfid or hrfid Other instructions (excluding mtspr instructions that instruction that returns from the interrupt handler modify the LPIDR as described above, and excluding may provide the required context synchronization. tlbie instructions except as shown) may be interleaved with the instruction sequence shown above, but the instructions in the sequence must appear in the order shown. On uniprocessor systems, the eieio and tlb- Page Table Entries must not be changed in a manner sync instructions can be omitted. Other instructions that causes an implicit branch. may be interleaved with this sequence of instructions, but these instructions must appear in the order shown. 5.10.1 Page Table Updates TLBs are non-coherent caches of the HTAB. TLB entries must be invalidated explicitly with one of the TLB Invalidate instructions. Chapter 5. Storage Control 543 Version 2.05 Programming Note Programming Note The eieio instruction prevents the reordering of For all of the sequences shown in the following tlbie instructions previously executed by the pro- subsections, if it is necessary to communicate com- cessor with respect to the subsequent tlbsync pletion of the sequence to software running on instruction. The tlbsync instruction and the subse- another processor, the ptesync instruction at the quent ptesync instruction together ensure that all end of the sequence should be followed by a Store storage accesses for which the address was trans- instruction that stores a chosen value to some cho- lated using the translations being invalidated, and sen storage location X. The memory barrier cre- all Reference and Change bit updates associated ated by the ptesync instruction ensures that if a with address translations that were performed Load instruction executed by another processor using the translations being invalidated, will be per- returns the chosen value from location X, the formed with respect to any processor or mecha- sequence's stores to the Page Table have been nism, to the extent required by the associated performed with respect to that other processor. The Memory Coherence Required attributes, before Load instruction that returns the chosen value any data accesses caused by instructions following should be followed by a context synchronizing the ptesync instruction are performed with respect instruction in order to ensure that all instructions to that processor or mechanism. following the context synchronizing instruction will be fetched and executed using the values stored by The requirements specified above for tlbie instructions the sequence (or values stored subsequently). apply also to tlbsync instructions, except that the (These instructions may have been fetched or exe- "sequence of instructions" consists solely of the tlb- cuted out-of-order using the old contents of the sync instruction(s) followed by a ptesync instruction. PTE.) Before permitting an mtspr instruction that modifies the This Note assumes that the Page Table and loca- LPIDR to be executed on a given processor, software tion X are in storage that is Memory Coherence must ensure that no other processor will execute a Required. "conflicting instruction" until after the mtspr instruction followed by a context synchronizing instruction have been executed on the given processor (a context syn- 5.10.1.1 Adding a Page Table Entry chronizing event can be used instead of the context This is the simplest Page Table case. The Valid bit of synchronizing instruction; see Chapter 10). the old entry is assumed to be 0. The following The "conflicting instructions" in this case are the follow- sequence can be used to create a PTE, maintain a ing. consistent state, and ensure that a subsequent refer- ence to the virtual address translated by the new entry 1 a tlbie or tlbsync instruction, if executed on a pro- will use the correct real address and associated cessor in either of the following partitions attributes - the partition identified by the old LPID value PTEARPN,LP,AC,R,C,WIMG,N,PP 1 new values - the partition identified by the new LPID value eieio /* order 1st update before 2nd */ PTEB,AVPN,SW,L,H,V 1 new values (V=1) Programming Note ptesync /* order updates before next The restrictions specified above regarding modify- Page Table search and before ing the LPIDR apply even on uniprocessor sys- next data access. */ tems, and even if the new LPID value is equal to the old LPID value. Similarly, when a tlbsync instruction has been exe- cuted by a processor in a given partition, a ptesync instruction must be executed by that processor before a tlbie or tlbsync instruction is executed by another pro- cessor in that partition. The sequences of operations shown in the following subsections assume a multiprocessor environment. In a uniprocessor environment the tlbsync must be omit- ted, and the eieio that separates the tlbie from the tlb- sync can be omitted. In a multiprocessor environment, when tlbiel is used instead of tlbie in a Page Table update, the synchronization requirements are the same as when tlbie is used in a uniprocessor environment. 544 Power ISATM III-S Version 2.05 5.10.1.2 Modifying a Page Table Entry Resetting the Reference Bit If the only change being made to a valid entry is to set General Case the Reference bit to 0, a simpler sequence suffices If a valid entry is to be modified and the translation because the Reference bit need not be maintained instantiated by the entry being modified is to be invali- exactly. dated, the following sequence can be used to modify the PTE, maintain a consistent state, ensure that the oldR 1 PTER /* get old R */ if oldR = 1 then translation instantiated by the old entry is no longer PTER 1 0 /* store byte (R=0, other bits available, and ensure that a subsequent reference to unchanged) */ the virtual address translated by the new entry will use tlbie(B,VA14:77-p,L,LP,AP) /* invalidate entry */ the correct real address and associated attributes. (The eieio /* order tlbie before tlbsync */ sequence is equivalent to deleting the PTE and then tlbsync /* order tlbie before ptesync */ adding a new one; see Sections 5.10.1.1 and 5.10.1.3.) ptesync /* order tlbie, tlbsync, and update before next Page Table search PTEV 1 0 /* (other fields don't matter)*/ and before next data access */ ptesync /* order update before tlbie and before next Page Table search */ Modifying the SW field tlbie(old_B,old_VA14:77-p,old_L,old_LP,old_AP) /*invalidate old translation*/ If the only change being made to a valid entry is to eieio /* order tlbie before tlbsync */ modify the SW field, the following sequence suffices, tlbsync /* order tlbie before ptesync */ because the SW field is not used by the processor and ptesync /* order tlbie, tlbsync and 1st doubleword 0 of the PTE is not modified by the proces- update before 2nd update */ sor. PTEARPN,LP,AC,R,C,WIMG,N,PP 1 new values eieio /* order 2nd update before 3rd */ loop: ldarx r1 1 PTE_dwd_0 /* load dwd 0 of PTE */ PTEB,AVPN,SW,L,H,V 1 new values (V=1) r157:60 1 new SW value /* replace SW, in r1 */ ptesync /* order 2nd and 3rd updates before stdcx. PTE_dwd_0 12r1 /* store dwd 0 of PTE next Page Table search and if still reserved (new SW value, other before next data access */ fields unchanged) */ bne- loop /* loop if lost reservation */ A lwarx/stwcx. pair (specifying the low-order word of doubleword 0 of the PTE) can be used instead of the ldarx /stdcx. pair shown above. Modifying the Virtual Address If the virtual addrss translated by a valid PTE is to be modified and the new virtual address hashes to the same PTEG (or the same two PTEGs if the secondary Page Table search is enabled) as does the old virtual address, the following sequence can be used to modify the PTE, maintain a consistent state, ensure that the translation instantiated by the old entry is nolonger available, and ensure that a subsequent reference to the virtual address translated by th enew entry will use the correct real address and associated attributes. PTEAVPN,SW,L,H,V 1 new values (V=1) ptesync /* order update before tlbie and before next Page Table search */ tlbie(old_B,old_VA14:77-p,old_L,old_LP,old_AP) /*invalidate old translation*/ eieio /* order tlbie before tlbsync */ tlbsync /* order tlbie before ptesync */ ptesync /* order tlbie, tlbsync, and update before next data access */ Chapter 5. Storage Control 545 Version 2.05 5.10.1.3 Deleting a Page Table Entry The following sequence can be used to ensure that the translation instantiated by an existing entry is no longer available. PTEV 1 0 /* (other fields don't matter) */ ptesync /* order update before tlbie and before next Page Table search */ tlbie(old_B,old_VA14:77-p,old_L,old_LP,old_AP) /*invalidate old translation*/ eieio /* order tlbie before tlbsync */ tlbsync /* order tlbie before ptesync */ ptesync /* order tlbie, tlbsync, and update before next data access */ 546 Power ISATM III-S Version 2.05 Chapter 6. Interrupts 6.1 Overview. . . . . . . . . . . . . . . . . . . . 547 6.5.8 Alignment Interrupt. . . . . . . . . . . 562 6.2 Interrupt Registers . . . . . . . . . . . . 548 6.5.9 Program Interrupt . . . . . . . . . . . . 563 6.2.1 Machine Status Save/Restore Regis- 6.5.10 Floating-Point Unavailable ters . . . . . . . . . . . . . . . . . . . . . . . . . . . 548 Interrupt . . . . . . . . . . . . . . . . . . . . . . . . 564 6.2.2 Hypervisor Machine Status Save/ 6.5.11 Decrementer Interrupt . . . . . . . 565 Restore Registers . . . . . . . . . . . . . . . . 548 6.5.12 Hypervisor Decrementer 6.2.3 Data Address Register . . . . . . . 548 Interrupt . . . . . . . . . . . . . . . . . . . . . . . . 565 6.2.4 Hypervisor Data Address Register 6.5.13 System Call Interrupt . . . . . . . . 565 548 6.5.14 Trace Interrupt [Category: Trace] . . 6.2.5 Data Storage Interrupt 565 Status Register . . . . . . . . . . . . . . . . . . 548 6.5.15 Hypervisor Data Storage Inter- 6.2.6 Hypervisor Data Storage Interrupt rupt . . . . . . . . . . . . . . . . . . . . . . . . . . . 566 Status Register . . . . . . . . . . . . . . . . . 549 6.5.16 Hypervisor Instruction Storage 6.2.7 Hypervisor Emulation Instruction Interrupt . . . . . . . . . . . . . . . . . . . . . . . 567 Register [Category: Hypervisor Emula- 6.5.17 Hypervisor Data Segment Inter- tion Assistance] . . . . . . . . . . . . . . . . 549 rupt . . . . . . . . . . . . . . . . . . . . . . . . . . . 567 6.2.8 Hypervisor Maintenance Exception 6.5.18 Hypervisor Instruction Segment Register. . . . . . . . . . . . . . . . . . . . . . . . 549 Interrupt . . . . . . . . . . . . . . . . . . . . . . . 568 6.2.9 Hypervisor Maintenance Exception 6.5.19 Hypervisor Emulation Assis- Enable Register . . . . . . . . . . . . . . . . . 549 tance Interrupt [Category: Hypervisor 6.3 Interrupt Synchronization . . . . . . . 550 Emulation Assistance] . . . . . . . . . . . 568 6.4 Interrupt Classes . . . . . . . . . . . . . 550 6.5.20 Hypervisor Maintenance Interrupt . 6.4.1 Precise Interrupt . . . . . . . . . . . . 550 568 6.4.2 Imprecise Interrupt. . . . . . . . . . . 550 6.5.21 Performance Monitor 6.4.3 Interrupt Processing . . . . . . . . . 551 Interrupt [Category: Server.Performance 6.4.4 Implicit alteration of HSRR0 and Monitor] . . . . . . . . . . . . . . . . . . . . . . . . 569 HSRR1 . . . . . . . . . . . . . . . . . . . . . . . . 554 6.5.22 Vector Unavailable Interrupt [Cate- 6.5 Interrupt Definitions. . . . . . . . . . . . 555 gory: Vector] . . . . . . . . . . . . . . . . . . . . 569 6.5.1 System Reset Interrupt . . . . . . . 556 6.6 Partially Executed 6.5.2 Machine Check Interrupt . . . . . . 557 Instructions . . . . . . . . . . . . . . . . . . . . . 570 6.5.3 Data Storage Interrupt . . . . . . . . 559 6.7 Exception Ordering . . . . . . . . . . . . 571 6.5.4 Data Segment Interrupt . . . . . . . 560 6.7.1 Unordered Exceptions . . . . . . . . 571 6.5.5 Instruction Storage Interrupt . . . 560 6.7.2 Ordered Exceptions . . . . . . . . . . 571 6.5.6 Instruction Segment 6.8 Interrupt Priorities . . . . . . . . . . . . . 571 Interrupt. . . . . . . . . . . . . . . . . . . . . . . . 561 6.5.7 External Interrupt . . . . . . . . . . . . 561 6.1 Overview System Reset and Machine Check interrupts are not ordered. All other interrupts are ordered such that only The Power ISA provides an interrupt mechanism to one interrupt is reported, and when it is processed allow the processor to change state as a result of exter- (taken) no program state is lost. Since Save/Restore nal signals, errors, or unusual conditions arising in the Registers SRR0 and SRR1 are serially reusable execution of instructions. resources used by most interrupts, program state may be lost when an unordered interrupt is taken. Chapter 6. Interrupts 547 Version 2.05 6.2 Interrupt Registers Programming Note Execution of some instructions, and fetching instructions when MSRIR=1, may have the side 6.2.1 Machine Status Save/ effect of modifying HSRR0 and HSRR1; see Sec- Restore Registers tion 6.4.4. When various interrupts occur, the state of the machine is saved in the Machine Status Save/Restore registers (SRR0 and SRR1). Section 6.5 describes which regis- 6.2.3 Data Address Register ters are altered by each interrupt. The Data Address Register (DAR) is a 64-bit register that is set by the Machine Check, Data Storage, Data SRR0 // Segment, and Alignment interrupts; see Sections 6.5.2, 0 62 63 6.5.3, 6.5.4, and 6.5.8. In general, when one of these interrupts occurs the DAR is set to an effective address SRR1 associated with the storage access that caused the 0 63 interrupt, with the high-order 32 bits of the DAR set to 0 if the interrupt occurs in 32-bit mode. Figure 34. Save/Restore Registers SRR1 bits may be treated as reserved in a given imple- DAR mentation if they correspond to MSR bits that are 0 63 reserved or are treated as reserved in that implementa- Figure 36. Data Address Register tion or, for SRR1 bits in the range 33:36 and 42:47, they are specified as being set either to 0 or to an undefined value for all interrupts that set SRR1 (includ- 6.2.4 Hypervisor Data Address ing implementation-dependent setting, e.g. by the Machine Check interrupt or by implementation-specific Register interrupts). The Hypervisor Data Address Register (HDAR) is a 64- bit register that is set by the Hypervisor Data Storage 6.2.2 Hypervisor Machine Status and Hypervisor Data Segment interrupts; see Section 6.5.15 and Section 6.5.17. In general, when one of Save/Restore Registers these interrupts occurs the HDAR is set to an effective address associated with the storage access that When various interrupts occur, the state of the machine caused the interrupt, with the high-order 32 bits of the is saved in the Hypervisor Machine Status Save/ HDAR set to 0 if the interrupt occurs in 32-bit mode. Restore registers (HSRR0 and HSRR1). Section 6.5 describes which registers are altered by each interrupt. HDAR 0 63 HSRR0 // 0 62 63 Figure 37. Hypervisor Data Address Register HSRR1 0 63 6.2.5 Data Storage Interrupt Status Register Figure 35. Hypervisor Save/Restore Registers The Data Storage Interrupt Status Register (DSISR) is HSRR1 bits may be treated as reserved in a given a 32-bit register that is set by the Machine Check, Data implementation if they correspond to MSR bits that are Storage, Data Segment, and Alignment interrupts; see reserved or are treated as reserved in that implementa- Sections 6.5.2, 6.5.3, 6.5.4, and 6.5.8. In general, when tion or, for HSRR1 bits in the range 33:36 and 42:47, one of these interrupts occurs the DSISR is set to indi- they are specified as being set either to 0 or to an cate the cause of the interrupt. undefined value for all interrupts that set HSRR1 (including implementation-dependent setting, e.g. by implementation-specific interrupts). DSISR 32 63 The HSRR0 and HSRR1 are hypervisor resources; see Chapter 2. Figure 38. Data Storage Interrupt Status Register DSISR bits may be treated as reserved in a given implementation if they are specified as being set either to 0 or to an undefined value for all interrupts that set the DSISR (including implementation-dependent set- 548 Power ISATM III-S Version 2.05 ting, e.g. by the Machine Check interrupt or by imple- 1 Set to 1 when performance is degraded for mentation-specific interrupts). thermal reasons. 2 Set to 1 when processor recovery is invoked. 6.2.6 Hypervisor Data Storage Others Implementation-specific. Interrupt Status Register When the mtspr instruction is executed with the HMER The Hypervisor Data Storage Interrupt Status Register as the encoded Special Purpose Register, the contents (HDSISR) is a 32-bit register that is set by the Hypervi- of register RS are ANDed with the contents of the sor Data Storage interrupt. In general, when one of HMER and the result is placed into the HMER. these interrupts occurs the HDSISR is set to indicate The exception bits in the HMER are sticky; that is, once the cause of the interrupt. set to 1 they remain set to 1 until they are set to 0 by an mthmer instruction. HDSISR 32 63 Programming Note Figure 39. Hypervisor Data Storage Interrupt An access to the HMER is likely to be very slow. Status Register Software should access it sparingly. 6.2.7 Hypervisor Emulation Instruction Register [Category: 6.2.9 Hypervisor Maintenance Hypervisor Emulation Assis- Exception Enable Register tance] The Hypervisor Maintenance Exception Enable Regis- The Hypervisor Emulation Instruction Register (HEIR) ter (HMEER) is a 64-bit register in which each bit is a 32-bit register that is set by the Hypervisor Emula- enables the corresponding exception in the HMER to tion Assistance interrupt; see Section 6.5.19. The cause the Hypervisor Maintenance interrupt, potentially image of the instruction that caused the interrupt is causing exit from power-saving mode; see Section loaded into the register. 6.5.20 and Section 3.3.2. HEIR HMEER 0 31 0 63 Figure 40. Hypervisor Emulation Instruction Figure 42. Hypervisor Maintenance Exception Register Enable Register 6.2.8 Hypervisor Maintenance Exception Register Each bit in the Hypervisor Maintenance Exception Register (HMER) is associated with one or more causes of the Hypervisor Maintenance exception, and is set when the associated exception(s) occur. If the corresponding bit in the Hypervisor Maintenance Exception Enable Register (HMEER) is set, a Hypervisor Maintenance Inter- rupt (HMI) may occur. If the processor is in a power-saving mode when the interrupt would have occurred, the processor will exit the power- saving mode; see Section 6.5.20 and Section 3.3.2. HMER 0 63 Figure 41. Hypervisor Maintenance Exception Register The contents of the HMER are as follows: 0 Set to 1 for a Malfunction Alert. Chapter 6. Interrupts 549 Version 2.05 6.3 Interrupt Synchronization appear to have completed with respect to the exe- cuting processor. When an interrupt occurs, SRR0 or HSRR0 is set to 3. The instruction causing the exception may appear point to an instruction such that all preceding instruc- not to have begun execution (except for causing tions have completed execution, no subsequent the exception), may have been partially executed, instruction has begun execution, and the instruction or may have completed, depending on the inter- addressed by SRR0 or HSRR0 may or may not have rupt type. completed execution, depending on the interrupt type. 4. Architecturally, no subsequent instruction has With the exception of System Reset and Machine begun execution. Check interrupts, all interrupts are context synchroniz- ing as defined in Section 1.5.1. System Reset and Machine Check interrupts are context synchronizing if 6.4.2 Imprecise Interrupt they are recoverable (i.e., if bit 62 of SRR1 is set to 1 This architecture defines one imprecise interrupt, the by the interrupt). If a System Reset or Machine Check Imprecise Mode Floating-Point Enabled Exception type interrupt is not recoverable (i.e., if bit 62 of SRR1 is set Program interrupt. to 0 by the interrupt), it acts like a context synchronizing operation with respect to subsequent instructions. That When an Imprecise Mode Floating-Point Enabled is, a non-recoverable System Reset or Machine Check Exception type Program interrupt occurs, the following interrupt need not satisfy items 1 through 3 of Section conditions exist at the interrupt point. 1.5.1, but does satisfy items 4 and 5. 1. SRR0 addresses either the instruction causing the exception or some instruction following that instruction; see Section 6.5.9, "Program Interrupt" 6.4 Interrupt Classes on page 563. Interrupts are classified by whether they are directly 2. An interrupt is generated such that all instructions caused by the execution of an instruction or are caused preceding the instruction addressed by SRR0 by some other system exception. Those that are "sys- appear to have completed with respect to the exe- tem-caused" are: cuting processor. 1 System Reset 3. The instruction addressed by SRR0 may appear 1 Machine Check not to have begun execution (except, in some 1 External cases, for causing the interrupt to occur), may 1 Decrementer have been partially executed, or may have com- 1 Hypervisor Decrementer pleted; see Section 6.5.9. 1 Hypervisor Maintenance 4. No instruction following the instruction addressed External, Decrementer, Hypervisor Decrementer, and by SRR0 appears to have begun execution. Hypervisor Maintenance interrupts are maskable inter- rupts. Therefore, software may delay the generation of All Floating-Point Enabled Exception type Program these interrupts. System Reset and Machine Check interrupts are maskable using the MSR bits FE0 and interrupts are not maskable. FE1. Although these interrupts are maskable, they dif- fer significantly from the other maskable interrupts in "Instruction-caused" interrupts are further divided into that the masking of these interrupts is usually con- two classes, precise and imprecise. trolled by the application program, whereas the mask- ing of all other maskable interrupts is controlled by either the operating system or the hypervisor. 6.4.1 Precise Interrupt Except for the Imprecise Mode Floating-Point Enabled Exception type Program interrupt, all instruction- caused interrupts are precise. When the fetching or execution of an instruction causes a precise interrupt, the following conditions exist at the interrupt point. 1. SRR0 addresses either the instruction causing the exception or the immediately following instruction. Which instruction is addressed can be determined from the interrupt type and status bits. 2. An interrupt is generated such that all instructions preceding the instruction causing the exception 550 Power ISATM III-S Version 2.05 6.4.3 Interrupt Processing Associated with each kind of interrupt is an interrupt vector, which contains the initial sequence of instruc- tions that is executed when the corresponding interrupt occurs. Interrupt processing consists of saving a small part of the processor's state in certain registers, identifying the cause of the interrupt in other registers, and continuing execution at the corresponding interrupt vector loca- tion. When an exception exists that will cause an inter- rupt to be generated and it has been determined that the interrupt will occur, the following actions are per- formed. The handling of Machine Check interrupts (see Section 6.5.2) differs from the description given below in several respects. 1. SRR0 or HSRR0 is loaded with an instruction address that depends on the type of interrupt; see the specific interrupt description for details. 2. Bits 33:36 and 42:47 of SRR1 or HSRR1 are loaded with information specific to the interrupt type. 3. Bits 0:32, 37:41, and 48:63 of SRR1 or HSRR1 are loaded with a copy of the corresponding bits of the MSR. 4. The MSR is set as shown in Figure 43 on page 555. In particular, MSR bits IR and DR are set to 0, disabling relocation, and MSR bit SF is set to 1, selecting 64-bit mode. The new values take effect beginning with the first instruction executed following the interrupt. 5. Instruction fetch and execution resumes, using the new MSR value, at the effective address specific to the interrupt type. These effective addresses are shown in Figure 44 on page 556. Interrupts do not clear reservations obtained with lwarx or ldarx. Programming Note In general, when an interrupt occurs, the following instructions should be executed by the operating system before dispatching a "new" program. 1 stwcx. or stdcx., to clear the reservation if one is outstanding, to ensure that a lwarx or ldarx in the interrupted program is not paired with a stwcx. or stdcx. in the "new" program. 1 sync, to ensure that all storage accesses caused by the interrupted program will be per- formed with respect to another processor before the program is resumed on that other processor. 1 isync or rfid, to ensure that the instructions in the "new" program execute in the "new" con- text. Chapter 6. Interrupts 551 Version 2.05 552 Power ISATM III-S Version 2.05 Programming Note For instruction-caused interrupts, in some cases it may supports, or by an instruction that is in a cate- be desirable for the operating system to emulate the gory that the implementation does not support instruction that caused the interrupt, while in other but is used by some programs that the operat- cases it may be desirable for the operating system not ing system supports. to emulate the instruction. The following list, while not In general, the instruction should not be emulated if: complete, illustrates criteria by which decisions regard- ing emulation should be made. The list applies to gen- - The purpose of the instruction is to cause an eral execution environments; it does not necessarily interrupt. Example: System Call interrupt apply to special environments such as program debug- caused by sc. ging, processor bring-up, etc. - The interrupt is caused by a condition that is In general, the instruction should be emulated if: stated, in the instruction description, poten- tially to cause the interrupt. Example: Align- - The interrupt is caused by a condition for ment interrupt caused by lwarx for which the which the instruction description (including storage operand is not aligned. related material such as the introduction to the section describing the instruction) implies that - The program is attempting to perform a func- the instruction works correctly. Example: tion that it should not be permitted to perform. Alignment interrupt caused by lmw for which Example: Data Storage interrupt caused by the storage operand is not aligned, or by dcbz lwz for which the storage operand is in stor- for which the storage operand is in storage age that the program should not be permitted that is Write Through Required or Caching to access. (If the function is one that the pro- Inhibited. gram should be permitted to perform, the con- ditions that caused the interrupt should be - The instruction is an illegal instruction that corrected and the program re-dispatched such should appear, to the program executing it, as that the instruction will be re-executed. Exam- if it were supported by the implementation. ple: Data Storage interrupt caused by lwz for Example: An Illegal Instruction type Program which the storage operand is in storage that interrupt (or a Hypervisor Emulation Assis- the program should be permitted to access tance interrupt if Category: HEA is supported) but for which there currently is no PTE that is caused by an instruction that has been satisfies the Page Table search.) phased out of the architecture but is still used by some programs that the operating system Programming Note If a program modifies an instruction that it or another program will subsequently execute and the execution of the instruction causes an interrupt, the state of storage and the content of some processor registers may appear to be inconsistent to the inter- rupt handler program. For example, this could be the result of one program executing an instruction that causes a Hypervisor Emulation Assistance interrupt if Category: HEA is supported or the Ille- gal Instruction type Program interrupt if Category: HEA is not supported just before another instance of the same program stores an Add Immediate instruction in that storage location. To the interrupt handler code, it would appear that a processor gen- erated the interrupt as the result of executing a valid instruction. Chapter 6. Interrupts 553 Version 2.05 Execution of these instructions is guaranteed not Programming Note to have the side effect of altering HSRR0 and In order to handle Machine Check and System HSRR1 only if the storage operand is aligned and Reset interrupts correctly, the operating system MSRDR=0. should manage MSRRI as follows. 3. Arithmetic instructions 1 In the Machine Check and System Reset inter- rupt handlers, interpret SRR1 bit 62 (where addi, addis, add, subf, neg MSRRI is placed) as: 4. Compare instructions - 0: interrupt is not recoverable - 1: interrupt is recoverable cmpi, cmp, cmpli, cmpl 1 In each interrupt handler, when enough state 5. Logical and Extend Sign instructions has been saved that a Machine Check or Sys- ori, oris, xori, xoris, and, or, xor, nand, nor, tem Reset interrupt can be recovered from, set eqv, andc, orc, extsb, extsh, extsw MSRRI to 1. 6. Rotate and Shift instructions 1 In each interrupt handler, do the following (in order) just before returning. rldicl<64>, rldicr<64>, rldic<64>, rlwinm, 1. Set MSRRI to 0. rldcl<64>, rldcr<64>, rlwnm, rldimi<64>, rlwimi, 2. Set SRR0 and SRR1 to the values to be sld<64>, slw, srd<64>, srw used by rfid. The new value of SRR1 7. Other instructions should have bit 62 set to 1 (which will hap- pen naturally if SRR1 is restored to the isync value saved there by the interrupt, rfid, hrfid because the interrupt handler will not be executing this sequence unless the inter- mtspr, mfspr, mtmsrd, mfmsr rupt is recoverable). 3. Execute rfid. Programming Note For interrupts that set the SRRs other than Instructions excluded from the list include the fol- Machine Check or System Reset, MSRRI can be lowing. managed similarly when these interrupts occur 1 instructions that set or use XERCA within interrupt handlers for other interrupts that set 1 instructions that set XEROV or XERSO the SRRs. 1 andi., andis., and fixed-point instructions with Rc=1 (Fixed-point instructions with Rc=1 can This Note does not apply to interrupts that set the be replaced by the corresponding instruction HSRRs because these interrupts put the processor with Rc=0 followed by a Compare instruction.) into hypervisor state, and either do not occur or can 1 all floating-point instructions be prevented from occurring within interrupt han- 1 mftb dlers for other interrupts that set the HSRRs. These instructions, and the other excluded instruc- tions, may be implemented with the assistance of 6.4.4 Implicit alteration of HSRR0 implementation-specific interrupts that modify HSRR0 and HSRR1. The included instructions are and HSRR1 guaranteed not to be implemented thus. (The Executing some of the more complex instructions may included instructions are sufficiently simple as to be have the side effect of altering the contents of HSRR0 unlikely to need such assistance. Moreover, they and HSRR1. The instructions listed below are guaran- are likely to be needed in interrupt handlers before teed not to have this side effect. Any omission of HSRR0 and HSRR1 have been saved or after instruction suffixes is significant; e.g., add is listed but HSRR0 and HSRR1 have been restored.) add. is excluded. Similarly, fetching instructions may have the side effect 1. Branch instructions of altering the contents of HSRR0 and HSRR1 unless MSRIR=0. b[l][a], bc[l][a], bclr[l], bcctr[l] 2. Fixed-Point Load and Store Instructions lbz, lbzx, lhz, lhzx, lwz, lwzx, ld<64>, ldx<64>, stb, stbx, sth, sthx, stw, stwx, std<64>, stdx<64> 554 Power ISATM III-S Version 2.05 6.5 Interrupt Definitions effective address of the interrupt vector for each inter- rupt type. (Section 5.7.4 on page 512 summarizes all Figure 43 shows all the types of interrupts and the val- architecturally defined uses of effective addresses, ues assigned to the MSR for each. Figure 44 shows the including those implied by Figure 44.) Interrupt Type MSR Bit IR DR FE0 FE1 EE RI ME HV System Reset 0 0 0 0 0 0 p 1 Machine Check 0 0 0 0 0 0 0 1 Data Storage 0 0 0 0 0 0 - m Data Segment 0 0 0 0 0 0 - m Instruction Storage 0 0 0 0 0 0 - m Instruction Segment 0 0 0 0 0 0 - m External 0 0 0 0 0 h - e Alignment 0 0 0 0 0 0 - m Program 0 0 0 0 0 0 - m FP Unavailable 0 0 0 0 0 0 - m Decrementer 0 0 0 0 0 0 - m Hypervisor Decrementer 0 0 0 0 0 - - 1 System Call 0 0 0 0 0 0 - s Trace 0 0 0 0 0 0 - m Hypervisor Data Storage 0 0 0 0 0 - - 1 Hypervisor Instr. Storage. 0 0 0 0 0 - - 1 Hypervisor Instr. Segment 0 0 0 0 0 - - 1 Hypervisor Data Segment 0 0 0 0 0 - - 1 Hypv Em'n Assistance 0 0 0 0 0 - - 1 Hypervisor Maintenance 0 0 0 0 0 - - 1 Performance Monitor 0 0 0 0 0 0 - m Vector Unavailable1 0 0 0 0 0 0 - m 0 bit is set to 0 1 bit is set to 1 p bit is set to 1 if interrupt ocurred while the processor was in power-saving mode; oth- erwise not altered - bit is not altered m if LPES1=0, set to 1; otherwise not altered e if LPES0=0, set to 1; otherwise not altered h if LPES0=0, set to 0; otherwise not altered s if LEV=1 or LPES/LPES1=0, set to 1; otherwise not altered Settings for Other Bits Bits BE, FP, PMM, PR, SE, and VEC1are set to 0. If the interrupt results in HV being equal to 1, the LE bit is copied from the HILE bit; other- wise the LE bit is copied from the LPCRILE bit. The SF bit is set to 1. Reserved bits are set as if written as 0. 1 Category: Vector Figure 43. MSR setting due to interrupt Chapter 6. Interrupts 555 Version 2.05 6.5.1 System Reset Interrupt Effective If a System Reset exception causes an interrupt that is Address1 Interrupt Type not context synchronizing or causes the loss of a 00..0000_0100 System Reset Machine Check exception or a Direct External excep- 00..0000_0200 Machine Check tion, or if the state of the processor has been corrupted, 00..0000_0300 Data Storage the interrupt is not recoverable. 00..0000_0380 Data Segment When the processor is in any power-saving level, a 00..0000_0400 Instruction Storage System Reset interrupt occurs when a System Reset 00..0000_0480 Instruction Segment exception exists. When the processor is in doze or nap 00..0000_0500 External power-saving levels, a System Reset interrupt occurs 00..0000_0600 Alignment when any of the following exceptions exists provided 00..0000_0700 Program that the exception is enabled to cause exit from power saving mode (see Section 2.2, "Logical Partitioning 00..0000_0800 Floating-Point Unavailable Control Register (LPCR)"). When the processor is in 00..0000_0900 Decrementer sleep or rvwinkle power-saving level, it is implementa- 00..0000_0980 Hypervisor Decrementer tion-specific whether the following exceptions, when 00..0000_0A00 Reserved enabled, cause exit, or whether only a system-reset 00..0000_0B00 Reserved causes exit. 00..0000_0C00 System Call 1 External 00..0000_0D00 Trace 00..0000_0E00 Hypervisor Data Storage 1 Decrementer 00..0000_0E10 Hypervisor Instruction Storage 1 Hypervisor Maintenance 00..0000_0E20 Hypervisor Data Segment 1 Implementation-specific 00..0000_0E30 Hypervisor Instruction Segment 00..0000_0E40 Hypervisor Emulation Assistance SRR1 indicates the exception that caused exit from 00..0000_0E50 Hypervisor Maintenance power-saving mode as specified below. 00..0000_0E60 Reserved The following registers are set: . . . ... SRR0 If the interrupt did not occur when the pro- 00..0000_0EFF Reserved cessor was in power-saving mode, set to 00..0000_0F00 Performance Monitor the effective address of the instruction that 00..0000_0F10 Reserved the processor would have attempted to 00..0000_0F20 Vector Unavailable3 execute next if no interrupt conditions were 00..0000_0F30 Reserved present; otherwise, set to an undefined . . . ... value. 00..0000_0FFF Reserved SRR1 1 The values in the Effective Address column are 33:36 Set to 0. interpreted as follows. 42:44 If the interrupt did not occur when the pro- 1 00...0000_nnnn means cessor was in power-saving mode, set to 0x0000_0000_0000_nnnn an implementation-specific value. If the 2 Effective addresses 0x0000_0000_0000_0000 interrupt occurred when the processor was through 0x0000_0000_0000_00FF are used by in power-saving mode, set to indicate the software and will not be assigned as interrupt exception that caused exit from power-sav- vectors. ing mode as shown below: 3 Category: Vector. SRR142:44 Exception Figure 44. Effective address of interrupt vector by 000 Reserved interrupt type 001 Implementation specific Programming Note 010 System Reset When address translation is disabled, use of any of 011 Decrementer the effective addresses that are shown as reserved 100 External in Figure 44 risks incompatibility with future imple- 101 Hypervisor Maintenance mentations. 110 Implementation specific 111 Implementation specific 556 Power ISATM III-S Version 2.05 If multiple exceptions that cause exit from from bit 62 of the MSR if the processor is in power-saving mode exist, the exception a recoverable state; otherwise set to 0. If reported is the exception corresponding to the interrupt occurred while the processor the interrupt that would have occurred if the was in power-saving mode, set to 1 if the same exceptions existed and the processor processor is in a recoverable state; other- was not in power-saving mode. wise set to 0. 45 Set to 0. Others Loaded from the MSR. 46:47 Set to indicate whether the interrupt MSR See Figure 43 on page 555. occurred when the processor was in power- saving mode and, if so, the extent In addition, if the interrupt occurs when the processor is to which resource state was maintained in power-saving mode and is caused by an exception while the processor was in power-saving other than a System Reset exception, all other regis- mode, as follows: ters, except HSRR0 and HSRR1, that would be set by the corresponding interrupt if the exception occurred 00 The interrupt did not occur when when the processor was not in power-saving mode are the processor was in power-saving set by the System Reset interrupt, and are set to the mode. values to which they would be set if the exception occurred when the processor was not in power-saving 01 The interrupt occurred when the mode. processor was in power-saving Execution resumes at effective address mode. The state of all resources 0x0000_0000_0000_0100. was maintained as if the processor was not in power-saving mode. The means for software to distinguish between power- on Reset and other types of System Reset are imple- 10 The interrupt occurred when the mentation-dependent. processor was in power-saving mode. The state of some resources was not maintained, but 6.5.2 Machine Check Interrupt the state of all hypervisor resources was maintained as if the The causes of Machine Check interrupts are implemen- processor was not in power-saving tation-dependent. For example, a Machine Check mode and the state of all other interrupt may be caused by a reference to a storage resources is such that the hypervi- location that contains an uncorrectable error or does sor can resume execution. not exist (see Section 5.6), or by an error in the storage subsystem. 11 The interrupt occurred when the processor was in power-saving When the processor is not in power-saving mode, mode. The state of some Machine Check interrupts are enabled when resources was not maintained, and MSRME=1; if MSRME=0 and a Machine Check excep- the state of some hypervisor tion occurs, the processor enters the Checkstop state. resources was not maintained or When the processor is in doze or nap power-saving the state of some resources is levels, Machine Check interrupts are treated as such that the hypervisor cannot enabled when LPCRPECE[2]=1 and cannot occur when resume execution. LPCRPECE[2]=0. When the processor is in sleep or rvwinkle power-saving level, it is implementation-spe- cific whether Machine Check interrupts are treated as Programming Note enabled under the same conditions as in doze and nap Although the resources that are maintained in power-saving level or if they cannot occur. If a Machine power-saving mode (except in doze power-saving Check exception occurs while the processor is in level) are implementation-dependent, the hypervi- power-saving mode and the Machine Check exception sor can avoid implementation-dependence in the is not enabled to cause exit from power-saving mode, portion of the System Reset and Machine Check the result is implementation specific interrupt handlers that recover from having been in power-saving mode by using the contents of The Checkstop state may also be entered if an access SRR146:47, to determine what state to restore. To is attempted to a storage location that does not exist avoid implementation-dependence in the portion of (see Section 5.6), or if an implementation-dependant the hypervisor that enters power-saving mode, the hardware error occurs that prevents processor opera- hypervisor must use the specification of the four tion. instructions to determine what state to save. Disabled Machine Check (Checkstop State) 62 If the interrupt did not occur while the pro- cessor was in power-saving mode, loaded Chapter 6. Interrupts 557 Version 2.05 When a processor is in Checkstop state, instruction processing is suspended and generally cannot be 10 The interrupt occurred when the restarted without resetting the processor. Some imple- processor was in power-saving mentations may preserve some or all of the internal mode. The state of some state of the processor when entering Checkstop state, resources was not maintained, but so that the state can be analyzed as an aid in problem the state of all hypervisor determination. resources was maintained as if the processor was not in power-saving Enabled Machine Check mode and the state of all other resources is such that the hypervi- If a Machine Check exception causes an interrupt that sor can resume execution. is not context synchronizing or causes the loss of a Direct External exception, or if the state of the proces- 11 The interrupt occurred when the sor has been corrupted, the interrupt is not recover- processor was in power-saving able. mode. The state of some In some systems, the operating system may attempt to resources was not maintained, and identify and log the cause of the Machine Check. the state of some hypervisor resources was not maintained or The following registers are set: the state of some resources is SRR0 If the interrupt did not occur while the pro- such that the hypervisor cannot cessor was in power-saving mode, set on a resume execution. "best effort" basis to the effective address of some instruction that was executing or was about to be executed when the Programming Note Machine Check exception occurred; other- Although the resources that are maintained in wise set to an undefined value. power-saving mode (except in the doze power- saving level) are implementation-dependent, the Programming Note hypervisor can avoid implementation-dependence Since the hypervisor can save the in the portion of the System Reset and Machine address of the instruction following the Check interrupt handlers that recover from having power-saving mode instruction if been in power-saving mode by using the contents needed, there is no need for the pro- of SRR146:47, to determine what state to restore. cessor to preserve it and store it into (To avoid implementation-dependence in the por- SRR0. Therefore, for ease of imple- tion of the hypervisor that enters power-saving mentation, the contents of SRR0 upon mode, the hypervisor must use the specification of exit from power-saving mode are spec- the four instructions to determine what state to ified to be undefined. save) SRR1 62 If the interrupt did not occur while the pro- 46:47 Set to indicate whether the interrupt cessor was in power-saving mode, loaded occurred when the processor was in power- from bit 62 of the MSR if the processor is in saving mode and, if so, the extent to which a recoverable state; otherwise set to 0. If resource state was maintained while the the interrupt occurred while the processor processor was in power-saving mode, as was in power-saving mode, set to 1 if the follows. processor is in a recoverable state; other- wise set to 0. 00 The interrupt did not occur when Others Set to an implementation-dependent value. the processor was in power-saving MSR See Figure 43. mode. DSISR Set to an implementation-dependent value. 01 The interrupt occurred when the DAR Set to an implementation-dependent value. processor was in power-saving mode. The state of all resources Execution resumes at effective address was maintained as if the processor 0x0000_0000_0000_0200. was not in power-saving mode. A Machine Check interrupt caused by the existence of multiple SLB entries or TLB entries (or similar entries in implementation-specific translation caches) which translate a given effective or virtual address (see Sec- tions 5.7.6.2 and 5.7.7.3.) must occur while still in the 558 Power ISATM III-S Version 2.05 context of the partition that caused it. The interrupt SRR0 Set to the effective address of the instruc- must be presented in a way that permits continuing tion that caused the interrupt. execution, with damage limited to the causing partition. SRR1 Treating the exception as instruction-caused will 33:36 Set to 0. achieve these requirements. 42:47 Set to 0. Others Loaded from the MSR. Programming Note If a Machine Check interrupt is caused by an error MSR See Figure 43. in the storage subsystem, the storage subsystem DSISR may return incorrect data, which may be placed 32 Set to 0. into registers. This corruption of register contents 33 Set to 1 if MSRDR=1 and the translation for may occur even if the interrupt is recoverable. an attempted access is not found in the Page Table; otherwise set to 0.. 34:35 Set to 0. 6.5.3 Data Storage Interrupt 36 Set to 1 if the access is not permitted by Figure 26 or 27, as appropriate; otherwise A Data Storage interrupt occurs when no higher priority set to 0. exception exists, the value of the expression 37 Set to 1 if the access is due to a lq, stq, (MSRHV PR = 0b10)|(¬VPM0 & ¬MSRDR) lwarx, ldarx, stwcx., or stdcx. instruction that addresses storage that is Write | (¬VPM1 & MSRDR) Through Required or Caching Inhibited; is 1, and a data access cannot be performed for any of otherwise set to 0. the following reasons. 38 Set to 1 for a Store, dcbz, or ecowx instruction; otherwise set to 0. 1 Data address translation is enabled (MSRDR=1) 39:40 Set to 0. and the virtual address of any byte of the storage 41 Set to 1 if a Data Address Breakpoint location specified by a Load, Store, icbi, dcbz, match occurs; otherwise set to 0. dcbst, dcbf[l], eciwx, or ecowx instruction cannot 42 Set to 1 if the access is not permitted by be translated to a real address. virtual page class key protection; otherwise 1 The effective address specified by a lq, stq, lwarx, set to 0. ldarx, stwcx., or stdcx. instruction refers to stor- 43 Set to 1 if execution of an eciwx or ecowx age that is Write Through Required or Caching instruction is attempted when EARE=0; oth- Inhibited. erwise set to 0. 1 The access violates storage protection. 44:63 Set to 0. 1 A Data Address Breakpoint match occurs. 1 Execution of an eciwx or ecowx instruction is dis- DAR Set to the effective address of a storage allowed because EARE=0. element as described in the following list. The list should be read from the top down; If a stwcx. or stdcx. would not perform its store in the the DAR is set as described by the first item absence of a Data Storage interrupt, and either (a) the that corresponds to an exception that is specified effective address refers to storage that is reported in the DSISR. For example, if a Write Through Required or Caching Inhibited, or (b) a Load instruction causes a storage protec- non-conditional Store to the specified effective address tion violation and a Data Address Break- would cause a Data Storage interrupt, it is implementa- point match (and both are reported in the tion-dependent whether a Data Storage interrupt DSISR), the DAR is set to the effective occurs. address of a byte in the first aligned double- If the contents of the XER specifies a length of zero word for which access was attempted in the bytes for a Move Assist instruction, a Data Storage page that caused the exception. interrupt does not occur for reasons of address transla- 1 a Data Storage exception occurs for tion, or storage protection. If such an instruction causes reasons other than a Data Address a Data Storage interrupt for other reasons, the setting Breakpoint match or, for eciwx and of the DSISR and DAR reflects only these other rea- ecowx, EARE=0 sons listed in the preceding sentence. (E.g., if such an - a byte in the block that caused the instruction causes a storage protection violation and a exception, for a Cache Manage- Data Address Breakpoint match, the DSISR and DAR ment instruction are set as if the storage protection violation did not - a byte in the first aligned double- occur.) word for which access was attempted in the page that caused The following registers are set: the exception, for a Load, Store, eciwx, or ecowx instruction ("first" Chapter 6. Interrupts 559 Version 2.05 refers to address order; see eciwx, or ecowx instruction ("first" Section 6.7) refers to address order; see 1 undefined, for a Data Address Break- Section 6.7) point match, or if eciwx or ecowx is If the interrupt occurs in 32-bit mode, the executed when EARE=0 high-order 32 bits of the DAR are set to 0. For the cases in which the DAR is specified Execution resumes at effective address above to be set to a defined value, if the 0x0000_0000_0000_0380. interrupt occurs in 32-bit mode the high- order 32 bits of the DAR are set to 0. Programming Note If multiple Data Storage exceptions occur for a given A Data Segment interrupt occurs if MSRDR=1 and effective address, any one or more of the bits corre- the translation of the effective address of any byte sponding to these exceptions may be set to 1 in the of the specified storage location is not found in the DSISR. SLB (or in any implementation-specific address translation lookaside information). Execution resumes at effective address 0x0000_0000_0000_0300. 6.5.5 Instruction Storage Interrupt An Instruction Storage interrupt occurs when no higher priority exception exists, the value of the expression 6.5.4 Data Segment Interrupt (MSRHV PR = 0b10)|(¬VPM0 & ¬MSRIR) A Data Segment interrupt occurs when no higher prior- | (¬VPM1 & MSRIR) ity exception exists and a data access cannot be per- formed because data address translation is enabled is 1, and the next instruction to be executed cannot be and the effective address of any byte of the storage fetched for any of the following reasons. location specified by a Load, Store, icbi, dcbz, dcbst, 1 Instruction address translation is enabled and the dcbf[l] eciwx, or ecowx instruction cannot be trans- virtual address cannot be translated to a real lated to a virtual address. address. If a stwcx. or stdcx. would not perform its store in the 1 The fetch access violates storage protection. absence of a Data Segment interrupt, and a non-condi- The following registers are set: tional Store to the specified effective address would cause a Data Segment interrupt, it is implementation- SRR0 Set to the effective address of the instruction dependent whether a Data Segment interrupt occurs. that the processor would have attempted to execute next if no interrupt conditions were If a Move Assist instruction has a length of zero (in the present (if the interrupt occurs on attempting XER), a Data Segment interrupt does not occur, to fetch a branch target, SRR0 is set to the regardless of the effective address. branch target address). The following registers are set: SRR1 SRR0 Set to the effective address of the instruc- 33 Set to 1 if MSRIR=1 and the translation for tion that caused the interrupt. an attempted access is not found in the Page Table; otherwise set to 0. SRR1 34 Set to 0. 33:36 Set to 0. 35 Set to 1 if the access is to No-execute or 42:47 Set to 0. Guarded storage; otherwise set to 0. Others Loaded from the MSR. 36 Set to 1 if the access is not permitted by MSR See Figure 43. Figure 26, or 27, as appropriate; otherwise set to 0. DSISR Set to an undefined value. Programming Note DAR Set to the effective address of a storage element as described in the following list. Storage protection violations for the 1 a byte in the block that caused the Data Storage Interrupt are reported in Data Segment interrupt, for a Cache DSISR36 and DSISR42, whereas stor- Management instruction age protection violations for the Instruc- 1 a byte in the first aligned doubleword tion Storage Interrupt are reported in for which access was attempted in the SRR135 and SRR136. segment that caused the Data Seg- ment interrupt, for a Load, Store, 42:47 Set to 0. 560 Power ISATM III-S Version 2.05 A Direct External interrupt occurs when no higher prior- Others Loaded from the MSR. ity exception exists, a Direct External exception exists, and the value of the expression MSR See Figure 43. MSREE | (¬(LPES0) & (¬(MSRHV) | MSRPR)) If multiple Instruction Storage exceptions occur due to attempting to fetch a single instruction, any one or more is one. The occurrence of the interrupt does not cause of the bits corresponding to these exceptions may be the exception to cease to exist. set to 1 in SRR1. When LPES0=0, the following registers are set: HSRR0 Set to the effective address of the instruction Execution resumes at effective address that the processor would have attempted to 0x0000_0000_0000_0400. execute next if no interrupt conditions were present. 6.5.6 Instruction Segment HSRR1 33:36 Set to 0. Interrupt 42:47 Set to 0. An Instruction Segment interrupt occurs when no Others Loaded from the MSR. higher priority exception exists and the next instruction MSR See Figure 43 on page 555. to be executed cannot be fetched because instruction address translation is enabled and the effective When LPES0=1, the following registers are set: address cannot be translated to a virtual address. SRR0 Set to the effective address of the instruction that the processor would have attempted to The following registers are set: execute next if no interrupt conditions were SRR0 Set to the effective address of the instruction present. that the processor would have attempted to SRR1 execute next if no interrupt conditions were 33:36 Set to 0. present (if the interrupt occurs on attempting 42:47 Set to 0. to fetch a branch target, SRR0 is set to the Others Loaded from the MSR. branch target address). MSR See Figure 43 on page 555. SRR1 33:36 Set to 0. Execution resumes at effective address 42:47 Set to 0. 0x0000_0000_0000_0500. Others Loaded from the MSR. MSR See Figure 43 on page 555. Programming Note Because the value of MSREE is always 1 when the Execution resumes at effective address processor is in problem state, the simpler expres- 0x0000_0000_0000_0480. sion Programming Note MSREE | ¬(LPES0 | MSRHV) An Instruction Segment interrupt occurs if is equivalent to the expression given above. MSRIR=1 and the translation of the effective address of the next instruction to be executed is not found in the SLB (or in any implementation-specific Programming Note address translation lookaside information). The Direct External exception has the same mean- ing as the External exception in versions of the architecture prior to Version 2.05. 6.5.7 External Interrupt An External interrupt is classified as being either a Direct External interrupt or a Mediated External inter- 6.5.7.2 Mediated External Interrupt rupt. Throughout this Book, usage of the phrase "Exter- A Mediated External interrupt occurs when no higher nal interrupt', without further classification, refers to priority exception exists, a Mediated External exception both a Direct External interrupt and a Mediated Exter- exists (see the definition of LPCRMER in Section 2.2), nal interrupt. and the value of the expression 6.5.7.1 Direct External Interrupt MSREE & (¬(MSRHV) | MSRPR) is one. The occurrence of the interrupt does not cause the exception to cease to exist. Chapter 6. Interrupts 561 Version 2.05 When LPES0=0, the following registers are set: effective address refers to storage that is Write Through Required or Caching Inhibited, it is implemen- HSRR0 Set to the effective address of the instruction tation-dependent whether an Alignment interrupt that the processor would have attempted to occurs. execute next if no interrupt conditions were present. Setting the DSISR and DAR as described below is HSRR1 optional for implementations on which Alignment inter- 33:36 Set to 0. rupts occur rarely, if ever, for cases that the Alignment 42 Set to 1. interrupt handler emulates. For such implementations, 43:47 Set to 0. if the DSISR and DAR are not set as described below Others Loaded from the MSR. they are set to undefined values. MSR See Figure 43 on page 555. The following registers are set: When LPES0=1, the following registers are set: SRR0 Set to the effective address of the instruction that caused the interrupt. SRR0 Set to the effective address of the instruction that the processor would have attempted to SRR1 execute next if no interrupt conditions were 33:36 Set to 0. present. 42:47 Set to 0. Others Loaded from the MSR. SRR1 33:36 Set to 0. MSR See Figure 43. 42:47 Set to 0. DSISR Others Loaded from the MSR. 32:43 Set to 0. MSR See Figure 43 on page 555. 44:45 Set to bits 30:31 of the instruction if DS- form. Set to 0b00 if D-, or X-form. Execution resumes at effective address 46 Set to 0. 0x0000_0000_0000_0500. 47:48 Set to bits 29:30 of the instruction if X-form. Set to 0b00 if D- or DS-form. 49 Set to bit 25 of the instruction if X-form. Set 6.5.8 Alignment Interrupt to bit 5 of the instruction if D- or DS-form. An Alignment interrupt occurs when no higher priority 50:53 Set to bits 21:24 of the instruction if X-form. exception exists and a data access cannot be per- Set to bits 1:4 of the instruction if D- or DS- formed for any of the following reasons. form. 54:58 Set to bits 6:10 of the instruction (RT/RS/ 1 The operand of a floating-point Load or Store is not FRT/FRS), except undefined for dcbz. word-aligned, or crosses a virtual page boundary. 59:63 Set to bits 11:15 of the instruction (RA) for 1 The operand of lq, stq, lmw, stmw, lwarx, ldarx, update form instructions; set to either bits stwcx., stdcx., eciwx, ecowx, fdp, lfdpx, stfdp, 11:15 of the instruction or to any register or stfdpx is not aligned. number not in the range of registers to be loaded for a valid form lmw, a valid form 1 The operand of a single-register Load or Store is lswi, or a valid form lswx for which neither not aligned and the processor is in Little-Endian RA nor RB is in the range of registers to be mode. loaded; otherwise undefined. 1 The instruction is lq, stq, lmw, stmw, lswi, lswx, DAR Set to the effective address computed by stswi, or stswx, and the operand is in storage that the instruction, except that if the interrupt is Write Through Required or Caching Inhibited, or occurs in 32-bit mode the high-order 32 bits the processor is in Little-Endian mode. of the DAR are set to 0. 1 The operand of a Load or Store crosses a segment For an X-form Load or Store, it is acceptable for the boundary, or crosses a boundary between virtual processor to set the DSISR to the same value that pages that have different storage control attributes. would have resulted if the corresponding D- or DS-form 1 The operand of a Load or Store is not aligned and instruction had caused the interrupt. Similarly, for a D- is in storage that is Write Through Required or or DS-form Load or Store, it is acceptable for the pro- Caching Inhibited. cessor to set the DSISR to the value that would have resulted for the corresponding X-form instruction. For 1 The operand of lfdp, lfdpx, stfdp, stfdpx, dcbz, example, an unaligned lwax (that crosses a protection lwarx, ldarx, stwcx., or stdcx. is in storage that is boundary) would normally, following the description Write Through Required or Caching Inhibited. above, cause the DSISR to be set to binary: If a stwcx. or stdcx. would not perform its store in the absence of an Alignment interrupt and the specified 000000000000 00 0 01 0 0101 ttttt ????? 562 Power ISATM III-S Version 2.05 where "ttttt" denotes the RT field, and "?????" denotes 1 an mtspr or mfspr instruction is executed an undefined 5-bit value. However, it is acceptable if it when MSRPR=1 if the instruction specifies an causes the DSISR to be set as for lwa, which is SPR with SPR0=0 that is not provided by the implementation 000000000000 10 0 00 0 1101 ttttt ????? 1 an mtspr or mfspr instruction is executed If there is no corresponding alternative form instruction when MSRPR=1 if the instruction specifies (e.g., for lwaux), the value described above is set in SPR 0 the DSISR. 1 an mfspr instruction is executed when MSRPR=0 if the instruction specifies SPR 4 ,5, The instruction pairs that may use the same DSISR or 6. value are. An Illegal Instruction type Program interrupt may lhz/lhzx lhzu/lhzux lha/lhax lhau/lhaux be generated when execution is attempted of any lwz/lwzx lwzu/lwzux lwa/lwax of the following kinds of instruction. ld/ldx ldu/ldux 1 an instruction that is in invalid form lsth/sthx sthu/sthux stw/stwx stwu/stwux 1 an lswx instruction for which RA or RB is in std/stdx stdu/stdux the range of registers to be loaded lfs/lfsx lfsu/lfsux lfd/lfdx lfdu/lfdux Note: If the Hypervisor Emulation Assistance category stfs/stfsx stfsu/stfsux stfd/stfdx stfdu/stfdux is supported, see the following Programming Note. Execution resumes at effective address Programming Note 0x0000_0000_0000_0600. When the Hypervisor Emulation Assistance Programming Note category is supported, the hardware will not generate this interrupt, but instead will gener- The architecture does not support the use of an ate a Hypervisor Emulation Assistance inter- unaligned effective address by lwarx, ldarx, rupt. See Section 6.5.19. The hypervisor must stwcx., stdcx., eciwx, and ecowx. If an Alignment then present this interrupt to operating system interrupt occurs because one of these instructions software as if it were an Illegal Instruction type specifies an unaligned effective address, the Align- Program interrupt when the instruction that ment interrupt handler must not attempt to simulate caused the interrupt is truely illegal, rather than the instruction, but instead should treat the instruc- one to be emulated. tion as a programming error. Privileged Instruction 6.5.9 Program Interrupt The following applies if the instruction is executed when MSRPR = 1. A Program interrupt occurs when no higher priority exception exists and one of the following exceptions A Privileged Instruction type Program interrupt arises during execution of an instruction: is generated when execution is attempted of a privileged instruction, or of an mtspr or mfspr Floating-Point Enabled Exception instruction with an SPR field that contains one A Floating-Point Enabled Exception type Program of the defined values having spr0=1. It may be interrupt is generated when the value of the generated when execution is attempted of an expression mtspr or mfspr instruction with an SPR field that contains a value having spr0=1. (MSRFE0 | MSRFE1) & FPSCRFEX The following applies if the instruction is executed is 1. FPSCRFEX is set to 1 by the execution of a when MSRHV PR = 0b00. floating-point instruction that causes an enabled A Privileged Instruction type Program interrupt exception, including the case of a Move To is generated when execution is attempted of FPSCR instruction that causes an exception bit an mtspr or mfspr instruction with an SPR and the corresponding enable bit both to be 1. field that designates an SPR that is accessible Illegal Instruction by the instruction only when the processor is in hypervisor state, or when execution of a An Illegal Instruction type Program interrupt is gen- hypervisor-privileged instruction is attempted. erated when execution is attempted of an illegal instruction, or of a reserved instruction or an instruction that is not provided by the implementa- tion. It is also generated under the following condi- tions. Chapter 6. Interrupts 563 Version 2.05 FPSCRFEX was changed from 1 to 0 or of Programming Note some subsequent instruction. These are the only cases in which a Privi- leged Instruction type Program interrupt Programming Note can be generated when MSRPR=0. They If SRR0 is set to the effective address can be distinguished from other causes of of a subsequent instruction, that Privileged Instruction type Program inter- instruction will not be beyond the first rupts by examining SRR149 (the bit in such instruction at which synchroniza- which MSRPR was saved by the interrupt). tion of floating-point instructions occurs. (Recall that such synchroniza- Trap tion is caused by Floating-Point Status A Trap type Program interrupt is generated when and Control Register instructions, as any of the conditions specified in a Trap instruction well as by execution synchronizing is met. instructions and events.) The following registers are set: SRR1 SRR0 For all Program interrupts except a Floating- 33:36 Set to 0. Point Enabled Exception type Program inter- 42 Set to 0. rupt, set to the effective address of the instruc- 43 Set to 1 for a Floating-Point Enabled tion that caused the corresponding exception. Exception type Program interrupt; other- wise set to 0. For a Floating-Point Enabled Exception type 44 Set to 1 for an Illegal Instruction type Pro- Program interrupt, set as described in the fol- gram interrupt; otherwise set to 0. lowing list. - If MSRFE0 FE1 = 0b00, FPSCRFEX = 1, 45 Set to 1 for a Privileged Instruction type and an instruction is executed that Program interrupt; otherwise set to 0. changes MSRFE0 FE1 to a nonzero value, 46 Set to 1 for a Trap type Program interrupt; set to the effective address of the instruc- otherwise set to 0. tion that the processor would have 47 Set to 0 if SRR0 contains the address of attempted to execute next if no interrupt the instruction causing the exception and conditions were present. there is only one such instruction; other- wise set to 1. Programming Note Recall that all instructions that can alter Programming Note MSRFE0 FE1 are context synchronizing, SRR147 can be set to 1 only if the and therefore are not initiated until all exception is a Floating-Point Enabled preceding instructions have reported all Exception and either MSRFE0 FE1 = exceptions they will cause. 0b01 or 0b10 or MSRFE0 FE1 has just been changed from 0b00 to a nonzero - If MSRFE0 FE = 0b11, set to the effective value. (SRR147 is always set to 1 in the address of the instruction that caused the last case.) Floating-Point Enabled Exception. - If MSRFE0 FE = 0b01 or 0b10, set to the Others Loaded from the MSR. effective address of the first instruction that caused a Floating-Point Enabled Only one of bits 43:46 can be set to 1. Exception since the most recent time MSR See Figure 43 on page 555. Execution resumes at effective address 0x0000_0000_0000_0700. 6.5.10 Floating-Point Unavailable Interrupt A Floating-Point Unavailable interrupt occurs when no higher priority exception exists, an attempt is made to execute a floating-point instruction (including floating- point loads, stores, and moves), and MSRFP=0. The following registers are set: 564 Power ISATM III-S Version 2.05 SRR0 Set to the effective address of the instruc- Programming Note tion that caused the interrupt. Because the value of MSREE is always 1 when the SRR1 processor is in problem state, the simpler expres- 33:36 Set to 0. sion 42:47 Set to 0. Others Loaded from the MSR. (MSREE | ¬(MSRHV)) & HDICE MSR See Figure 43 on page 555. is equivalent to the expression given above. Execution resumes at effective address 0x0000_0000_0000_0800. 6.5.13 System Call Interrupt 6.5.11 Decrementer Interrupt A System Call interrupt occurs when a System Call instruction is executed. A Decrementer interrupt occurs when no higher priority The following registers are set: exception exists, a Decrementer exception exists, and MSREE=1. SRR0 Set to the effective address of the instruc- tion following the System Call instruction. The following registers are set: SRR1 SRR0 Set to the effective address of the instruc- 33:36 Set to 0. tion that the processor would have 42:47 Set to 0. attempted to execute next if no interrupt Others Loaded from the MSR. conditions were present. MSR See Figure 43 on page 555. SRR1 33:36 Set to 0. Execution resumes at effective address 42:47 Set to 0. 0x0000_0000_0000_0C00. Others Loaded from the MSR. Programming Note MSR See Figure 43 on page 555. An attempt to execute an sc instruction with LEV=1 Execution resumes at effective address in problem state should be treated as a program- 0x0000_0000_0000_0900. ming error. 6.5.14 Trace Interrupt [Category: 6.5.12 Hypervisor Decrementer Trace] Interrupt A Trace interrupt occurs when no higher priority excep- A Hypervisor Decrementer interrupt occurs when no tion exists and either MSRSE=1 and any instruction higher priority exception exists, a Hypervisor Decre- except rfid or hrfid, is successfully completed, or menter exception exists, and the value of the following MSRBE=1 and a Branch instruction is completed. Suc- expression is 1. cessful completion means that the instruction caused (MSREE | ¬(MSRHV) | MSRPR) & HDICE no other interrupt. Thus a Trace interrupt never occurs for a System Call instruction, or for a Trap instruction The following registers are set: that traps. The instruction that causes a Trace interrupt HSRR0 Set to the effective address of the instruc- is called the "traced instruction". tion that the processor would have When a Trace interrupt occurs, the following registers attempted to execute next if no interrupt are set: conditions were present. SRR0 Set to the effective address of the instruc- HSRR1 tion that the processor would have 33:36 Set to 0. attempted to execute next if no interrupt 42:47 Set to 0. conditions were present. Others Loaded from the MSR. SRR1 MSR See Figure 43 on page 555. 33:36 and 42:47 Execution resumes at effective address Set to an implementation-dependent value. 0x0000_0000_0000_0980. Others Loaded from the MSR. MSR See Figure 43 on page 555. Chapter 6. Interrupts 565 Version 2.05 Execution resumes at effective address age interrupt, it is implementation-dependent whether a 0x0000_0000_0000_0D00. Hypervisor Data Storage interrupt occurs. Extensions to the Trace facility are described in If the contents of the XER specifies a length of zero Appendix C. bytes for a Move Assist instruction, a Hypervisor Data Storage interrupt does not occur for reasons of address Programming Note translation, or storage protection. If such an instruction The following instructions are not traced. causes a Hypervisor Data Storage interrupt for other reasons, the setting of the HDSISR and HDAR reflects 1 rfid only these other reasons listed in the preceding sen- 1 hrfid tence. (E.g., if such an instruction causes a storage 1 sc, and Trap instructions that trap protection violation and a Data Address Breakpoint 1 other instructions that cause interrupts (other match, the HDSISR and HDAR are set as if the storage than Trace interrupts) protection violation did not occur.) 1 the first instructions of any interrupt handler 1 instructions that are emulated by software The following registers are set: In general, interrupt handlers can achieve the effect HSRR0 Set to the effective address of the instruc- of tracing these instructions. tion that caused the interrupt. HSRR1 33:36 Set to 0. 42:47 Set to 0. Others Loaded from the MSR. 6.5.15 Hypervisor Data Storage MSR See Figure 43. Interrupt HDSISR A Hypervisor Data Storage interrupt occurs when the 32 Set to 0. processor is not in hypervisor state, no higher priority 33 Set to 1 if the value of the expression exception exists, the value of the expression (MSRDR) | ((¬MSRDR & VPM0) (VPM0 & ¬MSRDR) | (VPM1 & MSRDR) & LPES1) is 1 and the translation for an attempted is 1, and a data access cannot be performed for any of access is not found in the Page Table; oth- the following reasons. erwise set to 0. 34:35 Set to 0. 1 Data address translation is enabled (MSRDR=1) 36 Set to 1 if the access is not permitted by the and the virtual address of any byte of the storage storage protection mechanism; otherwise location specified by a Load, Store, icbi, dcbz, set to 0. dcbst, dcbf[l], eciwx, or ecowx instruction cannot 37 Set to 1 if the access is due to a lq, stq, be translated to a real address. lwarx, ldarx, stwcx., or stdcx. instruction 1 Data address translation is disabled (MSRDR=0), that addresses storage that is Write LPES1 =1, and the virtual address of any byte of Through Required or Caching Inhibited; the storage location specified by a Load, Store, otherwise set to 0. icbi, dcbz, dcbst, dcbf[l], eciwx, or ecowx 38 Set to 1 for a Store, dcbz, or ecowx instruction cannot be translated to a real address instruction; otherwise set to 0. by means of the virtual real addressing mecha- 39:40 Set to 0. nism. 41 Set to 1 if a Data Address Compare match 1 The effective address specified by a lwarx, ldarx, or a Data Address Breakpoint match stwcx., or stdcx. instruction refers to storage that occurs; otherwise set to 0. is Write Through Required or Caching Inhibited. 42 Set to 0. 1 The access violates storage protection. 43 Set to 1 if execution of an eciwx or ecowx 1 A Data Address Compare match or a Data instruction is attempted when EARE=0; oth- Address Breakpoint match occurs. erwise set to 0. 1 Execution of an eciwx or ecowx instruction is dis- 44:63 Set to 0. allowed because EARE=0. HDAR Set to the effective address of a storage If a stwcx. or stdcx. would not perform its store in the element as described in the following list. absence of a Hypervisor Data Storage interrupt, and The list should be read from the top down; either (a) the specified effective address refers to stor- the HDAR is set as described by the first age that is Write Through Required or Caching Inhib- item that corresponds to an exception that ited, or (b) a non-conditional Store to the specified is reported in the HDSISR. For example, if effective address would cause a Hypervisor Data Stor- a Load instruction causes a storage protec- tion violation and a Data Address Break- 566 Power ISATM III-S Version 2.05 point match (and both are reported in the present (if the interrupt occurs on attempting HDSISR), the HDAR is set to the effective to fetch a branch target, HSRR0 is set to the address of a byte in the first aligned double- branch target address). word for which access was attempted in the HSRR1 page that caused the exception. 33 Set to 1 if the value of the expression 1 a Data Storage exception occurs for (MSRIR) | ((¬MSRIR & VPM0) reasons other than a Data Address & LPES1) Breakpoint match or, for eciwx and is 1 and the translation for an attempted ecowx, EARE=0 access is not found in the Page Table; oth- - a byte in the block that caused the erwise set to 0. exception, for a Cache Manage- 34 Set to 0. ment instruction 35 Set to 1 if the access is to No-execute or - a byte in the first aligned double- Guarded storage; otherwise set to 0. word for which access was 36 Set to 1 if the access is not permitted by attempted in the page that caused Figure 26; otherwise set to 0. the exception, for a Load, Store, eciwx, or ecowx instruction ("first" refers to address order; see Programming Note Section 6.7) Storage protection violations for the 1 undefined, for a Data Address Break- Hypervisor Data Storage Interrupt are point match, or if eciwx or ecowx is reported in HDSISR36, whereas storage executed when EARE=0 protection violations for the Hypervisor For the cases in which the HDAR is speci- Instruction Storage Interrupt are reported fied above to be set to a defined value, if in HSRR135 and HSRR136. the interrupt occurs in 32-bit mode the high- order 32 bits of the DAR are set to 0. 42:46 Set to 0. If multiple Hypervisor Data Storage exceptions occur 47 Set to 0. for a given effective address, any one or more of the Others Loaded from the MSR. bits corresponding to these exceptions may be set to 1 MSR See Figure 43. in the HDSISR. If multiple Instruction Storage exceptions occur due to Execution resumes at effective address attempting to fetch a single instruction, any one or more 0x0000_0000_0000_0E00. of the bits corresponding to these exceptions may be set to 1 in HSRR1. 6.5.16 Hypervisor Instruction Execution resumes at effective address Storage Interrupt 0x0000_0000_0000_0E10. A Hypervisor Instruction Storage interrupt occurs when the processor is not in hypervisor state, no higher prior- 6.5.17 Hypervisor Data Seg- ity exception exists, the value of the expression ment Interrupt (VPM0 & ¬MSRIR) | (VPM1 & MSRIR) A Hypervisor Data Segment interrupt may occur when is 1, and the next instruction to be executed cannot be the processor is not in hypervisor state, data address fetched for any of the following reasons. translation is disabled (MSRDR=0), VPM0=1, LPES1=1, no higher priority exception exists, the effective address 1 Instruction address translation is enabled of any byte of the storage location specified by a Load, (MSRIR=1) and the virtual address cannot be Store, icbi, dcbz, dcbst, dcbf[l] eciwx, or ecowx translated to a real address. instruction is beyond the 1 TB VRMA. 1 Instruction address translation is disabled (MSRIR=0), LPES1 =1, and the virtual address If a stwcx. or stdcx. would not perform its store in the cannot be translated to a real address by means of absence of a Hypervisor Data Segment interrupt, and a the virtual real addressing mechanism. non-conditional Store to the specified effective address would cause a Hypervisor Data Segment interrupt, it is 1 The fetch access violates storage protection. implementation-dependent whether a Hypervisor Data Segment interrupt occurs. The following registers are set: If a Move Assist instruction has a length of zero (in the HSRR0 Set to the effective address of the instruction XER), a Hypervisor Data Segment interrupt does not that the processor would have attempted to occur, regardless of the effective address. execute next if no interrupt conditions were Chapter 6. Interrupts 567 Version 2.05 The following registers are set: Hypervisor Emulation Assis- HSRR0 Set to the effective address of the instruc- tance] tion that caused the interrupt. A Hypervisor Emulation Assistance interrupt is HSRR1 generated when execution is attempted of an ille- 33:36 Set to 0. gal instruction, or of a reserved instruction or an 42:47 Set to 0. instruction that is not provided by the implementa- Others Loaded from the MSR. tion. It is also generated under the following condi- MSR See Figure 43. tions. HDSISR Set to an undefined value. 1 an mtspr or mfspr instruction is executed when MSRPR=1 if the instruction specifies an HDAR Set to the effective address of a storage SPR with SPR0=0 that is not provided by the element as described in the following list. implementation 1 a byte in the block that caused the 1 an mtspr or mfspr instruction is executed Hypervisor Data Segment interrupt, for when MSRPR=1 if the instruction specifies a Cache Management instruction SPR 0 1 a byte in the first aligned doubleword 1 an mfspr instruction is executed when for which access was attempted in the MSRPR=0 if the instruction specifies SPR 4, 5, segment that caused the Hypervisor or 6. Data Segment interrupt, for a Load, Store, eciwx, or ecowx instruction A Hypervisor Emulation Assistance interrupt may ("first" refers to address order; see be generated when execution is attempted of any Section 6.7) of the following kinds of instruction. 1 an instruction that is in invalid form Execution resumes at effective address 1 an lswx instruction for which RA or RB is in 0x0000_0000_0000_0E20. the range of registers to be loaded The following registers are set: 6.5.18 Hypervisor Instruction HSRR0 Set to the effective address of the instruc- Segment Interrupt tion that caused the interrupt. A Hypervisor Instruction Segment interrupt may occur HSRR1 when the processor is not in hypervisor state, instruc- 33:36 Set to 0. tion address translation is disabled (MSRIR=0), 42:47 Set to 0. VPM0=1, LPES1=1, no higher priority exception exists, Others Loaded from the MSR. and the effective address of any byte of the instruction MSR See Figure 43 on page 555. is beyond the 1 TB VRMA. HEIR Set to a copy of the instruction that caused The following registers are set: the interrupt HSRR0 Set to the effective address of the instruction Execution resumes at effective address that the processor would have attempted to 0x0000_0000_0000_0E40. execute next if no interrupt conditions were present (if the interrupt occurs on attempting to fetch a branch target, HSRR0 is set to the 6.5.20 Hypervisor Maintenance branch target address). Interrupt HSRR1 A Hypervisor Maintenance interrupt occurs when no 33:36 Set to 0. higher priority exception exists, a Hypervisor Mainte- 42:47 Set to 0. nance exception exists (a bit in the HMER is set to Others Loaded from the MSR. one), the exception is enabled in the HMEER, and the MSR See Figure 43 on page 555. value of the following expression is 1. Execution resumes at effective address (MSREE | ¬(MSRHV) | MSRPR ) 0x0000_0000_0000_0E30. The following registers are set: HSRR0 Set to the effective address of the instruc- 6.5.19 Hypervisor Emulation tion that the processor would have Assistance Interrupt [Category: attempted to execute next if no interrupt conditions were present. HSRR1 568 Power ISATM III-S Version 2.05 33:36 Set to 0. Others Loaded from the MSR. 42:47 Set to 0. MSR See Figure 43 on page 555. Others Loaded from the MSR. Execution resumes at effective address MSR See Figure 43 on page 555. 0x0000_0000_0000_0F20. HMER See Section 6.2.8 on page 549. The exception bits in the HMER are sticky; that is, once set to 1 they remain set to 1 until they are set to 0 by an mthmer instruction. Execution resumes at effective address 0x0000_0000_0000_0E50. Programming Note Because the value of MSREE is always 1 when the processor is in problem state, the simpler expres- sion (MSREE | ¬(MSRHV)) is equivalent to the expression given above. Programming Note If an implementation uses the HMER to record that a readable resource, such as the Time Base, has been corrupted, then, because the HMI is disabled in the hypervisor state, it is necessary for the hyper- visor to check HMER after reading that resource to be sure an error has not occurred. 6.5.21 Performance Monitor Interrupt [Category: Server.Perfor- mance Monitor] The Performance Monitor interrupt is part of the Perfor- mance Monitor facility; see Appendix C. If the Perfor- mance Monitor facility is not implemented or does not use this interrupt, the corresponding interrupt vector (see Figure 44 on page 556) is treated as reserved. 6.5.22 Vector Unavailable Inter- rupt [Category: Vector] A Vector Unavailable interrupt occurs when no higher priority exception exists, an attempt is made to execute a Vector instruction (including Vector loads, stores, and moves), and MSRVEC=0. The following registers are set: SRR0 Set to the effective address of the instruc- tion that caused the interrupt. SRR1 33:36 Set to 0. 42:47 Set to 0. Chapter 6. Interrupts 569 Version 2.05 6.6 Partially Executed Programming Note Instructions An exception may result in the partial execution of a Load or Store instruction. For example, if the Page If a Data Storage, Data Segment, Alignment, system- Table Entry that translates the address of the stor- caused, or imprecise exception occurs while a Load or age operand is altered, by a program running on Store instruction is executing, the instruction may be another processor, such that the new contents of aborted. In such cases the instruction is not completed, the Page Table Entry preclude performing the but may have been partially executed in the following access, the alteration could cause the Load or respects. Store instruction to be aborted after having been partially executed. 1 Some of the bytes of the storage operand may have been accessed, except that if access to a As stated in the Book II section cited above, if an given byte of the storage operand would violate instruction is partially executed the contents of reg- storage protection, that byte is neither copied to a isters are preserved to the extent that the instruc- register by a Load instruction nor modified by a tion can be re-executed correctly. The consequent Store instruction. Also, the rules for storage preservation is described in the following list. For accesses given in Section 5.8.1, "Guarded Stor- any given instruction, zero or one item in the list age" and in Section 2.1 of Book II are obeyed. applies. 1 Some registers may have been altered as 1 For a fixed-point Load instruction that is not a described in the Book II section cited above. multiple or string form, or for an eciwx instruc- tion, if RT=RA or RT=RB then the contents of 1 Reference and Change bits may have been register RT are not altered. updated as described in Section 5.7.8. 1 For an lq instruction, if RT+1 = RA then the 1 For a stwcx. or stdcx. instruction that is executed contents of register RT+1 are not altered. in-order, CR0 may have been set to an undefined value and the reservation may have been cleared. 1 For an update form Load or Store instruction, the contents of register RA are not altered. 1 For an lq instruction that is executed in-order, the TGCC may have been set to an undefined value. 1 The architecture does not support continuation of an aborted instruction but intends that the aborted instruc- tion be re-executed if appropriate. 570 Power ISATM III-S Version 2.05 6.7 Exception Ordering Instruction-Caused and Precise Since multiple exceptions can exist at the same time 1. [Hypervisor] Instruction Segment and the architecture does not provide for reporting 2. [Hypervisor] Instruction Storage more than one interrupt at a time, the generation of 3.a Hypervisor Emulation Assistance [Category: HEA] more than one interrupt is prohibited. Some exceptions, 3.b Program such as the Mediated External exception, persist and - Illegal Instruction, if Category: HEA is not can be deferred. However, other exceptions would be supported lost if they were not recognized and handled when they - Privileged Instruction occur. For example, if an External interrupt was gener- 4. Function-Dependent ated when a Data Storage exception existed, the Data 4.a Fixed-Point and Branch Storage exception would be lost. If the Data Storage 1a Program exception was caused by a Store Multiple instruction - Trap for which the storage operand crosses a virtual page 1b System Call boundary and the exception was a result of attempting 1c [Hypervisor] Data Storage, [Hypervisor] Data to access the second virtual page, the store could have Segment, or Alignment modified locations in the first virtual page even though it 2 Trace appeared that the Store Multiple instruction was never 4.b Floating-Point executed. 1 FP Unavailable 2a Program For the above reasons, all exceptions are prioritized - Precise Mode Floating-Pt Enabled Excep'n with respect to other exceptions that may exist at the 2b [Hypervisor] Data Storage, [Hypervisor] Data same instant to prevent the loss of any exception that is Segment, or Alignment not persistent. Some exceptions cannot exist at the 3 Trace same instant as some others. 4.c Vector 1 Vector Unavailable Data Storage, Hypervisor Data Storage, Data Seg- 2a [Hypervisor] Data Storage, [Hypervisor] Data ment, Hypervisor Data Segment, and Alignment excep- Segment, or Alignment tions occur as if the storage operand were accessed 3 Trace one byte at a time in order of increasing effective address (with the obvious caveat if the operand For implementations that execute multiple instructions includes both the maximum effective address and in parallel using pipeline or superscalar techniques, or effective address 0). combinations of these, it can be difficult to understand the ordering of exceptions.To understand this ordering it is useful to consider a model in which each instruction 6.7.1 Unordered Exceptions is fetched, then decoded, then executed, all before the The exceptions listed here are unordered, meaning that next instruction is fetched. In this model, the excep- they may occur at any time regardless of the state of tions a single instruction would generate are in the the interrupt processing mechanism. These exceptions order shown in the list of instruction-caused exceptions. are recognized and processed when presented. Exceptions with different numbers have different order- ing. Exceptions with the same numbering but differ- 1. System Reset ent lettering are mutually exclusive and cannot be 2. Machine Check caused by the same instruction. The External, Decre- menter, and Hypervisor Decrementer interrupts have 6.7.2 Ordered Exceptions equal ordering. Similarly, where Data Storage, Data Segment, and Alignment exceptions are listed in the The exceptions listed here are ordered with respect to same item they have equal ordering. the state of the interrupt processing mechanism. In the Even on processors that are capable of executing sev- following list, the hypervisor forms of the Data Storage, eral instructions simultaneously, or out of order, Instruction Storage, Data Segment, and Instruction instruction-caused interrupts (precise and imprecise) Segment exceptions can be substituted for the non- occur in program order. hypervisor forms since the hypervisor forms cannot be caused by the same instruction and have the same ordering. 6.8 Interrupt Priorities System-Caused or Imprecise This section describes the relationship of non- 1. Program maskable, maskable, precise, and imprecise interrupts. - Imprecise Mode Floating-Point Enabled Exception In the following descriptions, the interrupt mechanism 2. Hypervisor Maintenance waiting for all possible exceptions to be reported 3. External and [Hypervisor] Decrementer includes only exceptions caused by previously initiated Chapter 6. Interrupts 571 Version 2.05 instructions (e.g., it does not include waiting for the lists. Where [Hypervisor] Data Storage, [Hypervi- Decrementer to step through zero). The exceptions sor] Data Segment, and Alignment exceptions are are listed in order of highest to lowest priority. The listed in the same item they have equal priority phrase "corresponding interrupt" means the interrupt (i.e., the processor may generate any one of the having the same name as the exception unless the pro- three interrupts for which an exception exists). cessor is in power-saving mode, in which case the A. Fixed-Point Loads and Stores phrase means the System Reset interrupt. a. These exceptions are mutually exclusive Unless otherwise stated or obvious from context, it is and have the same priority: assumed below that one of the following conditions is 1 Hypervisor Emulation Assistance [Cat- satisfied. egory: HEA] 1 Program - Illegal Instruction if Cate- 1 The processor is not in power-saving mode and gory: HEA is not supported the interrupt, unless it is the Machine Check 1 Program - Privileged Instruction interrupt, is not disabled. (For the Machine b. [Hypervisor] Data Storage, [Hypervisor] Check interrupt no assumption is made regard- Data Segment, or Alignment ing enablement.) c. Trace 1 The processor is in power-saving mode and the B. Floating-Point Loads and Stores exception is enabled to cause exit from the a. Hypervisor Emulation Assistance [Category: mode. HEA], or Program - Illegal Instruction if Cat- In the following list, the hypervisor forms of the Data egory: HEA is not supported Storage, Instruction Storage, Data Segment, and b. Floating-Point Unavailable Instruction Segment exceptions can be substituted for c. [Hypervisor] Data Storage, [Hypervisor] the non-hypervisor forms since the hypervisor forms Data Segment, or Alignment cannot occur simultaneously and have the same prior- d. Trace ity. C. Vector Loads and Stores 1. System Reset a. Hypervisor Emulation Assistance [Category: HEA], or Program - Illegal Instruction if Cat- System Reset exception has the highest priority of egory: HEA is not supported all exceptions. If this exception exists, the inter- b. Vector Unavailable rupt mechanism ignores all other exceptions and c. [Hypervisor] Data Storage, [Hypervisor] generates a System Reset interrupt. Data Segment, or Alignment Once the System Reset interrupt is generated, d. Trace no nonmaskable interrupts are generated due to D. Other Floating-Point Instructions exceptions caused by instructions issued prior to a. Floating-Point Unavailable the generation of this interrupt. b. Program - Precise Mode Floating-Point 2. Machine Check Enabled Exception c. Trace Machine Check exception is the second highest priority exception. If this exception exists and a E. Other Vector Instructions System Reset exception does not exist, the a. Vector Unavailable interrupt mechanism ignores all other exceptions b. Trace and generates a Machine Check interrupt. F. rfid, hrfid and mtmsr[d] Once the Machine Check interrupt is generated, a. Program - Privileged Instruction no nonmaskable interrupts are generated due to b. Program - Floating-Point Enabled Exception exceptions caused by instructions issued prior to c. Trace, for mtmsr[d] only the generation of this interrupt. G. Other Instructions 3. Instruction-Dependent a.These exceptions are mutually exclusive and have the same priority: This exception is the third highest priority excep- 1 Program - Trap tion. When this exception is created, the inter- 1 System Call rupt mechanism waits for all possible Imprecise 1 Program - Privileged Instruction exceptions to be reported. It then generates the 1 Hypervisor Emulation Assistance [Cat- appropriate ordered interrupt if no higher priority egory: HEA], or Program - Illegal exception exists when the interrupt is to be Instruction if Category: HEA is not sup- generated. Within this category a particular ported instruction may present more than a single b.Trace exception. When this occurs, those exceptions are ordered in priority as indicated in the following 572 Power ISATM III-S Version 2.05 H. [Hypervisor] Instruction Storage and this interrupt is enabled causes an exception (see [Hypervisor] Instruction Segment the Programming Note below), the Direct External interrupt is not delayed indefinitely. These exceptions have the lowest priority in this category. They are recognized only when Programming Note all instructions prior to the instruction caus- ing one of these exceptions appear to have An incorrect or malicious operating system completed and that instruction is the next could corrupt the first instruction in the instruction to be executed. The two excep- interrupt vector location for an instruction- tions are mutually exclusive. caused interrupt such that the attempt to exe- cute the instruction causes the same excep- The priority of these exceptions is specified for tion that caused the interrupt (a looping completeness and to ensure that they are interrupt; e.g., illegal instruction and Program not given more favorable treatment. It is interrupt). Similarly, the first instruction of the acceptable for an implementation to treat interrupt vector for one instruction-caused these exceptions as though they had a lower interrupt could cause a different instruction- priority. caused interrupt, and the first instruction of 4. Program - Imprecise Mode Floating-Point Enabled the interrupt vector for the second Exception instruction-caused interrupt could cause the first instruction-caused interrupt (e.g., This exception is the fourth highest priority excep- Program interrupt and Floating-Point tion. When this exception is created, the interrupt Unavailable interrupt). Similarly, if the Real mechanism waits for all other possible exceptions Mode Area is virtualized and there is no PTE to be reported. It then generates this interrupt if no for the page containing the interrupt vectors, higher priority exception exists when the interrupt every attempt to execute the first instruction of is to be generated. the OS's Instruction Storage interrupt handler 5. Hypervisor Maintenance would cause a Hypervisor Instruction Storage interrupt; if the Hypervisor Instruction Storage This exception is the fifth highest priority excep- interrupt handler returns to the OS's Instruc- tion. When this exception is created, the interrupt tion Storage interrupt handler without the rele- mechanism waits for all other possible exceptions vant PTE having been created, another to be reported. It then generates this interrupt if no Hypervisor Instruction Storage interrupt would higher priority exception exists when the interrupt occur immediately. The looping caused by is to be generated. these and similar cases is terminated by the If a Hypervisor Maintenance exception exists and occurrence of a System Reset or Hypervisor each attempt to execute an instruction when the Decrementer interrupt. Hypervisor Maintenance interrupt is enabled causes an exception (see the Programming Note below), the Hypervisor Maintenance interrupt is not delayed indefinitely. 6. Direct External, Mediated External, and [Hypervi- sor] Decrementer These exceptions are the lowest priority excep- tions. All have equal priority (i.e., the processor may generate any one of the corresponding inter- rupts for which an exception exists). When one of these exceptions is created, the interrupt process- ing mechanism waits for all other possible excep- tions to be reported. It then generates the corresponding interrupt if no higher priority excep- tion exists when the interrupt is to be generated. If a Hypervisor Decrementer exception exists and each attempt to execute an instruction when the Hypervisor Decrementer interrupt is enabled causes an exception (see the Programming Note below), the Hypervisor Decrementer interrupt is not delayed indefinitely. If LPES0=1 and a Direct External exception exists and each attempt to execute an instruction when Chapter 6. Interrupts 573 Version 2.05 574 Power ISATM III-S Version 2.05 Chapter 7. Timer Facilities 7.1 Overview. . . . . . . . . . . . . . . . . . . . 575 7.4 Hypervisor Decrementer . . . . . . . . 577 7.2 Time Base (TB) . . . . . . . . . . . . . . 575 7.5 Processor Utilization of Resources 7.2.1 Writing the Time Base . . . . . . . . 576 Register (PURR) . . . . . . . . . . . . . . . . . 578 7.3 Decrementer . . . . . . . . . . . . . . . . . 576 7.6 Scaled Processor Utilization of 7.3.1 Writing and Reading the Decre- Resources Register (PURR) . . . . . . . . 578 menter . . . . . . . . . . . . . . . . . . . . . . . . . 577 7.1 Overview See Chapter 4 of Book II for infromation about the update frequency of the Time Base. The Time Base, Decrementer, Hypervisor Decre- The Time Base is implemented such that: menter, Processor Utilization of Resources, and Scaled Processor Utilization of Resources registers 1. Loading a GPR from the Time Base has no effect provide timing functions for the system. The remainder on the accuracy of the Time Base. of this section describes these registers and related 2. Copying the contents of a GPR to the Time Base facilities. replaces the contents of the Time Base with the contents of the GPR. 7.2 Time Base (TB) The Power ISA does not specify a relationship between the frequency at which the Time Base is updated and The Time Base (TB) is a 64-bit register (see Figure 45) other frequencies, such as the CPU clock or bus clock containing a 64-bit unsigned integer that is incremented in a Power ISA system. The Time Base update fre- periodically. quency is not required to be constant. What is required, so that system software can keep time of day and oper- 0 39 ate interval timers, is one of the following. TBU40 /// 1 The system provides an (implementation-depen- TBU TBL dent) interrupt to software whenever the update 0 32 63 frequency of the Time Base changes, and a means to determine what the current update fre- Field Description quency is. TBU40 Upper 40 bits of Time Base 1 The update frequency of the Time Base is under TBU Upper 32 bits of Time Base the control of the system software. TBL Lower 32 bits of Time Base Implementations must provide a means for either pre- venting the Time Base from incrementing or preventing Figure 45. Time Base it from being read in problem state (MSRPR=1). If the The Time Base is a hypervisor resource; see Chapter means is under software control, it must be privileged 2. and, in implementations of the Server environment, must be accessible only in hypervisor state (MSRHV PR The SPRs TBU40, TBU, and TBL provide access to the = 0b10). There must be a method for getting all pro- fields of the Time Base shown in Figure 45. When a cessors' Time Bases to start incrementing with values mtspr instruction is executed specifying one of these that are identical or almost identical in all processors. SPRs, the associated field of the Time Base is altered and the remaining bits of the Time Base are not affected. Chapter 7. Timer Facilities 575 Version 2.05 Rx contain the desired value upper 40 bits of the Time Programming Note Base. If software initializes the Time Base on power-on to some reasonable value and the update frequency mftb Ry # Read 64-bit Time Base value of the Time Base is constant, the Time Base can clrldi Ry,Ry,40# lower 24 bits of old TB be used as a source of values that increase at a mttbu40Rx # write upper 40 bits of TB constant rate, such as for time stamps in trace mftb Rz # read TB value again entries. clrldi Rz,Rz,40# lower 24 bits of new TB cmpld Rz,Ry # compare new and old lwr 24 Even if the update frequency is not constant, val- bge done # no carry out of low 24 bits ues read from the Time Base are monotonically addis Rx,Rx,0x0100#increment upper 40 bits increasing (except when the Time Base wraps from mttbu40 Rx # update to adjust for carry 264-1 to 0). If a trace entry is recorded each time the update frequency changes, the sequence of Programming Note Time Base values can be post-processed to The instructions for writing the Time Base are become actual time values. mode-independent. Thus code written to set the Successive readings of the Time Base may return Time Base will work correctly in either 64-bit or 32- identical values. bit mode. If Time Base bits 60:63 are used as part of a ran- dom number generator, software must account for the fact that these bits are set to 0x0 only when bit 7.3 Decrementer 59 changes state regardless of whether or not they The Decrementer (DEC) is a 32-bit decrementing incremented to 0xF since they were previously set counter that provides a mechanism for causing a Dec- to 0x0. rementer interrupt after a programmable delay. The See the description of the Time Base in Chapter of contents of the Decrementer are treated as a signed Book II for ways to compute time of day in POSIX integer. format from the Time Base. DEC 32 63 7.2.1 Writing the Time Base Figure 46. Decrementer Writing the Time Base is privileged, and can be done The Decrementer counts down. only in hypervisor state. Reading the Time Base is not privileged; it is discussed in Chapter 4 of Book II. Decrementer bits 32:59 count down until their value becomes 0x000_0000, at the next increment their It is not possible to write the entire 64-bit Time Base value becomes 0xFFF_FFFF. Decrementer bits 60:63 using a single instruction. The mttbl and mttbu may decrement at a variable rate. When the value of extended mnemonics write the lower and upper halves bit 59 changes, bits 60:63 are set to 0xF; if bits 60:63 of the Time Base (TBL and TBU), respectively, pre- decrement to 0x0 before the value of bit 59 changes, serving the other half. These are extended mnemonics they remain at 0x0 until the value of bit 59 changes. for the mtspr instruction; see Appendix A, "Assembler Extended Mnemonics" on page 589. The Decrementer is driven by the same frequency as the Time Base. The period of the Decrementer will The Time Base can be written by a sequence such as: depend on the driving frequency, but if the same values are used as given earlier for the Time Base (see lwz Rx,upper # load 64-bit value for Section 4.2 of Book II), and if the Time Base update lwz Ry,lower # TB into Rx and Ry frequency is constant, the period would be li Rz,0 mttbl Rz # set TBL to 0 32 mttbu Rx # set TBU 2 × 32 TDEC = -------------------- = 137 seconds. - mttbl Ry # set TBL 1 GHz Provided that no interrupts occur while the last three instructions are being executed, loading 0 into TBL pre- When the contents of DEC32 change from 0 to 1, a vents the possibility of a carry from TBL to TBU while Decrementer exception will come into existence within the Time Base is being initialized. a reasonable period or time. When the contents of DEC32 change from 1 to 0, an existing Decrementer The preferred method of changing the Time Base uti- exception will cease to exist within a reasonable period lizes the TBU40 facility. The following code sequence of time, but not later than the completion of the next demonstrates the process. Assume the upper 40 bits of context synchronizing instruction or event. 576 Power ISATM III-S Version 2.05 The preceding paragraph applies regardless of ble delay. The contents of the Decrementer are treated whether the change in the contents of DEC32 is the as a signed integer. result of decrementation of the Decrementer by the processor or of modification of the Decrementer HDEC caused by execution of an mtspr instruction. 32 63 The operation of the Decrementer satisfies the follow- Figure 47. Hypervisor Decrementer ing constraints. The Hypervisor Decrementer is a hypervisor resource; 1. The operation of the Time Base and the Decre- see Chapter 2. menter is coherent, i.e., the counters are driven by the same fundamental time base. Hypervisor Decrementer bits 32:59 count down until their value becomes 0x000_0000, at the next incre- 2. Loading a GPR from the Decrementer has no ment their value becomes 0xFFF_FFFF. Bits 60:63 effect on the accuracy of the Time Base. may decrement at a variable rate. When the value of 3. Copying the contents of a GPR to the Decre- bit 59 changes, bits 60:63 are set to 0xF; if bits 60:63 menter replaces the contents of the Decrementer decrement to 0x0 before the value of bit 59 changes, with the contents of the GPR. they remain at 0x0 until the value of bit 59 changes. The Hypervisor Decrementer is driven by the same fre- Programming Note quency as the Time Base. The period of the Hypervisor In systems that change the Time Base update fre- Decrementer will depend on the driving frequency, but quency for purposes such as power management, if the same values are used as given above for the the Decrementer input frequency will also change. Time Base (see Section 7.2), and if the Time Base Software must be aware of this in order to set inter- update frequency is constant, the period would be val timers. 32 If Decrementer bits 60:63 are used as part of a ran- 2 × 32 TDEC = -------------------- = 137 seconds. - dom number generator, software must account for 1 GHz the fact that these bits are set to 0xF only when bit 59 changes state regardless of whether or not they When the contents of HDEC32 change from 0 to 1, a decremented to 0x0 since they were previously set Hypervisor Decrementer exception will come into exist- to 0xF. ence within a reasonable period or time. When the con- tents of HDEC32 change from 1 to 0, an existing Hypervisor Decrementer exception will cease to exist within a reasonable period of time, but not later than 7.3.1 Writing and Reading the the completion of the next context synchronizing Decrementer instruction or event. The contents of the Decrementer can be read or written The preceding paragraph applies regardless of using the mfspr and mtspr instructions, both of which whether the change in the contents of HDEC32 is the are privileged when they refer to the Decrementer. result of decrementation of the Hypervisor Decre- Using an extended mnemonic (see Appendix A, menter by the processor or of modification of the "Assembler Extended Mnemonics" on page 589), the Hypervisor Decrementer caused by execution of an Decrementer can be written from GPR Rx using: mtspr instruction. The operation of the Hypervisor Decrementer satisfies mtdec Rx the following constraints. The Decrementer can be read into GPR Rx using: 1. The operation of the Time Base and the Hypervi- mfdec Rx sor Decrementer is coherent, i.e., the counters are driven by the same fundamental time base. Copying the Decrementer to a GPR has no effect on the Decrementer contents or on the interrupt mecha- 2. Loading a GPR from the Hypervisor Decrementer nism. has no effect on the accuracy of the Hypervisor Decrementer. 3. Copying the contents of a GPR to the Hypervisor 7.4 Hypervisor Decrementer Decrementer replaces the contents of the Hypervi- sor Decrementer with the contents of the GPR. The Hypervisor Decrementer (HDEC) is a 32-bit decre- menting counter that provides a mechanism for causing a Hypervisor Decrementer interrupt after a programma- Chapter 7. Timer Facilities 577 Version 2.05 the algorithm for incrementing the PURR are imple- Programming Note mentation-specific. In systems that change the Time Base update fre- quency for purposes such as power management, The PURR is implemented such that: the Hypervisor Decrementer update frequency will 1. Loading a GPR from the PURR has no effect on also change. Software must be aware of this in the accuracy of the PURR. order to set interval timers. 2. Copying the contents of a GPR to the PURR If Hypervisor Decrementer bits 60:63 are used as replaces the contents of the PURR with the con- part of a random number generator, software must tents of the GPR. account for the fact that these bits are set to 0xF only when bit 59 changes state regardless of Programming Note whether or not they decremented to 0x0 since they Estimates computed as described above may be were previously set to 0xF. useful for purposes related to resource utilization, including utilization-based system management and planning. 7.5 Processor Utilization of Because the rate at which the PURR accumulates Resources Register (PURR) resource usage estimates is dependent on the fre- quency at which the Time Base is incremented, The Processor Utilization of Resources Register and the frequency of the oscillator that drives (PURR) is a 64-bit counter, the contents of which pro- instruction execution may vary independently from vide an estimate of the resources used by the proces- that of the Time Base, the interpretation of the con- sor. The contents of the PURR are treated as a 64-bit tents of the PURR may be inaccurate as a mea- unsigned integer. surement of capacity consumption for accounting purposes. The SPURR should be used for PURR accounting purposes. 0 63 Figure 48. Processor Utilization of Resources Register 7.6 Scaled Processor Utilization The PURR is a hypervisor resource; see Chapter 2. of Resources Register (PURR) The contents of the PURR increase monotonically, The Scaled Processor Utilization of Resources Regis- unless altered by software, until the sum of the con- ter (SPURR) is a 64-bit counter, the contents of which tents plus the amount by which it is to be increased provide an estimate of the resources used by the pro- exceed 0xFFFF_FFFF_FFFF_FFFF (264 - 1) at which cessor. The contents of the SPURR are treated as a point the contents are replaced by that sum modulo 64-bit unsigned integer. 264. There is no interrupt or other indication when this occurs. SPURR The rate at which the value represented by the con- 0 63 tents of the PURR increases is an estimate of the por- Figure 49. Scaled Processor Utilization of tion of resources used by the processor per unit time Resources Register with respect to other processors that share those resources monitored by the PURR. When the proces- The SPURR is a hypervisor resource; see Section 2.7. sor is idle, the rate at which the PURR value increases The contents of the SPURR increase monotonically, is implementation dependent. unless altered by software, until the sum of the con- Let the difference between the value represented by tents plus the amount by which it is to be increased the contents of the Time Base at times Ta and Tb be exceed 0xFFFF_FFFF_FFFF_FFFF (264 - 1) at which Tab. Let the difference between the value represented point the contents are replaced by that sum modulo by the contents of the PURR at time Ta and Tb be the 264. There is no interrupt or other indication when this value Pab. The ratio of Pab/Tab is an estimate of the occurs. percentage of shared resources used by the processor The rate at which the value represented by the con- during the interval Tab. For the set {S} of processors tents of the SPURR increases is an estimate of the por- that share the resources monitored by the PURR, the tion of resources used by the processor with respect to sum of the usage estimates for all the processors in the other processors that share those resources monitored set is 1.0. by the SPURR, and relative to the computational The definition of the set of processors S, the shared capacity provided by those resources. The computa- resources corresponding to the set S, and specifics of tional capacity provided by the shared resources may 578 Power ISATM III-S Version 2.05 vary as a function of the frequency of the oscillator which drives the resources or as a result of deliberate delays in processing that are created to reduce power consumption. When the processor is idle, the rate at which the SPURR value increases is implementation dependent. Let the difference between the value represented by the contents of the Time Base at times Ta and Tb be Tab. Let the ratio of the effective and nominal frequen- cies of the oscillator driving instruction execution fe/fn be fr. Let the ratio of delay cycles created by power reduction circuitry and total cycles cd/ct be cr. Let the difference between the value represented by the con- tents of the SPURR at time Ta and Tb be the value Sab. The ratio of Sab/(Tab x fr x (1 - cr)) is an estimate of the percentage of shared resource capacity used by the processor during the interval Tab. For the set {S} of processors that share the resources monitored by the SPURR, the sum of the usage estimates for all the pro- cessors in the set is 1.0. The definition of the set of processors S, the shared resources corresponding to the set S, and specifics of the algorithm for incrementing the SPURR are imple- mentation-specific. The SPURR is implemented such that: 1. Loading a GPR from the SPURR has no effect on the accuracy of the SPURR. 2. Copying the contents of a GPR to the SPURR replaces the contents of the SPURR with the con- tents of the GPR. Programming Note Estimates computed as described above may be useful for purposes of resource use accounting, program dispatching, etc. Chapter 7. Timer Facilities 579 Version 2.05 580 Power ISATM III-S Version 2.05 Chapter 8. Debug Facilities 8.1 Overview. . . . . . . . . . . . . . . . . . . . 581 8.1.2 Data Address Breakpoint . . . . . . 581 8.1.1 Come-From Address Register . . 581 8.1 Overview Programming Note This register can be used for purposes of debug- Processors provide debug facilities to enable hardware ging software. For example, often a software bug and software debug functions, such as instructions and results in the program executing a portion of the data breakpoints and program single stepping. The code that it should not have reached or causing an debug facilities consist of a data address breakpoint unexpected interrupt. In the former case, a break- register (DABR), a data address breakpoint register point can be placed in the portion of the code that extension (DABRX) (see Section 8.1.2) and an associ- was erroneously reached and the program reexe- ated interrupt (see Section 6.5.3). cuted. In either case, the interrupt handler can save The mfspr and mtspr instructions (see Section 4.4.5) the contents of the CFAR (before executing the first provide access to the registers of the debug facilities. instruction that would modify the register), and then make the saved contents available for a debugger In addition to the facilities described here, implementa- to use in determining the control flow path by which tions will typically include debug facilities, modes, and the exception was reached. access mechanisms which are implementation-spe- cific. For example, implementations will typically pro- In order to preserve the CFAR's contents for each vide access to the debug facilities via a dedicated partition and to prevent it from being used to imple- interface such as the IEEE 1149.1 Test Access Port ment a "covert channel" between partitions, the (JTAG). hypervisor should initialize/save/restore the CFAR when switching partitions on a given processor. 8.1.1 Come-From Address Regis- ter The Come-From Address Register (CFAR) is a 64-bit 8.1.2 Data Address Breakpoint register. When an rfid instruction is executed, the reg- The Data Address Breakpoint mechanism provides a ister is set to the effective address of the instruction. means of detecting load and store accesses to a desig- When a Branch instruction is executed and the branch nated doubleword. The address comparison is done is taken, the register is set to the effective address of an on an effective address (EA). instruction in the instruction cache block containing the Branch instruction. For Branch instructions, this setting The Data Address Breakpoint mechanism is controlled need not occur until a subsequent context synchroniz- by the Data Address Breakpoint Register (DABR), ing operation has occurred. CFAR // 0 62 63 Figure 50. Come-From Address Register The contents of the CFAR can be read and written using the mfspr and mtspr instructions. Acccess to the CFAR is privileged. Chapter 8. Debug Facilities 581 Version 2.05 shown in Figure 51, and the Data Address Breakpoint 1 the instruction is a Store and DABRDW = 1, or the Register Extension (DABRX), shown in Figure 52. instruction is a Load and DABRDR = 1. In 32-bit mode the high-order 32 bits of the EA are DAB BT DW DR treated as zeros for the purpose of detecting a match. 0 61 62 63 If the above conditions are satisfied, a match also Bit(s) Name Description occurs for eciwx and ecowx. For the purpose of deter- 0:60 DAB Data Address Breakpoint mining whether a match occurs, eciwx is treated as a 61 BT Breakpoint Translation Load, and ecowx is treated as a Store. 62 DW Data Write If the above conditions are satisfied, it is undefined 63 DR Data Read whether a match occurs in the following cases. Figure 51. Data Address Breakpoint Register 1 The instruction is Store Conditional but the store is not performed. /// BTI PRIVM 1 The instruction is a Load/Store String of zero 0 60 61 63 length. 1 The instruction is dcbz. (For the purpose of deter- Bit(s) Name Description mining whether a match occurs, dcbz is treated as 60 BTI Breakpoint Translation Ignore a Store.) 61:63 PRIVM Privilege Mask 61 HYP Hypervisor state The Cache Management instructions other than dcbz 62 PNH Privileged but Non-Hypervisor state never cause a match. 63 PRO Problem state A Data Address Breakpoint match causes a Data Stor- All other fields are reserved. age exception (see Section 6.5.3, "Data Storage Inter- rupt" on page 559). If a match occurs, some or all of the Figure 52. Data Address Breakpoint Register bytes of the storage operand may have been Extension accessed; however, if a Store or ecowx instruction The DABR and DABRX are hypervisor resources; see causes the match, the storage operand is not modified Section 2.7 on page 475. if the instruction is one of the following: 1 any Store instruction that causes an atomic access The supported PRIVM values are 0b000, 0b001, 1 ecowx 0b010, 0b011, 0b100, and 0b111. If the PRIVM field does not contain one of the supported values, then Programming Note whether a match occurs for a given storage access is The Data Address Breakpoint mechanism does not undefined. Elsewhere in this section it is assumed that apply to instruction fetches. the PRIVM field contains one of the supported values. Programming Note Programming Note PRIVM value 0b000 causes matches not to occur Before setting a breakpoint requested by the oper- regardless of the contents of other DABR and ating system, the hypervisor must verify that the DABRX fields. PRIVM values 0b101 and 0b110 are requested contents of the DABR and DABRX can- not supported because a storage location that is not cause the hypervisor to receive a Data Storage shared between the hypervisor and non-hypervisor interrupt that it is not prepared to handle, or that it software is unlikely to be accessed using the same intrinsically cannot handle (e.g., the EA is in the EA by both the hypervisor and the non-hypervisor range of EAs at which the hypervisor's Data Stor- software. (PRIVM value 0b111 is supported prima- age interrupt handler saves registers, DABRBT || rily for reasons of software compatibility, as DABRXBTI 0b10, DABRDW = 1, and DABRXHYP = described in a subsequent Programming Note.) 1). A Data Address Breakpoint match occurs for a Load or Programming Note Store instruction if, for any byte accessed, all of the fol- lowing conditions are satisfied. Processors that comply with versions of the archi- tecture that precede Version 2.02 do not provide 1 EA0:60 = DABRDAB the DABRX. Forward compatibility for software 1 (MSRDR = DABRBT) | DABRXBTI that was written for such processors (and uses the 1 if the processor is in Data Address Breakpoint facility) can be obtained - hypervisor state and DABRXHYP = 1 or by setting DABRX60:63 to 0b0111. - privileged but non-hypervisor state and DABRXPNH = 1 or - problem state and DABRXPR = 1 582 Power ISATM III-S Version 2.05 Chapter 9. External Control [Category: External Control] The External Control facility permits a program to com- municate with a special-purpose device. The facility 9.2 External Access Instructions consists of a Special Purpose Register, called EAR, The External Access instructions, External Control In and two instructions, called External Control In Word Word Indexed (eciwx) and External Control Out Word Indexed (eciwx) and External Control Out Word Indexed (ecowx), are described in Book II. Additional Indexed (ecowx). information about them is given below. This facility must provide a means of synchronizing the If attempt is made to execute either of these instruc- devices with the processor to prevent the use of an tions when EARE=0, a Data Storage interrupt occurs address by the device when the translation that pro- with bit 43 of the DSISR set to 1. duced that address is being invalidated. The instructions are supported whenever MSRDR=1. If either instruction is executed when MSRDR=0 (real 9.1 External Access Register addressing mode), the results are boundedly unde- fined. This 32-bit Special Purpose Register controls access to the External Control facility and, for external control operations that are permitted, identifies the target device. E /// RID 32 33 58 63 Bit(s) Name Description 32 E Enable bit 58:63 RID Resource ID All other fields are reserved. Figure 53. External Access Register The External Access Register (EAR) is a hypervisor resource; see Chapter 2. The high-order bits of the RID field that correspond to bits of the Resource ID beyond the width of the Resource ID supported by the implementation are treated as reserved bits. Programming Note The hypervisor can use the EAR to control which programs are allowed to execute External Access instructions, when they are allowed to do so, and which devices they are allowed to communicate with using these instructions. Chapter 9. External Control [Category: External Control] 583 Version 2.05 584 Power ISATM III-S Version 2.05 Chapter 10. Synchronization Requirements for Context Alterations Changing the contents of certain System Registers, the If a sequence of instructions contains context-altering contents of SLB entries, or the contents of other system instructions and contains no instructions that are resources that control the context in which a program affected by any of the context alterations, no software executes can have the side effect of altering the con- synchronization is required within the sequence. text in which data addresses and instruction addresses are interpreted, and in which instructions are executed Programming Note and data accesses are performed. For example, Sometimes advantage can be taken of the fact that changing MSRIR from 0 to 1 has the side effect of certain events, such as interrupts, and certain enabling translation of instruction addresses. These instructions that occur naturally in the program, side effects need not occur in program order, and such as the rfid that returns from an interrupt han- therefore may require explicit synchronization by soft- dler, provide the required synchronization. ware. (Program order is defined in Book II.) An instruction that alters the context in which data No software synchronization is required before or after addresses or instruction addresses are interpreted, or a context-altering instruction that is also context syn- in which instructions are executed or data accesses are chronizing or when altering the MSR in most cases performed, is called a context-altering instruction. This (see the tables). No software synchronization is chapter covers all the context-altering instructions. The required before most of the other alterations shown in software synchronization required for them is shown in Table 2, because all instructions preceding the context- Table 1 (for data access) and Table 2 (for instruction altering instruction are fetched and decoded before the fetch and execution). context-altering instruction is executed (the processor must determine whether any of these preceding The notation "CSI" in the tables means any context instructions are context synchronizing). synchronizing instruction (e.g., sc, isync, or rfid). A context synchronizing interrupt (i.e., any interrupt Unless otherwise stated, the material in this chapter except non-recoverable System Reset or non-recover- assumes a uniprocessor environment. able Machine Check) can be used instead of a context synchronizing instruction. If it is, phrases like "the syn- chronizing instruction", below, should be interpreted as meaning the instruction at which the interrupt occurs. If no software synchronization is required before (after) a context-altering instruction, "the synchronizing instruc- tion before (after) the context-altering instruction" should be interpreted as meaning the context-altering instruction itself. The synchronizing instruction before the context-alter- ing instruction ensures that all instructions up to and including that synchronizing instruction are fetched and executed in the context that existed before the alter- ation. The synchronizing instruction after the context- altering instruction ensures that all instructions after that synchronizing instruction are fetched and executed in the context established by the alteration. Instruc- tions after the first synchronizing instruction, up to and including the second synchronizing instruction, may be fetched or executed in either context. Chapter 10. Synchronization Requirements for Context Alterations 585 Version 2.05 Instruction or Required Required Notes Instruction or Required Required Notes Event Before After Event Before After interrupt none none interrupt none none rfid none none rfid none none hrfid none none hrfid none none sc none none sc none none Trap none none Trap none none mtmsrd (SF) none none mtmsrd (SF) none none 8 mtmsr[d] (PR) none none mtmsr[d] (EE) none none 1 mtmsr[d] (DR) none none mtmsr[d] (PR) none none 9 mtsr[in] CSI CSI mtmsr[d] (FP) none none mtspr (SDR1) ptesync CSI 3,4 mtmsr[d](FE0,FE1) none none mtspr (AMR) CSI CSI mtmsr[d] (SE, BE) none none mtspr (EAR) CSI CSI mtmsr[d] (IR) none none 9 mtspr (RMOR) CSI CSI 13 mtmsr[d] (RI) none none mtspr (HRMOR) CSI CSI 13 mtsr[in] none CSI 9 mtspr (LPCR) CSI CSI 13 mtspr (DEC) none none 10 mtspr (DABR) -- -- 2 mtspr (SDR1) ptesync CSI 3,4 mtspr (DABRX) -- -- 2 mtspr (CTRL) none none slbie CSI CSI mtspr (HDEC) none none 10 slbia CSI CSI mtspr (RMOR) none CSI 13 slbmte CSI CSI 11 mtspr (HRMOR) none CSI 9,13 tlbie CSI CSI 5,7 mtspr (LPCR) none CSI 13, 14 tlbiel CSI ptesync 5 mtspr (LPIDR) CSI CSI 7,12 tlbia CSI CSI 5 mtspr (PCR) none CSI Store(PTE) none {ptesync, 6,7 slbie none CSI CSI} slbia none CSI Table 1: Synchronization requirements for data access slbmte none CSI 9,11 tlbie none CSI 5,7 tlbiel none CSI 5 tlbia none CSI 5 Store(PTE) none {ptesync, CSI} 6,7 Table 2: Synchronization requirements for instruction fetch and/or execution 586 Power ISATM III-S Version 2.05 Notes: tion ensures that all preceding instructions that access data storage have completed to a point at 1. The effect of changing the EE bit is immediate, which they have reported all exceptions they will even if the mtmsr[d] instruction is not context syn- cause. chronizing (i.e., even if L=1). 1 If an mtmsr[d] instruction sets the EE bit to 0, The context synchronizing instruction after the neither an External interrupt nor a Decre- tlbie, tlbiel, or tlbia instruction ensures that stor- menter interrupt occurs after the mtmsr[d] is age accesses associated with instructions follow- executed. ing the context synchronizing instruction will not 1 If an mtmsr[d] instruction changes the EE bit use the TLB entry(s) being invalidated. from 0 to 1 when an External, Decrementer, or (If it is necessary to order storage accesses asso- higher priority exception exists, the corre- ciated with preceding instructions, or Reference sponding interrupt occurs immediately after and Change bit updates associated with preceding the mtmsr[d] is executed, and before the next address translations, with respect to subsequent instruction is executed in the program that set data accesses, a ptesync instruction must also be EE to 1. used, either before or after the tlbie, tlbiel, or tlbia 1 If a hypervisor executes the mtmsr[d] instruc- instruction. These effects of the ptesync instruc- tion that sets the EE bit to 0, a Hypervisor tion are described in the last paragraph of Note 8.) Decrementer interrupt does not occur after mtmsr[d] is executed as long as the proces- 6. The notation "{ptesync,CSI}" denotes an instruc- sor remains in hypervisor state. tion sequence. Other instructions may be inter- 1 If the hypervisor executes an mtmsr[d] leaved with this sequence, but these instructions instruction that changes the EE bit from 0 to 1 must appear in the order shown. when a Hypervisor Decrementer or higher pri- No software synchronization is required before the ority exception exists, the corresponding inter- Store instruction because (a) stores are not per- rupt occurs immediately after the mtmsr[d] formed out-of-order and (b) address translations instruction is executed, and before the next associated with instructions preceding the Store instruction is executed, provided HDICE is 1. instruction are not performed again after the store 2. Synchronization requirements for this instruction has been performed (see Section 5.5). These are implementation-dependent. properties ensure that all address translations associated with instructions preceding the Store 3. SDR1 must not be altered when MSRDR=1 or instruction will be performed using the old contents MSRIR=1; if it is, the results are undefined. of the PTE. 4. A ptesync instruction is required before the mtspr The ptesync instruction after the Store instruction instruction because (a) SDR1 identifies the Page ensures that all searches of the Page Table that Table and thereby the location of Reference and are performed after the ptesync instruction com- Change bits, and (b) on some implementations, pletes will use the value stored (or a value stored use of SDR1 to update Reference and Change bits subsequently). The context synchronizing instruc- may be independent of translating the virtual tion after the ptesync instruction ensures that any address. (For example, an implementation might address translations associated with instructions identify the PTE in which to update the Reference following the context synchronizing instruction that and Change bits in terms of its offset in the Page were performed using the old contents of the PTE Table, instead of its real address, and then add the will be discarded, with the result that these Page Table address from SDR1 to the offset to address translations will be performed again and, if determine the real address at which to update the there is no corresponding entry in any implementa- bits.) To ensure that Reference and Change bits tion-specific address translation lookaside informa- are updated in the correct Page Table, SDR1 must tion, will use the value stored (or a value stored not be altered until all Reference and Change bit subsequently). updates associated with address translations that were performed, by the processor executing the The ptesync instruction also ensures that all stor- mtspr instruction, before the mtspr instruction is age accesses associated with instructions preced- executed have been performed with respect to that ing the ptesync instruction, and all Reference and processor. A ptesync instruction guarantees this Change bit updates associated with additional synchronization of Reference and Change bit address translations that were performed, by the updates, while neither a context synchronizing processor executing the ptesync instruction, operation nor the instruction fetching mechanism before the ptesync instruction is executed, will be does so. performed with respect to any processor or mech- anism, to the extent required by the associated 5. For data accesses, the context synchronizing Memory Coherence Required attributes, before instruction before the tlbie, tlbiel, or tlbia instruc- any data accesses caused by instructions follow- Chapter 10. Synchronization Requirements for Context Alterations 587 Version 2.05 ing the ptesync instruction are performed with No slbie (or slbia) is needed if the slbmte instruc- respect to that processor or mechanism. tion replaces a valid SLB entry with a mapping of a different ESID (e.g., to satisfy an SLB miss). How- 7. There are additional software synchronization ever, the slbie is needed later if and when the requirements for this instruction in multiprocessor translation that was contained in the replaced SLB environments (e.g., it may be necessary to invali- entry is to be invalidated. date one or more TLB entries on all processors in the multiprocessor system and to be able to deter- 12. The context synchronizing instruction before the mine that the invalidations have completed and mtspr instruction ensures that the LPIDR is not that all side effects of the invalidations have taken altered out-of-order. (Out-of-order alteration of the effect). LPIDR could permit the requirements described in Section 5.10.1 to be violated. For the same rea- Section 5.10 gives examples of using tlbie, Store, son, such a context synchronizing instruction may and related instructions to maintain the Page be needed even if the new LPID value is equal to Table, in both multiprocessor and uniprocessor the old LPID value.) environments. See also Chapter 2. "Logical Partitioning (LPAR)" Programming Note on page 471 regarding moving a processor from In a multiprocessor system, if software locking one partition to another. is used to help ensure that the requirements When the RMOR or HRMOR is modified, or the VC, described in Section 5.10 are satisfied, the VRMASD, RMLS, or LPES1 fields of the LPCR are lwsync instruction near the end of the lock modified, software must invalidate all implementation- acquisition sequence (see Section B.2.1.1 of specific lookaside information used in address transla- Book II) may naturally provide the context syn- tion that depends on values stored in these registers. chronization that is required before the alter- The slbia instruction can be used to invalidate all such ation. implementation-specific lookaside information. 8. The alteration must not cause an implicit branch in 13. A context synchronizing instruction or event that is effective address space. Thus, when changing executed or occurs when LPCRMER = 1 does not MSRSF from 1 to 0, the mtmsrd instruction must necessarily ensure that the exception effects of have an effective address that is less than 232 - 4. LPCRMER are consistent with the contents of Furthermore, when changing MSRSF from 0 to 1, LPCRMER. See Section 2.2. the mtmsrd instruction must not be at effective address 232 - 4 (see Section 5.3.2 on page 506). 9. The alteration must not cause an implicit branch in real address space. Thus the real address of the context-altering instruction and of each subse- quent instruction, up to and including the next con- text synchronizing instruction, must be independent of whether the alteration has taken effect. 10. The elapsed time between the contents of the Decrementer or Hypervisor Decrementer becom- ing negative and the signaling of the correspond- ing exception is not defined. 11. If an slbmte instruction alters the mapping, or associated attributes, of a currently mapped ESID, the slbmte must be preceded by an slbie (or slbia) instruction that invalidates the existing translation. This applies even if the corresponding entry is no longer in the SLB (the translation may still be in implementation-specific address transla- tion lookaside information). No software synchro- nization is needed between the slbie and the slbmte, regardless of whether the index of the SLB entry (if any) containing the current translation is the same as the SLB index specified by the slb- mte. 588 Power ISATM III-S Version 2.05 Appendix A. Assembler Extended Mnemonics In order to make assembler language programs simpler tions. This appendix defines extended mnemonics and to write and easier to understand, a set of extended symbols related to instructions defined in Book III. mnemonics and symbols is provided for certain instruc- Assemblers should provide the extended mnemonics and symbols listed here, and may provide others. A.1 Move To/From Special Purpose Register Mnemonics This section defines extended mnemonics for the mftb mnemonic with one operand as the extended mtspr and mfspr instructions, including the Special form. In the extended form the TBR operand is omitted Purpose Registers (SPRs) defined in Book I and cer- and assumed to be 268 (the value that corresponds to tain privileged SPRs, and for the Move From Time TB). Base instruction defined in Book II. Programming Note The mtspr and mfspr instructions specify an SPR as a numeric operand; extended mnemonics are provided The extended mnemonics in Table 3 for SPRs that represent the SPR in the mnemonic rather than associated with the Performance Monitor facility requiring it to be coded as an operand. Similar are based on the definitions in Appendix B. extended mnemonics are provided for the Move From Other versions of Performance Monitor facilities Time Base instruction, which specifies the portion of used different sets of SPR numbers (all 32-bit Pow- the Time Base as a numeric operand. erPC processors used a different set, and some Note: mftb serves as both a basic and an extended early Power ISA processors used yet a different mnemonic. The Assembler will recognize an mftb mne- set). monic with two operands as the basic form, and an Appendix A. Assembler Extended Mnemonics 589 Version 2.05 Table 3: Extended mnemonics for moving to/from an SPR Move To SPR Move From SPR1 Special Purpose Register Extended Equivalent to Extended Equivalent to Fixed-Point Exception Register mtxer Rx mtspr 1,Rx mfxer Rx mfspr Rx,1 Link Register mtlr Rx mtspr 8,Rx mflr Rx mfspr Rx,8 Count Register mtctr Rx mtspr 9,Rx mfctr Rx mfspr Rx,9 Data Stream Control Register mtdscr Rx mtspr 17,Rx mfdscr Rx mfspr Rx,17 Data Storage Interrupt Status mtdsisr Rx mtspr 18,Rx mfdsisr Rx mfspr Rx,18 Register Data Address Register mtdar Rx mtspr 19,Rx mfdar Rx mfspr Rx,19 Decrementer mtdec Rx mtspr 22,Rx mfdec Rx mfspr Rx,22 Storage Description Register 1 mtsdr1 Rx mtspr 25,Rx mfsdr1 Rx mfspr Rx,25 Save/Restore Register 0 mtsrr0 Rx mtspr 26,Rx mfsrr0 Rx mfspr Rx,26 Save/Restore Register 1 mtsrr1 Rx mtspr 27,Rx mfsrr1 Rx mfspr Rx,27 Come-From Address Register mtcfar Rx mtspr 28,Rx mfcfar Rx mfspr Rx,28 AMR mtamr Rx mtspr 29,Rx mfamr Rx mfspr Rx,29 CTRL mtctrl Rx mtspr 152,Rx mfctrl Rx mfspr Rx,136 Special Purpose Registers mtsprg n,Rx mtspr 272+n,Rx mfsprg Rx,n mfspr Rx,272+n G0 through G3 Time Base [Lower] mttbl Rx mtspr 284,Rx mftb Rx mftb Rx,2681 mfspr Rx,268 Time Base Upper mttbu Rx mtspr 285,Rx mftbu Rx mftb Rx,2691 mfspr Rx,269 Time Base Upper 40 mttbu40 Rx mtspr 286,Rx - - Processor Version Register - - mfpvr Rx mfspr Rx,287 HMER mthmer Rx mtspr 336,Rx mfhmer Rx mfspr Rx,336 HMEER mthmeer Rx mtspr 337,Rx mfhmeer Rx mfspr Rx,337 MMCRA mtmmcra Rx mtspr 786,Rx mfmmcra Rx mfspr Rx,770 PMC1 mtpmc1 Rx mtspr 787,Rx mfpmc1 Rx mfspr Rx,771 PMC2 mtpmc2 Rx mtspr 788,Rx mfpmc2 Rx mfspr Rx,772 PMC3 mtpmc3 Rx mtspr 789,Rx mfpmc3 Rx mfspr Rx,773 PMC4 mtpmc4 Rx mtspr 790,Rx mfpmc4 Rx mfspr Rx,774 PMC5 mtpmc5 Rx mtspr 791,Rx mfpmc5 Rx mfspr Rx,775 PMC6 mtpmc6 Rx mtspr 792,Rx mfpmc6 Rx mfspr Rx,776 MMCR0 mtmmcr0 Rx mtspr 795,Rx mfmmcr0 Rx mfspr Rx,779 MMCR1 mtmmcr1 Rx mtspr 798,Rx mfmmcr1 Rx mfspr Rx,782 PPR mtppr Rx mtspr 896, Rx mfppr Rx mfspr Rx, 896 Processor Identification Register - - mfpir Rx mfspr Rx,1023 1 The mftb instruction is Category: Server.Phased-Out. Assemblers targeting version 2.03 or later of the architec- ture should generate an mfspr instruction for the mftb and mftbu extended mnemonics; see the corresponding Assembler Note in the mftb instruction description (see Section 4.2.1 of Book II). 590 Power ISATM III-S Version 2.05 Appendix B. Example Performance Monitor Note - SIAR and SDAR (Sampled Instruction Address Register and Sampled Data Address This Appendix describes an example implementa- Register), which contain the address of the tion of a Performance Monitor. A subset of these "sampled instruction" and of the "sampled requirements are being considered for inclusion in data" the Architecture as part of Category: Server.Perfor- mance Monitor. 1 the Performance Monitor interrupt, which can be caused by monitored conditions and events A Performance Monitor facility provides a means of col- The minimal subset of the features that makes the lecting information about program and system perfor- resulting Performance Monitor useful to software con- mance. sists of MSRPMM, PMC1, PMC2, PMC3, PMC4, The resources (e.g., SPR numbers) that a Performance MMCR0, MMCR1, and MMCRA and certain bits and Monitor facility may use are identified elsewhere in this fields of these three Monitor Mode Control Registers, Book. All other aspects of any Performance Monitor and the Performance Monitor Interrupt. These features facility are implementation-dependent. are identified as the "basic" features below. The remaining features (the remaining SPRs, and the This appendix provides an example of a Performance remaining bits and fields in the three Monitor Mode Monitor facility. It is only an example; implementations Control Registers) are considered "extensions". may provide all, some, or none of the features described here, or may provide features that are similar The events that can be counted in the PMCs as well as to those described here but differ in detail. the code that identifies each event are implementation- dependent. The events and codes may vary between PMCs, as well as between implementations. For the Programming Note programmable PMCs, the event to be counted is Because the features provided by a Performance selected by specifying the appropriate code in the Monitor facility are implementation-dependent, MMCR "Selector" field for the PMC. Some events may operating systems should provide services that include operations that are performed out-of-order. support the useful performance monitoring func- tions in a generic fashion. Application programs Many aspects of the operation of the Performance should use these services, and should not depend Monitor are summarized by the following hierarchy, on the features provided by a particular implemen- which is described starting at the lowest level. tation. 1 A "counter negative condition" exists when the value in a PMC is negative (i.e., when bit 0 of the The example Performance Monitor facility consists of PMC is 1). A "Time Base transition event" occurs the following features (described in detail in subse- when a selected bit of the Time Base changes quent sections). from 0 to 1 (the bit is selected by an MMCR field). 1 one MSR bit The term "condition or event" is used as an abbre- viation for "counter negative condition or Time - PMM (Performance Monitor Mark), which can Base transition event". A condition or event can be be used to select one or more programs for caused implicitly by the processor (e.g., increment- monitoring ing a PMC) or explicitly by software (mtspr). 1 SPRs 1 A condition or event is enabled if the correspond- - PMC1 - PMC6 (Performance Monitor Counter ing "Enable" bit in an MMCR is 1. The occurrence registers 1 - 6), which count events of an enabled condition or event can have side effects within the Performance Monitor, such as - MMCR0, MMCR1, and MMCRA (Monitor causing the PMCs to cease counting. Mode Control Registers 0, 1, and A), which control the Performance Monitor facility 1 An enabled condition or event causes a Perfor- mance Monitor alert if Performance Monitor alerts are enabled by the corresponding "Enable" bit in Appendix B. Example Performance Monitor 591 Version 2.05 an MMCR. A single Performance Monitor alert Programming Note may reflect multiple enabled conditions and events. Software can use this bit as a process-specific marker which, in conjunction with MMCR0FCM0 1 A Performance Monitor alert causes a Perfor- FCM1 (see Section B.2.2), permits events to be mance Monitor exception. counted on a process-specific basis. (The bit is The exception effects of the Performance Monitor saved by interrupts and restored by rfid.) are said to be consistent with the contents of Common uses of the PMM bit include the following. MMCR0PMAO if one of the following statements is true. (MMCR0PMAO reflects the occurrence of Per- 1 Count events for a few selected processes. formance Monitor alerts; see the definition of that This use requires the following bit settings. bit in Section B.2.2.) - MSRPMM=1 for the selected processes, - MMCR0PMAO=0 and a Performance Monitor MSRPMM=0 for all other processes exception does not exist. - MMCR0FCM0=1 - MMCR0PMAO=1 and a Performance Monitor - MMCR0FCM1=0 exception exists. 1 Count events for all but a few selected pro- A context synchronizing instruction or event that cesses. This use requires the following bit set- occurs when MMCR0PMAO=0 ensures that the tings. exception effects of the Performance Monitor are - MSRPMM=1 for the selected processes, consistent with the contents of MMCR0PMAO. MSRPMM=0 for all other processes - MMCR0FCM0=0 Even without software synchronization, when the - MMCR0FCM1=1 contents of MMCR0PMAO change, the exception effects of the Performance Monitor become con- Notice that for both of these uses a mark value of 1 sistent with the new contents of MMCR0PMAO suf- identifies the "few" processes and a mark value of 0 ficiently soon that the Performance Monitor facility identifies the remaining "many" processes. is useful to software for its intended purposes. Because the PMM bit is set to 0 when an interrupt occurs (see Figure 43 on page 555), interrupt han- 1 A Performance Monitor exception causes a Perfor- dlers are treated as one of the "many". If it is mance Monitor interrupt when MSREE=1. desired to treat interrupt handlers as one of the "few", the mark value convention just described Programming Note would be reversed. The Performance Monitor can be effectively dis- abled (i.e., put into a state in which Performance Monitor SPRs are not altered and Performance Monitor interrupts do not occur) by setting MMCR0 B.2 Special Purpose Registers to 0x0000_0000_8000_0000. The Performance Monitor SPRs count events, control the operation of the Performance Monitor, and provide associated information. B.1 PMM Bit of the Machine The Performance Monitor SPRs can be read and writ- State Register ten using the mfspr and mtspr instructions (see Section 4.4.5, "Move To/From System Register Instruc- The Performance Monitor uses MSR bit PMM, which is tions" on page 496). The Performance Monitor SPR defined as follows. numbers are shown in Figure 54. Writing any of the Performance Monitor SPRs is privileged. Reading any Bit Description of the Performance Monitor SPRs is not privileged 61 Performance Monitor Mark (PMM) (however, the privileged SPR numbers used to write This bit is a basic feature. the SPRs can also be used to read them; see the fig- ure). This bit contains the Performance Monitor "mark" (0 or 1). The elapsed time between the execution of an instruc- tion and the time at which events due to that instruction have been reflected in Performance Monitor SPRs is not defined. No means are provided by which software can ensure that all events due to preceding instructions have been reflected in Performance Monitor SPRs. Similarly, if the events being monitored may be caused by operations that are performed out-of-order, no means are provided by which software can prevent such events due to subsequent instructions from being 592 Power ISATM III-S Version 2.05 reflected in Performance Monitor SPRs. Thus the con- tents obtained by reading a Performance Monitor SPR may not be precise: it may fail to reflect some events SPR1 Register Privi- due to instructions that precede the mfspr and may decimal spr5:9 spr0:4 Name leged reflect some events due to instructions that follow the 786 11000 10010 MMCRA yes mfspr. This lack of precision applies regardless of 787 11000 10011 PMC1 yes whether the state of the processor is such that the SPR 788 11000 10100 PMC2 yes is subject to change by the processor at the time the 789 11000 10101 PMC3 yes mfspr is executed. Similarly, if an mtspr instruction is 790 11000 10110 PMC4 yes executed that changes the contents of the Time Base, the change is not guaranteed to have taken effect with 791 11000 10111 PMC5 yes respect to causing Time Base transition events until 792 11000 11000 PMC6 yes after a subsequent context synchronizing instruction 795 11000 11011 MMCR0 yes has been executed. 796 11000 11100 SIAR yes 797 11000 11101 SDAR yes If an mtspr instruction is executed that changes the value of a Performance Monitor SPR other than SIAR 798 11000 11110 MMCR1 yes 1 Note that the order of the two 5-bit halves of or SDAR, the change is not guaranteed to have taken effect until after a subsequent context synchronizing the SPR number is reversed. instruction has been executed (see Chapter 10. "Synchronization Requirements for Context Alter- Figure 55. Performance Monitor SPR encodings for ations" on page 585). mtspr Programming Note Depending on the events being monitored, the con- tents of Performance Monitor SPRs may be B.2.1 Performance Monitor affected by aspects of the runtime environment Counter Registers (e.g., cache contents) that are not directly attribut- able to the programs being monitored. The six Performance Monitor Counter registers, PMC1 through PMC6, are 32-bit registers that count events. PMC1 SPR1,2 Register Privi- PMC2 decimal spr5:9 spr0:4 Name leged PMC3 770,786 11000 n0010 MMCRA no,yes PMC4 771,787 11000 n0011 PMC1 no,yes PMC5 772,788 11000 n0100 PMC2 no,yes PMC6 773,789 11000 n0101 PMC3 no,yes 774,790 11000 n0110 PMC4 no,yes 32 63 775,791 11000 n0111 PMC5 no,yes 776,792 11000 n1000 PMC6 no,yes Figure 56. Performance Monitor Counter registers PMC1, PMC2, PMC3, and PMC4 are basic features. 779,795 11000 n1011 MMCR0 no,yes PMC5 and PMC6 are not programmable. PMC5 780,796 11000 n1100 SIAR no,yes counts instructions completed and PMC6 counts 781,797 11000 n1101 SDAR no,yes cycles. 782,798 11000 n1110 MMCR1 no,yes Normally each PMC is incremented each processor 1 Note that the order of the two 5-bit halves of cycle by the number of times the corresponding event the SPR number is reversed. occurred in that cycle. Other modes of incrementing 2 Reading the SPR is privileged if and only if may also be provided (e.g., see the description of n=1. MMCR1 bits PMC1HIST and PMCjHIST). Figure 54. Performance Monitor SPR encodings for "PMCj" is used as an abbreviation for "PMCi, i > 1". mfspr Programming Note PMC5 and PMC6 are defined to facilitate calculat- ing basic performance metrics such as cycles per instruction (CPI). Appendix B. Example Performance Monitor 593 Version 2.05 0 The PMCs are incremented (if permitted Programming Note by other MMCR bits). Software can use a PMC to "pace" the collection of 1 The PMCs are not incremented if Performance Monitor data. For example, if it is MSRPR=1. desired to collect event counts every n cycles, soft- ware can specify that a particular PMC count 35 Freeze Counters while Mark = 1 (FCM1) cycles and set that PMC to 0x8000_0000 - n. The This bit is a basic feature. events of interest would be counted in other PMCs. 0 The PMCs are incremented (if permitted The counter negative condition that will occur after by other MMCR bits). n cycles can, with the appropriate setting of MMCR 1 The PMCs are not incremented if bits, cause counter values to become frozen, cause MSRPMM=1. a Performance Monitor interrupt to occur, etc. 36 Freeze Counters while Mark = 0 (FCM0) This bit is a basic feature. B.2.2 Monitor Mode Control 0 The PMCs are incremented (if permitted Register 0 by other MMCR bits). 1 The PMCs are not incremented if Monitor Mode Control Register 0 (MMCR0) is a 64-bit MSRPMM=0. register. This register, along with MMCR1 and MMCRA, controls the operation of the Performance 37 Performance Monitor Alert Enable (PMAE) Monitor. This bit is a basic feature. MMCR0 0 Performance Monitor alerts are disabled. 0 63 1 Performance Monitor alerts are enabled until a Performance Monitor alert occurs, Figure 57. Monitor Mode Control Register 0 at which time: 1 MMCR0PMAE is set to 0 MMCR0 is a basic feature. Within MMCR0, some of 1 MMCR0PMAO is set to 1 the bits and fields are basic features and some are extensions. The basic bits and fields are identified as Programming Note such, below. Software can set this bit and Some bits of MMCR0 are altered by the processor MMCR0PMAO to 0 to prevent Performance when various events occur, as described below. Monitor interrupts. The bit definitions of MMCR0 are as follows. MMCR0 Software can set this bit to 1 and then poll bits that are not implemented are treated as reserved. the bit to determine whether an enabled Bit(s) Description condition or event has occurred. This is especially useful for software that runs 0:31 Reserved with MSREE=0. 32 Freeze Counters (FC) In earlier versions of the architecture that This bit is a basic feature. lacked the concept of Performance Moni- 0 The PMCs are incremented (if permitted tor alerts, this bit was called Performance by other MMCR bits). Monitor Exception Enable (PMXE). 1 The PMCs are not incremented. 38 Freeze Counters on Enabled Condition or The processor sets this bit to 1 when an Event (FCECE) enabled condition or event occurs and MMCR0FCECE=1. 0 The PMCs are incremented (if permitted by other MMCR bits). 33 Freeze Counters in Privileged State (FCS) 1 The PMCs are incremented (if permitted This bit is a basic feature. by other MMCR bits) until an enabled condition or event occurs when 0 The PMCs are incremented (if permitted MMCR0TRIGGER=0, at which time: by other MMCR bits). 1 MMCR0FC is set to 1 1 The PMCs are not incremented if MSRHV PR=0b00. If the enabled condition or event occurs when MMCR0TRIGGER=1, the FCECE bit is treated 34 Freeze Counters in Problem State (FCP) as if it were 0. This bit is a basic feature. 39:40 Time Base Selector (TBSEL) 594 Power ISATM III-S Version 2.05 This field selects the Time Base bit that can 1 PMC1 is incremented (if permitted by cause a Time Base transition event (the event other MMCR bits). The PMCjs are not occurs when the selected bit changes from 0 incremented until PMC1 is negative or an to 1). enabled condition or event occurs, at which time: 00 Time Base bit 63 is selected. 1 the PMCjs resume incrementing (if 01 Time Base bit 55 is selected. permitted by other MMCR bits) 10 Time Base bit 51 is selected. 1 MMCR0TRIGGER is set to 0 11 Time Base bit 47 is selected. See the description of the FCECE bit, above, Programming Note regarding the interaction between TRIGGER Time Base transition events can be used and FCECE. to collect information about processor Programming Note activity, as revealed by event counts in PMCs and by addresses in SIAR and Uses of TRIGGER include the following. SDAR, at periodic intervals. 1 Resume counting in the PMCjs when In multiprocessor systems in which the PMC1 becomes negative, without Time Base registers are synchronized causing a Performance Monitor inter- among the processors, Time Base transi- rupt. Then freeze all PMCs (and tion events can be used to correlate the optionally cause a Performance Mon- Performance Monitor data obtained by the itor interrupt) when a PMCj becomes several processors. For this use, software negative. The PMCjs then reflect the must specify the same TBSEL value for all events that occurred between the the processors in the system. time PMC1 became negative and the time a PMCj becomes negative. This Because the frequency of the Time Base use requires the following MMCR0 bit is implementation-dependent, software settings. should invoke a system service program to obtain the frequency before choosing a - TRIGGER=1 value for TBSEL. - PMC1CE=0 - PMCjCE=1 41 Time Base Event Enable (TBEE) - TBEE=0 - FCECE=1 0 Time Base transition events are disabled. - PMAE=1 (if a Performance Moni- 1 Time Base transition events are enabled. tor interrupt is desired) 42:47 Reserved 1 Resume counting in the PMCjs when 48 PMC1 Condition Enable (PMC1CE) PMC1 becomes negative, and cause a Performance Monitor interrupt with- This bit controls whether counter negative out freezing any PMCs. The PMCjs conditions due to a negative value in PMC1 then reflect the events that occurred are enabled. between the time PMC1 became 0 Counter negative conditions for PMC1 are negative and the time the interrupt disabled. handler reads them. This use 1 Counter negative conditions for PMC1 are requires the following MMCR0 bit set- enabled. tings. 49 PMCj Condition Enable (PMCjCE) - TRIGGER=1 - PMC1CE=1 This bit controls whether counter negative - TBEE=0 conditions due to a negative value in any - FCECE=0 PMCj (i.e., in any PMC except PMC1) are - PMAE=1 enabled. 0 Counter negative conditions for all PMCjs 51:52 Setting is implementation-dependent. are disabled. 53:55 Reserved 1 Counter negative conditions for all PMCjs are enabled. 56 Performance Monitor Alert Occurred (PMAO) 50 Trigger (TRIGGER) This bit is a basic feature. 0 The PMCs are incremented (if permitted by other MMCR bits). Appendix B. Example Performance Monitor 595 Version 2.05 0 A Performance Monitor alert has not MMCRA, controls the operation of the Performance occurred since the last time software set Monitor. this bit to 0. 1 A Performance Monitor alert has occurred MMCR1 since the last time software set this bit to 0 63 0. Figure 58. Monitor Mode Control Register 1 This bit is set to 1 by the processor when a Performance Monitor alert occurs. This bit can MMCR1 is a basic feature. Within MMCR1, some of be set to 0 only by the mtspr instruction. the bits and fields are basic features and some are extensions. The basic bits and fields are identified as Programming Note such, below. Software can set this bit to 1 to simulate Some bits of MMCR1 are altered by the processor the occurrence of a Performance Monitor when various events occur, as described below. alert. The bit definitions of MMCR1 are as follows. MMCR1 Software should set this bit to 0 after han- bits that are not implemented are treated as reserved. dling the Performance Monitor alert. Bit(s) Description 57 Setting is implementation-dependent. 0:31 Implementation-Dependent Use 58 Freeze Counters 1-4 (FC1-4) These bits have implementation-dependent 0 PMC1 - PMC4 are incremented (if permit- uses (e.g., extended event selection). ted by other MMCR bits). 32:39 PMC1 Selector (PMC3SEL) 1 PMC1 - PMC4 are not incremented. 40:47 PMC2 Selector (PMC4SEL) 59 Freeze Counters 5-6 (FC5-6) 48:55 PMC3 Selector (PMC5SEL) 56:63 PMC4 Selector (PMC6SEL) 0 PMC5 - PMC6 are incremented (if permit- ted by other MMCR bits). Each of these fields contains a code that iden- 1 PMC5 - PMC6 are not incremented. tifies the event to be counted by PMCs 1 through 4 respectively. 60:61 Reserved PMC Selectors are basic features. 62 Freeze Counters in Wait State (FCWAIT) This bit is a basic feature. Compatibility Note 0 The PMCs are incremented (if permitted In versions of the architecture that pre- by other MMCR bits). cede Version 2.02 the PMC Selector 1 The PMCs are not incremented if Fields were six bits long, and were split CTRL31=0. Software is expected to set between MMCR0 and MMCR1. PMC1-8 CTRL31=0 when it is in a "wait state", i.e, were all programmable. when there is no process ready to run. If more programmable PMCs are imple- Only Branch Unit type of events do not incre- mented in the future, additional MMCRs ment if CTRL31=0. Other units continue to may be defined to cover the additional count. selectors. 63 Freeze Counters in Hypervisor State (FCH) This bit is a basic feature. B.2.4 Monitor Mode Control 0 The PMCs are incremented (if permitted Register A by other MMCR bits). 1 The PMCs are not incremented if Monitor Mode Control Register A (MMCRA) is a 64-bit MSRHV PR=0b10. register. This register, along with MMCR0 and MMCR1, controls the operation of the Performance Monitor. B.2.3 Monitor Mode Control Register 1 MMCRA 0 63 Monitor Mode Control Register 1 (MMCR1) is a 64-bit register. This register, along with MMCR0 and Figure 59. Monitor Mode Control Register A MMCRA is a basic feature. Within MMCRA, some of the bits and fields are basic features and some are 596 Power ISATM III-S Version 2.05 extensions. The basic bits and fields are identified as B.2.5 Sampled Instruction such, below. Address Register Some bits of MMCRA are altered by the processor when various events occur, as described below. The Sampled Instruction Address Register (SIAR) is a 64-bit register. It contains the address of the "sampled The bit definitions of MMCRA are as follows. MMCRA instruction" when a Performance Monitor alert occurs. bits that are not implemented are treated as reserved. Bit(s) Description SIAR 0 63 0:31 Reserved 32 Contents of SIAR and SDAR Are Related Figure 60. Sampled Instruction Address Register (CSSR) When a Performance Monitor alert occurs, SIAR is set Set to 1 by the processor if the contents of to the effective address of an instruction that was being SIAR and SDAR are associated with the same executed, possibly out-of-order, at or around the time instruction; otherwise set to 0. that the Performance Monitor alert occurred. This instruction is called the "sampled instruction". 33:34 Setting is implementation-dependent. The contents of SIAR may be altered by the processor 35 Sampled MSRHV (SAMPHV) if and only if MMCR0PMAE=1. Thus after the Perfor- Value of MSRHV when the Performance Moni- mance Monitor alert occurs, the contents of SIAR are tor Alert occurred. not altered by the processor until software sets MMCR0PMAE to 1. After software sets MMCR0PMAE to 36 Sampled MSRPR (SAMPPR) 1, the contents of SIAR are undefined until the next Value of MSRPR when the Performance Moni- Performance Monitor alert occurs. tor Alert occurred. See Section B.4 regarding the effects of the Trace facil- 37:47 Setting is implementation-dependent. ity on SIAR. 48:53 Threshold (THRESHOLD) Programming Note This field contains a "threshold value", which If the Performance Monitor alert causes a Perfor- is a value such that only events that exceed mance Monitor interrupt, the value of MSRHV PR the value are counted. The events to which a that was in effect when the sampled instruction was threshold value can apply are implementation- being executed is reported in MMCRA. dependent, as are the dimension of the threshold (e.g., duration in cycles) and the granularity with which the threshold value is interpreted. B.2.6 Sampled Data Address Reg- ister Programming Note The Sampled Data Address Register (SDAR) is a 64- By varying the threshold value, software bit register. It contains the address of the "sampled can obtain a profile of the characteristics data" when a Performance Monitor alert occurs. of the events subject to the threshold. For example, if PMC1 counts the number of cache misses for which the duration SDAR exceeds the threshold value, then soft- 0 63 ware can obtain the distribution of cache Figure 61. Sampled Data Address Register miss durations for a given program by monitoring the program repeatedly using When a Performance Monitor alert occurs, SDAR is set a different threshold value each time. to the effective address of the storage operand of an instruction that was being executed, possibly out-of- 54:59 Reserved for implementation-specific use. order, at or around the time that the Performance Mon- itor alert occurred. This storage operand is called the 60:62 Reserved "sampled data". The sampled data may be, but need 63 Setting is implementation-dependent. not be, the storage operand (if any) of the sampled instruction (see Section B.2.5). The contents of SDAR may be altered by the processor if and only if MMCR0PMAE=1. Thus after the Perfor- mance Monitor alert occurs, the contents of SDAR are not altered by the processor until software sets Appendix B. Example Performance Monitor 597 Version 2.05 MMCR0PMAE to 1. After software sets MMCR0PMAE to occur before the next instruction is executed (if no 1, the contents of SDAR are undefined until the next higher priority exception exists). Performance Monitor alert occurs. The priority of the Performance Monitor exception is See Section B.4 regarding the effects of the Trace facil- equal to that of the External, Decrementer, and Hyper- ity on SDAR. visor Decrementer exceptions (i.e., the processor may generate any one of the four interrupts for which an Programming Note exception exists) (see Section 6.7.2, "Ordered Excep- If the Performance Monitor alert causes a Perfor- tions" on page 571 and Section 6.8, "Interrupt Priori- mance Monitor interrupt, MMCRA indicates ties" on page 571). whether the sampled data is the storage operand of the sampled instruction. B.4 Interaction with the Trace Facility B.3 Performance Monitor If the Trace facility includes setting SIAR and SDAR Interrupt (see Appendix C, "Example Trace Extensions" on page 599), and tracing is active (MSRSE=1 or The Performance Monitor interrupt is a system caused MSRBE=1), the contents of SIAR and SDAR as used by interrupt (Section 6.4). It is masked by MSREE in the the Performance Monitor facility are undefined and may same manner that External and Decrementer interrupts change even when MMCR0PMAE=0. are. The Performance Monitor interrupt is a basic feature. Programming Note A potential combined use of the Trace and Perfor- A Performance Monitor interrupt occurs when no higher mance Monitor facilities is to trace the control flow priority exception exists, a Performance Monitor excep- of a program and simultaneously count events for tion exists, and MSREE=1. that program. If multiple Performance Monitor exceptions occur before the first causes a Performance Monitor interrupt, the interrupt reflects the most recent Performance Mon- itor exception and the preceding Performance Monitor exceptions are lost. The following registers are set: SRR0 Set to the effective address of the instruc- tion that the processor would have attempted to execute next if no interrupt conditions were present. SRR1 33:36 and 42:47 Implementation-specific. Others Loaded from the MSR. MSR See Figure 43 on page 555. SIAR Set to the effective address of the "sampled instruction" (see Section B.2.5). SDAR Set to the effective address of the "sampled data" (see Section B.2.6). Execution resumes at effective address 0x0000_0000_0000_0F00. In general, statements about External and Decre- menter interrupts elsewhere in this Book apply also to the Performance Monitor interrupt; for example, if a Performance Monitor exception exists when an mtm- srd[d] instruction is executed that changes MSREE from 0 to 1, the Performance Monitor interrupt will 598 Power ISATM III-S Version 2.05 Appendix C. Example Trace Extensions 34 Set to 1 if the traced instruction is dcbt, Note dcbtst, dcbz, dcbst, dcbf[l]; otherwise set This Appendix describes an example implementa- to 0. tion of Trace Extensions. A subset of these require- 35 Set to 1 if the traced instruction is a Load ments are being considered for inclusion in the instruction or eciwx; may be set to 1 if the Architecture as part of Category: Trace. traced instruction is icbi, dcbt, dcbtst, dcbst, dcbf[l]; otherwise set to 0. This appendix provides an example of extensions that 36 Set to 1 if the traced instruction is a Store may be added to the Trace facility described in instruction, dcbz, or ecowx; otherwise set Section 6.5.14, "Trace Interrupt [Category: Trace]" on to 0. page 565. It is only an example; implementations may 42 Set to 1 if the traced instruction is lswx or provide all, some, or none of the features described stswx; otherwise set to 0. here, or may provide features that are similar to those 43 Implementation-dependent. described here but differ in detail. 44 Set to 1 if the traced instruction is a Branch instruction and the branch is taken; other- The extensions consist of the following features wise set to 0. (described in detail below). 45 Set to 1 if the traced instruction is eciwx or 1 use of MSRSE BE=0b11 to specify new causes of ecowx; otherwise set to 0. Trace interrupts 46 Set to 1 if the traced instruction is lwarx, 1 specification of how certain SRR1 bits are set ldarx, stwcx., or stdcx.; otherwise set to 0. when a Trace interrupt occurs 47 Implementation-dependent. 1 setting of SIAR and SDAR (see Appendix B, "Example Performance Monitor" on page 591) SIAR and SDAR when a Trace interrupt occurs If the Performance Monitor facility is implemented and includes SIAR and SDAR (see Appendix B), the follow- MSRSE BE = 0b11 ing additional registers are set when a Trace interrupt occurs: If MSRSE BE=0b11, the processor generates a Trace exception under the conditions described in Section SIAR Set to the effective address of the traced 6.5.14 for MSRSE BE=0b01, and also after successfully instruction. completing the execution of any instruction that would SDAR Set to the effective address of the storage cause at least one of SRR1 bits 33:36, 42, and 44:46 to operand (if any) of the traced instruction; be set to 1 (see below) if the instruction were executed otherwise undefined. when MSRSE BE=0b10. If the state of the Performance Monitor is such that the This overrides the implicit statement in Section 6.5.14 Performance Monitor may be altering these registers that the effects of MSRSE BE=0b11 are the same as (i.e., if MMCR0PMAE=1), the contents of SIAR and those of MSRSE BE=0b10. SDAR as used by the Trace facility are undefined and may change even when no Trace interrupt occurs. SRR1 When a Trace interrupt occurs, the SRR1 bits that are not loaded from the MSR are set as follows instead of as described in Section 6.5.14. 33 Set to 1 if the traced instruction is icbi; oth- erwise set to 0. Appendix C. Example Trace Extensions 599 Version 2.05 600 Power ISATM III-S Version 2.05 Appendix D. Interpretation of the DSISR as Set by an Alignment Interrupt For most causes of Alignment interrupt, the interrupt D- or DS-form instruction if one exists, and vice versa. handler will emulate the interrupting instruction. To do Therefore two such instructions may yield the same this, it needs the following characteristics of the inter- DSISR value (all 32 bits). For example, stw and stwx rupting instruction: may both yield either the DSISR value shown in the fol- lowing table for stw, or that shown for stwx. Load or store Length (halfword, word, doubleword) String, multiple, or elementary Fixed-point or floating-point Update or non-update Byte reverse or not Is it dcbz? The Power ISA optionally provides this information by setting bits in the DSISR that identify the interrupting instruction type. It is not necessary for the interrupt handler to load the interrupting instruction from storage. The mapping is unique except for a few exceptions that are discussed below. The near-uniqueness depends on the fact that many instructions, such as the fixed- and floating-point arithmetic instructions and the one- byte loads and stores, cannot cause an Alignment interrupt. See Section 6.5.8 for a description of how the opcode and extended opcode are mapped to a DSISR value for an X-, D-, or DS-form instruction that causes an Align- ment interrupt. The table on the next page shows the inverse mapping: how the DSISR bits identify the interrupting instruc- tion. The following notes are cited in the table. 1. The instructions lwz and lwarx give the same DSISR bits (all zero). But if lwarx causes an Alignment interrupt, it should not be emulated. It is adequate for the Alignment interrupt handler sim- ply to treat the instruction as if it were lwz. The emulator must use the address in the DAR, rather than compute it from RA/RB/D, because lwz and lwarx have different instruction formats. If opcode 0 ("Illegal or Reserved") can cause an Alignment interrupt, it will be indistinguishable to the interrupt handler from lwarx and lwz. 2. These are distinguished by DSISR bits 44:45, which are not shown in the table. The interrupt handler has no need to distinguish between an X-form instruction and the corresponding Appendix D. Interpretation of the DSISR as Set by an Alignment Interrupt 601 Version 2.05 then it is or D/ then it is or D/ either X- DS- either X- DS- If DSISR form form If DSISR form form 47:53 is: opcode: opcode: so the instruction is: 47:53 is: opcode: opcode: so the instruction is: 00 0 0000 00000xxx00 x00000 lwarx,lwz,reserved(1) 10 0 0000 00000xxx10 - 00 0 0001 00010xxx00 x00010 ldarx 10 0 0001 00010xxx10 - 00 0 0010 00100xxx00 x00100 stw 10 0 0010 00100xxx10 stwcx. 00 0 0011 00110xxx00 x00110 - 10 0 0011 00110xxx10 stdcx. 00 0 0100 01000xxx00 x01000 lhz 10 0 0100 01000xxx10 - 00 0 0101 01010xxx00 x01010 lha 10 0 0101 01010xxx10 - 00 0 0110 01100xxx00 x01100 sth 10 0 0110 01100xxx10 - 00 0 0111 01110xxx00 x01110 lmw 10 0 0111 01110xxx10 - 00 0 1000 10000xxx00 x10000 lfs 10 0 1000 10000xxx10 lwbrx 00 0 1001 10010xxx00 x10010 lfd 10 0 1001 10010xxx10 - 00 0 1010 10100xxx00 x10100 stfs 10 0 1010 10100xxx10 stwbrx 00 0 1011 10110xxx00 x10110 stfd 10 0 1011 10110xxx10 - 00 0 1100 11000xxx00 x11000 - 10 0 1100 11000xxx10 lhbrx 00 0 1101 11010xxx00 x11010 ld, ldu, lwa (2) 10 0 1101 11010xxx10 - 00 0 1110 11100xxx00 x11100 - 10 0 1110 11100xxx10 sthbrx 00 0 1111 11110xxx00 x11110 std, stdu (2) 10 0 1111 11110xxx10 - 00 1 0000 00001xxx00 x00001 lwzu 10 1 0000 00001xxx10 - 00 1 0001 00011xxx00 x00011 - 10 1 0001 00011xxx10 - 00 1 0010 00101xxx00 x00101 stwu 10 1 0010 00101xxx10 - 00 1 0011 00111xxx00 x00111 - 10 1 0011 00111xxx10 - 00 1 0100 01001xxx00 x01001 lhzu 10 1 0100 01001xxx10 eciwx 00 1 0101 01011xxx00 x01011 lhau 10 1 0101 01011xxx10 - 00 1 0110 01101xxx00 x01101 sthu 10 1 0110 01101xxx10 ecowx 00 1 0111 01111xxx00 x01111 stmw 10 1 0111 01111xxx10 - 00 1 1000 10001xxx00 x10001 lfsu 10 1 1000 10001xxx10 - 00 1 1001 10011xxx00 x10011 lfdu 10 1 1001 10011xxx10 - 00 1 1010 10101xxx00 x10101 stfsu 10 1 1010 10101xxx10 - 00 1 1011 10111xxx00 x10111 stfdu 10 1 1011 10111xxx10 - 00 1 1100 11001xxx00 x11001 lfdp 10 1 1100 11001xxx10 - 00 1 1101 11011xxx00 x11011 - 10 1 1101 11011xxx10 - 00 1 1110 11101xxx00 x11101 stfdp 10 1 1110 11101xxx10 - 00 1 1111 11111xxx00 x11111 - 10 1 1111 11111xxx10 dcbz 01 0 0000 00000xxx01 ldx 11 0 0000 00000xxx11 lwzx 01 0 0001 00010xxx01 - 11 0 0001 00010xxx11 - 01 0 0010 00100xxx01 stdx 11 0 0010 00100xxx11 stwx 01 0 0011 00110xxx01 - 11 0 0011 00110xxx11 - 01 0 0100 01000xxx01 - 11 0 0100 01000xxx11 lhzx 01 0 0101 01010xxx01 lwax 11 0 0101 01010xxx11 lhax 01 0 0110 01100xxx01 - 11 0 0110 01100xxx11 sthx 01 0 0111 01110xxx01 - 11 0 0111 01110xxx11 - 01 0 1000 10000xxx01 lswx 11 0 1000 10000xxx11 lfsx 01 0 1001 10010xxx01 lswi 11 0 1001 10010xxx11 lfdx 01 0 1010 10100xxx01 stswx 11 0 1010 10100xxx11 stfsx 01 0 1011 10110xxx01 stswi 11 0 1011 10110xxx11 stfdx 01 0 1100 11000xxx01 - 11 0 1100 11000xxx11 lfdpx 01 0 1101 11010xxx01 - 11 0 1101 11010xxx11 lfiwax 01 0 1110 11100xxx01 - 11 0 1110 11100xxx11 stfdpx 01 0 1111 11110xxx01 - 11 0 1111 11110xxx11 stfiwx 01 1 0000 00001xxx01 ldux 11 1 0000 00001xxx11 lwzux 01 1 0001 00011xxx01 - 11 1 0001 00011xxx11 - 01 1 0010 00101xxx01 stdux 11 1 0010 00101xxx11 stwux 01 1 0011 00111xxx01 - 11 1 0011 00111xxx11 - 01 1 0100 01001xxx01 - 11 1 0100 01001xxx11 lhzux 01 1 0101 01011xxx01 lwaux 11 1 0101 01011xxx11 lhaux 01 1 0110 01101xxx01 - 11 1 0110 01101xxx11 sthux 01 1 0111 01111xxx01 - 11 1 0111 01111xxx11 - 01 1 1000 10001xxx01 - 11 1 1000 10001xxx11 lfsux 01 1 1001 10011xxx01 - 11 1 1001 10011xxx11 lfdux 01 1 1010 10101xxx01 - 11 1 1010 10101xxx11 stfsux 01 1 1011 10111xxx01 - 11 1 1011 10111xxx11 stfdux 01 1 1100 11001xxx01 - 11 1 1100 11001xxx11 - 01 1 1101 11011xxx01 - 11 1 1101 11011xxx11 - 01 1 1110 11101xxx01 - 11 1 1110 11101xxx11 - 01 1 1111 11111xxx01 - 11 1 1111 11111xxx11 - 602 Power ISATM III-S Version 2.05 Appendix E. Programming Examples E.1 Unsigned Single-Precision- minus), replace the xori after the "SignedSub" label with "xori RA,RA,2". BCD Arithmetic Preserving the appropriate sign code is accomplished addg6s can be used to add or subtract two BCD oper- by zeroing the sign code of the other operand before ands. In these examples it is assumed that r0 contains performing a 16 digit BCD addition/subtraction. Other 0x666...666. (BCD data formats are described in addends (ones complement or 6's) must leave the sign Section 5.3 of Book I - III.) code position as zero. Addition of the unsigned BCD operand in register RA to (In this example r11 contains 0x6666 6666 6666 6660.) the unsigned BCD operand in register RB can be accomplished as follows. SignedSub: xori RA,RA,1 add r1,RA,r0 add r2,r1,RB SignedAdd: addg6s RT,r1,RB xor r5,RA,RB subf RT,RT,r2 # RT = RA +BCD RB andi. r5,r5, 15 # compare sign codes cmpld cr1,RA,RB # compare magnitudes Subtraction of the unsigned BCD operand in register beq cr0,samesign RA from the unsigned BCD operand in register RB can ble cr1,BminusA be accomplished as follows. (In this example it is assumed that RB is not register 0.) # set up for RT = RA -BCD RB nor r9,RB,RB # one's complement of RB addi r1,RB,1 addi r10,RA,16 # generate the carry in nor r2,RA,RA # one's complement of RA b submag add r3,r1,r2 addg6s RT,r1,r2 BminusA: subf RT,RT,r3 # RT = RB -BCD RA # set up for RT = RB -BCD RA nor r9,RA,RA # one's complement of RA Additional instructions are needed to handle signed addi r10,RB,16 # generate the carry in BCD operands, and BCD operands that occupy more than one register (e.g., unsigned BCD operands that submag: have more than 16 decimal digits). rldicr r9,r9,0,59 # remove the sign code add r8,r10,r9 addg6s RT,r10,r9 E.2 Signed Single-Precision rldicr RT,RT,0,59 # remove generated 6 from # sign position BCD Arithmetic subf RT,RT,r8 b done Addition of the signed 15-digit BCD operand in register RA to the signed BCD operand in register RB can be samesign: accomplished as follows. If the signs of operands are rldicr r8,RB,0,59 # remove the sign code different, then the operand of smaller magnitude is sub- add r10,RA,r11 # add 6's tracted from the operand of larger magnitude and the add r9,r10,r8 sign of the larger operand is preserved; otherwise the addg6s RT,r10,RB subf RT,RT,r9 # RT = RA +BCD RB operands are added and the sign is preserved. The sign code is in the low order 4 bits of the operands done: and uses one of the standard encodings. (See Section 5.3 of Book I - III for a description of BCD and sign encodings.) This example assumes preferred sign option 1 (0b1100 is plus and 0b1101 is minus). For pre- ferred sign option 2 (0b1111 is plus and 0b1101 is Appendix E. Programming Examples 603 Version 2.05 E.3 Unsigned Extended-Preci- sion BCD Arithmetic Multiple precision BCD arithmetic requires additional code to add/subtract higher order digits and handle the carry between 16 digit groups. For example, the follow- ing sequence implements a 32-digit BCD add. In this example the contents of register R3 concatenated with the contents of R4 represent the first 32-digit operand and the contents of register R5 concatenated with the contents of R6 represents the second operand. The contents of register R3 concatenated with the contents of register R4 represents the result. (In this example r0 contains 0x6666 6666 6666 6666.) add r10,R4,r0 addc r9,r10,R6 # generate the carry addg6s R4,r10,R6 subf R4,R4,r9 # RT1 = RA1 +BCD RB1 addze R5,R5 # propagate the carry add r10,R3,r0 add r9,r10,R5 addg6s R3,r10,R5 subf R3,R3,r9 # RT0 = RA0 +BCD RB0 Note that an extra instruction (addze) is required to propagate the carry so that the same value is used in the subsequent add and addg6s. The following sequence implements a 32-digit BCD subtraction. In this example the first operand in R3 and R4 is subtracted from the 2nd operand in R5 and R6.The result is in R3 and R4. addi r10,R6,1 nor r9,R4,R4 # one's complement of RA0 addc r8,r10,r9 # Generate the carry addg6s R4,r10,r9 subf R4,R4,r8 # RT1 = RB1 -BCD RA1 addze r10,R5 # propagate the carry nor r9,R3,R3 # one's complement of RA0 add r8,r10,r2 addg6s R3,r10,r9 subf R3,R3,r8 # RT0 = RB0 -BCD RA0 604 Power ISATM III-S Version 2.05 Book III-E: Power ISA Operating Environment Architecture - Embedded Environment Book III-E: Power ISA Operating Environment Architecture - Embedded 605 Version 2.05 606 Power ISATM III-E Version 2.05 Chapter 1. Introduction 1.1 Overview. . . . . . . . . . . . . . . . . . . . 607 1.5 Exceptions. . . . . . . . . . . . . . . . . . . 608 1.2 32-Bit Implementations . . . . . . . . . 607 1.6 Synchronization . . . . . . . . . . . . . . 609 1.3 Document Conventions . . . . . . . . 607 1.6.1 Context Synchronization . . . . . . 609 1.3.1 Definitions and Notation. . . . . . . 607 1.6.2 Execution Synchronization. . . . . 609 1.3.2 Reserved Fields. . . . . . . . . . . . . 608 1.4 General Systems Overview . . . . . 608 1.1 Overview interrupt" or "Unimplemented Operation exception type Program interrupt", as appropriate. Chapter 1 of Book I describes computation modes, 1 For "system instruction storage error handler" sub- document conventions, a general systems overview, stitute "Instruction Storage interrupt" or "Instruction instruction formats, and storage addressing. This chap- TLB Error", as appropriate. ter augments that description as necessary for the Power ISA Operating Environment Architecture. 1 For "system privileged instruction error handler" substitute "Privileged Instruction exception type Program interrupt". 1.2 32-Bit Implementations 1 For "system service program" substitute "System Call interrupt". Though the specifications in this document assume a 64-bit implementation, 32-bit implementations are per- 1 For "system trap handler" substitute "Trap type mitted as described in Appendix C, "Guidelines for Program interrupt". 64-bit Implementations in 32-bit Mode and 32-bit Imple- mentations" on page 735. 1.3.1 Definitions and Notation The definitions and notation given in Book I are aug- 1.3 Document Conventions mented by the following. 1 real page The notation and terminology used in Book I apply to this Book also, with the following substitutions. A unit of real storage that is aligned at a boundary 1 For "system alignment error handler" substitute that is a multiple of its size. The real page size may "Alignment interrupt". range from 1KB to 1TB. 1 For "system auxiliary processor enabled exception 1 context of a program error handler" substitute "Auxiliary Processor The processor state (e.g., privilege and relocation) Enabled Exception type Program interrupt", in which the program executes. The context is con- 1 For "system data storage error handler" substitute trolled by the contents of certain System Registers, "Data Storage interrupt" or Data TLB Error inter- such as the MSR, of certain lookaside buffers, rupt" as appropriate. such as the TLB, and of other resources. 1 For "system error handler" substitute "interrupt". 1 exception 1 For "system floating-point enabled exception error An error, unusual condition, or external signal, that handler" substitute "Floating-Point Enabled Excep- may set a status bit and may or may not cause an tion type Program interrupt". interrupt, depending upon whether the correspond- ing interrupt is enabled. 1 For "system illegal instruction error handler" substi- tute "Illegal Instruction exception type Program Chapter 1. Introduction 607 Version 2.05 1 interrupt 1 Contents of reserved fields are either preserved by the processor or written as zero. The act of changing the machine state in response to an exception, as described in Chapter The reader should be aware that reading and writing of 5. "Interrupts and Exceptions" on page 661. some of these registers (e.g., the MSR) can occur as a 1 trap interrupt side effect of processing an interrupt and of returning from an interrupt, as well as when requested explicitly An interrupt that results from execution of a Trap by the appropriate instruction (e.g., mtmsr instruction). instruction. 1 Additional exceptions to the rule that the processor obeys the sequential execution model, beyond 1.4 General Systems Overview those described in the section entitled "Instruction The processor or processor unit contains the sequenc- Fetching" in Book I, are the following. ing and processing controls for instruction fetch, - A System Reset or Machine Check interrupt instruction execution, and interrupt action. Most imple- may occur. The determination of whether an mentations also contain data and instruction caches. instruction is required by the sequential exe- Instructions that the processing unit can execute fall cution model is not affected by the potential into the following classes: occurrence of a System Reset or Machine 1 instructions executed in the Branch Processor Check interrupt. (The determination is 1 instructions executed in the Fixed-Point Processor affected by the potential occurrence of any 1 instructions executed in the Floating-Point Proces- other kind of interrupt.) sor - A context-altering instruction is executed 1 instructions executed in the Vector Processor (Chapter 10. "Synchronization Requirements 1 instructions executed in an Auxiliary Processor for Context Alterations" on page 723). The 1 other instructions executed by the processor context alteration need not take effect until the required subsequent synchronizing operation Almost all instructions executed in the Branch Proces- has occurred. sor, Fixed-Point Processor, Floating-Point Processor, and Vector Processor are nonprivileged and are described in Book I. Book I may describe additional nonprivileged instructions (e.g., Book II describes some nonprivileged instructions for cache management). 1 hardware Instructions executed in an Auxiliary Processor are implementation-dependent. Instructions related to the Any combination of hard-wired implementation, supervisor mode, control of processor resources, con- emulation assist, or interrupt for software assis- trol of the storage hierarchy, and all other privileged tance. In the last case, the interrupt may be to an instructions are described here or are implementation- architected location or to an implementation- dependent. dependent location. Any use of emulation assists or interrupts to implement the architecture is imple- mentation-dependent. 1.5 Exceptions 1 /, //, ///, ... denotes a field that is reserved in an instruction, in a register, or in an architected stor- The following augments the exceptions defined in Book age table. I that can be caused directly by the execution of an instruction: 1 ?, ??, ???, ... denotes a field that is implementa- tion-dependent in an instruction, in a register, or in 1 the execution of a floating-point instruction when an architected storage table. MSRFP=0 (Floating-Point Unavailable interrupt) 1 execution of an instruction that causes a debug 1.3.2 Reserved Fields event (Debug interrupt). 1 the execution of an auxiliary processor instruction Some fields of certain architected registers may be when the auxiliary processor instruction is unavail- written to automatically by the processor, e.g., able (Auxiliary Processor Unavailable interrupt) Reserved bits in System Registers. When the proces- sor writes to such a register, the following rules are 1 the execution of a Vector, SPE, or Embedded obeyed. Floating-Point instruction when MSRSPV=0 (SPE/ Embedded Floating-Point/Vector Unavailable 1 Unless otherwise stated, no defined field other interrupt) than the one(s) the processor is specifically updat- ing are modified. 608 Power ISATM III-E Version 2.05 1.6 Synchronization 1.6.2 Execution Synchronization The synchronization described in this section refers to An instruction is execution synchronizing if it satisfies the state of the processor that is performing the syn- items 2 and 3 of the definition of context synchroniza- chronization. tion (see Section 1.6.1). sync is treated like isync with respect to item 2. The execution synchronizing instruc- tions are sync, mtmsr and all context synchronizing 1.6.1 Context Synchronization instructions. An instruction or event is context synchronizing if it sat- Programming Note isfies the requirements listed below. Such instructions and events are collectively called context synchronizing All context synchronizing instructions are execution operations. The context synchronizing operations synchronizing. include the isync instruction, the System Linkage Unlike a context synchronizing operation, an exe- instructions, the mtmsr instruction, and most interrupts cution synchronizing instruction does not ensure (see Section 5.1). that the instructions following that instruction will 1. The operation causes instruction dispatching (the execute in the context established by that instruc- issuance of instructions by the instruction fetching tion. This new context becomes effective some- mechanism to any instruction execution mecha- time after the execution synchronizing instruction nism) to be halted. completes and before or at a subsequent context synchronizing operation. 2. The operation is not initiated or, in the case of dnh [Category: Embedded.Enhanced Debug], isync and wait [Category: Wait], does not complete, until all instructions that precede the operation have completed to a point at which they have reported all exceptions they will cause. 3. The operation ensures that the instructions that precede the operation will complete execution in the context (privilege, relocation, storage protec- tion, etc.) in which they were initiated. 4. If the operation directly causes an interrupt (e.g., sc directly causes a System Call interrupt) or is an interrupt, the operation is not initiated until no exception exists having higher priority than the exception associated with the interrupt (see Section 5.9, "Exception Priorities" on page 689). 5. The operation ensures that the instructions that fol- low the operation will be fetched and executed in the context established by the operation. (This requirement dictates that any prefetched instruc- tions be discarded and that any effects and side effects of executing them out-of-order also be dis- carded, except as described in Section 4.5, "Per- forming Operations Out-of-Order".) Programming Note A context synchronizing operation is necessarily execution synchronizing; see Section 1.6.2. Unlike the Synchronize instruction, a context syn- chronizing operation does not affect the order in which storage accesses are performed. Item 2 permits a choice only for isync (and sync; see Section 1.6.2) because all other execution syn- chronizing operations also alter context. Chapter 1. Introduction 609 Version 2.05 610 Power ISATM III-E Version 2.05 Chapter 2. Branch Processor 2.1 Branch Processor Overview . . . . . 611 2.3 Branch Processor Instructions . . . 613 2.2 Branch Processor Registers . . . . . 611 2.4 System Linkage Instructions . . . . . 613 2.2.1 Machine State Register . . . . . . . 611 2.1 Branch Processor Overview 34:36 Implementation-dependent 37 User Cache Locking Enable (UCLE) This chapter describes the details concerning the regis- [Category: Embedded Cache Locking.User ters and the privileged instructions implemented in the Mode] Branch Processor that are not covered in Book I. 0 Cache Locking instructions are privileged. 1 Cache Locking instructions can be exe- 2.2 Branch Processor Registers cuted in user mode (MSRPR=1). If category Embedded Cache Locking.User Mode is not supported, this bit is treated as 2.2.1 Machine State Register reserved. The MSR (MSR) is a 32-bit register. MSR bits are num- 38 SP/Embedded Floating-Point/Vector Avail- bered 32 (most-significant bit) to 63 (least-significant able (SPV) bit). This register defines the state of the processor. [Category: Signal Processing]: The MSR can also be modified by the mtmsr, rfi, rfci, 0 The processor cannot execute any SP rfdi [Category: Embedded.Enhanced Debug], rfmci, instructions except for the brinc instruc- wrtee and wrteei instructions and interrupts. It can be tion. read by the mfmsr instruction. 1 The processor can execute all SP instruc- tions. MSR 32 63 [Category: Vector]: Figure 1. Machine State Register 0 The processor cannot execute any Vector instruction. Below are shown the bit definitions for the Machine 1 The processor can execute Vector State Register. instructions. Bit Description 39:44 Reserved 32 Computation Mode (CM) 45 Wait State Enable (WE) 0 The processor runs in 32-bit mode. 0 The processor is not in wait state and con- 1 The processor runs in 64-bit mode. tinues processing 1 The processor enters the wait state by 33 Interrupt Computation Mode (ICM) ceasing to execute instructions and enter- On interrupt this bit is copied to MSRCM, ing low power mode. The details of how selecting 32-bit or 64-bit mode for interrupt the wait state is entered and exited, and handling. how the processor behaves while in the wait state, are implementation-dependent. 0 MSRCM is set to 0 (32-bit mode) when an interrupt occurs. 1 MSRCM is set to 1 (64-bit mode) when an interrupt occurs. Chapter 2. Branch Processor 611 Version 2.05 46 Critical Enable (CE) 58 Instruction Address Space (IS) 0 Critical Input, Watchdog Timer, and Pro- 0 The processor directs all instruction cessor Doorbell Critical interrupts are dis- fetches to address space 0 (TS=0 in the abled relevant TLB entry). 1 Critical Input, Watchdog Timer, and Pro- 1 The processor directs all instruction cessor Doorbell Critical interrupts are fetches to address space 1 (TS=1 in the enabled relevant TLB entry). 47 Reserved 59 Data Address Space (DS) 48 External Enable (EE) 0 The processor directs all data storage accesses to address space 0 (TS=0 in the 0 External Input, Decrementer, Fixed-Inter- relevant TLB entry). val Timer, Processor Doorbell, and 1 The processor directs all data storage Embedded Performance Monitor [Cate- accesses to address space 1 (TS=1 in the gory:E.PM] interrupts are disabled. relevant TLB entry). 1 External Input, Decrementer, Fixed-Inter- val Timer, Processor Doorbell, and 60 Implementation-dependent Embedded Performance Monitor [Cate- 61 Performance Monitor Mark (PMM) gory:E.PM] interrupts are enabled. [Category: Embedded.Performance Monitor] 49 Problem State (PR) 0 Disable statistics gathering on marked 0 The processor is in supervisor mode, can processes. execute any instruction, and can access 1 Enable statistics gathering on marked pro- any resource (e.g. GPRs, SPRs, MSR, cesses etc.). See Appendix E for additional information. 1 The processor is in user mode, cannot execute any privileged instruction, and 62 Reserved cannot access any privileged resource. 63 Reserved MSRPR also affects storage access control, The Floating-Point Exception Mode bits FE0 and FE1 as described in Section 6.2.4. are interpreted as shown below. For further details see 50 Floating-Point Available (FP) Book I. [Category: Floating-Point] FE0 FE1 Mode 0 The processor cannot execute any float- 0 0 Ignore Exceptions ing-point instructions, including floating- 0 1 Imprecise Nonrecoverable point loads, stores and moves. 1 0 Imprecise Recoverable 1 The processor can execute floating-point 1 1 Precise instructions. 51 Machine Check Enable (ME) See Section 6.3, "Processor State After Reset" on page 693 for the initial state of the MSR. 0 Machine Check interrupts are disabled. 1 Machine Check interrupts are enabled. Programming Note 52 Floating-Point Exception Mode 0 (FE0) A Machine State Register bit that is reserved may [Category: Floating-Point] be altered by rfi/rfci/rfmci/rfdi [Category:Embed- ded.Enhanced Debug]. (See below) 53 Implementation-dependent 54 Debug Interrupt Enable (DE) 0 Debug interrupts are disabled 1 Debug interrupts are enabled if DBCR0IDM=1 55 Floating-Point Exception Mode 1 (FE1) [Category: Floating-Point] (See below) 56 Reserved 57 Reserved 612 Power ISATM III-E Version 2.05 2.3 Branch Processor Instruc- and by which the system can return from performing a service or from processing an interrupt. tions The System Call instruction is described in Book I, but only at the level required by an application program- mer. A complete description of this instruction appears 2.4 System Linkage Instructions below. These instructions provide the means by which a pro- gram can call upon the system to perform a service, System Call SC-form Return From Interrupt XL-form sc rfi 17 /// /// /// /// // 1 / 19 /// /// /// 50 / 0 6 11 16 20 27 30 31 0 6 11 16 21 31 SRR0 1iea CIA + 4 MSR 1 SRR1 SRR1 1 MSR NIA 1iea SRR00:61 || 0b00 NIA 1 IVPR0:47 || IVOR848:59 || 0b0000 MSR 1 new_value (see below) The rfi instruction is used to return from a base class interrupt, or as a means of simultaneously establishing The effective address of the instruction following the a new context and synchronizing on that new context. System Call instruction is placed into SRR0. The con- tents of the MSR are copied into SRR1. The contents of SRR1 are placed into the MSR. If the new MSR value does not enable any pending excep- Then a System Call interrupt is generated. The inter- tions, then the next instruction is fetched, under control rupt causes the MSR to be set as described in Section of the new MSR value, from the address 5.6 on page 672. SRR00:61||0b00. (Note: VLE behavior may be different; The interrupt causes the next instruction to be fetched see Book VLE.) If the new MSR value enables one or from effective address more pending exceptions, the interrupt associated with the highest priority pending exception is generated; in IVPR0:47||IVOR848:59||0b0000. this case the value placed into the applicable save/ restore register 0 by the interrupt processing mecha- This instruction is context synchronizing. nism (see Section 5.6 on page 672) is the address of Special Registers Altered: the instruction that would have been executed next had SRR0 SRR1 MSR the interrupt not occurred (i.e. the address in SRR0 at the time of the execution of the rfi). This instruction is privileged and context synchronizing. Special Registers Altered: MSR Chapter 2. Branch Processor 613 Version 2.05 Return From Critical Interrupt XL-form Return From Debug Interrupt X-form rfci rfdi [Category: Embedded.Enhanced Debug] 19 /// /// /// 51 / 0 6 11 16 21 31 19 /// /// /// 39 / 0 6 11 16 21 31 MSR 1 CSRR1 NIA 1iea CSRR00:61 || 0b00 MSR 1 DSRR1 NIA 1iea DSRR00:61 || 0b00 The rfci instruction is used to return from a critical class interrupt, or as a means of establishing a new context The rfdi instruction is used to return from a Debug and synchronizing on that new context simultaneously. interrupt, or as a means of establishing a new context and synchronizing on that new context simultaneously. The contents of CSRR1 are placed into the MSR. If the new MSR value does not enable any pending excep- The contents of DSRR1 are placed into the MSR. If the tions, then the next instruction is fetched, under control new MSR value does not enable any pending excep- of the new MSR value, from the address tions, then the next instruction is fetched, under control CSRR00:61||0b00. (Note: VLE behavior may be differ- of the new MSR value, from the address ent; see Book VLE.) If the new MSR value enables one DSRR00:61||0b00. (Note: VLE behavior may be differ- or more pending exceptions, the interrupt associated ent; see Book VLE.) If the new MSR value enables one with the highest priority pending exception is gener- or more pending exceptions, the interrupt associated ated; in this case the value placed into SRR0 or with the highest priority pending exception is gener- CSRR0 by the interrupt processing mechanism (see ated; in this case the value placed into SRR0, CSRR0, Section 5.6 on page 672) is the address of the instruc- or DSRR0 by the interrupt processing mechanism is tion that would have been executed next had the inter- the address of the instruction that would have been rupt not occurred (i.e. the address in CSRR0 at the executed next had the interrupt not occurred (i.e. the time of the execution of the rfci). address in DSRR0 at the time of the execution of the rfdi). This instruction is privileged and context synchronizing. This instruction is privileged and context synchronizing. Special Registers Altered: MSR Special Registers Altered: MSR 614 Power ISATM III-E Version 2.05 Return From Machine Check Interrupt XL-form rfmci 19 /// /// /// 38 / 0 6 11 16 21 31 MSR 1 MCSRR1 NIA 1iea MCSRR00:61 || 0b00 The rfmci instruction is used to return from a Machine Check class interrupt, or as a means of establishing a new context and synchronizing on that new context simultaneously. The contents of MCSRR1 are placed into the MSR. If the new MSR value does not enable any pending exceptions, then the next instruction is fetched, under control of the new MSR value, from the address MCSRR00:61||0b00. (Note: VLE behavior may be differ- ent; see Book VLE.) If the new MSR value enables one or more pending exceptions, the interrupt associated with the highest priority pending exception is gener- ated; in this case the value placed into SRR0, CSRR0, MCSRR0, or DSRR0 [Category: Embedded.Enhanced Debug] by the interrupt processing mechanism (see Section 5.6 on page 672) is the address of the instruc- tion that would have been executed next had the inter- rupt not occurred (i.e. the address in MCSRR0 at the time of the execution of the rfmci). This instruction is privileged and context synchronizing. Special Registers Altered: MSR Chapter 2. Branch Processor 615 Version 2.05 616 Power ISATM III-E Version 2.05 Chapter 3. Fixed-Point Processor 3.1 Fixed-Point Processor Overview . 617 3.3.4.2 External Process ID Store Context 3.2 Special Purpose Registers . . . . . . 617 (EPSC) Register . . . . . . . . . . . . . . . . . 620 3.3 Fixed-Point Processor Registers . 617 3.4 Fixed-Point Processor Instructions 621 3.3.1 Processor Version Register . . . . 617 3.4.1 Move To/From System Register 3.3.2 Processor Identification Register 617 Instructions . . . . . . . . . . . . . . . . . . . . . 621 3.3.3 Software-use SPRs . . . . . . . . . . 618 3.4.2 External Process ID Instructions 3.3.4 External Process ID Registers [Cate- [Category: Embedded.External PID] . . 627 gory: Embedded.External PID] . . . . . . 619 3.3.4.1 External Process ID Load Context (EPLC) Register . . . . . . . . . . . . . . . . . 619 3.1 Fixed-Point Processor Over- The PVR distinguishes between processors that differ in attributes that may affect software. It contains two view fields. Version A 16-bit number that identifies the version This chapter describes the details concerning the regis- of the processor. Different version num- ters and the privileged instructions implemented in the bers indicate major differences between Fixed-Point Processor that are not covered in Book I. processors, such as which optional facili- ties and instructions are supported. 3.2 Special Purpose Registers Revision A 16-bit number that distinguishes between implementations of the version. Different Special Purpose Registers (SPRs) are read and written revision numbers indicate minor differences using the mfspr (page 624) and mtspr (page 622) between processors having the same ver- instructions. Most SPRs are defined in other chapters sion number, such as clock rate and Engi- of this book; see the index to locate those definitions. neering Change level. Version numbers are assigned by the Power ISA Archi- 3.3 Fixed-Point Processor Reg- tecture process. Revision numbers are assigned by an implementation-defined process. isters 3.3.2 Processor Identification 3.3.1 Processor Version Register Register The Processor Version Register (PVR) is a 32-bit read- The Processor Identification Register (PIR) is a 32-bit only register that contains a value identifying the ver- register that contains a value that can be used to distin- sion and revision level of the processor. The contents guish the processor from other processors in the sys- of the PVR can be copied to a GPR by the mfspr tem. The contents of the PIR can be copied to a GPR instruction. Read access to the PVR is privileged; write by the mfspr instruction. Read access to the PIR is access is not provided. Version Revision 32 48 63 Figure 2. Processor Version Register Chapter 3. Fixed-Point Processor 617 Version 2.05 privileged; write access, if provided, is implementation- The contents of SPRGi can be read using mfspr and dependent. written into SPRGi using mtspr. PROCID 32 63 Bits Name Description 32:63 PROCID Processor ID Figure 3. Processor Identification Register The means by which the PIR is initialized are imple- mentation-dependent. 3.3.3 Software-use SPRs Software-use SPRs are 64-bit registers provided for use by software. SPRG0 SPRG1 SPRG2 SPRG3 SPRG4 SPRG5 SPRG6 SPRG7 SPRG8 SPRG9 [Category: Embedded.Enhanced Debug] 0 63 Figure 4. Special Purpose Registers Programming Note USPRG0 was made a 32-bit register and renamed to VRSAVE; see Book I, Section 6.3.3. SPRG0 through SPRG2 These 64-bit registers can be accessed only in supervisor mode. SPRG3 This 64-bit register can be read in supervisor mode and can be written only in supervisor mode. It is implementation-dependent whether or not this reg- ister can be read in user mode. SPRG4 through SPRG7 These 64-bit registers can be written only in super- visor mode. These registers can be read in super- visor and user modes. SPRG8 through SPRG9 These 64-bit registers can be accessed only in supervisor mode. 618 Power ISATM III-E Version 2.05 3.3.4 External Process ID Regis- Storage interrupt occurs, and the ESREPID bit is set to 1. If the operation was a Store, the ESRST bit is also set ters [Category: Embedded.Exter- to 1. nal PID] The External Process ID Registers provide capabilities 3.3.4.1 External Process ID Load Con- for loading and storing General Purpose Registers and text (EPLC) Register performing cache management operations using a sup- The EPLC register contains fields to provide the con- plied context other than the context normally used by text for External Process ID load instructions. the programming model. Two SPRs describe the context for loading and storing EPLC using external contexts. The External Process ID Load 32 63 Context (EPLC) Register provides the context for Figure 5. External Process ID Load Context External Process ID Load instructions, and the External Register Process ID Store Context (EPSC) Register provides the context for External Process ID Store instructions. These bits are interpreted as follows: Each of these registers contains a PR (privilege) bit, an AS (address space) bit, and a Process ID. Changes to Bit Definition the EPLC or the EPSC Register require that a context 0 External Load Context PR Bit (EPR) synchronizing operation be performed prior to using Used in place of MSRPR by the storage any External Process ID instructions that use these access control mechanism when an External registers. Process ID Load instruction is executed. External Process ID instructions that use the context 0 Supervisor mode provided by the EPLC register include lbepx, lhepx, 1 User mode lwepx, ldepx, dcbtep, dcbtstep, dcbfep, dcbstep, icbiep, lfdepx, evlddepx, lvepx, and lvepxl and those 1 External Load Context AS Bit (EAS) that use the context provided by the EPSC register Used in place of MSRDS for translation when include stbepx, sthepx, stwepx, stdepx, dcbzep, an External Process ID Load instruction is stfdepx, evstddepx, stvepx, and stvepxl. Instruction executed. definitions appear in Section 3.4.2. 0 Address space 0 System software configures the EPLC register to reflect 1 Address space 1 the Process ID, AS, and PR state from the context that 2:17 Reserved it wishes to perform loads from and configures the EPSC register to reflect the Process ID, AS, and PR 18:31 External Load Context Process ID Value state from the context it wishes to perform stores to. (EPID) Software then issues External Process ID instructions Used in place of all Process ID register values to manipulate data as required. for translation when an external Process ID Load instruction is executed. When the processor executes an External Process ID Load instruction, it uses the context information in the EPLC Register instead of the normal context with respect to address translation and storage access con- trol. EPLCEPR is used in place of MSRPR, EPLCEAS is used in place of MSRDS, and EPLCEPID is used in place of any Process ID registers implemented by the processor. Similarly, when the processor executes an External Process ID Store instruction, it uses the con- text information in the EPSC Register instead of the normal context with respect to address translation and storage access control. EPSCEPR is used in place of MSRPR, EPSCEAS is used in place of MSRDS, and EPSCEPID is used in place of all Process ID registers implemented by the processor. Translation occurs using the new substituted values. If the TLB lookup is successful, the storage access con- trol mechanism grants or denies the access using con- text information from EPLCEPR or EPSCEPR for loads and stores respectively. If access is not granted, a Data Chapter 3. Fixed-Point Processor 619 Version 2.05 3.3.4.2 External Process ID Store Con- text (EPSC) Register The EPSC register contains fields to provide the con- text for External Process ID Store instructions. The field encoding is the same as the EPLC Register. EPSC 32 63 Figure 6. External Process ID Store Context Register These bits are interpreted as follows: Bits Definition 0 External Store Context PR Bit (EPR) Used in place of MSRPR by the storage access control mechanism when an External Process ID Store instruction is executed. 0 Supervisor mode 1 User mode 1 External Store Context AS Bit (EAS) Used in place of MSRDS for translation when an External Process ID Store instruction is executed. 0 Address space 0 1 Address space 1 2:17 Reserved 18:31 External Store Context Process ID Value (EPID) Used in place of all Process ID register values for translation when an external PID Store instruction is executed. 620 Power ISATM III-E Version 2.05 3.4 Fixed-Point Processor Instructions 3.4.1 Move To/From System Register Instructions The Move To Special Purpose Register and Move mentation-specific SPR numbers that are implemented, From Special Purpose Register instructions are and similarly for "defined" registers. described in Book I, but only at the level available to an application programmer. For example, no mention is Extended mnemonics made there of registers that can be accessed only in supervisor mode. The descriptions of these instructions Extended mnemonics are provided for the mtspr and given below extend the descriptions given in Book I, but mfspr instructions so that they can be coded with the do not list Special Purpose Registers that are imple- SPR name as part of the mnemonic rather than as a mentation-dependent. In the descriptions of these numeric operand; see Appendix B. instructions given below, the "defined" SPR numbers are the SPR numbers shown in Table 7 and the imple- Figure 7. SPR Numbers SPR1 Register Privileged Length decimal Cat2 spr5:9 spr0:4 Name mtspr mfspr (bits) 1 00000 00001 XER no no 64 B 8 00000 01000 LR no no 64 B 9 00000 01001 CTR no no 64 B 22 00000 10110 DEC yes yes 32 B 26 00000 11010 SRR0 yes yes 64 B 27 00000 11011 SRR1 yes yes 64 B 48 00001 10000 PID yes yes 32 E 54 00001 10110 DECAR yes yes 32 E 58 00001 11010 CSRR0 yes yes 64 E 59 00001 11011 CSRR1 yes yes 32 E 61 00001 11101 DEAR yes yes 64 E 62 00001 11110 ESR yes yes 32 E 63 00001 11111 IVPR yes yes 64 E 256 01000 00000 VRSAVE no no 32 E,V 259 01000 00011 SPRG3 - no 64 B 260-263 01000 001xx SPRG[4-7] - no 64 E 268 01000 01100 TB - no 64 B 269 01000 01101 TBU - no 325 B 272-275 01000 100xx SPRG[0-3] yes yes 64 B 276-279 01000 101xx SPRG[4-7] yes yes 64 E 282 01000 11010 EAR yes yes 32 EC 284 01000 11100 TBL yes - 32 B 285 01000 11101 TBU yes - 32 B 286 01000 11110 PIR - yes 32 E 287 01000 11111 PVR - yes 32 B 304 01001 10000 DBSR yes3 yes 32 E 308 01001 10100 DBCR0 yes yes 32 E 309 01001 10101 DBCR1 yes yes 32 E 310 01001 10110 DBCR2 yes yes 32 E 312 01001 11000 IAC1 yes yes 64 E 313 01001 11001 IAC2 yes yes 64 E 314 01001 11010 IAC3 yes yes 64 E 315 01001 11011 IAC4 yes yes 64 E 316 01001 11100 DAC1 yes yes 64 E 317 01001 11101 DAC2 yes yes 64 E 318 01001 11110 DVC1 yes yes 64 E 319 01001 11111 DVC2 yes yes 64 E 336 01010 10000 TSR yes3 yes 32 E 340 01010 10100 TCR yes yes 32 E Chapter 3. Fixed-Point Processor 621 Version 2.05 Figure 7. SPR Numbers SPR1 Register Privileged Length decimal Cat2 spr5:9 spr0:4 Name mtspr mfspr (bits) 400-415 01100 1xxxx IVOR[0-15] yes yes 32 E 512 10000 00000 SPEFSCR no no 32 SP 526 10000 01110 ATB/ATBL - no 64 ATB 527 10000 01111 ATBU - no 32 ATB 528 10000 10000 IVOR32 yes yes 32 SP 529 10000 10001 IVOR33 yes yes 32 SP 530 10000 10010 IVOR34 yes yes 32 SP 531 10000 10011 IVOR35 yes yes 32 E.PM 532 10000 10100 IVOR36 yes yes 32 E.PC 533 10000 10101 IVOR37 yes yes 32 E.PC 570 10001 11010 MCSRR0 yes yes 64 E 571 10001 11011 MCSRR1 yes yes 32 E 572 10001 11100 MCSR yes yes 64 E 574 10001 11110 DSRR0 yes yes 64 E.ED 575 10001 11111 DSRR1 yes yes 32 E.ED 604 10010 11100 SPRG8 yes yes 64 E 605 10010 11101 SPRG9 yes yes 64 E.ED 624 10011 10000 MAS0 yes yes 32 E.MF 625 10011 10001 MAS1 yes yes 32 E.MF 626 10011 10010 MAS2 yes yes 64 E.MF 627 10011 10011 MAS3 yes yes 32 E.MF 628 10011 10100 MAS4 yes yes 32 E.MF 630 10011 10110 MAS6 yes yes 32 E.MF 633 10011 11001 PID1 yes yes 32 E.MF 634 10011 11010 PID2 yes yes 32 E.MF 688-691 10101 100xx TLB[0-3]CFG yes yes 32 E.MF 702 10101 11110 EPR - yes 32 EXP 924 11100 11100 DCDBTRL -4 yes 32 E.CD 925 11100 11101 DCDBTRH -4 yes 32 E.CD 926 11100 11110 ICDBTRL -5 yes 32 E.CD 927 11100 11111 ICDBTRH -5 yes 32 E.CD 944 11101 10000 MAS7 yes yes 32 E.MF 947 11101 10011 EPLC yes yes 32 E.PD 948 11101 10100 EPSC yes yes 32 E.PD 979 11110 10011 ICDBDR -5 yes 32 E.CD 1012 11111 10100 MMUCSR0 yes yes 32 E.MF 1015 11111 10111 MMUCFG yes yes 32 E.MF - This register is not defined for this instruction. 1 Note that the order of the two 5-bit halves of the SPR number is reversed. 2 See Section 1.3.5 of Book I. 3 This register cannot be directly written to. Instead, bits in the register corre- sponding to 1 bits in (RS) can be cleared using mtspr SPR,RS. 4 The register can be written by the dcread instruction. 5 The register can be written by the icread instruction. All SPR numbers that are not shown above and are not implementation- specific are reserved. Move To Special Purpose Register n 1 spr5:9 || spr0:4 XFX-form if length(SPR(n)) = 64 then SPR(n) 1 (RS) mtspr SPR,RS else SPR(n) 1 (RS)32:63 31 RS spr 467 / The SPR field denotes a Special Purpose Register, 0 6 11 21 31 encoded as shown in Figure 7. The contents of register RS are placed into the designated Special Purpose 622 Power ISATM III-E Version 2.05 Register. For Special Purpose Registers that are 32 bits long, the low-order 32 bits of RS are placed into the SPR. For this instruction, SPRs TBL and TBU are treated as separate 32-bit registers; setting one leaves the other unaltered. spr0=1 if and only if writing the register is privileged. Execution of this instruction specifying a defined and privileged register when MSRPR=1 causes a Privileged Instruction type Program interrupt. Execution of this instruction specifying an SPR number that is not defined for the implementation causes either an Illegal Instruction type Program interrupt or one of the following. 1 if spr0=0: boundedly undefined results 1 if spr0=1: - if MSRPR=1: Privileged Instruction type Pro- gram interrupt; if MSRPR=0: boundedly unde- fined results If the SPR number is set to a value that is shown in Figure 7 but corresponds to an optional Special Pur- pose Register that is not provided by the implementa- tion, the effect of executing this instruction is the same as if the SPR number were reserved. Special Registers Altered: See Figure 7 Compiler and Assembler Note For the mtspr and mfspr instructions, the SPR number coded in assembler language does not appear directly as a 10-bit binary number in the instruction. The number coded is split into two 5-bit halves that are reversed in the instruction, with the high-order 5 bits appearing in bits 16:20 of the instruction and the low-order 5 bits in bits 11:15. Programming Note For a discussion of software synchronization requirements when altering certain Special Pur- pose Registers, see Chapter 10. "Synchronization Requirements for Context Alterations" on page 723. Chapter 3. Fixed-Point Processor 623 Version 2.05 Move From Special Purpose Register Move To Device Control Register XFX-form XFX-form mfspr RT,SPR mtdcr DCRN,RS 31 RT spr 339 / 31 RS dcr 451 / 0 6 11 21 31 0 6 11 21 31 n 1 spr5:9 || spr0:4 DCRN 1 dcr0:4 || dcr5:9 if length(SPR(n)) = 64 then DCR(DCRN) 1 (RS) RT 1 SPR(n) else Let DCRN denote a Device Control Register. (The sup- RT 1 320 || SPR(n) ported Device Control Registers are implementation- dependent.) The SPR field denotes a Special Purpose Register, encoded as shown in Figure 7. The contents of the The contents of register RS are placed into the desig- designated Special Purpose Register are placed into nated Device Control Register. For 32-bit Device Con- register RT. For Special Purpose Registers that are 32 trol Registers, the contents of bits 32:63 of (RS) are bits long, the low-order 32 bits of RT receive the con- placed into the Device Control Register. tents of the Special Purpose Register and the high- This instruction is privileged. order 32 bits of RT are set to zero. Special Registers Altered: spr0=1 if and only if reading the register is privileged. Implementation-dependent. Execution of this instruction specifying a defined and privileged register when MSRPR=1 causes a Privileged Move To Device Control Register Indexed Instruction type Program interrupt. X-form Execution of this instruction specifying an SPR number that is not defined for the implementation causes either mtdcrx RA,RS an Illegal Instruction type Program interrupt or one of the following. 31 RS RA /// 387 / 0 6 11 16 21 31 1 if spr0=0: boundedly undefined results 1 if spr0=1: DCRN 1 (RA) - if MSRPR=1: Privileged Instruction type Pro- DCR(DCRN) 12(RS) gram interrupt Let the contents of register RA denote a Device Control - if MSRPR=0: boundedly undefined results Register. (The supported Device Control Registers supported are implementation-dependent.) If the SPR field contains a value that is shown in Figure 7 but corresponds to an optional Special Pur- The contents of register RS are placed into the desig- pose Register that is not provided by the implementa- nated Device Control Register. For 32-bit Device Con- tion, the effect of executing this instruction is the same trol Registers, the contents of RS32:63 are placed into as if the SPR number were reserved. the Device Control Register. Special Registers Altered: The specification of Device Control Registers using None mtdcrx, mtdcrux (see Book I), and mtdcr is imple- mentation-dependent. For example, mtdcr 105,r2 and Note mtdcrux r1,r2 (where register r1 contains the value 105) See the Notes that appear with mtspr. may not produce identical results on an implementa- tion. This instruction is privileged. Special Registers Altered: Implementation-dependent. 624 Power ISATM III-E Version 2.05 Move From Device Control Register Move To Machine State Register X-form XFX-form mtmsr RS mfdcr RT,DCRN 31 RS /// /// 146 / 31 RT dcr 323 / 0 6 11 16 21 31 0 6 11 21 31 newmsr 1 (RS)32:63 DCRN 1 dcr0:4 || dcr5:9 if MSRCM = 0 & newmsrCM = 1 then NIA0:31 1 0 RT 1 DCR(DCRN) MSR 1 newmsr Let DCRN denote a Device Control Register. (The sup- The contents of register RS32:63 are placed into the ported Device Control Registers are implementation- MSR. If the processor is changing from 32-bit mode to dependent.) 64-bit mode, the next instruction is fetched from 32 0||NIA32:63. The contents of the designated Device Control Register are placed into register RT. For 32-bit Device Control This instruction is privileged and execution synchroniz- Registers, the contents of the Device Control Register ing. are placed into bits 32:63 of RT. Bits 0:31 of RT are set In addition, alterations to the EE or CE bits are effective to 0. as soon as the instruction completes. Thus if MSREE=0 This instruction is privileged. and an External interrupt is pending, executing an mtmsr that sets MSREE to 1 will cause the External Special Registers Altered: interrupt to be taken before the next instruction is exe- Implementation-dependent. cuted, if no higher priority exception exists. Likewise, if MSRCE=0 and a Critical Input interrupt is pending, exe- cuting an mtmsr that sets MSRCE to 1 will cause the Critical Input interrupt to be taken before the next Move From Device Control Register instruction is executed if no higher priority exception Indexed X-form exists. (See Section 5.6 on page 672). mfdcrx RT,RA Special Registers Altered: MSR 31 RT RA /// 259 / 0 6 11 16 21 31 Programming Note For a discussion of software synchronization DCRN 1 (RA) requirements when altering certain MSR bits RT 1 DCR(DCRN) please refer to Chapter 10. Let the contents of register RA denote a Device Control Register (the supported Device Control Registers are implementation-dependent.) The contents of the designated Device Control Register are placed into register RT. For 32-bit Device Control Move From Machine State Register Registers, the contents of bits 32:63 of the designated X-form Device Control Register are placed into RT. Bits 0:31 of mfmsr RT RT are set to 0. The specification of Device Control Registers using 31 RT /// /// 83 / mfdcrx and mfdcrux (see Book I) compared to the 0 6 11 16 21 31 specification of Device Control Registers using mfdcr is implementation-dependent. For example, mfdcr RT 1 320 || MSR r2,105 and mfdcrx r2,r1 (where register r1 contains the value 105) may not produce identical results on an The contents of the MSR are placed into bits 32:63 of implementation or between implementations. Also, register RT and bits 0:31 of RT are set to 0. accessing privileged Device Control Registers with This instruction is privileged. mfdcrux when the processor is in supervisor mode is implementation-dependent. Special Registers Altered: None This instruction is privileged. Special Registers Altered: Implementation-dependent. Chapter 3. Fixed-Point Processor 625 Version 2.05 Write MSR External Enable X-form Write MSR External Enable Immediate X-form wrtee RS wrteei E 31 RS /// /// 131 / 0 6 11 16 21 31 31 /// /// E /// 163 / 0 6 11 16 17 21 31 MSREE 1 (RS)48 MSREE 1 E The content of (RS)48 is placed into MSREE. The value specified in the E field is placed into MSREE. Alteration of the MSREE bit is effective as soon as the instruction completes. Thus if MSREE=0 and an Exter- Alteration of the MSREE bit is effective as soon as the nal interrupt is pending, executing a wrtee instruction instruction completes. Thus if MSREE=0 and an Exter- that sets MSREE to 1 will cause the External interrupt to nal interrupt is pending, executing a wrtee instruction occur before the next instruction is executed, if no that sets MSREE to 1 will cause the External interrupt to higher priority exception exists (Section 5.9, "Exception occur before the next instruction is executed, if no Priorities" on page 689). higher priority exception exists (Section 5.9, "Exception Priorities" on page 689). This instruction is privileged. This instruction is privileged. Special Registers Altered: MSR Special Registers Altered: MSR Programming Note wrtee and wrteei are used to provide atomic update of MSREE. Typical usage is: mfmsr Rn #save EE in (Rn)48 wrteei 0 #turn off EE mfmsr Rn #save EE in (Rn)48 wrteei 0 #turn off EE : : : : #code with EE disabled wrtee Rn #restore EE without altering #other MSR bits that might #have changed 626 Power ISATM III-E Version 2.05 3.4.2 External Process ID Instructions [Category: Embedded.External PID] External Process ID instructions provide capabilities for If an Alignment interrupt, Data Storage interrupt, or a loading and storing General Purpose Registers and Data TLB Error interrupt, occurs while attempting to performing cache management operations using a sup- execute an External Process ID instruction, ESREPID is plied context other than the context normally used by set to 1 indicating that the instruction causing the inter- translation. rupt was an External Process ID instruction; any other applicable ESR bits are also set. The EPLC and EPSC registers provide external con- texts for performing loads and stores. The EPLC and the EPSC registers are described in Section 3.3.4. Load Byte by External Process ID Indexed Load Halfword by External Process ID X-form Indexed X-form lbepx RT,RA,RB lhepx RT,RA,RB 31 RT RA RB 95 / 31 RT RA RB 287 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + (RB) EA 1 b + (RB) RT 1 560 || MEM(EA,1) RT 1 480 || MEM(EA,2) Let the effective address (EA) be the sum (RA|0)+(RB). Let the effective address (EA) be the sum (RA|0)+(RB). The byte in storage addressed by EA is loaded into The halfword in storage addressed by EA is loaded into RT56:63. RT0:55 are set to 0. RT48:63. RT0:47 are set to 0. For lbepx, the normal translation mechanism is not For lhepx, the normal translation mechanism is not used. The contents of the EPLC register are used to used. The contents of the EPLC register are used to provide the context in which translation occurs. The fol- provide the context in which translation occurs. The fol- lowing substitutions are made for just the translation lowing substitutions are made for just the translation and access control process: and access control process: EPLCEPR is used in place of MSRPR EPLCEPR is used in place of MSRPR EPLCEAS is used in place of MSRDS EPLCEAS is used in place of MSRDS EPLCEPID is used in place of all Process ID regis- EPLCEPID is used in place of all Process ID regis- ters. ters. This instruction is privileged. This instruction is privileged. Special Registers Altered: Special Registers Altered: None None Programming Note Programming Note This instruction behaves identically to a lbzx This instruction behaves identically to a lhzx instruction except for using the EPLC register to instruction except for using the EPLC register to provide the translation context. provide the translation context. Chapter 3. Fixed-Point Processor 627 Version 2.05 Load Word by External Process ID Load Doubleword by External Process ID Indexed X-form Indexed X-form lwepx RT,RA,RB ldepx RT,RA,RB 31 RT RA RB 31 / 31 RT RA RB 29 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + (RB) EA 1 b + (RB) RT 1 320 || MEM(EA,4) RT 1 MEM(EA,8) Let the effective address (EA) be the sum (RA|0)+(RB). Let the effective address (EA) be the sum (RA|0)+(RB). The word in storage addressed by EA is loaded into The doubleword in storage addressed by EA is loaded RT32:63. RT0:31 are set to 0. into RT. For lwepx, the normal translation mechanism is not For ldepx, the normal translation mechanism is not used. The contents of the EPLC register are used to used. The contents of the EPLC register are used to provide the context in which translation occurs. The fol- provide the context in which translation occurs. The fol- lowing substitutions are made for just the translation lowing substitutions are made for just the translation and access control process: and access control process: EPLCEPR is used in place of MSRPR EPLCEPR is used in place of MSRPR EPLCEAS is used in place of MSRDS EPLCEAS is used in place of MSRDS EPLCEPID is used in place of all Process ID regis- EPLCEPID is used in place of all Process ID regis- ters. ters. This instruction is privileged. This instruction is privileged. Special Registers Altered: Corequisite Categories: None 64-Bit Special Registers Altered: Programming Note None This instruction behaves identically to a lwzx instruction except for using the EPLC register to Programming Note provide the translation context. This instruction behaves identically to a ldx instruc- tion except for using the EPLC register to provide the translation context. 628 Power ISATM III-E Version 2.05 Store Byte by External Process ID Store Halfword by External Process ID Indexed X-form Indexed X-form stbepx RS,RA,RB sthepx RS,RA,RB 31 RS RA RB 223 / 31 RS RA RB 415 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + (RB) EA 1 b + (RB) MEM(EA,1) 1 (RS)56:63 MEM(EA,2) 1 (RS)48:63 Let the effective address (EA) be the sum (RA|0)+(RB). (RS)56:63 are stored into the byte in storage addressed Let the effective address (EA) be the sum (RA|0)+(RB). by EA. (RS)48:63 are stored into the halfword in storage addressed by EA. For stbepx, the normal translation mechanism is not used. The contents of the EPSC register are used to For sthepx, the normal translation mechanism is not provide the context in which translation occurs. The fol- used. The contents of the EPSC register are used to lowing substitutions are made for just the translation provide the context in which translation occurs. The fol- and access control process: lowing substitutions are made for just the translation EPSCEPR is used in place of MSRPR and access control process: EPSCEAS is used in place of MSRDS EPSCEPR is used in place of MSRPR EPSCEPID is used in place of all Process ID regis- EPSCEAS is used in place of MSRDS ters. EPSCEPID is used in place of all Process ID regis- ters. This instruction is privileged. This instruction is privileged. Special Registers Altered: None Special Registers Altered: None Programming Note This instruction behaves identically to a stbx Programming Note instruction except for using the EPSC register to This instruction behaves identically to a sthx provide the translation context. instruction except for using the EPSC register to provide the translation context. Chapter 3. Fixed-Point Processor 629 Version 2.05 Store Word by External Process ID Store Doubleword by External Process ID Indexed X-form Indexed X-form stwepx RS,RA,RB stdepx RS,RA,RB 31 RS RA RB 159 / 31 RS RA RB 157 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + (RB) EA 1 b + (RB) MEM(EA,4) 1 (RS)32:63 MEM(EA,8) 1 (RS) Let the effective address (EA) be the sum (RA|0)+(RB). Let the effective address (EA) be the sum (RA|0)+(RB). (RS)32:63 are stored into the word in storage addressed (RS) is stored into the doubleword in storage by EA. addressed by EA. For stwepx, the normal translation mechanism is not For stdepx, the normal translation mechanism is not used. The contents of the EPSC register are used to used. The contents of the EPSC register are used to provide the context in which translation occurs. The fol- provide the context in which translation occurs. The fol- lowing substitutions are made for just the translation lowing substitutions are made for just the translation and access control process: and access control process: EPSCEPR is used in place of MSRPR EPSCEPR is used in place of MSRPR EPSCEAS is used in place of MSRDS EPSCEAS is used in place of MSRDS EPSCEPID is used in place of all Process ID regis- EPSCEPID is used in place of all Process ID regis- ters. ters. This instruction is privileged. This instruction is privileged. Special Registers Altered: Corequisite Categories: None 64-Bit Special Registers Altered: Programming Note None This instruction behaves identically to a stwx instruction except for using the EPSC register to Programming Note provide the translation context. This instruction behaves identically to a stdx instruction except for using the EPSC register to provide the translation context. 630 Power ISATM III-E Version 2.05 Data Cache Block Store by External PID Data Cache Block Touch by External PID X-form X-form dcbstep RA,RB dcbtep TH,RA,RB 31 /// RA RB 63 / 31 TH RA RB 319 / 0 6 11 16 21 31 0 6 11 16 21 31 Let the effective address (EA) be the sum (RA|0)+(RB). Let the effective address (EA) be the sum (RA|0)+(RB). If the block containing the byte addressed by EA is in The dcbtep instruction provides a hint that describes a storage that is Memory Coherence Required, a block block or data stream, or indicates the expected use containing the byte addressed by EA is in the data thereof. A hint that the program will probably soon load cache of any processor, and any locations in the block from a given storage location is ignored if the location is are considered to be modified there, then those loca- Caching Inhibited or Guarded. tions are written to main storage. Additional locations in the block may be written to main storage. The block The only operation that is "caused" by the dcbtep ceases to be considered modified in that data cache. instruction is the providing of the hint. The actions (if any) taken by the processor in response to the hint are If the block containing the byte addressed by EA is in not considered to be "caused by" or "associated with" storage that is not Memory Coherence Required and the dcbtep instruction (e.g., dcbtep is considered not the block is in the data cache of this processor, and any to cause any data accesses). No means are provided locations in the block are considered to be modified by which software can synchronize these actions with there, those locations are written to main storage. Addi- the execution of the instruction stream. For example, tional locations in the block may be written to main stor- these actions are not ordered by the memory barrier age, and the block ceases to be considered modified in created by a sync instruction. that data cache. The dcbtep instruction may complete before the opera- The function of this instruction is independent of tion it causes has been performed. whether the block containing the byte addressed by EA is in storage that is Write Through Required or Caching The nature of the hint depends, in part, on the value of Inhibited. the TH field, as specified in the dcbt instruction in Section 3.3.2 of Book II. The instruction is treated as a Load with respect to translation, memory protection, and is treated as a The instruction is treated as a Load, except that no Write with respect to debug events. interrupt occurs if a protection violation occurs. This instruction is privileged. The instruction is privileged. For dcbstep, the normal translation mechanism is not The normal address translation mechanism is not used. used. The contents of the EPLC register are used to The contents of the EPLC register are used to provide provide the context in which translation occurs. The fol- the context in which translation occurs. The following lowing substitutions are made for just the translation substitutions are made for just the translation and and access control process: access control process: EPLCEPR is used in place of MSRPR EPLCEPR is used in place of MSRPR EPLCEAS is used in place of MSRDS EPLCEAS is used in place of MSRDS EPLCEPID is used in place of all Process ID regis- EPLCEPID is used in place of all Process ID regis- ters ters. Special Registers Altered: Special Registers Altered: None None Extended Mnemonics: Programming Note Extended mnemonics are provided for the Data Cache This instruction behaves identically to a dcbst Block Touch by External PID instruction so that it can instruction except for using the EPLC register to be coded with the TH value as the last operand for all provide the translation context. categories. . Extended: Equivalent to: dcbtctep RA,RB,TH dcbtep for TH values of 0b0000 - 0b0111; other TH values are invalid. Chapter 3. Fixed-Point Processor 631 Version 2.05 Extended: Equivalent to: Data Cache Block Flush by External PID dcbtdsep RA,RB,TH dcbtep for TH values of 0b0000 X-form or 0b1000 - 0b1010; other TH values are invalid. dcbfep RA,RB Programming Note 31 /// RA RB 127 / 0 6 11 16 21 31 This instruction behaves identically to a dcbt instruction except for using the EPLC register to provide the translation context. Let the effective address (EA) be the sum (RA|0)+(RB). If the block containing the byte addressed by EA is in storage that is Memory Coherence Required, a block containing the byte addressed by EA is in the data cache of any processor, and any locations in the block are considered to be modified there, then those loca- tions are written to main storage. Additional locations in the block may also be written to main storage. The block is invalidated in the data cache of all processors. If the block containing the byte addressed by EA is in storage that is not Memory Coherence Required, a block containing the byte addressed by EA is in the data cache of this processor, and any locations in the block are considered to be modified there, then those locations are written to main storage. Additional loca- tions in the block may also be written to main storage. The block is invalidated in the data cache of this pro- cessor. The function of this instruction is independent of whether the block containing the byte addressed by EA is in storage that is Write Through Required or Caching Inhibited. The instruction is treated as a Load with respect to translation, memory protection, and is treated as a Write with respect to debug events. This instruction is privileged. The normal translation mechanism is not used. The contents of the EPLC register are used to provide the context in which translation occurs. The following sub- stitutions are made for just the translation and access control process: EPLCEPR is used in place of MSRPR EPLCEAS is used in place of MSRDS EPLCEPID is used in place of all Process ID regis- ters Special Registers Altered: None Programming Note This instruction behaves identically to a dcbf instruction except for using the EPLC register to provide the translation context. 632 Power ISATM III-E Version 2.05 Data Cache Block Touch for Store by that it can be coded with the TH value as the last oper- External PID X-form and for all categories. . dcbtstep TH,RA,RB Extended: Equivalent to: dcbtstctep RA,RB,TH dcbtstep for TH values of 31 TH RA RB 255 / 0b0000 - 0b0111; 0 6 11 16 21 31 other TH values are invalid. Let the effective address (EA) be the sum (RA|0)+(RB). Programming Note This instruction behaves identically to a dcbtst The dcbtstep instruction provides a hint that the pro- instruction except for using the EPLC register to gram will probably soon store to the block containing provide the translation context. the byte addressed by EA. If the Cache Specification category is supported, the nature of the hint depends on the value of the TH field, as specified in Section 3.3.2 of Book II. If the Cache Specification cat- egory is not supported, the TH field is treated as a reserved field. If the block is in a storage location that is Caching Inhibited or Guarded, then the hint is ignored. The only operation that is "caused" by the dcbtstep instruction is the providing of the hint. The actions (if any) taken by the processor in response to the hint are not considered to be "caused by" or "associated with" the dcbtstep instruction (e.g., dcbtstep is considered not to cause any data accesses). No means are pro- vided by which software can synchronize these actions with the execution of the instruction stream. For exam- ple, these actions are not ordered by the memory bar- rier created by a sync instruction. The dcbtstep instruction may complete before the operation it causes has been performed. The instruction is treated as a Load, except that no interrupt occurs if a protection violation occurs. The instruction is privileged. The normal address translation mechanism is not used. The contents of the EPLC register are used to provide the context in which translation occurs. The following substitutions are made for just the translation and access control process: EPLCEPR is used in place of MSRPR EPLCEAS is used in place of MSRDS EPLCEPID is used in place of all Process ID regis- ters. Special Registers Altered: None Extended Mnemonics: Extended mnemonics are provided for the Data Cache Block Touch for Store by External PID instruction so Chapter 3. Fixed-Point Processor 633 Version 2.05 Instruction Cache Block Invalidate by Data Cache Block set to Zero by External External PID X-form PID X-form icbiep RA,RB dcbzep RA,RB 31 /// RA RB 991 / 31 /// RA RB 1023 / 0 6 11 16 21 31 0 6 11 16 21 31 Let the effective address (EA) be the sum (RA|0)+(RB). if RA = 0 then b 1 0 else b 1 (RA) If the block containing the byte addressed by EA is in EA 1 b + (RB) storage that is Memory Coherence Required and a n 1 block size (bytes) block containing the byte addressed by EA is in the m 1 log2(n) instruction cache of any processor, the block is invali- ea 1 EA0:63-m || m0 dated in those instruction caches. MEM(ea, n) 1 n0x00 If the block containing the byte addressed by EA is in Let the effective address (EA) be the sum (RA|0)+(RB). storage that is not Memory Coherence Required and a All bytes in the block containing the byte addressed by block containing the byte addressed by EA is in the EA are set to zero. instruction cache of this processor, the block is invali- dated in that instruction cache. This instruction is treated as a Store. The function of this instruction is independent of This instruction is privileged. whether the block containing the byte addressed by EA The normal translation mechanism is not used. The is in storage that is Write Through Required or Caching contents of the EPSC register are used to provide the Inhibited. context in which translation occurs. The following sub- The instruction is treated as a Load. stitutions are made for just the translation and access control process: This instruction is privileged. EPSCEPR is used in place of MSRPR For icbiep, the normal translation mechanism is not EPSCEAS is used in place of MSRDS used. The contents of the EPLC register are used to EPSCEPID is used in place of all Process ID regis- provide the context in which translation occurs. The fol- ters lowing substitutions are made for just the translation Special Registers Altered: and access control process: None EPLCEPR is used in place of MSRPR EPLCEAS is used in place of MSRDS Programming Note EPLCEPID is used in place of all Process ID regis- ters See the Programming Notes for the dcbz instruc- tion. Special Registers Altered: None Programming Note Programming Note This instruction behaves identically to a dcbz This instruction behaves identically to an icbi instruction except for using the EPSC register to instruction except for using the EPLC register to provide the translation context. provide the translation context. 634 Power ISATM III-E Version 2.05 Load Floating-Point Double by External Store Floating-Point Double by External Process ID Indexed X-form Process ID Indexed X-form lfdepx FRT,RA,RB stfdepx FRS,RA,RB 31 FRT RA RB 607 / 31 FRS RA RB 735 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + (RB) EA 1 b + (RB) FRT 1 MEM(EA,8) MEM(EA,8) 1 (FRS) Let the effective address (EA) be the sum (RA|0)+(RB). Let the effective address (EA) be the sum (RA|0)+(RB). The doubleword in storage addressed by EA is loaded (FRS) is stored into the doubleword in storage into FRT. addressed by EA. For lfdepx, the normal translation mechanism is not For stfdepx, the normal translation mechanism is not used. The contents of the EPLC register are used to used. The contents of the EPSC register are used to provide the context in which translation occurs. The fol- provide the context in which translation occurs. The fol- lowing substitutions are made for just the translation lowing substitutions are made for just the translation and access control process: and access control process: EPLCEPR is used in place of MSRPR EPSCEPR is used in place of MSRPR EPLCEAS is used in place of MSRDS EPSCEAS is used in place of MSRDS EPLCEPID is used in place of all Process ID regis- EPSCEPID is used in place of all Process ID regis- ters ters This instruction is privileged. This instruction is privileged. An attempt to execute lfdepx while MSRFP=0 will An attempt to execute stfdepx while MSRFP=0 will cause a Floating-Point Unavailable interrupt. cause a Floating-Point Unavailable interrupt. Corequisite Categories: Corequisite Categories: Floating-Point Floating-Point Special Registers Altered: Special Registers Altered: None None Programming Note Programming Note This instruction behaves identically to a lfdx This instruction behaves identically to a stfdx instruction except for using the EPLC register to instruction except for using the EPSC register to provide the translation context. provide the translation context. Chapter 3. Fixed-Point Processor 635 Version 2.05 Vector Load Doubleword into Doubleword Vector Store Doubleword into by External Process ID Indexed EVX-form Doubleword by External Process ID Indexed EVX-form evlddepx RT,RA,RB evstddepx RS,RA,RB 31 RT RA RB 285 0 6 11 16 21 31 31 RS RA RB 413 0 6 11 16 21 31 if RA = 0 then b 1 0 else b 1 (RA) if RA = 0 then b 1 0 EA 1 b + (RB) else b 1 (RA) RT 1 MEM(EA,8) EA 1 b + (RB) MEM(EA,8) 1 (RS) Let the effective address (EA) be the sum (RA|0)+(RB). The doubleword in storage addressed by EA is loaded Let the effective address (EA) be the sum (RA|0)+(RB). into RT. (RS) is stored into the doubleword in storage addressed by EA. For evlddepx, the normal translation mechanism is not used. The contents of the EPLC register are used to For evstddepx, the normal translation mechanism is provide the context in which translation occurs. The fol- not used. The contents of the EPSC register are used lowing substitutions are made for just the translation to provide the context in which translation occurs. The and access control process: following substitutions are made for just the translation EPLCEPR is used in place of MSRPR and access control process: EPLCEAS is used in place of MSRDS EPSCEPR is used in place of MSRPR EPLCEPID is used in place of all Process ID regis- EPSCEAS is used in place of MSRDS ters EPSCEPID is used in place of all Process ID regis- ters This instruction is privileged. This instruction is privileged. An attempt to execute evlddepx while MSRSPV=0 will cause an SPE Unavailable interrupt. An attempt to execute evstddepx while MSRSPV=0 will cause an SPE Unavailable interrupt. Corequisite Categories: Signal Processing Engine Corequisite Categories: Signal Processing Engine Special Registers Altered: None Special Registers Altered: None Programming Note This instruction behaves identically to a evlddx Programming Note instruction except for using the EPLC register to This instruction behaves identically to a evstddx provide the translation context. instruction except for using the EPSC register to provide the translation context. 636 Power ISATM III-E Version 2.05 Load Vector by External Process ID Load Vector by External Process ID Indexed X-form Indexed LRU X-form lvepx VRT,RA,RB lvepxl VRT,RA,RB 31 VRT RA RB 295 / 31 VRT RA RB 263 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + (RB) EA 1 b + (RB) VRT 1 MEM(EA & 0xFFFF_FFFF_FFFF_FFF0, 16) VRT 1 MEM(EA & 0xFFFF_FFFF_FFFF_FFF0, 16) mark_as_not_likely_to_be_needed_again_anytime_soon Let the effective address (EA) be the sum (RA|0)+(RB). ( EA ) The quadword in storage addressed by the result of EA ANDed with 0xFFFF_FFFF_FFFF_FFF0 is loaded into Let the effective address (EA) be the sum (RA|0)+(RB). VRT. The quadword in storage addressed by the result of EA ANDed with 0xFFFF_FFFF_FFFF_FFF0 is loaded into For lvepx, the normal translation mechanism is not VRT. used. The contents of the EPLC register are used to provide the context in which translation occurs. The fol- lvepxl provides a hint that the quadword in storage lowing substitutions are made for just the translation addressed by EA will probably not be needed again by and access control process: the program in the near future. EPLCEPR is used in place of MSRPR For lvepxl, the normal translation mechanism is not EPLCEAS is used in place of MSRDS used. The contents of the EPLC register are used to EPLCEPID is used in place of all Process ID regis- provide the context in which translation occurs. The fol- ters lowing substitutions are made for just the translation This instruction is privileged. and access control process: EPLCEPR is used in place of MSRPR An attempt to execute lvepx while MSRSPV=0 will EPLCEAS is used in place of MSRDS cause a Vector Unavailable interrupt. EPLCEPID is used in place of all Process ID regis- Corequisite Categories: ters Vector This instruction is privileged. Special Registers Altered: An attempt to execute lvepxl while MSRSPV=0 will None cause a Vector Unavailable interrupt. Programming Note Corequisite Categories: Vector This instruction behaves identically to a lvx instruc- tion except for using the EPLC register to provide Special Registers Altered: the translation context. None Programming Note See the Programming Notes for the lvxl instruction in Section 6.7.2 of Book I. Programming Note This instruction behaves identically to a lvxl instruction except for using the EPLC register to provide the translation context. Chapter 3. Fixed-Point Processor 637 Version 2.05 Store Vector by External Process ID Store Vector by External Process ID Indexed X-form Indexed LRU X-form stvepx VRS,RA,RB stvepxl VRS,RA,RB 31 VRS RA RB 807 / 31 VRS RA RB 775 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + (RB) EA 1 b + (RB) MEM(EA & 0xFFFF_FFFF_FFFF_FFF0, 16) 1 (VRS) MEM(EA & 0xFFFF_FFFF_FFFF_FFF0, 16) 1 (VRS) mark_as_not_likely_to_be_needed_again_anytime_soon Let the effective address (EA) be the sum (RA|0)+(RB). (EA) The contents of VRS are stored into the quadword in storage addressed by the result of EA ANDed with Let the effective address (EA) be the sum (RA|0)+(RB). 0xFFFF_FFFF_FFFF_FFF0. The contents of VRS are stored into the quadword in storage addressed by the result of EA ANDed with For stvepx, the normal translation mechanism is not 0xFFFF_FFFF_FFFF_FFF0. used. The contents of the EPSC register are used to provide the context in which translation occurs. The fol- The stvepxl instruction provides a hint that the quad- lowing substitutions are made for just the translation word addressed by EA will probably not be needed and access control process: again by the program in the near future. EPSCEPR is used in place of MSRPR For stvepxl, the normal translation mechanism is not EPSCEAS is used in place of MSRDS used. The contents of the EPSC register are used to EPSCEPID is used in place of all Process ID regis- provide the context in which translation occurs. The fol- ters lowing substitutions are made for just the translation This instruction is privileged. and access control process: EPSCEPR is used in place of MSRPR An attempt to execute stvepx while MSRSPV=0 will EPSCEAS is used in place of MSRDS cause a Vector Unavailable interrupt. EPSCEPID is used in place of all Process ID regis- Corequisite Categories: ters Vector This instruction is privileged. Special Registers Altered: An attempt to execute stvepxl while MSRSPV=0 will None cause a Vector Unavailable interrupt. Programming Note Corequisite Categories: Vector This instruction behaves identically to a stvx instruction except for using the EPSC register to Special Registers Altered: provide the translation context. None Programming Note See the Programming Notes for the lvxl instruction in Section 6.7.2 of Book I. Programming Note This instruction behaves identically to a stvxl instruction except for using the EPSC register to provide the translation context. 638 Power ISATM III-E Version 2.05 Chapter 4. Storage Control 4.1 Storage Addressing . . . . . . . . . . . 639 4.8 Storage Control Attributes. . . . . . . 649 4.2 Storage Exceptions . . . . . . . . . . . 639 4.8.1 Guarded Storage . . . . . . . . . . . . 649 4.3 Instruction Fetch . . . . . . . . . . . . . 640 4.8.1.1 Out-of-Order Accesses to Guarded 4.3.1 Implicit Branch . . . . . . . . . . . . . . 640 Storage . . . . . . . . . . . . . . . . . . . . . . . . 650 4.3.2 Address Wrapping Combined with 4.8.2 User-Definable . . . . . . . . . . . . . . 650 Changing MSR Bit CM . . . . . . . . . . . . 640 4.8.3 Storage Control Bits . . . . . . . . . . 650 4.4 Data Access . . . . . . . . . . . . . . . . . 640 4.8.3.1 Storage Control Bit Restrictions . . 4.5 Performing Operations 650 Out-of-Order . . . . . . . . . . . . . . . . . . . . 640 4.8.3.2 Altering the Storage Control Bits . . 4.6 Invalid Real Address. . . . . . . . . . . 641 651 4.7 Storage Control . . . . . . . . . . . . . . 641 4.9 Storage Control Instructions . . . . . 652 4.7.1 Storage Control Registers . . . . . 641 4.9.1 Cache Management Instructions 652 4.7.1.1 Process ID Register . . . . . . . . 641 4.9.2 Cache Locking [Category: Embed- 4.7.1.2 Translation Lookaside Buffer . 641 ded Cache Locking]. . . . . . . . . . . . . . . 653 4.7.2 Page Identification . . . . . . . . . . . 643 4.9.2.1 Lock Setting and Clearing . . . . 653 4.7.3 Address Translation. . . . . . . . . . 646 4.9.2.2 Error Conditions . . . . . . . . . . . 653 4.7.4 Storage Access Control . . . . . . . 647 4.9.2.2.1 Overlocking . . . . . . . . . . . . . 653 4.7.4.1 Execute Access . . . . . . . . . . . 647 4.9.2.2.2 Unable-to-lock and Unable-to- 4.7.4.2 Write Access. . . . . . . . . . . . . . 647 unlock Conditions . . . . . . . . . . . . . . . . 654 4.7.4.3 Read Access . . . . . . . . . . . . . 647 4.9.2.3 Cache Locking Instructions . . . 655 4.7.4.4 Storage Access Control Applied to 4.9.3 Synchronize Instruction . . . . . . . 657 Cache Management Instructions . . . . 647 4.9.4 Lookaside Buffer 4.7.4.5 Storage Access Control Applied to Management . . . . . . . . . . . . . . . . . . . . 657 String Instructions . . . . . . . . . . . . . . . . 648 4.9.4.1 TLB Management Instructions . 658 4.7.5 TLB Management . . . . . . . . . . . 648 4.1 Storage Addressing 4.2 Storage Exceptions A program references storage using the effective A storage exception results when the sequential execu- address computed by the processor when it executes a tion model requires that a storage access be performed Load, Store, Branch, or Cache Management instruc- but the access is not permitted (e.g., is not permitted by tion, or when it fetches the next sequential instruction. the storage protection mechanism), the access cannot The effective address is translated to a real address be performed because the effective address cannot be according to procedures described in Section 4.7.2 and translated to a real address, or the access matches in Section 4.7.3. The real address that results from the some tracking mechanism criteria (e.g., Data Address respective translations is used to access main storage. Breakpoint). For a complete discussion of storage addressing and In certain cases a storage exception may result in the effective address calculation, see Section 1.10 of Book "restart" of (re-execution of at least part of) a Load or I. Store instruction. See Section 2.1 of Book II and Section 5.7 on page 686 in this Book. Chapter 4. Storage Control 639 Version 2.05 4.3 Instruction Fetch the sequential execution model. An operation is said to be performed "out-of-order" if, at the time that it is per- The effective address for an instruction fetch is pro- formed, it is not known to be required by the sequential cessed under control of MSRIS. The Address Transla- execution model. tion mechanism is described beginning in Operations are performed out-of-order by the proces- Section 4.7.2. sor on the expectation that the results will be needed by an instruction that will be required by the sequential 4.3.1 Implicit Branch execution model. Whether the results are really needed is contingent on everything that might divert the control Explicitly altering certain MSR bits (using mtmsr), or flow away from the instruction, such as Branch, Trap, explicitly altering TLB entries, certain System Registers System Call, and Return From Interrupt instructions, and possibly other implementation-dependent regis- and interrupts, and on everything that might change the ters, may have the side effect of changing the context in which the instruction is executed. addresses, effective or real, from which the current Typically, the processor performs operations out-of- instruction stream is being fetched. This side effect is order when it has resources that would otherwise be called an implicit branch. For example, an mtmsr idle, so the operation incurs little or no cost. If subse- instruction that changes the value of MSRCM may quent events such as branches or interrupts indicate change the real address from which the current instruc- that the operation would not have been performed in tion stream is being fetched. The MSR bits and System the sequential execution model, the processor aban- Registers (excluding implementation-dependent regis- dons any results of the operation (except as described ters) for which alteration can cause an implicit branch below). are indicated as such in Chapter 10. "Synchronization Requirements for Context Alterations" on page 723. In the remainder of this section, including its subsec- Implicit branches are not supported by the Power ISA. tions, "Load instruction" includes the Cache Manage- If an implicit branch occurs, the results are boundedly ment and other instructions that are stated in the undefined. instruction descriptions to be "treated as a Load", and similarly for "Store instruction". 4.3.2 Address Wrapping Com- A data access that is performed out-of-order may cor- respond to an arbitrary Load or Store instruction (e.g., a bined with Changing MSR Bit CM Load or Store instruction that is not in the instruction If the current instruction is at effective address 232-4 stream being executed). Similarly, an instruction fetch and is an mtmsr instruction that changes the contents that is performed out-of-order may be for an arbitrary of MSRCM, the effective address of the next sequential instruction (e.g., the aligned word at an arbitrary loca- instruction is undefined. tion in instruction storage). Most operations can be performed out-of-order, as long Programming Note as the machine appears to follow the sequential execu- In the case described in the preceding paragraph, if tion model. Certain out-of-order operations are an interrupt occurs before the next sequential restricted, as follows. instruction is executed, the contents of SRR0, 1 Stores CSRR0, or MCSRR0, as appropriate to the inter- rupt, are undefined. Stores are not performed out-of-order (even if the Store instructions that caused them were executed out-of-order). 4.4 Data Access 1 Accessing Guarded Storage The restrictions for this case are given in Section The effective address for a data access is processed 4.8.1.1. under control of MSRDS. The Address Translation mechanism is described beginning in Section 4.7.2. The only permitted side effects of performing an opera- tion out-of-order are the following. Storage control attributes may also affect instruction fetch. 1 A Machine Check that could be caused by in-order execution may occur out-of-order. 1 Non-Guarded storage locations that could be 4.5 Performing Operations fetched into a cache by in-order fetching or execu- Out-of-Order tion of an arbitrary instruction may be fetched out- of-order into that cache. An operation is said to be performed "in-order" if, at the time that it is performed, it is known to be required by 640 Power ISATM III-E Version 2.05 4.6 Invalid Real Address Some implementations may support more than one Process ID Register. See User's Manual for the imple- A storage access (including an access that is per- mentation. formed out-of-order; see Section 4.5) may cause a Machine Check if the accessed storage location con- 4.7.1.2 Translation Lookaside Buffer tains an uncorrectable error or does not exist. See Section 5.6.2 on page 674. The Translation Lookaside Buffer (TLB) is the hardware resource that controls translation, protection, and stor- age control attributes. The organization of the TLB (e.g. 4.7 Storage Control unified versus separate instruction and data, hierar- chies, associativity, number of entries, etc.) is imple- This section describes the address translation facility, mentation-dependent. Thus, the software for updating access control, and storage control attributes. the TLB is also implementation-dependent. For the pur- poses of this discussion, a unified TLB organization is Demand-paged virtual memory is supported, as well as assumed. The differences for an implementation with a variety of other management schemes that depend separate instruction and data TLBs are for the most on precise control of effective-to-real address transla- part obvious (e.g. separate instructions or separate tion and flexible memory protection. Translation misses index ranges for reading, writing, searching, and invali- and protection faults cause precise exceptions. Suffi- dating each TLB). For details on how to synchronize cient information is available to correct the fault and TLB updates with instruction execution see Chapter 10. restart the faulting instruction. Maintenance of TLB entries is under software control. The effective address space is divided into pages. The System software determines TLB entry replacement page represents the granularity of effective address strategy and the format and use of any page state infor- translation, access control, and storage control mation. The TLB entry contains all the information attributes. Up to sixteen page sizes (1KB, 4KB, 16KB, required to identify the page, to specify the translation, 64KB, 256KB, 1MB, 4MB, 16MB, 64MB, 256MB, 1GB, to specify access controls, and to specify the storage 4GB, 16GB, 64GB, 256GB, 1TB) may be simulta- control attributes. The format of the TLB entry is imple- neously supported. In order for an effective to real mentation-dependent. translation to exist, a valid entry for the page containing the effective address must be in the Translation Looka- While the TLB is managed by software, an implementa- side Buffer (TLB). Addresses for which no TLB entry tion may include partial or full hardware assist for TLB exists cause TLB Miss exceptions. management (e.g. support of the Server environment's virtual memory architecture). However, such implemen- tations should be able to disable such support with 4.7.1 Storage Control Registers implementation-dependent software or hardware con- figuration mechanisms. In addition to the registers described below, the Machine State Register provides the IS and DS bits, A TLB entry is written by copying information from a that specify which of the two address spaces the GPR or other implementation-dependent source, using respective instruction or data storage accesses are a series of tlbwe instructions (see page 660). A TLB directed towards. MSRPR bit is also used by the stor- entry is read by copying information to a GPR or other age access control mechanism. implementation-dependent target, using a series of tlbre instructions (see page 658). Software can also 4.7.1.1 Process ID Register search for specific TLB entries using the tlbsx instruc- tion (see page 659). Writing, reading and searching the The Process ID Register (PID) is a 32-bit register. Pro- TLB is implementation-dependent. cess ID Register bits are numbered 32 (most-signifi- Each TLB entry describes a page that is eligible for cant bit) to 63 (least-significant bit). The Process ID translation and access controls. Fields in the TLB entry Register provides a value that is used to construct a vir- fall into four categories: tual address for accessing storage. 1 Page identification fields (information required to The Process ID Register can be read using mfspr and identify the page to the hardware translation mech- can be written using mtspr. An implementation may anism). opt to implement only the least-significant n bits of the 1 Address translation fields Process ID Register, where 0 n 32, and n must be 1 Access control fields the same as the number of implemented bits in the TID 1 Storage attribute fields field of the TLB entry. The most-significant 32­n bits of the Process ID Register are treated as reserved. While the fields in the TLB entry are required, no partic- ular TLB entry format is formally specified. The tlbre and tlbwe instructions provide the ability to read or Chapter 4. Storage Control 641 Version 2.05 write portions of individual entries. Below are shown the field definitions for the TLB entry. Translation Field Name Description Page Identification Fields RPN Real Page Number (up to 54 bits) Bits 0:n­1 of the RPN field are used to replace Name Description bits 0:n­1 of the effective address to produce EPN Effective Page Number (up to 54 bits) the real address for the storage access Bits 0:n­1 of the EPN field are compared to (where n=64­log2(page size in bytes) and bits 0:n­1 of the effective address (EA) of page size is specified by the SIZE field of the the storage access (where n=64­ TLB entry). Software must set unused low- log2(page size in bytes) and page size is order RPN bits (i.e. bits n:53) to 0. See Sec- specified by the SIZE field of the TLB entry). tion 4.7.3. See Table 1. Note: Bits X:Y of the RPN field may be imple- Note: Bits X:Y of the EPN field may be imple- mented, where X 0 and 53 Y. The num- mented, where X=0 or X=32, and Y 353. ber of bits implemented for EPN are not The number of bits implemented for EPN required to be the same number of bits as are not required to be the same number of are implemented for RPN. bits as are implemented for RPN. TS Translation Address Space This bit indicates the address space this TLB entry is associated with. For instruction stor- Storage Control Bits (see Section 4.8.3 on page 650) age accesses, MSRIS must match the value of TS in the TLB entry for that TLB entry to Name Description provide the translation. Likewise, for data W Write-Through Required See Section 1.6.1 storage accesses, MSRDS must match the of Book II. value of TS in the TLB entry. For tlbsx and I Caching Inhibited See Section 1.6.2 of tlbivax instructions, an implementation- Book II. dependent source provides the address M Memory Coherence Required See space specification that must match the Section 1.6.3 of Book II. value of TS. G Guarded See Section 1.6.4 of Book II and SIZE Page Size Section 4.8.1. The SIZE field specifies the size of the page E Endian Mode See Section 1.10.1 of Book I associated with the TLB entry as 4SIZEKB, and Section 1.6.5 of Book II. where 0 SIZE 15. Implementations may U0:U3 User-Definable Storage Control implement any one or more of these page Attributes See Section 4.8.2. sizes. See Table 1. Specifies implementation-dependent and sys- TID Translation ID (implementation-dependent tem-dependent storage control attributes for size) the page associated with the TLB entry. Field used to identify a shared page (TID=0) or VLE Variable Length Encoding [Category: VLE] the owner's process ID of a private page See Section 4.8.3 and Chapter 1 of Book (TID0). See Section 4.7.2. VLE. V Valid This bit indicates that this TLB entry is valid and may be used for translation. The Valid bit for a given entry can be set or cleared Access Control Fields with a tlbwe instruction; alternatively, the Name Description Valid bit for an entry may be cleared by a tlbivax instruction. UX User State Execute Enable See Section 4.7.4.1. 0 Instruction fetch and execution is not permit- ted from this page while MSRPR=1 and will cause an Execute Access Control exception type Instruction Storage interrupt. 1 Instruction fetch and execution is permitted from this page while MSRPR=1. 642 Power ISATM III-E Version 2.05 SX Supervisor State Execute Enable See Sec- 4.7.2 Page Identification tion 4.7.4.1. 0 Instruction fetch and execution is not permit- Instruction effective addresses are generated for ted from this page while MSRPR=0 and will sequential instruction fetches and for addresses that cause an Execute Access Control exception correspond to a change in program flow (branches, type Instruction Storage interrupt. interrupts). Data effective addresses are generated by 1 Instruction fetch and execution is permitted Load, Store, and Cache Management instructions. TLB from this page while MSRPR=1. Management instructions generate effective addresses UW User State Write Enable See Section to determine the presence of or to invalidate a specific 4.7.4.2. TLB entry associated with that address. 0 Store operations, including dcba dcbz, and The Valid (V) bit, Effective Page Number (EPN) field, dcbzep are not permitted to this page when Translation Space Identifier (TS) bit, Page Size (SIZE) MSRPR=1 and will cause a Write Access field, and Translation ID (TID) field of a particular TLB Control exception. Except as noted in entry identify the page associated with that TLB entry. Table 3 on page 648, a Write Access Control Except as noted, all comparisons must succeed to vali- exception will cause a Data Storage inter- date this entry for subsequent translation and access rupt. control processing. Failure to locate a matching TLB 1 Store operations, including dcba, dcbz, and entry based on this criteria for instruction fetches will dcbzep are permitted to this page when result in an Instruction TLB Miss exception type Instruc- MSRPR=1. tion TLB Error interrupt. Failure to locate a matching SW Supervisor State Write Enable See Section TLB entry based on this criteria for data storage 4.7.4.2. accesses will result in a Data TLB Miss exception which 0 Store operations, including dcba, dcbi, may result in a Data TLB Error interrupt. Figure 8 on dcbz, and dcbzep are not permitted to this page 644 illustrates the criteria for a virtual address to page when MSRPR=0. Store operations, match a specific TLB entry. including dcbi, dcbz, and dcbzep, will cause a Write Access Control exception. There are two address spaces, one typically associated Except as noted in Table 3 on page 648, a with interrupt-related storage accesses and one typi- Write Access Control exception will cause a cally associated with non-interrupt-related storage Data Storage interrupt. accesses. There are two bits in the Machine State Reg- 1 Store operations, including dcba, dcbi, ister, the Instruction Address Space bit (IS) and the dcbz, and dcbzep, are permitted to this Data Address Space bit (DS), that control which page when MSRPR=0. address space instruction and data storage accesses, UR User State Read Enable See Section respectively, are performed in, and a bit in the TLB 4.7.4.3. entry (TS) that specifies which address space that TLB 0 Load operations (including load-class Cache entry is associated with. Management instructions) are not permitted Load, Store, Cache Management, Branch, tlbsx, and from this page when MSRPR=1 and will tlbivax instructions and next-sequential-instruction cause a Read Access Control exception. fetches produce a 64-bit effective address. The virtual Except as noted in Table 3 on page 648, a address space is extended from this 64-bit effective Read Access Control exception will cause a address space by prepending a one-bit address space Data Storage interrupt. identifier and a process identifier. For instruction 1 Load operations (including load-class Cache fetches, the address space identifier is provided by Management instructions) are permitted MSRIS and the process identifier is provided by the from this page when MSRPR=1. contents of the Process ID Register. For data storage SR Supervisor State Read Enable See Section accesses, the address space identifier is provided by 4.7.4.3. the MSRDS and the process identifier is provided by the 0 Load operations (including load-class Cache contents of the Process ID Register. For tlbsx, and Management instructions) are not permitted tlbivax instructions, the address space identifier and from this page when MSRPR=0 and will the process identifier are provided by implementation- cause a Read Access Control exception. dependent sources. Except as noted in Table 3 on page 648, a Read Access Control exception will cause a This virtual address is used to locate the associated Data Storage interrupt. entry in the TLB. The address space identifier, the pro- 1 Load operations (including load-class Cache cess identifier, and the effective address of the storage Management instructions) are permitted access are compared to the Translation Address from this page when MSRPR=0. Space bit (TS), the Translation ID field (TID), and the value in the Effective Page Number field (EPN), respectively, of each TLB entry. Chapter 4. Storage Control 643 Version 2.05 The virtual address of a storage access matches a TLB match a specific virtual address exists, assuming a set- entry if, for every TLB entry i in the congruence class associative or fully-associative organization, doing so is specified by EA: a programming error and the results are undefined. 1 the value of the address specifier for the storage access (MSRIS for instruction fetches, MSRDS for Table 1: Page Size and Effective Address to EPN data storage accesses, and implementation- Comparison dependent source for tlbsx and tlbivax) is equal Page Size EA to EPN Comparison SIZE to the value of the TS bit of the TLB entry, and (4SIZEKB) (bits 0:53­2ĄSIZE) =0b0000 1KB EPN0:53 =? EA0:53 1 either the value of the process identifier (Process =0b0001 4KB EPN0:51 =? EA0:51 ID Register for instruction and data storage =0b0010 16KB EPN0:49 =? EA0:49 accesses, and implementation-dependent source =0b0011 64KB EPN0:47 =? EA0:47 for tlbsx and tlbivax) is equal to the value in the =0b0100 256KB EPN0:45 =? EA0:45 TID field of the TLB entry, or the value of the TID =0b0101 1MB EPN0:43 =? EA0:43 field of the TLB entry is equal to 0, and =0b0110 4MB EPN0:41 =? EA0:41 1 the contents of bits 0:n­1 of the effective address =0b0111 16MB EPN0:39 =? EA0:39 of the storage or TLB access are equal to the value =0b1000 64MB EPN0:37 =? EA0:37 of bits 0:n-1 of the EPN field of the TLB entry =0b1001 256MB EPN0:35 =? EA0:35 (where n=64-log2(page size in bytes) and =0b1010 1GB EPN0:33 =? EA0:33 page size is specified by the value of the SIZE field =0b1011 4GB EPN0:31 =? EA0:31 of the TLB entry). See Table 1. =0b1100 16GB EPN0:29 =? EA0:29 =0b1101 64GB EPN0:27 =? EA0:27 A TLB Miss exception occurs if there is no valid entry in =0b1110 256GB EPN0:25 =? EA0:25 the TLB for the page specified by the virtual address =0b1111 1TB EPN0:23 =? EA0:23 (Instruction or Data TLB Error interrupt). Although the possibility to place multiple entries into the TLB that TLBentry[i][V] TLB entry i matches effective address TLBentry[i][TS] =? AS Process IDn:63 private page =? Legend: TLBentry[i][TID]n:63 =0? shared page AS EA {MSRIS for instruction fetches, or MSRDS for data storage accesses, or implementation-dependent for tlbsx & tlbivax effective address of storage access contents of Process ID Register for TLBentry[i][EPN]0:N-1 EA0:N-1 =? Process ID N-1 { instruction fetches and data storage accesses, or implementation-dependent for tlbsx & tlbivax 63 ­ log2(page size) n 64 ­ # of implemented PID/TID bits Figure 8. Virtual Address to TLB Entry Match Process 644 Power ISATM III-E Version 2.05 MSRDS for data storage accesses MSRIS for instruction fetch 64-bit Effective Address AS PID Effective Page Address Offset 0 n­1 n 63 Virtual Address TLB multiple-entry RPN0:53 Real Page Number Offset 0 n­1 n 63 NOTE: n = 64­log2(page size) 64-bit Real Address Figure 9. Effective-to-Real Address Translation Flow Chapter 4. Storage Control 645 Version 2.05 4.7.3 Address Translation The Real Page Number field (RPN) of the matching TLB entry provides the translation for the effective A program references memory by using the effective address of the storage access. Based on the setting of address computed by the processor when it executes a the SIZE field of the matching TLB entry, the RPN field Load, Store, Cache Management, or Branch instruc- replaces the corresponding most-significant N bits of tion, and when it fetches the next instruction. The effec- the effective address (where N = 64 ­ log2(page size)), tive address is translated to a real address according to as shown in Table 2, to produce the 64-bit real address the procedures described in this section. The storage that is to be presented to main storage to perform the subsystem uses the real address for the access. All storage access. storage access effective addresses are translated to real addresses using the TLB mechanism. See Figure 9. Table 2: Effective Address to Real Address Page RPN Bits If the virtual address of the storage access matches a Size Required TLB entry in accordance with the selection criteria SIZE Real Address (4SIZE to be Equal specified in Section 4.7.2, the value of the Real Page KB) to 0 Number field (RPN) of the selected TLB entry provides the real page number portion of the real address. Let =0b0000 1KB none RPN0:53 || EA54:63 n=64­log2(page size in bytes) where page size is =0b0001 4KB RPN52:53=0 RPN0:51 || EA52:63 specified by the SIZE field of the TLB entry. Bits n:63 of =0b0010 16KB RPN50:53=0 RPN0:49 || EA50:63 the effective address are appended to bits 0:n­1 of the =0b0011 64KB RPN48:53=0 RPN0:47 || EA48:63 54-bit RPN field of the selected TLB entry to produce =0b0100 256KB RPN46:53=0 RPN0:45 || EA46:63 the 64-bit real address (i.e. RA = RPN0:n­1 || EAn:63). =0b0101 1MB RPN44:53=0 RPN0:43 || EA44:63 The page size is determined by the value of the SIZE =0b0110 4MB RPN42:53=0 RPN0:41 || EA42:63 field of the selected TLB entry. See Table 2. =0b0111 16MB RPN40:53=0 RPN0:39 || EA40:63 =0b1000 64MB RPN38:53=0 RPN0:37 || EA38:63 The rest of the selected TLB entry provides the access =0b1001 256MB RPN36:53=0 RPN0:35 || EA36:63 control bits (UX, SX, UW, SW, UR, SR), and storage =0b1010 1GB RPN34:53=0 RPN0:33 || EA34:63 control attributes (U0, U1, U2, U3, W, I, M, G, E) for the =0b1011 4GB RPN32:53=0 RPN0:31 || EA32:63 storage access. The access control bits and storage =0b1100 16GB RPN30:53=0 RPN0:29 || EA30:63 attribute bits specify whether or not the access is =0b1101 64GB RPN28:53=0 RPN0:27 || EA28:63 allowed and how the access is to be performed. See =0b1110 256GB RPN26:53=0 RPN0:25 || EA26:63 Sections 4.7.4 and 4.7.5. =0b1111 1TB RPN24:53=0 RPN0:23 || EA24:63 TLB match (see Figure 8) access granted MSRPR instruction fetch TLBentry[UX] TLBentry[SX] load-class data storage access TLBentry[UR] TLBentry[SR] store-class data storage access TLBentry[UW] TLBentry[SW] Figure 10. Access Control Process 646 Power ISATM III-E Version 2.05 4.7.4 Storage Access Control Store operations (including Store-class Cache Man- agement instructions) are permitted to a page in stor- After a matching TLB entry has been identified, an age while in user state (MSRPR=1) if the UW access access control mechanism selectively grants shared control bit for that page is equal to 1. If the UW access access, grants execute access, grants read access, control bit is equal to 0, then execution of the Store grants write access, and prohibits access to areas of instruction is suppressed and a Write Access Control storage based on a number of criteria. Figure 10 illus- exception type Data Storage interrupt is taken. trates the access control process and is described in Store operations (including Store-class Cache Man- detail in Sections 4.7.4.1 through 4.7.4.5. agement instructions) are permitted to a page in stor- An Execute, Read, or Write Access Control exception age while in supervisor state (MSRPR=0) if the SW occurs if the appropriate TLB entry is found but the access control bit for that page is equal to 1. If the SW access is not allowed by the access control mechanism access control bit is equal to 0, then execution of the (Instruction or Data Storage interrupt). See Section 5.6 Store instruction is suppressed and a Write Access for additional information about these and other inter- Control exception type Data Storage interrupt is taken. rupt types. In certain cases, Execute, Read, and Write Access Control exceptions may result in the restart of 4.7.4.3 Read Access (re-execution of at least part of) a Load or Store instruc- tion. The UR and SR bits of the TLB entry control read access to the page (see Table 3). Some implementation may provide additional access control capabilities beyond that described here. Load operations (including Load-class Cache Manage- ment instructions) are permitted from a page in storage while in user state (MSRPR=1) if the UR access control 4.7.4.1 Execute Access bit for that page is equal to 1. If the UR access control The UX and SX bits of the TLB entry control execute bit is equal to 0, then execution of the Load instruction access to the page (see Table 3). is suppressed and a Read Access Control exception type Data Storage interrupt is taken. Instructions may be fetched and executed from a page in storage while in user state (MSRPR=1) if the UX Load operations (including Load-class Cache Manage- access control bit for that page is equal to 1. If the UX ment instructions) are permitted from a page in storage access control bit is equal to 0, then instructions from while in supervisor state (MSRPR=0) if the SR access that page will not be fetched, and will not be placed into control bit for that page is equal to 1. If the SR access any cache as the result of a fetch request to that page control bit is equal to 0, then execution of the Load while in user state. instruction is suppressed and a Read Access Control exception type Data Storage interrupt is taken. Instructions may be fetched and executed from a page in storage while in supervisor state (MSRPR=0) if the SX access control bit for that page is equal to 1. If the 4.7.4.4 Storage Access Control Applied SX access control bit is equal to 0, then instructions to Cache Management Instructions from that page will not be fetched, and will not be placed into any cache as the result of a fetch request to dcbi, dcbz, and dcbzep instructions are treated as that page while in supervisor state. Stores since they can change data (or cause loss of data by invalidating a dirty line). As such, they both can Instructions from no-execute storage may be in the cause Write Access Control exception type Data Stor- instruction cache if they were fetched into that cache age interrupts. If an implementation first flushes a line when their effective addresses were mapped to exe- before invalidating it during a dcbi, the dcbi is treated cute permitted storage. Software need not flush a page as a a Load since the data is not modified. from the instruction cache before marking it no-exe- cute. dcba instructions are treated as Stores since they can change data. As such, they can cause Write Access Furthermore, if the sequential execution model calls for Control exceptions. However, such exceptions will not the execution of an instruction from a page that is not result in a Data Storage interrupt. enabled for execution (i.e. UX=0 when MSRPR=1 or SX=0 when MSRPR=0), an Execute Access Control icbi and icbiep instructions are treated as Loads with exception type Instruction Storage interrupt is taken. respect to protection. As such, they can cause Read Access Control exception type Data Storage interrupts. 4.7.4.2 Write Access dcbt, dcbtep, dcbtst, dcbtstep, and icbt instructions are treated as Loads with respect to protection. As The UW and SW bits of the TLB entry control write such, they can cause Read Access Control exceptions. access to the page (seeTable 3 ). However, such exceptions will not result in a Data Stor- age interrupt. Chapter 4. Storage Control 647 Version 2.05 dcbf, dcbfep, dcbst, and dcbstep instructions are 4.7.5 TLB Management treated as Loads with respect to protection. Flushing or storing a line from the cache is not considered a Store No format for the Page Tables or the Page Table since the store has already been done to update the Entries is implied. Software has significant flexibility in cache and the dcbf, dcbfep, dcbst, or dcbstep implementing a custom replacement strategy. For instruction is only updating the copy in main storage. example, software may choose to lock TLB entries that As a Load, they can cause Read Access Control correspond to frequently used storage, so that those exception type Data Storage interrupts. entries are never cast out of the TLB and TLB Miss exceptions to those pages never occur. At a minimum, Table 3: Storage Access Control Applied to Cache software must maintain an entry or entries for the Instructions Instruction and Data TLB Error interrupt handlers. Read Protection Write Protection TLB management is performed in software with some Instruction Violation Violation hardware assist. This hardware assist consists of a dcba No Yes2 minimum of: dcbf Yes No 1 Automatic recording of the effective address caus- dcbfep Yes No ing a TLB Miss exception. For Instruction TLB Miss exceptions, the address is saved in the Save/ dcbi Yes3 Yes3 Restore Register 0. For Data TLB Miss exceptions, dcblc Yes No the address is saved in the Data Exception dcbst Yes No Address Register. dcbstep Yes No 1 Instructions for reading, writing, searching, invali- dcbt Yes 1 No dating, and synchronizing the TLB (see Section 4.9.4.1). dcbtep Yes1 No dcbtls Yes No dcbtst Yes1 No dcbtstep Yes1 No dcbtstls Yes4 Yes4 dcbz No Yes dcbzep No Yes dci No No icbi Yes No icbiep Yes No icblc Yes5 No icbt Yes1 No icbtls Yes5 No ici No No 1. dcbt, dcbtep, dcbtst, dcbtstep, and icbt may cause a Read Access Control exception but does not result in a Data Storage interrupt. 2. dcba may cause a Write Access Control exception but does not result in a Data Storage interrupt. 3. dcbi may cause a Read or Write Access Control Exception based on whether the data is flushed prior to invalidation. 4. It is implementation-dependent whether dcbtstls is treated as a Load or a Store. 5. icbtls and icblc require execute or read access. 4.7.4.5 Storage Access Control Applied to String Instructions When the string length is zero, neither lswx nor stswx can cause Data Storage interrupts. 648 Power ISATM III-E Version 2.05 Programming Note This Note suggests one example for managing refer- attempt of application code to use the page will cause ence and change recording. an Access Control exception (because the entry is marked "No Execute", "No Read", and "No Write"). The When performing physical page management, it is use- Instruction or Data Storage interrupt handler records ful to know whether a given physical page has been ref- the reference to the TLB entry and to the associated erenced or altered. Note that this may be more involved physical page in a software table, and then turns on the than whether a given TLB entry has been used to refer- appropriate access control bit. An initial read from the ence or alter memory, since multiple TLB entries may page could be handled by only turning on the appropri- translate to the same physical page. If it is necessary to ate UR or SR access control bits, leaving the page replace the contents of some physical page with other "read-only". Subsequent execute, read, or write contents, a page which has been referenced (accessed accesses to the page via this TLB entry will proceed for any purpose) is more likely to be maintained than a normally. page which has never been referenced. If the contents of a given physical page are to be replaced, then the In a demand-paged environment, when the contents of contents of that page must be written to the backing a physical page are to be replaced, if any storage in store before replacement, if anything in that page has that physical page has been altered, then the backing been changed. Software must maintain records to con- storage must be updated. The information that a physi- trol this process. cal page is dirty is typically recorded in a "Change" bit for that page. Similarly, when performing TLB management, it is use- ful to know whether a given TLB entry has been refer- Write Access Control exceptions may be used to allow enced. When making a decision about which entry to software to maintain change information for a physical cast-out of the TLB, an entry which has been refer- page. For the example just given for reference record- enced is more likely to be maintained in the TLB than ing, the first write access to the page via the TLB entry an entry which has never been referenced. will create a Write Access Control exception type Data Storage interrupt. The Data Storage interrupt handler Execute, Read and Write Access Control exceptions records the change status to the physical page in a may be used to allow software to maintain reference software table, and then turns on the appropriate UW information for a TLB entry and for its associated physi- and SW bits. All subsequent accesses to the page via cal page. The entry is built, with its UX, SX, UR, SR, this TLB entry will proceed normally. UW, and SW bits off, and the index and effective page number of the entry retained by software. The first 4.8 Storage Control Attributes This section describes aspects of the storage control Storage is said to be Guarded if the G bit is 1 in the TLB attributes that are relevant only to privileged software entry that translates the effective address. programmers. The rest of the description of storage In general, storage that is not well-behaved should be control attributes may be found in Section 1.6 of Book II Guarded. Because such storage may represent a con- and subsections. trol register on an I/O device or may include locations that do not exist, an out-of-order access to such stor- 4.8.1 Guarded Storage age may cause an I/O device to perform unintended operations or may result in a Machine Check. Storage is said to be "well-behaved" if the correspond- ing real storage exists and is not defective, and if the Instruction fetching is not affected by the G bit. Soft- effects of a single access to it are indistinguishable ware must set guarded pages to no execute (i.e. UX=0 from the effects of multiple identical accesses to it. and SX=0) to prevent instruction fetching from guarded Data and instructions can be fetched out-of-order from storage. well-behaved storage without causing undesired side The following rules apply to in-order execution of Load effects. and Store instructions for which the first byte of the storage operand is in storage that is both Caching Inhibited and Guarded. Chapter 4. Storage Control 649 Version 2.05 1 Load or Store instruction that causes an atomic access If any portion of the storage operand has been Bit Storage Control Attribute accessed, the instruction completes before the W1 0 - not Write Through Required interrupt occurs if any of the following exceptions is 1 - Write Through Required pending. I 0 - not Caching Inhibited 1 External, Decrementer, Critical Input, Machine 1 - Caching Inhibited Check, Fixed-Interval Timer, Watchdog Timer, M2 0 - not Memory Coherence Required Debug, or Imprecise mode Floating-Point or 1 - Memory Coherence Required Auxiliary Processor Enabled G 0 - not Guarded 1 Load or Store instruction that causes an Alignment 1 - Guarded exception, a Data TLB Error exception, or that causes a Data Storage exception. E3 0 - Big-Endian 1 - Little-Endian The portion of the storage operand that is in Cach- U0-U34 User-Definable ing Inhibited and Guarded storage is not accessed. 5 VLE 0 - non Variable Length Encoding (VLE). 1 - VLE 4.8.1.1 Out-of-Order Accesses to 1 Support for the 1 value of the W bit is optional. Guarded Storage Implementations that do not support the 1 value In general, Guarded storage is not accessed out-of- treat the bit as reserved and assume its value to order. The only exceptions to this rule are the following. be 0. 2 Support of the 1 value is optional for implementa- Load Instruction tions that do not support multiprocessing, imple- mentations that do not support this storage If a copy of any byte of the storage operand is in a attribute assume the value of the bit to be 0, and cache then that byte may be accessed in the cache or setting M=1 in a TLB entry will have no effect. in main storage. 3 [Category: Embedded.Little-Endian] 4 Support for these attributes is optional. 4.8.2 User-Definable 5 [Category: VLE] User-definable storage control attributes control user- Figure 11. Storage control bits definable and implementation-dependent behavior of In Section 4.8.3.1 and 4.8.3.2, "access" includes the storage system. These bits are both implementa- accesses that are performed out-of-order. tion-dependent and system-dependent in their effect. They may be used in any combination and also in com- Programming Note bination with the other storage attribute bits. In a uniprocessor system in which only the proces- sor has caches, correct coherent execution does 4.8.3 Storage Control Bits not require the processor to access storage as Memory Coherence Required, and accessing stor- Storage control attributes are specified on a per-page age as not Memory Coherence Required may give basis. These attributes are specified in storage control better performance. bits in the TLB entries. The interpretation of their values is given in Figure 11. 4.8.3.1 Storage Control Bit Restrictions All combinations of W, I, M, G, and E values are permit- ted except those for which both W and I are 1. Programming Note If an application program requests both the Write Through Required and the Caching Inhibited attributes for a given storage location, the operating system should set the I bit to 1 and the W bit to 0. At any given time, the value of the I bit must be the same for all accesses to a given real page. 650 Power ISATM III-E Version 2.05 Accesses to the same storage location using two effec- tive addresses for which the W bit differs meet the memory coherence requirements described in Section 1.6.3 of Book II if the accesses are performed by a single processor. If the accesses are performed by two or more processors, coherence is enforced by the hardware only if the W bit is the same for all the accesses. At any given time, data accesses to a given real page may use both Endian modes. When changing the Endian mode of a given real page for instruction fetch- ing, care must be taken to prevent accesses while the change is made and to flush the instruction cache(s) after the change has been completed. 4.8.3.2 Altering the Storage Control Bits When changing the value of the I bit for a given real page from 0 to 1, software must set the I bit to 1 and then flush all copies of locations in the page from the caches using dcbf, dcbfep, or dcbi, and icbi or icbiep before permitting any other accesses to the page. When changing the value of the W bit for a given real page from 0 to 1, software must ensure that no proces- sor modifies any location in the page until after all cop- ies of locations in the page that are considered to be modified in the data caches have been copied to main storage using dcbst, dcbstep, dcbf, dcbfep, or dcbi. When changing the value of the M bit for a given real page, software must ensure that all data caches are consistent with main storage. The actions required to do this to are system-dependent. Programming Note For example, when changing the M bit in some directory-based systems, software may be required to execute dcbf or dcbfep on each processor to flush all storage locations accessed with the old M value before permitting the locations to be accessed with the new M value. Chapter 4. Storage Control 651 Version 2.05 4.9 Storage Control Instructions 4.9.1 Cache Management Instructions This section describes aspects of cache management delayed Machine Check interrupt or a delayed Check- that are relevant only to privileged software program- stop. mers. Each implementation provides an efficient means by For a dcbz or dcba instruction that causes the target which software can ensure that all blocks that are con- block to be newly established in the data cache without sidered to be modified in the data cache have been being fetched from main storage, the processor need copied to main storage before the processor enters any not verify that the associated real address is valid. The power conserving mode in which data cache contents existence of a data cache block that is associated with are not maintained. an invalid real address (see Section 4.6) can cause a Data Cache Block Invalidate X-form cache, except that the invalidation is not ordered by mbar. On other implementations this instruction is dcbi RA,RB treated as a Load (see the section cited above). If a processor holds a reservation and some other pro- 31 /// RA RB 470 / cessor executes a dcbi to the same reservation gran- 0 6 11 16 21 31 ule, whether the reservation is lost is undefined. if RA=0 then b 0 dcbi may cause a cache locking exception, the details else b (RA) of which are implementation-dependent. EA b + (RB) This instruction is privileged. InvalidateDataCacheBlock( EA ) Special Registers Altered: Let the effective address (EA) be the sum (RA|0)+(RB). None If the block containing the byte addressed by EA is in storage that is Memory Coherence Required and a block containing the byte addressed by EA is in the data cache of any processors, then the block is invali- dated in those data caches. On some implementations, before the block is invalidated, if any locations in the block are considered to be modified in any such data cache, those locations are written to main storage and additional locations in the block may be written to main storage. If the block containing the byte addressed by EA is in storage that is not Memory Coherence Required and a block containing the byte addressed by EA is in the data cache of this processor, then the block is invali- dated in that data cache. On some implementations, before the block is invalidated, if any locations in the block are considered to be modified in that data cache, those locations are written to main storage and addi- tional locations in the block may be written to main stor- age. The function of this instruction is independent of whether the block containing the byte addressed by EA is in storage that is Write Through Required or Caching Inhibited. This instruction is treated as a Store (see Section 4.7.4.4) on implementations that invalidate a block without first writing to main storage all locations in the block that are considered to be modified in the data 652 Power ISATM III-E Version 2.05 4.9.2 Cache Locking [Category: Embedded Cache Locking] The Embedded Cache Locking category defines the method of locking is said to be persistent; otherwise instructions and methods for locking cache blocks for it is not persistent. An implementation may choose to frequently used instructions and data. Cache locking implement locks as persistent or not persistent; how- allows software to instruct the cache to keep latency ever, the preferred method is persistent. sensitive data readily available for fast access. This is It is implementation-dependent if cache blocks are accomplished by marking individual cache blocks as implicitly unlocked in the following ways: locked. 1 A locked block is invalidated as the result of a A locked block differs from a normal block in the cache dcbi, dcbf, dcbfep, icbi, or icbiep instruction. in the following way: 1 A locked block is evicted because of an overlock- 1 blocks that are locked in the cache do not partici- ing condition. pate in the normal replacement policy when a 1 A snoop hit on a locked block that requires the block must be replaced. block to be invalidated. This can occur because the data the block contains has been modified external to the processor, or another processor 4.9.2.1 Lock Setting and Clearing has explicitly invalidated the block. Blocks are locked into the cache by software using 1 The entire cache containing the locked block is Cache Locking instructions. The following instructions invalidated. are provided to lock data items into the data and instruction cache: 4.9.2.2 Error Conditions 1 dcbtls - Data cache block touch and lock set. Setting locks in the cache can fail for a variety of rea- 1 dcbtstls - Data cache block touch for store and sons. A Lock Set instruction addressing a byte in stor- lock set. age that is not allowed to be accessed by the storage 1 icbtls - Instruction cache block touch and lock set. access control mechanism (see Section 4.7.4) will The RA and RB operands in these instructions are cause a Data Storage interrupt (DSI). Addresses refer- used to identify the block to be locked. The CT field enced by Cache Locking instructions are always trans- indicates which cache in the cache hierarchy should be lated as data references; therefore, icbtls instructions targeted. (See Section 3.3 of Book II.) that fail to translate or are not allowed by the storage access control mechanism cause Data TLB Error inter- These instructions are similar in nature to the dcbt, rupts and Data Storage interrupts, respectively. Addi- dcbtst, and icbt instructions, but are not hints and thus tionally, cache locking and clearing operations can fail locking instructions do not execute speculatively and due to non-privileged access. The methods for deter- may cause additional exceptions. For unified caches, mining other failure conditions such as unable-to-lock both the instruction lock set and the data lock set target or overlocking (see below), is implementation-depen- the same cache. dent. Similarly, blocks are unlocked from the cache by soft- When a Cache Locking instruction is executed in user ware using Lock Clear instructions. The following mode and MSRUCLE is 0, a Data Storage interrupt instructions are provided to unlock instructions and occurs and one of the following ESR bits is set to 1. data in their respective caches: 1 dcblc - Data cache block lock clear. Bit Description 1 icblc - Instruction cache block lock clear. 42 DLK0 The RA and RB operands in these instructions are 0 Default setting. used to identify the block to be unlocked. The CT field 1 A dcbtls, dcbtstls, or dcblc instruction indicates which cache in the cache hierarchy should be was executed in user mode. targeted. 43 DLK1 Additionally, an implementation-dependent method can be provided for software to clear all the locks in the 0 Default setting. cache. 1 An icbtls or icblc instruction was exe- cuted in user mode. An implementation is not required to unlock blocks that contain data that has been invalidated unless it is 4.9.2.2.1 Overlocking explicitly unlocked with a dcblc or icblc instruction; if the implementation does not unlock the block upon If no exceptions occur for the execution of an dcbtls, invalidation, the block remains locked even though it dcbtstls, or icbtls instruction, an attempt is made to contains invalid data. If the implementation does not lock the specified block into the cache. If all of the avail- clear locks when the associated block is invalidated, able cache blocks into which the specified block may Chapter 4. Storage Control 653 Version 2.05 be loaded are already locked, an overlocking condition occurs. The overlocking condition may be reported in an implementation-dependent manner. If an overlocking condition occurs, it is implementation- dependent whether the specified block is not locked into the cache or if another locked block is evicted and the specified block is locked. The selection of which block is replaced in an overlock- ing situation is implementation-dependent. The over- locking condition is still said to exist, and is reflected in any implementation-dependent overlocking status. An attempt to lock a block that is already present and valid in the cache will not cause an overlocking condi- tion. If a cache block is to be loaded because of an instruc- tion other than a Cache Management or Cache Locking instruction and all available blocks into which the block can be loaded are locked, the instruction executes and completes, but no cache blocks are unlocked and the block is not loaded into the cache. Programming Note Since caches may be shared among processors, an overlocking condition may occur when loading a block even though a given processor has not locked all the available cache blocks. Similarly. blocks may be unlocked as a result of invalidations by other processors. 4.9.2.2.2 Unable-to-lock and Unable-to-unlock Conditions If no exceptions occur and no overlocking condition exists, an attempt to set or unlock a lock may fail if any of the following are true: 1 The target address is marked Caching Inhibited, or the storage attributes of the address use a coher- ency protocol that does not support locking. 1 The target cache is disabled or not present. 1 The CT field of the instructions contains a value not supported by the implementation. 1 Any other implementation-specific error conditions are detected. If an unable-to-lock or unable-to-unlock condition occurs, the lock set or unlock instruction is treated as a no-op and the condition may be reported in an imple- mentation-dependent manner. 654 Power ISATM III-E Version 2.05 4.9.2.3 Cache Locking Instructions Data Cache Block Touch and Lock Set Data Cache Block Touch for Store and X-form Lock Set X-form dcbtls CT,RA,RB dcbtstls CT,RA,RB 31 / CT RA RB 166 / 31 / CT RA RB 134 / 0 6 7 11 16 21 31 0 6 7 11 16 21 31 Let the effective address (EA) be the sum (RA|0)+(RB). Let the effective address (EA) be the sum (RA|0)+(RB). The dcbtls instruction provides a hint that the program The dcbtstls instruction provides a hint that the pro- will probably soon load from the block containing the gram will probably soon store to the block containing byte addressed by EA, and that the block containing the byte addressed by EA, and that the block contain- the byte addressed by EA is to be loaded and locked ing the byte addressed by EA is to be loaded and into the cache specified by the CT field. (See locked into the cache specified by the CT field. (See Section 3.3 of Book II.) If the CT field is set to a value Section 3.3 of Book II.) If the CT field is set to a value not supported by the implementation, no operation is not supported by the implementation, no operation is performed. performed. If the block already exists in the cache, the block is If the block already exists in the cache, the block is locked without accessing storage. If the block is in a locked without accessing storage. If the block is in a storage location that is Caching Inhibited, then no storage location that is Caching Inhibited, then no cache operation is performed. An unable-to-lock condi- cache operation is performed. An unable-to-lock condi- tion may occur (see Section 4.9.2.2.2), or an overlock- tion may occur (see Section 4.9.2.2.2), or an overlock- ing condition may occur (see Section 4.9.2.2.1). ing condition may occur (see Section 4.9.2.2.1). The dcbtls instruction may complete before the opera- The dcbtstls instruction may complete before the oper- tion it causes has been performed. ation it causes has been performed. The instruction is treated as a Load. It is implementation-dependent whether the instruction is treated as a Load or a Store. This instruction is privileged unless the Embedded Cache Locking.User Mode category is supported. If the This instruction is privileged unless the Embedded Embedded Cache Locking.User Mode category is Cache Locking.User Mode category is supported. If the supported, this instruction is privileged only if Embedded Cache Locking.User Mode category is sup- MSRUCLE=0. ported, this instruction is privileged only if MSRUCLE=0. Special Registers Altered: Special Registers Altered: None None Chapter 4. Storage Control 655 Version 2.05 Instruction Cache Block Touch and Lock Instruction Cache Block Lock Clear Set X-form X-form icbtls CT,RA,RB icblc CT,RA,RB 31 / CT RA RB 486 / 31 / CT RA RB 230 / 0 6 7 11 16 21 31 0 6 7 11 16 21 31 Let the effective address (EA) be the sum (RA|0)+(RB). Let the effective address (EA) be the sum (RA|0)+(RB). The icbtls instruction causes the block containing the The block containing the byte addressed by EA in the byte addressed by EA to be loaded and locked into the instruction cache specified by the CT field is unlocked. instruction cache specified by CT, and provides a hint The instruction is treated as a Load. that the program will probably soon execute code from the block. See Section 3.3 of Book II for a definition of An unable-to-unlock condition may occur (see Section the CT field. 4.9.2.2.2). If the block containing the byte addressed by EA is not locked in the specified cache, no cache oper- If the block already exists in the cache, the block is ation is performed. locked without refetching from memory. If the block is in storage that is Caching Inhibited, no cache operation is This instruction is privileged unless the Embedded performed. Cache Locking.User Mode category is supported. If the Embedded Cache Locking.User Mode category is sup- This instruction treated as a Load (see Section 3.3), ported, this instruction is privileged only if MSRUCLE=0. except that the system instruction storage error handler is not invoked. Special Registers Altered: None An unable-to-lock condition may occur (see Section 4.9.2.2.2), or an overlocking condition may occur (see Section 4.9.2.2.1). This instruction is privileged unless the Embedded Data Cache Block Lock Clear X-form Cache Locking.User Mode category is supported. If the Embedded Cache Locking.User Mode category is sup- dcblc CT,RA,RB ported, this instruction is privileged only if MSRUCLE=0. 31 / CT RA RB 390 / Special Registers Altered: 0 6 7 11 16 21 31 None Let the effective address (EA) be the sum (RA|0)+(RB). The block containing the byte addressed by EA in the data cache specified by the CT field is unlocked. The instruction is treated as a Load. An unable-to-unlock condition may occur (see Section 4.9.2.2.2). If the block containing the byte addressed by EA is not locked in the specified cache, no cache oper- ation is performed. This instruction is privileged unless the Embedded Cache Locking.User Mode category is supported. If the Embedded Cache Locking.User Mode category is sup- ported, this instruction is privileged only if MSRUCLE=0. Special Registers Altered: None Programming Note The dcblc and icblc instructions are used to remove locks previously set by the corresponding lock set instructions. 656 Power ISATM III-E Version 2.05 4.9.3 Synchronize Instruction 4.9.4 Lookaside Buffer The Synchronize instruction is described in Management Section 3.4.3 of Book II, but only at the level required All implementations include a TLB as the architected by an application programmer. This section describes repository of translation, protection, and attribute infor- properties of the instruction that are relevant only to mation for storage. operating system programmers. Each implementation that has a TLB or similar looka- In conjunction with the tlbie and tlbsync instructions, side buffer provides a means by which software can the sync instruction provides an ordering function for invalidate the lookaside entry that translates a given TLB invalidations and related storage accesses on effective address. other processors as described in the tlbsync instruc- tion description on page 659. Programming Note The invalidate all entries function is not required because each TLB entry can be addressed directly without regard to the contents of the entry. In addition, implementations provide a means by which software can do the following. 1 Read a specified TLB entry 1 Identify the TLB entry (if any) associated with a specified effective address 1 Write a specified TLB entry Programming Note Because the presence, absence, and exact semantics of the TLB Management instructions are implementation-dependent, it is recommended that system software "encapsulate" uses of these instructions into subroutines to minimize the impact of moving from one implementation to another. Chapter 4. Storage Control 657 Version 2.05 4.9.4.1 TLB Management Instructions The tlbivax instruction is used to invalidate TLB write, and search TLB entries, and to provide an order- entries. Additional instructions are used to read and ing function for the effects of tlbivax TLB Invalidate Virtual Address Indexed TLB Read Entry X-form X-form tlbre (implementation-dependent) tlbivax (implementation-dependent) 31 ??? ??? ??? 946 / 31 ??? ??? ??? 786 / 0 6 11 16 21 31 0 6 11 16 21 31 Bits 6:20 of the instruction encoding are implementa- Bits 6:20 of the instruction encoding are implementa- tion-dependent, and may be used to specify the source tion-dependent, and may be used to specify the TLB TLB entry, the source portion of the source TLB entry, entry or entries to be invalidated. (E.g. they may specify and the target resource that the result is placed into. virtual or effective addresses.) The implementation-dependent-specified TLB entry is If a single tlbivax instruction can invalidate more read, and the implementation-dependent-specified por- entries than those corresponding to a single VA, a tion of the TLB entry is extracted and placed into an means must be provided to prevent specific TLB entries implementation-dependent target resource. from being invalidated. If the instruction specifies a TLB entry that does not If the Translation Lookaside Buffer (TLB) contains an exist, the results are undefined. entry specified, the entry or entries are made invalid Execution of this instruction may cause other imple- (i.e. removed from the TLB). This instruction causes the mentation-dependent effects. target TLB entry to be invalidated in all processors. This instruction is privileged. If the instruction specifies a TLB entry that does not exist, the results are undefined. Special Registers Altered: Implementation-dependent Execution of this instruction may cause other imple- mentation-dependent effects. The operation performed by this instruction is ordered by the mbar (or sync) instruction with respect to a sub- sequent tlbsync instruction executed by the processor executing the tlbivax instruction. The operations caused by tlbivax and tlbsync are ordered by mbar as a set of operations which is independent of the other sets that mbar orders. This instruction is privileged. Special Registers Altered: None Programming Note The effects of the invalidation may not be visible until the completion of a context synchronizing operation (see Section 1.6.1). Programming Note Care must be taken not to invalidate any TLB entry that contains the mapping for any interrupt vector. 658 Power ISATM III-E Version 2.05 TLB Search Indexed X-form TLB Synchronize X-form tlbsx RA,RB, (implementation-dependent) tlbsync 31 ??? RA RB 914 ? 31 /// /// /// 566 / 0 6 11 16 21 31 0 6 11 16 21 31 if RA=0 then b 1 0 The tlbsync instruction provides an ordering function else b 1 (RA) for the effects of all tlbivax instructions executed by the EA 1 b + (RB) processor executing the tlbsync instruction, with AS 1 implementation-dependent value respect to the memory barrier created by a subsequent ProcessID 1 implementation-dependent value sync instruction executed by the same processor. Exe- VA 1 AS || ProcessID || EA cuting a tlbsync instruction ensures that all of the fol- If there is a TLB entry for which TLBentryVA=VA then result 1 implementation-dependent value lowing will occur. else result 1 undefined 1 All storage accesses by other processors for which target resource(???) 1 result the address was translated using the translations Let the effective address (EA) be the sum(RA|0)+ (RB). being invalidated will have been performed with respect to the processor executing the sync Let address space (AS) be defined as implementation- instruction, to the extent required by the associ- dependent (e.g. could be MSRDS or a bit from an ated Memory Coherence Required attributes, implementation-dependent SPR). before the sync instruction's memory barrier is Let the ProcessID be defined as implementation- created. dependent (e.g. could be from the PID register or from The operation performed by this instruction is ordered an implementation-dependent SPR). by the mbar or msync instruction with respect to pre- Let the virtual address (VA) be the value AS || Pro- ceding tlbivax instructions executed by the processor cessID || EA. See Figure 9 on page 645. executing the tlbsync instruction. The operations caused by tlbivax and tlbsync are ordered by mbar as Bits 6:10 of the instruction encoding are implementa- a set of operations, which is independent of the other tion-dependent, and may be used to specify the target sets that mbar orders. resource that the result of the instruction is placed into. The tlbsync instruction may complete before opera- If the Translation Lookaside Buffer (TLB) contains an tions caused by tlbivax instructions preceding the tlb- entry corresponding to VA, an implementation-depen- sync instruction have been performed. dent value is placed into an implementation-depen- dent-specified target. Otherwise the contents of the This instruction is privileged. implementation-dependent-specified target are left Special Registers Altered: undefined. None Bit 31 of the instruction encoding is implementation- dependent. For example, bit 31 may be interpreted as an "Rc" bit, used to enable recording the success or failure of the search operation. This instruction is privileged. Special Registers Altered: None Chapter 4. Storage Control 659 Version 2.05 TLB Write Entry X-form tlbwe (implementation-dependent) 31 ??? ??? ??? 978 / 0 6 11 16 21 31 Bits 6:20 of the instruction encoding are implementa- tion-dependent, and may be used to specify the target TLB entry, the target portion of the target TLB entry, and the source of the value that is to be written into the TLB. The contents of the implementation-dependent-speci- fied source are written into the implementation-depen- dent-specified portion of the implementation- dependent-specified TLB entry. If the instruction specifies a TLB entry that does not exist, the results are undefined. Execution of this instruction may cause other imple- mentation-dependent effects. This instruction is privileged. Special Registers Altered: Implementation-dependent Programming Note The effects of the update may not be visible until the completion of a context synchronizing opera- tion (see Section 1.6.1). Programming Note Care must be taken not to invalidate any TLB entry that contains the mapping for any interrupt vector. 660 Power ISATM III-E Version 2.05 Chapter 5. Interrupts and Exceptions 5.1 Overview. . . . . . . . . . . . . . . . . . . . 662 5.6.7 Program Interrupt . . . . . . . . . . . . 678 5.2 Interrupt Registers . . . . . . . . . . . . 662 5.6.8 Floating-Point Unavailable Interrupt. 5.2.1 Save/Restore Register 0 . . . . . . 662 679 5.2.2 Save/Restore Register 1 . . . . . . 662 5.6.9 System Call Interrupt . . . . . . . . . 679 5.2.3 Critical Save/Restore Register 0 663 5.6.10 Auxiliary Processor Unavailable 5.2.4 Critical Save/Restore Register 1 663 Interrupt . . . . . . . . . . . . . . . . . . . . . . . . 679 5.2.5 Debug Save/Restore Register 0 5.6.11 Decrementer Interrupt . . . . . . . 680 [Category: Embedded.Enhanced Debug] . 5.6.12 Fixed-Interval Timer Interrupt . . 680 663 5.6.13 Watchdog Timer Interrupt . . . . 680 5.2.6 Debug Save/Restore Register 1 5.6.14 Data TLB Error Interrupt . . . . . . 681 [Category: Embedded.Enhanced Debug] . 5.6.15 Instruction TLB Error Interrupt . 681 663 5.6.16 Debug Interrupt . . . . . . . . . . . . 682 5.2.7 Data Exception Address Register . . 5.6.17 SPE/Embedded Floating-Point/ 664 Vector Unavailable Interrupt 5.2.8 Interrupt Vector Prefix Register . 664 [Categories: SPE.Embedded Float Scalar 5.2.9 Exception Syndrome Register . . 665 Double, SPE.Embedded Float Vector, 5.2.10 Interrupt Vector Offset Registers . . Vector] . . . . . . . . . . . . . . . . . . . . . . . . . 683 666 5.6.18 Embedded Floating-Point Data 5.2.11 Machine Check Registers . . . . 666 Interrupt 5.2.11.1 Machine Check Save/Restore [Categories: SPE.Embedded Float Scalar Register 0 . . . . . . . . . . . . . . . . . . . . . . 667 Double, SPE.Embedded Float Scalar Sin- 5.2.11.2 Machine Check Save/Restore gle, SPE.Embedded Float Vector]. . . . 684 Register 1 . . . . . . . . . . . . . . . . . . . . . . 667 5.6.19 Embedded Floating-Point Round 5.2.11.3 Machine Check Syndrome Regis- Interrupt ter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 667 [Categories: SPE.Embedded Float Scalar 5.2.12 External Proxy Register [Category: Double, SPE.Embedded Float Scalar Sin- External Proxy] . . . . . . . . . . . . . . . . . . 667 gle, SPE.Embedded Float Vector]. . . . 684 5.3 Exceptions . . . . . . . . . . . . . . . . . . 668 5.6.20 Performance Monitor Interrupt [Cat- 5.4 Interrupt Classification . . . . . . . . . 668 egory: Embedded.Performance Monitor] . . 5.4.1 Asynchronous Interrupts . . . . . . 668 685 5.4.2 Synchronous Interrupts . . . . . . . 668 5.6.21 Processor Doorbell Interrupt [Cate- 5.4.2.1 Synchronous, Precise Interrupts. . gory: Embedded.Processor Control] . . 685 669 5.6.22 Processor Doorbell Critical Interrupt 5.4.2.2 Synchronous, Imprecise Interrupts [Category: Embedded.Processor Control] . 669 685 5.4.3 Interrupt Classes . . . . . . . . . . . . 669 5.7 Partially Executed Instructions . . . 686 5.4.4 Machine Check Interrupts . . . . . 669 5.8 Interrupt Ordering and Masking . . 687 5.5 Interrupt Processing . . . . . . . . . . . 670 5.8.1 Guidelines for System Software. 688 5.6 Interrupt Definitions. . . . . . . . . . . . 672 5.8.2 Interrupt Order . . . . . . . . . . . . . . 689 5.6.1 Critical Input Interrupt . . . . . . . . 674 5.9 Exception Priorities . . . . . . . . . . . . 689 5.6.2 Machine Check Interrupt . . . . . . 674 5.9.1 Exception Priorities for Defined 5.6.3 Data Storage Interrupt . . . . . . . . 675 Instructions . . . . . . . . . . . . . . . . . . . . . 690 5.6.4 Instruction Storage Interrupt . . . 676 5.9.1.1 Exception Priorities for Defined 5.6.5 External Input Interrupt . . . . . . . 676 Floating-Point Load and Store Instructions 5.6.6 Alignment Interrupt . . . . . . . . . . 677 690 Chapter 5. Interrupts and Exceptions 661 Version 2.05 5.9.1.2 Exception Priorities for Other 5.9.1.6 Exception Priorities for Defined Defined Load and Store Instructions and System Call Instruction . . . . . . . . . . . . 691 Defined Cache Management Instructions . 5.9.1.7 Exception Priorities for Defined 690 Branch Instructions . . . . . . . . . . . . . . . 691 5.9.1.3 Exception Priorities for Other 5.9.1.8 Exception Priorities for Defined Defined Floating-Point Instructions. . . .690 Return From Interrupt Instructions . . . 691 5.9.1.4 Exception Priorities for Defined 5.9.1.9 Exception Priorities for Other Privileged Instructions . . . . . . . . . . . . .690 Defined Instructions . . . . . . . . . . . . . . 691 5.9.1.5 Exception Priorities for Defined 5.9.2 Exception Priorities for Reserved Trap Instructions. . . . . . . . . . . . . . . . . .690 Instructions . . . . . . . . . . . . . . . . . . . . . 691 5.1 Overview In general, SRR0 contains the address of the instruc- tion that caused the non-critical interrupt, or the An interrupt is the action in which the processor saves address of the instruction to return to after a non-critical its old context (MSR and next instruction address) and interrupt is serviced. begins execution at a pre-determined interrupt-handler The contents of SRR0 when an interrupt is taken are address, with a modified MSR. Exceptions are the mode dependent, reflecting the computation mode cur- events that will, if enabled, cause the processor to take rently in use (specified by MSRCM) and the computa- an interrupt. tion mode entered for execution of the interrupt Exceptions are generated by signals from internal and (specified by MSRICM). The contents of SRR0 upon external peripherals, instructions, the internal timer interrupt can be described as follows (assuming Addr is facility, debug events, or error conditions. the address to be put into SRR0): Interrupts are divided into 4 classes, as described in if (MSRCM = 0) & (MSRICM = 0) Section 5.4.3, such that only one interrupt of each class then SRR0 32undefined || Addr32:63 is reported, and when it is processed no program state if (MSRCM = 0) & (MSRICM = 1) is lost. Since Save/Restore register pairs SRR0/SRR1, then SRR0 320 || Addr32:63 CSRR0/CSRR1, DSRR0/DSRR1 [Category: E.ED], if (MSRCM = 1) & (MSRICM = 1) then SRR0 Addr0:63 if (MSRCM = 1) & (MSRICM = 0) then SRR0 undefined and MCSSR0/MCSSR1 are serially reusable resources used by base, critical, debug [Category: E.ED], The contents of SRR0 can be read into register RT Machine Check interrupts, respectively, program state using mfspr RT,SRR0. The contents of register RS may be lost when an unordered interrupt is taken. (See can be written into the SRR0 using mtspr SRR0,RS. Section 5.8. All interrupts, except Machine Check, are context syn- 5.2.2 Save/Restore Register 1 chronizing as defined in Section 1.6.1 on page 609. A Machine Check interrupt acts like a context synchroniz- Save/Restore Register 1 (SRR1) is a 32-bit register. ing operation with respect to subsequent instructions; SRR1 bits are numbered 32 (most-significant bit) to 63 that is, a Machine Check interrupt need not satisfy (least-significant bit). The register is used to save items 2-3 of Section 1.6.1 but does satisfy items 1, 4, machine state on non-critical interrupts, and to restore and 5. machine state when an rfi is executed. When a non- critical interrupt is taken, the contents of the MSR are placed into SRR1. When rfi is executed, the contents 5.2 Interrupt Registers of SRR1 are placed into the MSR. Bits of SRR1 that correspond to reserved bits in the 5.2.1 Save/Restore Register 0 MSR are also reserved. Save/Restore Register 0 (SRR0) is a 64-bit register. Programming Note SRR0 bits are numbered 0 (most-significant bit) to 63 A MSR bit that is reserved may be inadvertently (least-significant bit). The register is used to save modified by rfi/rfci/rfmci. machine state on non-critical interrupts, and to restore machine state when an rfi is executed. On a non-criti- The contents of SRR1 can be read into register RT cal interrupt, SRR0 is set to the current or next instruc- using mfspr RT,SRR1. The contents of register RS tion address. When rfi is executed, instruction can be written into the SRR1 using mtspr SRR1,RS. execution continues at the address in SRR0. 662 Power ISATM III-E Version 2.05 5.2.3 Critical Save/Restore Regis- can be written into the CSRR1 using mtspr CSRR1,RS. ter 0 Critical Save/Restore Register 0 (CSRR0) is a 64-bit 5.2.5 Debug Save/Restore Regis- register. CSRR0 bits are numbered 0 (most-significant bit) to 63 (least-significant bit). The register is used to ter 0 [Category: Embed- save machine state on critical interrupts, and to restore ded.Enhanced Debug] machine state when an rfci is executed. When a critical interrupt is taken, the CSRR0 is set to the current or Debug Save/Restore Register 0 (DSRR0) is a 64-bit next instruction address. When rfci is executed, register used to save machine state on Debug inter- instruction execution continues at the address in rupts, and to restore machine state when an rfdi is exe- CSRR0. cuted. When a Debug interrupt is taken, the DSRR0 is set to the current or next instruction address. When rfdi In general, CSRR0 contains the address of the instruc- is executed, instruction execution continues at the tion that caused the critical interrupt, or the address of address in DSRR0. the instruction to return to after a critical interrupt is ser- viced. In general, DSRR0 contains the address of an instruc- tion that was executing or just finished execution when The contents of CSRR0 when a critical interrupt is the Debug exception occurred. taken are mode dependent, reflecting the computation mode currently in use (specified by MSRCM) and the The contents of DSRR0 when a Debug interrupt is computation mode entered for execution of the critical taken are mode dependent, reflecting the computation interrupt (specified by MSRICM). The contents of mode currently in use (specified by MSRCM) and the CSRR0 upon critical interrupt can be described as fol- computation mode entered for execution of the Debug lows (assuming Addr is the address to be put into interrupt (specified by MSRICM). The contents of CSRR0): DSRR0 upon Debug interrupt can be described as fol- lows (assuming Addr is the address to be put into if (MSRCM = 0) & (MSRICM = 0) DSRR0): then CSRR0 32undefined || Addr32:63 if (MSRCM = 0) & (MSRICM = 0) then DSRR0 1 32undefined || if (MSRCM = 0) & (MSRICM = 1) Addr32:63 then CSRR0 320 || Addr32:63 if (MSRCM = 0) & (MSRICM = 1) then DSRR0 1 320 || Addr32:63 if (MSRCM = 1) & (MSRICM = 1) then CSRR0 Addr0:63 if (MSRCM = 1) & (MSRICM = 1) then DSRR0 1 Addr0:63 if (MSRCM = 1) & (MSRICM = 0) then CSRR0 undefined if (MSRCM = 1) & (MSRICM = 0) then DSRR0 1 undefined The contents of CSRR0 can be read into register RT The contents of DSRR0 can be read into register RT using mfspr RT,CSRR0. The contents of register RS using mfspr RT,DSRR0. The contents of register RS can be written into CSRR0 using mtspr CSRR0,RS. can be written into DSRR0 using mtspr DSRR0,RS. 5.2.4 Critical Save/Restore Regis- 5.2.6 Debug Save/Restore Regis- ter 1 ter 1 [Category: Embed- Critical Save/Restore Register 1 (CSRR1) is a 32-bit ded.Enhanced Debug] register. CSRR1 bits are numbered 32 (most-signifi- Debug Save/Restore Register 1 (DSRR1) is a 32-bit cant bit) to 63 (least-significant bit). The register is used register used to save machine state on Debug inter- to save machine state on critical interrupts, and to rupts, and to restore machine state when an rfdi is exe- restore machine state when an rfci is executed. When cuted. When a Debug interrupt is taken, the contents of a critical interrupt is taken, the contents of the MSR are the Machine State Register are placed into DSRR1. placed into CSRR1. When rfci is executed, the con- When rfdi is executed, the contents of DSRR1 are tents of CSRR1 are placed into the MSR. placed into the Machine State Register. Bits of CSRR1 that correspond to reserved bits in the Bits of DSRR1 that correspond to reserved bits in the MSR are also reserved. Machine State Register are also reserved. Programming Note The contents of DSRR1 can be read into bits 32:63 of A MSR bit that is reserved may be inadvertently register RT using mfspr RT,DSRR1, setting bits 0:31 modified by rfi/rfci/rfmci. of RT to zero. The contents of bits 32:63 of register RS can be written into the DSSR1 using mtspr DSRR1,RS. The contents of CSRR1 can be read into bits 32:63 of register RT using mfspr RT,CSRR1, setting bits 0:31 of RT to zero. The contents of bits 32:63 of register RS Chapter 5. Interrupts and Exceptions 663 Version 2.05 5.2.7 Data Exception Address Register The Data Exception Address Register (DEAR) is a 64- bit register. DEAR bits are numbered 0 (most-signifi- cant bit) to 63 (least-significant bit). The DEAR contains the address that was referenced by a Load, Store or Cache Management instruction that caused an Align- ment, Data TLB Miss, or Data Storage interrupt. The contents of the DEAR when an interrupt is taken are mode dependent, reflecting the computation mode currently in use (specified by MSRCM) and the compu- tation mode entered for execution of the critical inter- rupt (specified by MSRICM). The contents of the DEAR upon interrupt can be described as follows (assuming Addr is the address to be put into DEAR): if (MSRCM = 0) & (MSRICM = 0) then DEAR 32undefined || Addr32:63 if (MSRCM = 0) & (MSRICM = 1) then DEAR 320 || Addr32:63 if (MSRCM = 1) & (MSRICM = 1) then DEAR Addr0:63 if (MSRCM = 1) & (MSRICM = 0) then DEAR undefined The contents of DEAR can be read into register RT using mtspr RT,DEAR. The contents of register RS can be written into the DEAR using mtspr DEAR,RS. 5.2.8 Interrupt Vector Prefix Reg- ister The Interrupt Vector Prefix Register (IVPR) is a 64-bit register. Interrupt Vector Prefix Register bits are num- bered 0 (most-significant bit) to 63 (least-significant bit). Bits 48:63 are reserved. Bits 0:47 of the Interrupt Vec- tor Prefix Register provides the high-order 48 bits of the address of the exception processing routines. The 16- bit exception vector offsets (provided in Section 5.2.10) are concatenated to the right of bits 0:47 of the Inter- rupt Vector Prefix Register to form the 64-bit address of the exception processing routine. The contents of Interrupt Vector Prefix Register can be read into register RT using mfspr RT,IVPR. The con- tents of register RS can be written into Interrupt Vector Prefix Register using mtspr IVPR,RS. 664 Power ISATM III-E Version 2.05 5.2.9 Exception Syndrome Register The Exception Syndrome Register (ESR) is a 32-bit the bit or bits corresponding to the specific exception register. ESR bits are numbered 32 (most-significant that generated the interrupt is set, and all other ESR bit) to 63 (least-significant bit). The ESR provides a bits are cleared. Other interrupt types do not affect the syndrome to differentiate between the different kinds of contents of the ESR. The ESR does not need to be exceptions that can generate the same interrupt type. cleared by software. Figure 12 shows the bit definitions Upon the generation of one of these types of interrupts, for the ESR. Bit(s) Name Meaning Associated Interrupt Type 32:35 Implementation-dependent (Implementation-dependent) 36 PIL Illegal Instruction exception Program 37 PPR Privileged Instruction exception Program 38 PTR Trap exception Program 39 FP Floating-point operation Alignment Data Storage Data TLB Program 40 ST Store operation Alignment Data Storage Data TLB Error 41 Reserved 42 DLK0 (Implementation-dependent) (Implementation-dependent) 43 DLK1 (implementation-dependent) (Implementation-dependent) 44 AP Auxiliary Processor operation Alignment Data Storage Data TLB Program 45 PUO Unimplemented Operation exception Program 46 BO Byte Ordering exception Data Storage Instruction Storage 47 PIE Imprecise exception Program 48:55 Reserved 56 SPV Signal Processing operation [Category: Sig- Alignment nal Processing Engine] Data Storage Vector operation [Category: Vector] Data TLB Embedded Floating-point Data Embedded Floating-point Round SPE/Embedded Floating-point/Vector Unavailable 57 EPID External Process ID operation [Category: Alignment Embedded.External Process ID] Data Storage Data TLB 58 VLEMI VLE operation [Category: VLE] Alignment Data Storage Data TLB SPE/Embedded Floating-point/Vector Unavailable Embedded Floating-point Data Embedded Floating-point Round Instruction Storage Program System Call 59:61 Implementation-dependent (Implementation-dependent) 62 MIF Misaligned Instruction [Category: VLE] Instruction TLB Instruction Storage Figure 12. Exception Syndrome Register Definitions Chapter 5. Interrupts and Exceptions 665 Version 2.05 Programming Note The information provided by the ESR is not com- IVORi Interrupt plete. System software may also need to identify the type of instruction that caused the interrupt, IVOR0 Critical Input examine the TLB entry accessed by a data or IVOR1 Machine Check instruction storage access, as well as examine the IVOR2 Data Storage ESR to fully determine what exception or excep- IVOR3 Instruction Storage tions caused the interrupt. For example, a Data IVOR4 External Storage interrupt may be caused by both a Protec- IVOR5 Alignment tion Violation exception as well as a Byte Ordering IVOR6 Program exception. System software would have to look IVOR7 Floating-Point Unavailable beyond ESRBO, such as the state of MSRPR in IVOR8 System Call SRR1 and the page protection bits in the TLB entry IVOR9 Auxiliary Processor Unavailable accessed by the storage access, to determine IVOR10 Decrementer whether or not a Protection Violation also occurred. IVOR11 Fixed-Interval Timer Interrupt IVOR12 Watchdog Timer Interrupt IVOR13 Data TLB Error The contents of the ESR can be read into bits 32:63 of IVOR14 Instruction TLB Error register RT using mfspr RT,ESR, setting bits 0:31 of RT IVOR15 Debug to zero. The contents of bits 32:63 of register RS can be written into the ESR using mtspr ESR,RS. IVOR16 Reserved : IVOR31 5.2.10 Interrupt Vector Offset [Category: Signal Processing Engine] Registers [Category: Vector] The Interrupt Vector Offset Registers (IVORs) are 32- IVOR 32 SPE/Embedded Floating-Point/Vector bit registers. Interrupt Vector Offset Register bits are Unavailable Interrupt numbered 32 (most-significant bit) to 63 (least-signifi- [Category: SP.Embedded Float_*] cant bit). Bits 32:47 and bits 60:63 are reserved. An (IVORs 33 & 34 are required if any SP.Float_ Interrupt Vector Offset Register provides the quadword dependent category is supported.) index from the base address provided by the IVPR (see IVOR 33 Embedded Floating-Point Data Interrupt Section 5.2.8) for its respective interrupt. Interrupt Vec- IVOR 34 Embedded Floatg.-pt. round Interrupt tor Offset Registers 0 through 15 and 32-37 are pro- [Category: Embedded Performance Monitor] vided for the defined interrupts. SPR numbers corresponding to Interrupt Vector Offset Registers 16 IVOR 35 Embedded Performance Monitor Inter- through 31 are reserved. SPR numbers corresponding rupt to Interrupt Vector Offset Registers 38 through 63 are [Category: Embedded.Processor Control] allocated for implementation-dependent use. Figure 13 IVOR 36 Processor Doorbell Interrupt provides the assignments of specific Interrupt Vector IVOR 37 Processor Doorbell Critical Interrupt Offset Registers to specific interrupts. IVOR38 Implementation-dependent : IVOR63 Figure 13. Interrupt Vector Offset Register Assignments Bits 48:59 of the contents of IVORi can be read into bits 48:59 of register RT using mfspr RT,IVORi, setting bits 0:47 and bits 60:63 of GPR(RT) to zero. Bits 48:59 of the contents of register RS can be written into bits 48:59 of IVORi using mtspr IVORi,RS. 5.2.11 Machine Check Registers A set of Special Purpose Registers are provided to sup- port Machine Check interrupts. 666 Power ISATM III-E Version 2.05 5.2.11.1 Machine Check Save/Restore 5.2.11.3 Machine Check Syndrome Register 0 Register Machine Check Save/Restore Register 0 (MCSRR0) is MCSR (MCSR) is a 64-bit register that is used to a 64-bit register used to save machine state on record the cause of the Machine Check interrupt. The Machine Check interrupts, and to restore machine state specific definition of the contents of this register are when an rfmci is executed. When a Machine Check implementation-dependent (see the User Manual of the interrupt is taken, the MCSRR0 is set to the current or implementation). next instruction address. When rfmci is executed, The contents of MCSR can be read into register RT instruction execution continues at the address in using mfspr RT,MCSR. The contents of register RS MCSRR0. can be written into the MCSR using mtspr MCSR,RS. In general, MCSRR0 contains the address of an instruction that was executing or about to be executed when the Machine Check exception occurred. 5.2.12 External Proxy Register The contents of MCSRR0 when a Machine Check [Category: External Proxy] interrupt is taken are mode dependent, reflecting the The External Proxy Register (EPR) contains implemen- computation mode currently in use (specified by tation-dependent information related to an External MSRCM) and the computation mode entered for execu- Input interrupt when an External Input interrupt occurs. tion of the Machine Check interrupt (specified by The EPR is only considered valid from the time that the MSRICM). The contents of MCSRR0 upon Machine External Input Interrupt occurs until MSREE is set to 1 Check interrupt can be described as follows (assuming as the result of a mtmsr or a return from interrupt Addr is the address to be put into MCSRR0): instruction. if (MSRCM = 0) & (MSRICM = 0) The format of the EPR is shown below. then MCSRR0 32undefined || Addr32:63 if (MSRCM = 0) & (MSRICM = 1) EPR then MCSRR0 320 || Addr32:63 32 63 if (MSRCM = 1) & (MSRICM = 1) then MCSRR0 Addr0:63 if (MSRCM = 1) & (MSRICM = 0) then MCSRR0 unde- Figure 14. External Proxy Register fined When the External Input interrupt is taken, the contents The contents of MCSRR0 can be read into register RT of the EPR provide information related to the External using mfspr RT,MCSRR0. The contents of register RS Input Interrupt. can be written into MCSRR0 using mtspr MCSRR0,RS. Programming Note 5.2.11.2 Machine Check Save/Restore The EPR is provided for faster interrupt processing Register 1 as well as situations where an interrupt must be taken, but software must delay the resultant pro- Machine Check Save/Restore Register 1 (MCSRR1) is cessing for later. a 32-bit register used to save machine state on The EPR contains the vector from the interrupt con- Machine Check interrupts, and to restore machine state troller. The process of receiving the interrupt into when an rfmci is executed. When a Machine Check the EPR acknowledges the interrupt to the interrupt interrupt is taken, the contents of the MSR are placed controller. The method for enabling or disabling the into MCSRR1. When rfmci is executed, the contents of acknowledgment of the interrupt by placing the MCSRR1 are placed into the MSR. interrupt-related information in the EPR is imple- Bits of MCSRR1 that correspond to reserved bits in the mentation-dependent. If this acknowledgement is MSR are also reserved. disabled, then the EPR is set to 0 when the Exter- nal Input interrupt occurs. Programming Note A MSR bit that is reserved may be inadvertently modified by rfi/rfci/rfmci. The contents of MCSRR1 can be read into register RT using mfspr RT,MCSRR1. The contents of register RS can be written into the MCSRR1 using mtspr MCSRR1,RS. Chapter 5. Interrupts and Exceptions 667 Version 2.05 1 the execution of an instruction that is not imple- 5.3 Exceptions mented by the implementation (Illegal Instruction There are two kinds of exceptions, those caused exception or Unimplemented Operation exception directly by the execution of an instruction and those type of Program interrupt) caused by an asynchronous event. In either case, the 1 the execution of an auxiliary processor instruction exception may cause one of several types of interrupts when the auxiliary processor instruction is unavail- to be invoked. able (Auxiliary Processor Unavailable interrupt) Examples of exceptions that can be caused directly by 1 the execution of an instruction that causes an aux- the execution of an instruction include but are not lim- iliary processor enabled exception (Enabled ited to the following: exception type Program interrupt) 1 an attempt to execute a reserved-illegal instruction The invocation of an interrupt is precise, except that if (Illegal Instruction exception type Program inter- one of the imprecise modes for invoking the Floating- rupt) point Enabled Exception type Program interrupt is in 1 an attempt by an application program to execute a effect then the invocation of the Floating-point Enabled `privileged' instruction (Privileged Instruction Exception type Program interrupt may be imprecise. exception type Program interrupt) When the interrupt is invoked imprecisely, the except- ing instruction does not appear to complete before the 1 an attempt by an application program to access a next instruction starts (because one of the effects of the `privileged' Special Purpose Register (Privileged excepting instruction, namely the invocation of the Instruction exception type Program interrupt) interrupt, has not yet occurred). 1 an attempt by an application program to access a Special Purpose Register that does not exist (Unimplemented Operation Instruction exception 5.4 Interrupt Classification type Program interrupt) All interrupts, except for Machine Check, can be classi- 1 an attempt by a system program to access a Spe- fied as either Asynchronous or Synchronous. Indepen- cial Purpose Register that does not exist (bound- dent from this classification, all interrupts, including edly undefined results) Machine Check, can be classified into one of the follow- 1 the execution of a defined instruction using an ing classes: invalid form (Illegal Instruction exception type Pro- 1 Base gram interrupt, Unimplemented Operation excep- 1 Critical tion type Program interrupt, or Privileged 1 Machine Check Instruction exception type Program interrupt) 1 Debug[Category:Embedded.Enhanced Debug]. 1 an attempt to access a storage location that is either unavailable (Instruction TLB Error interrupt 5.4.1 Asynchronous Interrupts or Data TLB Error interrupt) or not permitted Asynchronous interrupts are caused by events that are (Instruction Storage interrupt or Data Storage independent of instruction execution. For asynchro- interrupt) nous interrupts, the address reported to the exception 1 an attempt to access storage with an effective handling routine is the address of the instruction that address alignment not supported by the implemen- would have executed next, had the asynchronous inter- tation (Alignment interrupt) rupt not occurred. 1 the execution of a System Call instruction (System Call interrupt) 5.4.2 Synchronous Interrupts 1 the execution of a Trap instruction whose trap con- Synchronous interrupts are those that are caused dition is met (Trap type Program interrupt) directly by the execution (or attempted execution) of 1 the execution of a floating-point instruction when instructions, and are further divided into two classes, floating-point instructions are unavailable (Float- precise and imprecise. ing-point Unavailable interrupt) Synchronous, precise interrupts are those that pre- 1 the execution of a floating-point instruction that cisely indicate the address of the instruction causing causes a floating-point enabled exception to exist the exception that generated the interrupt; or, for cer- (Enabled exception type Program interrupt) tain synchronous, precise interrupt types, the address 1 the execution of a defined instruction that is not of the immediately following instruction. implemented by the implementation (Illegal Synchronous, imprecise interrupts are those that may Instruction exception or Unimplemented Operation indicate the address of the instruction causing the exception type of Program interrupt) 668 Power ISATM III-E Version 2.05 exception that generated the interrupt, or some instruc- cution (except for its forcing the imprecise inter- tion after the instruction causing the exception. rupt). If the imprecise interrupt is forced by an msync or isync instruction, then SRR0 or CSRR0 may address either the msync or isync instruc- 5.4.2.1 Synchronous, Precise Inter- tion, or the following instruction. rupts 1 If the imprecise interrupt is not forced by either the When the execution or attempted execution of an context synchronizing mechanism or the execution instruction causes a synchronous, precise interrupt, the synchronizing mechanism, then the instruction following conditions exist at the interrupt point. addressed by SRR0 or CSRR0 may have been partially executed (see Section 5.7 on page 686). 1 SRR0, CSRR0, or DSRR0 [Category: Embed- 1 No instruction following the instruction addressed ded.Enhanced Debug] addresses either the by SRR0 or CSRR0 has executed. instruction causing the exception or the instruction immediately following the instruction causing the exception. Which instruction is addressed can be 5.4.3 Interrupt Classes determined from the interrupt type and status bits. 1 An interrupt is generated such that all instructions Interrupts can also be classified as base, critical, preceding the instruction causing the exception Machine Check, and Debug [Category: Embed- appear to have completed with respect to the exe- ded.Enhanced Debug]. cuting processor. However, some storage Interrupt classes other than the base class may accesses associated with these preceding instruc- demand immediate attention even if another class of tions may not have been performed with respect to interrupt is currently being processed and software has other processors and mechanisms. not yet had the opportunity to save the state of the 1 The instruction causing the exception may appear machine (i.e. return address and captured state of the not to have begun execution (except for causing MSR). For this reason, the interrupts are organized into the exception), may have been partially executed, a hierarchy (see Section 5.8). To enable taking a criti- or may have completed, depending on the inter- cal, Machine Check, or Debug [Category: Embed- rupt type. See Section 5.7 on page 686. ded.Enhanced Debug] interrupt immediately after a 1 Architecturally, no subsequent instruction has exe- base class interrupt occurs (i.e. before software has cuted beyond the instruction causing the excep- saved the state of the machine), these interrupts use tion. the Save/Restore Register pair CSRR0/CSRR1, MCSRR0/MCSRR1, or DSRR0/DSRR1 [Category: 5.4.2.2 Synchronous, Imprecise Inter- Embedded.Enhanced Debug], and base class inter- rupts use Save/Restore Register pair SRR0/SRR1. rupts When the execution or attempted execution of an instruction causes an imprecise interrupt, the following 5.4.4 Machine Check Interrupts conditions exist at the interrupt point. Machine Check interrupts are a special case. They are 1 SRR0 or CSRR0 addresses either the instruction typically caused by some kind of hardware or storage causing the exception or some instruction follow- subsystem failure, or by an attempt to access an invalid ing the instruction causing the exception that gen- address. A Machine Check may be caused indirectly by erated the interrupt. the execution of an instruction, but not be recognized 1 An interrupt is generated such that all instructions and/or reported until long after the processor has exe- preceding the instruction addressed by SRR0 or cuted past the instruction that caused the Machine CSRR0 appear to have completed with respect to Check. As such, Machine Check interrupts cannot the executing processor. properly be thought of as synchronous or asynchro- 1 If the imprecise interrupt is forced by the context nous, nor as precise or imprecise. The following gen- synchronizing mechanism, due to an instruction eral rules apply to Machine Check interrupts: that causes another exception that generates an 1. No instruction after the one whose address is interrupt (e.g., Alignment, Data Storage), then reported to the Machine Check interrupt handler in SRR0 addresses the interrupt-forcing instruction, MCSRR0 has begun execution. and the interrupt-forcing instruction may have been partially executed (see Section 5.7 on 2. The instruction whose address is reported to the page 686). Machine Check interrupt handler in MCSRR0, and 1 If the imprecise interrupt is forced by the execution all prior instructions, may or may not have com- synchronizing mechanism, due to executing an pleted successfully. All those instructions that are execution synchronizing instruction other than ever going to complete appear to have done so msync or isync, then SRR0 or CSRR0 addresses already, and have done so within the context exist- the interrupt-forcing instruction, and the interrupt- ing prior to the Machine Check interrupt. No further forcing instruction appears not to have begun exe- interrupt (other than possible additional Machine Chapter 5. Interrupts and Exceptions 669 Version 2.05 Check interrupts) will occur as a result of those where IVPR is the Interrupt Vector Prefix Register instructions. and IVORi is the Interrupt Vector Offset Register for that interrupt (see Figure 13 on page 666). The contents of the Interrupt Vector Prefix Register and 5.5 Interrupt Processing Interrupt Vector Offset Registers are indeterminate upon power-on reset, and must be initialized by Associated with each kind of interrupt is an interrupt system software using the mtspr instruction. vector, that is the address of the initial instruction that is executed when the corresponding interrupt occurs. Interrupts may not clear reservations obtained with Load and Reserve instructions. The operating system Interrupt processing consists of saving a small part of should do so at appropriate points, such as at process the processor's state in certain registers, identifying the switch. cause of the interrupt in another register, and continu- ing execution at the corresponding interrupt vector At the end of an interrupt handling routine, execution of location. When an exception exists that will cause an an rfi, rfdi [Category: Embedded.Enhanced Debug], interrupt to be generated and it has been determined rfmci, or rfci causes the MSR to be restored from the that the interrupt can be taken, the following actions are contents of SRR1, DSRR1 [Category: Embed- performed, in order: ded.Enhanced Debug], MCSRR1, or CSRR1, and instruction execution to resume at the address con- 1. SRR0, DSRR0 [Category: Embedded.Enhanced tained in SRR0, DSRR0 [Category: Embed- Debug], MCSRR0, or CSRR0 is loaded with an ded.Enhanced Debug], MCSRR0, or CSRR0, instruction address that depends on the interrupt; respectively. see the specific interrupt description for details. 2. The ESR is loaded with information specific to the Programming Note exception. Note that many interrupts can only be In general, at process switch, due to possible pro- caused by a single kind of exception event, and cess interlocks and possible data availability thus do not need nor use an ESR setting to indi- requirements, the operating system needs to con- cate to the cause of the interrupt was. sider executing the following. 3. SRR1, DSRR1 [Category: Embedded.Enhanced 1 stwcx. or stdcx., to clear the reservation if Debug], or MCSRR1, or CSRR1 is loaded with a one is outstanding, to ensure that a lwarx or copy of the contents of the MSR. ldarx in the "old" process is not paired with a stwcx. or stdcx. in the "new" process. 4. The MSR is updated as described below. The new 1 msync, to ensure that all storage operations of values take effect beginning with the first instruc- an interrupted process are complete with tion following the interrupt. MSR bits of particular respect to other processors before that pro- interest are the following. cess begins executing on another processor. 1 MSRWE,EE,PR,FP,FE0,FE1,IS,DS are set to 0 by 1 isync, rfi, rfdi [Category: Embed- all interrupts. ded.Enhanced Debug], rfmci, or rfci to ensure 1 MSRME is set to 0 by Machine Check inter- that the instructions in the "new" process exe- rupts and left unchanged by all other inter- cute in the "new" context. rupts. 1 MSRCE is set to 0 by critical class interrupts, Debug interrupts, and Machine Check inter- rupts, and is left unchanged by all other inter- rupts. 1 MSRDE is set to 0 by critical class interrupts unless Category E.ED is supported, by Debug interrupts, and by Machine Check interrupts, and is left unchanged by all other interrupts. 1 MSRCM is set to MSRICM. 1 Other supported MSR bits are left unchanged by all interrupts. See Section 2.2.1 for more detail on the definition of the MSR. 5. Instruction fetching and execution resumes, using the new MSR value, at a location specific to the interrupt. The location is IVPR0:47 || IVORi48:59 || 0b0000 670 Power ISATM III-E Version 2.05 Programming Note For instruction-caused interrupts, in some cases it may system supports, or by an instruction that is in be desirable for the operating system to emulate the a category that the implementation does not instruction that caused the interrupt, while in other support but is used by some programs that cases it may be desirable for the operating system not the operating system supports. to emulate the instruction. The following list, while not In general, the instruction should not be emulated if: complete, illustrates criteria by which decisions regard- ing emulation should be made. The list applies to gen- - The purpose of the instruction is to cause an eral execution environments; it does not necessarily interrupt. Example: System Call interrupt apply to special environments such as program debug- caused by sc. ging, processor bring-up, etc. - The interrupt is caused by a condition that is In general, the instruction should be emulated if: stated, in the instruction description, poten- tially to cause the interrupt. Example: Align- - The interrupt is caused by a condition for ment interrupt caused by lwarx for which the which the instruction description (including storage operand is not aligned. related material such as the introduction to the section describing the instruction) implies that - The program is attempting to perform a func- the instruction works correctly. Example: tion that it should not be permitted to perform. Alignment interrupt caused by lmw for which Example: Data Storage interrupt caused by the storage operand is not aligned, or by dcbz lwz for which the storage operand is in stor- or dcbzep for which the storage operand is in age that the program should not be permitted storage that is Write Through Required or to access. (If the function is one that the pro- Caching Inhibited. gram should be permitted to perform, the con- ditions that caused the interrupt should be - The instruction is an illegal instruction that corrected and the program re-dispatched such should appear, to the program executing it, as that the instruction will be re-executed. Exam- if it were supported by the implementation. ple: Data Storage interrupt caused by lwz for Example: Illegal Instruction type Program which the storage operand is in storage that interrupt caused by an instruction that has the program should be permitted to access been phased out of the architecture but is still but for which there currently is no TLB entry.) used by some programs that the operating Chapter 5. Interrupts and Exceptions 671 Version 2.05 5.6 Interrupt Definitions Table 15 provides a summary of each interrupt type, interrupt type and which Interrupt Vector Offset Regis- the various exception types that may cause that inter- ter is used to specify that interrupt type's vector rupt type, the classification of the interrupt, which ESR address. bits can be set, if any, which MSR bits can mask the Synchronous, Imprecise (Section 1.3.5 of Book I) Synchronous, Precise DBCR0/TCR Mask Bit Notes (see page 673) MSR Mask Bit(s) Asynchronous Category Critical ESR Page IVOR Interrupt Exception (See Note 5) IVOR0 Critical Input Critical Input x x CE E 1 674 IVOR1 Machine Check Machine Check ME E 2,4 674 IVOR2 Data Storage Access x [ST],[FP,AP,SPV] E 9 675 [VLEMI], [EPID] Load and Reserve or Store x [ST], [VLEMI] E 9 Conditional to `write-thru required' storage (W=1) Cache Locking x {DLK0,DLK1},[ST] E 8 [VLEMI] Byte Ordering x BO, [ST], E [FP,AP,SPV], [VLEMI], [EPID] IVOR3 Inst Storage Access x E 676 Byte Ordering x BO, [VLEMI] E Mismatched Instruction x BO, VLEMI EE E, 1 Storage (See Book VLE.)) VLE Misaligned Instruction x MIF EE E, 1 Storage (See Book VLE.) VLE IVOR4 External Input External Input x EE E 1 676 IVOR5 Alignment Alignment x [ST],[FP,AP,SPV] E 677 [EPID],[VLEMI] IVOR6 Program Illegal x PIL, [VLEMI] E 678 Privileged x PPR,[AP], E [VLEMI] Trap x PTR,[VLEMI] E FP Enabled x x FP, [PIE] FE0, E 6,7 FE1 AP Enabled x x AP E Unimplemented Op x PUO, [VLEMI] E 7 [FP,AP,SPV] IVOR7 FP Unavailable FP Unavailable x E 679 IVOR8 System Call System Call x [VLEMI] E 679 IVOR9 AP Unavailable AP Unavailable x E 679 IVOR10 Decrementer x EE DIE E 680 IVOR11 FIT x EE FIE E 680 IVOR12 Watchdog x x CE WIE E 10 680 IVOR13 Data TLB Error Data TLB Miss x [ST],[FP,AP,SPV] E 681 [VLEMI],[EPID] IVOR14 Inst TLB Error Inst TLB Miss x [MIF] E 681 672 Power ISATM III-E Version 2.05 Synchronous, Imprecise (Section 1.3.5 of Book I) Synchronous, Precise DBCR0/TCR Mask Bit Notes (see page 673) MSR Mask Bit(s) Asynchronous Category Critical ESR Page IVOR Interrupt Exception (See Note 5) IVOR15 Debug Trap x x DE IDM E 10 682 Inst Addr Compare x x DE IDM E 10 Data Addr Compare x x DE IDM E 10 Instruction Complete x x DE IDM E 3,10 Branch Taken x x DE IDM E 3,10 Return From Interrupt x x DE IDM E 10 Interrupt Taken x x DE IDM E 10 Uncond Debug Event x x DE IDM E.ED 10 Critical Interrupt Taken x DE IDM E.ED Critical Interrupt Return x DE IDM E.ED IVOR32 SPE/Embedded SPE Unavailable x SPV, [VLEMI] SPE 683 Floating-Point/Vector Unavailable Vector Unavailable SPV V IVOR33 Embedded Floating- Embedded Floating-Point x SPV, [VLEMI] SP.F* 684 Point Data Data IVOR34 Embedded Floating- Embedded Floating-Point x SPV, [VLEMI] SP.F* 684 Point Round Round IVOR35 Embedded Perfor- Embedded Performance x E.PM mance Monitor Monitor IVOR36 Processor Doorbell Processor Doorbell x EE E.PC IVOR37 Processor Critical Processor Critical Doorbell x x CE E.PC Doorbell Figure 15. Interrupt and Exception Types Figure 15 Notes 4. Machine Check status information is commonly provided as part of the system implementation, but 1. Although it is not specified, it is common for sys- is implementation-dependent. tem implementations to provide, as part of the interrupt controller, independent mask and status 5. In general, when an interrupt causes a particular bits for the various sources of Critical Input and ESR bit or bits to be set (or cleared) as indicated in External Input interrupts. the table, it also causes all other ESR bits to be cleared. There may be special rules regarding the 2. Machine Check interrupts are a special case and handling of implementation-specific ESR bits. are not classified as asynchronous nor synchro- nous. See Section 5.4.4 on page 669. Legend: 3. The Instruction Complete and Branch Taken [xxx] means ESRxxx could be set debug events are only defined for MSRDE=1 when [xxx,yyy] means either ESRxxx or ESRyyy in Internal Debug Mode (DBCR0IDM=1). In other may be set, but never both words, when in Internal Debug Mode with MSRDE=0, then Instruction Complete and Branch (xxx,yyy) means either ESRxxx or ESRyyy Taken debug events cannot occur, and no DBSR will be set, but never both status bits are set and no subsequent imprecise {xxx,yyy} means either ESRxxx or ESRyyy will Debug interrupt will occur (see Section 8.4 on be set, or possibly both page 704). xxx means ESRxxx is set Chapter 5. Interrupts and Exceptions 673 Version 2.05 6. The precision of the Floating-point Enabled Excep- All other defined MSR bits set to 0. tion type Program interrupt is controlled by the Instruction execution resumes at address IVPR0:47 || MSRFE0,FE1 bits. When MSRFE0,FE1=0b01 or IVOR048:59||0b0000. 0b10, the interrupt may be imprecise. When such a Program interrupt is taken, if the address saved in SRR0 is not the address of the instruction that Programming Note caused the exception (i.e. the instruction that Software is responsible for taking any action(s) that caused FPSCRFEX to be set to 1), ESRPIE is set to are required by the implementation in order to clear 1. When MSRFE0,FE1=0b11, the interrupt is pre- any Critical Input exception status prior to re- cise. When MSRFE0,FE1=0b00, the interrupt is enabling MSRCE in order to avoid another, redun- masked, and the interrupt will subsequently occur dant Critical Input interrupt. imprecisely if and when Floating-point Enabled Exception type Program interrupts are enabled by setting either or both of MSRFE0,FE1, and will also 5.6.2 Machine Check Interrupt cause ESRPIE to be set to 1. See Section 5.6.7. Also, exception status on the exact cause is avail- A Machine Check interrupt occurs when no higher pri- able in the Floating-Point Status and Control Reg- ority exception exists (see Section 5.9 on page 689), a ister (see Section 4.2.2 and Section 4.4 of Book I). Machine Check exception is presented to the interrupt mechanism, and MSRME=1. The specific cause or The precision of the Auxiliary Processor Enabled causes of Machine Check exceptions are implementa- Exception type Program interrupt is implementa- tion-dependent, as are the details of the actions taken tion-dependent. on a Machine Check interrupt. 7. Auxiliary Processor exception status is commonly If the Machine Check Extension is implemented, provided as part of the implementation. MCSRR0, MCSRR1, and MCSR are set, otherwise 8. Cache locking and cache locking exceptions are CSRR0, CSRR1, and ESR are set. The registers are implementation-dependent. updated as follows: 9. Software must examine the instruction and the CSRR0/MCSRR0 subject TLB entry to determine the exact cause of Set to an instruction address. As closely as the interrupt. possible, set to the effective address of an instruction that was executing or about to 10. If the Embedded.Enhanced Debug category is be executed when the Machine Check enabled, this interrupt is not a critical interrupt. exception occurred. DSRR0 and DSRR1 are used instead of CSRR0 and CSRR1. CSRR1/MCSRR1 Set to the contents of the MSR at the time of the interrupt. 5.6.1 Critical Input Interrupt MSR A Critical Input interrupt occurs when no higher priority CM MSRCM is set to MSRICM. exception exists (see Section 5.9 on page 689), a Criti- DE Unchanged if category E.ED is supported; cal Input exception is presented to the interrupt mecha- otherwise set to 0. nism, and MSRCE=1. While the specific definition of a All other defined MSR bits set to 0. Critical Input exception is implementation-dependent, it would typically be caused by the activation of an asyn- ESR/MCSR chronous signal that is part of the system. Also, imple- Implementation-dependent. mentations may provide an alternative means (in Instruction execution resumes at address IVPR0:47 || addition to MSRCE) for masking the Critical Input inter- IVOR148:59||0b0000. rupt. CSRR0, CSRR1, and MSR are updated as follows: Programming Note CSRR0 Set to the effective address of the next If a Machine Check interrupt is caused by an error instruction to be executed. in the storage subsystem, the storage subsystem may return incorrect data, that may be placed into CSRR1 Set to the contents of the MSR at the time registers and/or on-chip caches. of the interrupt. MSR CM MSRCM is set to MSRICM. ME, ICM Unchanged. DE Unchanged if category E.ED is supported; otherwise set to 0 674 Power ISATM III-E Version 2.05 Cache Locking exception Programming Note On implementations on which a Machine Check A Cache Locking exception may occur when the locked interrupt can be caused by referring to an invalid state of one or more cache lines has the potential to be real address, executing a dcbz, dcbzep, or dcba altered. This exception is implementation-dependent. instruction can cause a delayed Machine Check Storage Synchronization exception interrupt by establishing in the data cache a block that is associated with an invalid real address. See A Storage Synchronization exception will occur when Section 3.3 of Book II. A Machine Check interrupt an attempt is made to execute a Load and Reserve or can eventually occur if and when a subsequent Store Conditional instruction from or to a location that is attempt is made to write that block to main storage, Write Through Required or Caching Inhibited (if the for example as the result of executing an instruc- interrupt does not occur then the instruction executes tion that causes a cache miss for which the block is correctly: see Section 3.4.2 of Book II). the target for replacement or as the result of exe- cuting a dcbst, dcbstep, dcbf, or dcbfep instruc- If a stwcx. or stdcx. would not perform its store in the tion. absence of a Data Storage interrupt, and either (a) the specified effective address refers to storage that is Write Through Required or Caching Inhibited, or (b) a non-conditional Store to the specified effective address 5.6.3 Data Storage Interrupt would cause a Data Storage interrupt, it is implementa- A Data Storage interrupt may occur when no higher pri- tion-dependent whether a Data Storage interrupt ority exception exists (see Section 5.9 on page 689) occurs. and a Data Storage exception is presented to the inter- Instructions lswx or stswx with a length of zero, icbt, rupt mechanism. A Data Storage exception is caused dcbt, dcbtep, dcbtst, dcbtstep, or dcba cannot cause when any of the following exceptions arises during exe- a Data Storage interrupt, regardless of the effective cution of an instruction: address. Read Access Control exception Programming Note A Read Access Control exception is caused when one The icbi, icbiep, and icbt instructions are treated of the following conditions exist. as Loads from the addressed byte with respect to 1 While in user mode (MSRPR=1), a Load or `load- address translation and protection. These Instruc- class' Cache Management instruction attempts to tion Cache Management instructions use MSRDS, access a location in storage that is not user mode not MSRIS, to determine translation for their oper- read enabled (i.e. page access control bit UR=0). ands. Instruction Storage exceptions and Instruc- 1 While in supervisor mode (MSRPR=0), a Load or tion TLB Miss exceptions are associated with the `load-class' Cache Management instruction `fetching' of instructions not with the `execution' of attempts to access a location in storage that is not instructions. Data Storage exceptions and Data supervisor mode read enabled (i.e. page access TLB Miss exceptions are associated with the `exe- control bit SR=0). cution' of Instruction Cache Management instruc- tions. Write Access Control exception When a Data Storage interrupt occurs, the processor A Write Access Control exception is caused when one suppresses the execution of the instruction causing the of the following conditions exist. Data Storage exception. 1 While in user mode (MSRPR=1), a Store or `store- SRR0, SRR1, MSR, DEAR, and ESR are updated as class' Cache Management instruction attempts to follows: access a location in storage that is not user mode write enabled (i.e. page access control bit UW=0). SRR0 Set to the effective address of the instruc- 1 While in supervisor mode (MSRPR=0), a Store or tion causing the Data Storage interrupt. `store-class' Cache Management instruction SRR1 Set to the contents of the MSR at the time attempts to access a location in storage that is not of the interrupt. supervisor mode write enabled (i.e. page access control bit SW=0). MSR Byte Ordering exception CM MSRCM is set to MSRICM. CE, ME, A Byte Ordering exception may occur when the imple- DE, ICM Unchanged. mentation cannot perform the data storage access in All other defined MSR bits set to 0. the byte order specified by the Endian storage attribute of the page being accessed. Chapter 5. Interrupts and Exceptions 675 Version 2.05 DEAR Set to the effective address of a byte that is is not user mode execute enabled (i.e. page both within the range of the bytes being access control bit UX=0). accessed by the Storage Access or Cache 1 While in supervisor mode (MSRPR=0), an instruc- Management instruction, and within the tion fetch attempts to access a location in storage page whose access caused the Data Stor- that is not supervisor mode execute enabled (i.e. age exception. page access control bit SX=0). ESR Byte Ordering exception FP Set to 1 if the instruction causing the inter- A Byte Ordering exception may occur when the imple- rupt is a floating-point load or store; other- mentation cannot perform the instruction fetch in the wise set to 0. byte order specified by the Endian storage attribute of ST Set to 1 if the instruction causing the inter- the page being accessed. rupt is a Store or `store-class' Cache Man- agement instruction; otherwise set to 0. When an Instruction Storage interrupt occurs, the pro- DLK0:1 Set to an implementation-dependent value cessor suppresses the execution of the instruction due to a Cache Locking exception causing causing the Instruction Storage exception. the interrupt. SRR0, SRR1, MSR, and ESR are updated as follows: AP Set to 1 if the instruction causing the inter- rupt is an Auxiliary Processor load or store; SRR0 Set to the effective address of the instruc- otherwise set to 0. tion causing the Instruction Storage inter- BO Set to 1 if the instruction caused a Byte rupt. Ordering exception; otherwise set to 0. SRR1 Set to the contents of the MSR at the time SPV Set to 1 if the instruction causing the inter- of the interrupt. rupt is a SPE operation or a Vector opera- tion; otherwise set to 0. MSR VLEMI Set to 1 if the instruction causing the inter- CM MSRCM is set to MSRICM. rupt resides in VLE storage. CE, ME, EPID Set to 1 if the instruction causing the inter- DE, ICM Unchanged. rupt is an External Process ID instruction; otherwise set to 0. All other defined MSR bits set to 0. All other defined ESR bits are set to 0. ESR BO Set to 1 if the instruction fetch caused a Programming Note Byte Ordering exception; otherwise set to Read and Write Access Control and Byte Ordering 0. exceptions are not mutually exclusive. Even if VLEMI Set to 1 if the instruction causing the inter- ESRBO is set, system software must also examine rupt resides in VLE storage. the TLB entry accessed by the data storage access All other defined ESR bits are set to 0. to determine whether or not a Read Access Control or Write Access Control exception may have also occurred. Programming Note Execute Access Control and Byte Ordering excep- Instruction execution resumes at address IVPR0:47 || tions are not mutually exclusive. Even if ESRBO is IVOR248:59||0b0000. set, system software must also examine the TLB entry accessed by the instruction fetch to deter- mine whether or not an Execute Access Control 5.6.4 Instruction Storage Interrupt exception may have also occurred. An Instruction Storage interrupt occurs when no higher Instruction execution resumes at address IVPR0:47 || priority exception exists (see Section 5.9 on page 689) IVOR348:59||0b0000. and an Instruction Storage exception is presented to the interrupt mechanism. An Instruction Storage excep- tion is caused when any of the following exceptions 5.6.5 External Input Interrupt arises during execution of an instruction: An External Input interrupt occurs when no higher prior- Execute Access Control exception ity exception exists (see Section 5.9 on page 689), an An Execute Access Control exception is caused when External Input exception is presented to the interrupt one of the following conditions exist. mechanism, and MSREE=1. While the specific defini- tion of an External Input exception is implementation- 1 While in user mode (MSRPR=1), an instruction dependent, it would typically be caused by the activa- fetch attempts to access a location in storage that tion of an asynchronous signal that is part of the pro- 676 Power ISATM III-E Version 2.05 cessing system. Also, implementations may provide an execution means setting each byte of the block in main alternative means (in addition to MSREE) for masking storage to 0x00.) the External Input interrupt. Programming Note SRR0, SRR1, and MSR are updated as follows: The architecture does not support the use of an SRR0 Set to the effective address of the next unaligned effective address by Load and Reserve instruction to be executed. and Store Conditional instructions. If an Alignment SRR1 Set to the contents of the MSR at the time interrupt occurs because one of these instructions of the interrupt. specifies an unaligned effective address, the Align- ment interrupt handler must not attempt to emulate MSR the instruction, but instead should treat the instruc- CM MSRCM is set to MSRICM. tion as a programming error. CE, ME, DE, ICM Unchanged. When an Alignment interrupt occurs, the processor suppresses the execution of the instruction causing the All other defined MSR bits set to 0. Alignment exception. Instruction execution resumes at address IVPR0:47 || SRR0, SRR1, MSR, DEAR, and ESR are updated as IVOR448:59||0b0000. follows: Programming Note SRR0 Set to the effective address of the instruc- Software is responsible for taking whatever tion causing the Alignment interrupt. action(s) are required by the implementation in SRR1 Set to the contents of the MSR at the time order to clear any External Input exception status of the interrupt. prior to re-enabling MSREE in order to avoid another, redundant External Input interrupt. MSR CM MSRCM is set to MSRICM. CE, ME, 5.6.6 Alignment Interrupt DE, ICM Unchanged. An Alignment interrupt occurs when no higher priority All other defined MSR bits set to 0. exception exists (see Section 5.9 on page 689) and an DEAR Set to the effective address of a byte that is Alignment exception is presented to the interrupt mech- both within the range of the bytes being anism. An Alignment exception may be caused when accessed by the Storage Access or Cache the implementation cannot perform a data storage Management instruction, and within the access for one of the following reasons: page whose access caused the Alignment exception. 1 The operand of a Load or Store is not aligned. 1 The instruction is a Move Assist, Load Multiple or ESR Store Multiple. FP Set to 1 if the instruction causing the inter- 1 The operand of dcbz or dcbzep is in storage that rupt is a floating-point load or store; other- is Write Through Required or Caching Inhibited, or wise set to 0. one of these instructions is executed in an imple- ST Set to 1 if the instruction causing the inter- mentation that has either no data cache or a Write rupt is a Store; otherwise set to 0. Through data cache or the line addressed by the AP Set to 1 if the instruction causing the inter- instruction cannot be established in the cache rupt is an Auxiliary Processor load or store; because the cache is disabled or locked. otherwise set to 0. 1 The operand of a Store, except Store Conditional, SPV Set to 1 if the instruction causing the inter- is in storage that is Write-Through Required. rupt is a SPE operation or a Vector opera- For lmw and stmw with an operand that is not word- tion; otherwise set to 0. aligned, and for Load and Reserve and Store Condi- VLEMI Set to 1 if the instruction causing the inter- tional instructions with an operand that is not aligned, rupt resides in VLE storage. an implementation may yield boundedly undefined EPID Set to 1 if the instruction causing the inter- results instead of causing an Alignment interrupt. A rupt is an External Process ID instruction; Store Conditional to Write Through Required storage otherwise set to 0. may either cause a Data Storage interrupt, cause an All other defined ESR bits are set to 0. Alignment interrupt, or correctly execute the instruction. For all other cases listed above, an implementation Instruction execution resumes at address IVPR0:47 || may execute the instruction correctly instead of causing IVOR548:59||0b0000. an Alignment interrupt. (For dcbz or dcbzep, `correct' Chapter 5. Interrupts and Exceptions 677 Version 2.05 5.6.7 Program Interrupt A Trap exception occurs when any of the conditions specified in a Trap instruction are met and the excep- A Program interrupt occurs when no higher priority tion is not also enabled as a Debug interrupt. If enabled exception exists (see Section 5.9 on page 689), a Pro- as a Debug interrupt (i.e. DBCR0TRAP=1, gram exception is presented to the interrupt mecha- DBCR0IDM=1, and MSRDE=1), then a Debug interrupt nism, and, for Floating-point Enabled exception, will be taken instead of the Program interrupt. MSRFE0,FE1 are non-zero. A Program exception is caused when any of the following exceptions arises Unimplemented Operation exception during execution of an instruction: An Unimplemented Operation exception may occur when execution is attempted of a defined instruction Floating-point Enabled exception that is not implemented by the implementation. Other- A Floating-point Enabled exception is caused when wise an Illegal Instruction exception occurs. FPSCRFEX is set to 1 by the execution of a floating- An Unimplemented Operation exception may also point instruction that causes an enabled exception, occur when the processor is in 32-bit mode and execu- including the case of a Move To FPSCR instruction that tion is attempted of an instruction that is part of the 64- causes an exception bit and the corresponding enable Bit category. Otherwise the instruction executes nor- bit both to be 1. Note that in this context, the term mally. `enabled exception' refers to the enabling provided by control bits in the Floating-Point Status and Control SRR0, SRR1, MSR, and ESR are updated as follows: Register. See Section 4.2.2 of Book I. SRR0 For all Program interrupts except an Auxiliary Processor Enabled exception Enabled exception when in one of the imprecise modes (see Section 2.2.1 on The cause of an Auxiliary Processor Enabled exception page 611) or when a disabled exception is is implementation-dependent. subsequently enabled, set to the effective address of the instruction that caused the Illegal Instruction exception Program interrupt. An Illegal Instruction exception does occur when exe- For an imprecise Enabled exception, set to cution is attempted of any of the following kinds of the effective address of the excepting instructions. instruction or to the effective address of 1 a reserved-illegal instruction some subsequent instruction. If it points to 1 when MSRPR=1 (user mode), an mtspr or mfspr a subsequent instruction, that instruction that specifies an spr value with spr5=0 (user-mode has not been executed, and ESRPIE is set accessible) that represents an unimplemented to 1. If a subsequent instruction is an Special Purpose Register msync or isync, SRR0 will point at the msync or isync instruction, or at the fol- An Illegal Instruction exception may occur when execu- lowing instruction. tion is attempted of any of the following kinds of instruc- tions. If the exception does not occur, the alternative is If FPSCRFEX=1 but both MSRFE0=0 and shown in parentheses. MSRFE1=0, an Enabled exception type Program interrupt will occur imprecisely 1 an instruction that is in invalid form (boundedly prior to or at the next synchronizing event if undefined results) these MSR bits are altered by any instruc- 1 an lswx instruction for which register RA or regis- tion that can set the MSR so that the ter RB is in the range of registers to be loaded expression (boundedly undefined results) 1 a defined instruction that is not implemented by the (MSRFE0 | MSRFE1) & FPSCRFEX implementation (Unimplemented Operation excep- is 1. When this occurs, SRR0 is loaded with tion) the address of the instruction that would have executed next, not with the address of Privileged Instruction exception the instruction that modified the MSR caus- A Privileged Instruction exception occurs when ing the interrupt, and ESRPIE is set to 1. MSRPR=1 and execution is attempted of any of the fol- SRR1 Set to the contents of the MSR at the time lowing kinds of instructions. of the interrupt. 1 a privileged instruction 1 an mtspr or mfspr instruction that specifies an spr MSR value with spr5=1 CM MSRCM is set to MSRICM. CE, ME, Trap exception DE, ICM Unchanged. 678 Power ISATM III-E Version 2.05 All other defined MSR bits set to 0. Instruction execution resumes at address IVPR0:47 || IVOR748:59||0b0000. ESR PIL Set to 1 if an Illegal Instruction exception type Program interrupt; otherwise set to 0 5.6.9 System Call Interrupt PPR Set to 1 if a Privileged Instruction exception A System Call interrupt occurs when no higher priority type Program interrupt; otherwise set to 0 exception exists (see Section 5.9 on page 689) and a PTR Set to 1 if a Trap exception type Program System Call (sc) instruction is executed. interrupt; otherwise set to 0 PUO Set to 1 if an Unimplemented Operation SRR0, SRR1, and MSR are updated as follows: exception type Program interrupt; other- SRR0 Set to the effective address of the instruc- wise set to 0 tion after the sc instruction. FP Set to 1 if the instruction causing the inter- rupt is a floating-point instruction; otherwise SRR1 Set to the contents of the MSR at the time set to 0. of the interrupt. PIE Set to 1 if a Floating-point Enabled excep- tion type Program interrupt, and the MSR address saved in SRR0 is not the address CM MSRCM is set to MSRICM. of the instruction causing the exception (i.e. VLEMI Set to 1 if the instruction causing the inter- the instruction that caused FPSCRFEX to rupt resides in VLE storage. be set); otherwise set to 0. CE, ME, AP Set to 1 if the instruction causing the inter- DE, ICM Unchanged. rupt is an Auxiliary Processor instruction; All other defined MSR bits set to 0. otherwise set to 0. SPV Set to 1 if the instruction causing the inter- Instruction execution resumes at address IVPR0:47 || rupt is a SPE operation or a Vector opera- IVOR848:59||0b0000. tion; otherwise set to 0. VLEMI Set to 1 if the instruction causing the inter- rupt resides in VLE storage. 5.6.10 Auxiliary Processor All other defined ESR bits are set to 0. Unavailable Interrupt Instruction execution resumes at address IVPR0:47 || An Auxiliary Processor Unavailable interrupt occurs IVOR648:59||0b0000. when no higher priority exception exists (see Section 5.9 on page 689), an attempt is made to exe- cute an Auxiliary Processor instruction (including Auxil- 5.6.8 Floating-Point Unavailable iary Processor loads, stores, and moves), the target Auxiliary Processor is present on the implementation, Interrupt and the Auxiliary Processor is configured as unavail- A Floating-Point Unavailable interrupt occurs when no able. Details of the Auxiliary Processor, its instruction higher priority exception exists (see Section 5.9 on set, and its configuration are implementation-depen- page 689), an attempt is made to execute a floating- dent. See User's Manual for the implementation. point instruction (i.e. any instruction listed in When an Auxiliary Processor Unavailable interrupt Section 4.6 of Book I), and MSRFP=0. occurs, the processor suppresses the execution of the When a Floating-Point Unavailable interrupt occurs, the instruction causing the Auxiliary Processor Unavailable processor suppresses the execution of the instruction interrupt. causing the Floating-Point Unavailable interrupt. Registers SRR0, SRR1, and MSR are updated as fol- SRR0, SRR1, and MSR are updated as follows: lows: SRR0 Set to the effective address of the instruc- SRR0 Set to the effective address of the instruc- tion that caused the interrupt. tion that caused the interrupt. SRR1 Set to the contents of the MSR at the time SRR1 Set to the contents of the MSR at the time of the interrupt. of the interrupt. MSR MSR CM MSRCM is set to MSRICM. CM MSRCM is set to MSRICM. CE, ME, CE, ME, DE, ICM Unchanged. DE, ICM Unchanged. All other defined MSR bits set to 0. All other defined MSR bits set to 0. Chapter 5. Interrupts and Exceptions 679 Version 2.05 Instruction execution resumes at address IVPR0:47 || Programming Note IVOR948:59||0b0000. MSREE also enables the External Input and Decre- menter interrupts. 5.6.11 Decrementer Interrupt SRR0, SRR1, MSR, and TSR are updated as follows: A Decrementer interrupt occurs when no higher priority exception exists (see Section 5.9 on page 689), a Dec- SRR0 Set to the effective address of the next rementer exception exists (TSRDIS=1), and the inter- instruction to be executed. rupt is enabled (TCRDIE=1 and MSREE=1). See SRR1 Set to the contents of the MSR at the time Section 7.3 on page 697. of the interrupt. Programming Note MSR MSREE also enables the External Input and Fixed- CM MSRCM is set to MSRICM. Interval Timer interrupts. CE, ME, DE, ICM Unchanged. SRR0, SRR1, MSR, and TSR are updated as follows: All other defined MSR bits set to 0. SRR0 Set to the effective address of the next instruction to be executed. TSR (See Section 7.5.1 on page 700.) SRR1 Set to the contents of the MSR at the time FIS Set to 1 of the interrupt. Instruction execution resumes at address IVPR0:47 || IVOR1148:59||0b0000. MSR CM MSRCM is set to MSRICM. Programming Note CE, ME, Software is responsible for clearing the Fixed-Inter- DE, ICM Unchanged. val Timer exception status prior to re-enabling the All other defined MSR bits set to 0. MSREE bit in order to avoid another redundant Fixed-Interval Timer interrupt. To clear the Fixed- TSR (See Section 7.5.1 on page 700.) Interval Timer exception, the interrupt handling rou- DIS Set to 1. tine must clear TSRFIS. Clearing is done by writing a word to TSR using mtspr with a 1 in any bit posi- Instruction execution resumes at address IVPR0:47 || tion that is to be cleared and 0 in all other bit posi- IVOR1048:59||0b0000. tions. The write-data to the TSR is not direct data, but a mask. A 1 causes the bit to be cleared, and a Programming Note 0 has no effect. Software is responsible for clearing the Decre- menter exception status prior to re-enabling the MSREE bit in order to avoid another redundant 5.6.13 Watchdog Timer Interrupt Decrementer interrupt. To clear the Decrementer exception, the interrupt handling routine must clear A Watchdog Timer interrupt occurs when no higher pri- TSRDIS. Clearing is done by writing a word to TSR ority exception exists (see Section 5.9 on page 689), a using mtspr with a 1 in any bit position that is to be Watchdog Timer exception exists (TSRWIS=1), and the cleared and 0 in all other bit positions. The write- interrupt is enabled (i.e. TCRWIE=1 and MSRCE=1). data to the TSR is not direct data, but a mask. A 1 See Section 7.7 on page 701. causes the bit to be cleared, and a 0 has no effect. Programming Note MSRCE also enables the Critical Input interrupt. 5.6.12 Fixed-Interval Timer Inter- rupt CSRR0, CSRR1, MSR, and TSR are updated as fol- lows: A Fixed-Interval Timer interrupt occurs when no higher CSRR0 Set to the effective address of the next priority exception exists (see Section 5.9 on page 689), instruction to be executed. a Fixed-Interval Timer exception exists (TSRFIS=1), and the interrupt is enabled (TCRFIE=1 and MSREE=1). CSRR1 Set to the contents of the MSR at the time See Section 7.6 on page 700. of the interrupt. MSR CM MSRCM is set to MSRICM. ME, ICM, 680 Power ISATM III-E Version 2.05 DE Unchanged. Management instruction, and within the page whose access caused the Data TLB All other defined MSR bits set to 0. Error exception. TSR (See Section 7.5.1 on page 700.) WIS Set to 1. ESR ST Set to 1 if the instruction causing the inter- Instruction execution resumes at address IVPR0:47 || rupt is a Store, dcbi, dcbz, or dcbzep IVOR1248:59||0b0000. instruction; otherwise set to 0. FP Set to 1 if the instruction causing the inter- Programming Note rupt is a floating-point load or store; other- Software is responsible for clearing the Watchdog wise set to 0. Timer exception status prior to re-enabling the AP Set to 1 if the instruction causing the inter- MSRCE bit in order to avoid another redundant rupt is an Auxiliary Processor load or store; Watchdog Timer interrupt. To clear the Watchdog otherwise set to 0. Timer exception, the interrupt handling routine SPV Set to 1 if the instruction causing the inter- must clear TSRWIS. Clearing is done by writing a rupt is a SPE operation or a Vector opera- word to TSR using mtspr with a 1 in any bit posi- tion; otherwise set to 0. tion that is to be cleared and 0 in all other bit posi- VLEMI Set to 1 if the instruction causing the inter- tions. The write-data to the TSR is not direct data, rupt resides in VLE storage. but a mask. A 1 causes the bit to be cleared, and a EPID Set to 1 if the instruction causing the inter- 0 has no effect. rupt is an External Process ID instruction; otherwise set to 0. All other defined ESR bits are set to 0. 5.6.14 Data TLB Error Interrupt Instruction execution resumes at address IVPR0:47 || A Data TLB Error interrupt occurs when no higher prior- IVOR1348:59||0b0000. ity exception exists (see Section 5.9 on page 689) and any of the following Data TLB Error exceptions is pre- sented to the interrupt mechanism. 5.6.15 Instruction TLB Error Inter- TLB Miss exception rupt Caused when the virtual address associated with a An Instruction TLB Error interrupt occurs when no data storage access does not match any valid entry in higher priority exception exists (see Section 5.9 on the TLB as specified in Section 4.7.2 on page 643. page 689) and any of the following Instruction TLB Error exceptions is presented to the interrupt mecha- If a stwcx. or stdcx. would not perform its store in the nism. absence of a Data Storage interrupt, and a non-condi- tional Store to the specified effective address would TLB Miss exception cause a Data Storage interrupt, it is implementation- dependent whether a Data Storage interrupt occurs. Caused when the virtual address associated with an instruction fetch does not match any valid entry in the When a Data TLB Error interrupt occurs, the processor TLB as specified in Section 4.7.2 on page 643. suppresses the execution of the instruction causing the Data TLB Error interrupt. When an Instruction TLB Error interrupt occurs, the pro- cessor suppresses the execution of the instruction SRR0, SRR1, MSR, DEAR and ESR are updated as causing the Instruction TLB Miss exception. follows: SRR0, SRR1, and MSR are updated as follows: SRR0 Set to the effective address of the instruc- tion causing the Data TLB Error interrupt SRR0 Set to the effective address of the instruc- tion causing the Instruction TLB Error inter- SRR1 Set to the contents of the MSR at the time rupt. of the interrupt. SRR1 Set to the contents of the MSR at the time MSR of the interrupt. CM MSRCM is set to MSRICM. CE, ME, DE, ICM Unchanged. MSR CM MSRCM is set to MSRICM. All other defined MSR bits set to 0. CE, ME, DEAR Set to the effective address of a byte that is DE, ICM Unchanged. both within the range of the bytes being All other defined MSR bits set to 0. accessed by the Storage Access or Cache Chapter 5. Interrupts and Exceptions 681 Version 2.05 Instruction execution resumes at address IVPR0:47 || 5.6.16 Debug Interrupt IVOR1448:59||0b0000. A Debug interrupt occurs when no higher priority exception exists (see Section 5.9 on page 689), a Debug exception exists in the DBSR, and Debug inter- rupts are enabled (DBCR0IDM=1 and MSRDE=1). A Debug exception occurs when a Debug Event causes a corresponding bit in the DBSR to be set. See Section 8.5. If the Embedded.Enhanced Debug category is not sup- ported or is supported and is not enabled, CSRR0, CSRR1, MSR, and DBSR are updated as follows. If the Embedded.Enhanced Debug category is supported and is enabled, DSRR0 and DSRR1 are updated as specified below and CSRR0 and CSRR1 are not changed. The means by which the Embed- ded.Enhanced Debug category is enabled is imple- mentation-dependent. CSRR0 or DSRR0 [Category: Embedded.Enhanced Debug] For Debug exceptions that occur while Debug interrupts are enabled (DBCR0IDM=1 and MSRDE=1), CSRR0 is set as follows: 1 For Instruction Address Compare (IAC1, IAC2, IAC3, IAC4), Data Address Compare (DAC1R, DAC1W, DAC2R, DAC2W), Data Value Com- pare (DVC1, DVC2), Trap (TRAP), or Branch Taken (BRT) debug excep- tions, set to the address of the instruc- tion causing the Debug interrupt. 1 For Instruction Complete (ICMP) debug exceptions, set to the address of the instruction that would have exe- cuted after the one that caused the Debug interrupt. 1 For Unconditional Debug Event (UDE) debug exceptions, set to the address of the instruction that would have exe- cuted next if the Debug interrupt had not occurred. 1 For Interrupt Taken (IRPT) debug exceptions, set to the interrupt vector value of the interrupt that caused the Interrupt Taken debug event. 1 For Return From Interrupt (RET) debug exceptions, set to the address of the rfi instruction that caused the Debug interrupt. 1 For Critical Interrupt Taken (CRPT) debug exceptions, DSRR0 is set to the address of the first instruction of the critical interrupt handler. CSRR0 is unaffected. 1 For Critical Interrupt Return (CRET) debug exceptions, DSRR0 is set to the address of the rfci instruction that 682 Power ISATM III-E Version 2.05 caused the Debug interrupt. See 5.6.17 SPE/Embedded Floating- Section 8.4.10, "Critical Interrupt Return Debug Event [Category: Point/Vector Unavailable Interrupt Embedded.Enhanced Debug]". [Categories: SPE.Embedded Float For Debug exceptions that occur while Scalar Double, SPE.Embedded Debug interrupts are disabled (DBCR0IDM=0 or MSRDE=0), a Debug Float Vector, Vector] interrupt will occur at the next synchroniz- The SPE/Embedded Floating-Point/Vector Unavail- ing event if DBCR0IDM and MSRDE are able interrupt occurs when no higher priority exception modified such that they are both 1 and if exists, and an attempt is made to execute an SPE, the Debug exception Status is still set in the SPE.Embedded Float Scalar Double, SPE.Embedded DBSR. When this occurs, CSRR0 or Float Vector, or Vector instruction and MSRSPV = 0. DSRR0 [Category:Embedded.Enhanced Debug] is set to the address of the instruc- When an Embedded Floating-Point Unavailable inter- tion that would have executed next, not rupt occurs, the processor suppresses the execution of with the address of the instruction that the instruction causing the exception. modified the Debug Control Register 0 or SRR0, SRR1, MSR, and ESR are updated as follows: MSR and thus caused the interrupt. SRR0 Set to the effective address of the instruc- CSRR1 or DSRR1 [Category: Embedded.Enhanced tion causing the Embedded Floating-Point Debug] Unavailable interrupt. Set to the contents of the MSR at the time SRR1 Set to the contents of the MSR at the time of the interrupt. of the interrupt. MSR MSR CM MSRCM is set to MSRICM. CM MSRCM is set to MSRICM. ME, ICM Unchanged. VLEMI Set to 1 if the instruction causing the inter- All other supported MSR bits set to 0. rupt resides in VLE storage. DBSR Set to indicate type of Debug Event (see CE, ME, Section 8.5.2) DE, ICM Unchanged. All other defined MSR bits set to 0. Instruction execution resumes at address IVPR0:47 || IVOR1548:59||0b0000. ESR SPV Set to 1. VLEMI Set to 1 if the instruction causing the inter- rupt resides in VLE storage. All other defined ESR bits are set to 0. Instruction execution resumes at address IVPR0:47 || IVOR3248:59||0b0000. Programming Note This interrupt is also used by the Signal Processing Engine in the same manner. It should be used by software to determine if the application is using the upper 32 bits of the GPRs in a 32-bit implementa- tion and thus be required to save and restore them on context switch. Chapter 5. Interrupts and Exceptions 683 Version 2.05 5.6.18 Embedded Floating-Point 5.6.19 Embedded Floating-Point Data Interrupt Round Interrupt [Categories: SPE.Embedded Float [Categories: SPE.Embedded Float Scalar Double, SPE.Embedded Scalar Double, SPE.Embedded Float Scalar Single, SPE.Embedded Float Scalar Single, SPE.Embedded Float Vector] Float Vector] The Embedded Floating-Point Data interrupt occurs The Embedded Floating-Point Round interrupt occurs when no higher priority exception exists (see Section when no higher priority exception exists (see 5.9) and an Embedded Floating-Point Data exception is Section 5.9 on page 689), SPEFSCRFINXE is set to 1, presented to the interrupt mechanism. The Embedded and any of the following occurs: Floating-Point Data exception causing the interrupt is - the unrounded result of an Embedded Float- indicated in the SPEFSCR; these exceptions include ing-Point operation is not exact Embedded Floating-Point Invalid Operation/Input Error (FINV, FINVH), Embedded Floating-Point Divide By - an overflow occurs and overflow exceptions Zero (FDBZ, FDBZH), Embedded Floating-Point Over- are disabled (FOVF or FOVFH is set to 1 and flow (FOV, FOVH), and Embedded Floating-Point FOVFE is set to 0) Underflow (FUNF, FUNFH) - an underflow occurs and underflow excep- When an Embedded Floating-Point Data interrupt tions are disabled (FUNF is set to 1 and occurs, the processor suppresses the execution of the FUNFE is set to 0). instruction causing the exception. The value of SPEFSCRFINXS is 1, indicating that one of SRR0, SRR1, MSR, and ESR are updated as follows: the above exceptions has occurred, and additional information about the exception is found in SRR0 Set to the effective address of the instruc- SPEFSCRFGH FG FXH FX. tion causing the Embedded Floating-Point Data interrupt. When an Embedded Floating-Point Round interrupt occurs, the processor completes the execution of the SRR1 Set to the contents of the MSR at the time instruction causing the exception and writes the result of the interrupt. to the destination register prior to taking the interrupt. MSR SRR0, SRR1, MSR, and ESR are updated as follows: CM MSRCM is set to MSRICM. SRR0 Set to the effective address of the instruc- VLEMI Set to 1 if the instruction causing the inter- tion following the instruction causing the rupt resides in VLE storage. Embedded Floating-Point Round interrupt. CE, ME, SRR1 Set to the contents of the MSR at the time DE, ICM Unchanged. of the interrupt. All other defined MSR bits set to 0. MSR CM MSRCM is set to MSRICM. ESR CE, ME, SPV Set to 1. DE, ICM Unchanged. All other defined ESR bits are set to 0. All other defined MSR bits set to 0. Instruction execution resumes at address IVPR0:47 || IVOR3348:59||0b0000. ESR SPV Set to 1. VLEMI Set to 1 if the instruction causing the inter- rupt resides in VLE storage. All other defined ESR bits are set to 0. Instruction execution resumes at address IVPR0:47 || IVOR3448:59||0b0000. 684 Power ISATM III-E Version 2.05 Programming Note 5.6.21 Processor Doorbell Inter- If an implementation does not support ±Infinity rupt [Category: Embedded.Proces- rounding modes and the rounding mode is set to be +Infinity or -Infinity, an Embedded Floating-Point sor Control] Round interrupt occurs after every Embedded A Processor Doorbell Interrupt occurs when no higher Floating-Point instruction for which rounding might priority exception exists, a Processor Doorbell excep- occur regardless of the value of FINXE, provided tion is present, and MSREE=1. Processor Doorbell no higher priority exception exists. exceptions are generated when DBELL messages (see When an Embedded Floating-Point Round interrupt Chapter 9) are received and accepted by the proces- occurs, the unrounded (truncated) result of an inex- sor. act high or low element is placed in the target regis- When a Processor Doorbell Interrupt occurs, SRR0 is ter. If only a single element is inexact, the other set to the address of the next instruction to be executed exact element is updated with the correctly and SRR1 is set to the contents of the MSR at the time rounded result, and the FG and FX bits corre- of the interrupt. sponding to the other exact element will both be 0. Instruction execution resumes at address IVPR0:47 || The bits FG (FGH) and FX (FXH) are provided so IVOR3648:59 || 0b0000. that an interrupt handler can round the result as it desires. FG (FGH) is the value of the bit immedi- ately to the right of the least significant bit of the 5.6.22 Processor Doorbell Criti- destination format mantissa from the infinitely pre- cal Interrupt [Category: Embed- cise intermediate calculation before rounding. FX (FXH) is the value of the `or' of all the bits to the ded.Processor Control] right of the FG (FGH) of the destination format A Processor Doorbell Critical Interrupt occurs when no mantissa from the infinitely precise intermediate higher priority exception exists, a Processor Doorbell calculation before rounding. Critical exception is present, and MSRCE=1. Processor Doorbell Critical exceptions are generated when DBELL_CRIT messages (see Chapter 9) are received 5.6.20 Performance Monitor Inter- and accepted by the processor. rupt [Category: Embedded.Perfor- When a Processor Doorbell Critical Interrupt occurs, mance Monitor] CSRR0 is set to the address of the next instruction to be executed and CSRR1 is set to the contents of the The Performance Monitor interrupt is part of the MSR at the time of the interrupt. optional Performance Monitor facility; see Appendix E. Instruction execution resumes at address IVPR0:47 || IVOR3748:59 || 0b0000. Chapter 5. Interrupts and Exceptions 685 Version 2.05 5.7 Partially Executed Instructions In general, the architecture permits load and store 1. Any Load or Store (except elementary, aligned, instructions to be partially executed, interrupted, and guarded): then to be restarted from the beginning upon return Any asynchronous interrupt from the interrupt. Unaligned Load and Store instruc- Machine Check tions, or Load Multiple, Store Multiple, Load String, and Program (Imprecise Mode Floating-Point Store String instructions may be broken up into multi- Enabled) ple, smaller accesses, and these accesses may be per- Program (Imprecise Mode Auxiliary Processor formed in any order. In order to guarantee that a Enabled) particular load or store instruction will complete without 2. Unaligned elementary Load or Store, or any multi- being interrupted and restarted, software must mark ple or string: the storage being referred to as Guarded, and must use an elementary (non-string or non-multiple) load or All of the above listed under item 1, plus the store that is aligned on an operand-sized boundary. following: Data Storage (if the access crosses a protec- In order to guarantee that Load and Store instructions tion boundary) can, in general, be restarted and completed correctly Debug (Data Address Compare, Data Value without software intervention, the following rules apply Compare) when an execution is partially executed and then inter- rupted: 3. mtcrf may also be partially executed due to the occurrence of any of the interrupts listed under 1 For an elementary Load, no part of the target reg- item 1 at the time the mtcrf was executing. ister RT or FRT, will have been altered. 1 All instructions prior to the mtcrf have com- 1 For `with update' forms of Load or Store, the pleted execution. (Some storage accesses update register, register RA, will not have been generated by these preceding instructions altered. may not have completed.) On the other hand, the following effects are permissible 1 No subsequent instruction has begun execu- when certain instructions are partially executed and tion. then restarted: 1 The mtcrf instruction (the address of which was saved in SRR0/CSRR0/MCSRR0/ 1 For any Store, some of the bytes at the target stor- DSRR0 [Category: Embedded.Enhanced age location may have been altered (if write Debug] at the occurrence of the interrupt), access to that page in which bytes were altered is may appear not to have begun or may have permitted by the access control mechanism). In partially executed. addition, for Store Conditional instructions, CR0 has been set to an undefined value, and it is unde- fined whether the reservation has been cleared. 1 For any Load, some of the bytes at the addressed storage location may have been accessed (if read access to that page in which bytes were accessed is permitted by the access control mechanism). 1 For Load Multiple or Load String, some of the reg- isters in the range to be loaded may have been altered. Including the addressing registers (RA, and possibly RB) in the range to be loaded is a programming error, and thus the rules for partial execution do not protect against overwriting of these registers. In no case will access control be violated. As previously stated, the only load or store instructions that are guaranteed to not be interrupted after being partially executed are elementary, aligned, guarded loads and stores. All others may be interrupted after being partially executed. The following list identifies the specific instruction types for which interruption after partial execution may occur, as well as the specific interrupt types that could cause the interruption: 686 Power ISATM III-E Version 2.05 5.8 Interrupt Ordering and Masking It is possible for multiple exceptions to exist simulta- chy of interrupt classes is as follows from highest to neously, each of which could cause the generation of lowest: an interrupt. Furthermore, for interrupts classes other than the Machine Check interrupt and critical interrupts, MSR Enables Save/Restore the architecture does not provide for reporting more Interrupt Class Cleared Registers than one interrupt of the same class (unless the Machine Check ME,DE, CE, EE MSRR0/1 Embedded.Enhanced Debug category is supported). Debug1 DE,CE,EE DSRR0/1 Therefore, the architecture defines that interrupts are ordered with respect to each other, and provides a Critical CE,EE CSRR0/1 masking mechanism for certain persistent interrupt Base EE SRR0/1 types. 1 The Debug interrupt class is Category: E.ED. When an interrupt is masked (disabled), and an event Note: MSRDE may be cleared when a critical inter- causes an exception that would normally generate the rupt occurs if Category: E.ED is not supported. interrupt, the exception persists as a status bit in a reg- ister (which register depends upon the exception type). Figure 16. Interrupt Hierarchy However, no interrupt is generated. Later, if the inter- If the Embedded.Enhanced Debug category is not sup- rupt is enabled (unmasked), and the exception status ported (or is supported and is not enabled), then the has not been cleared by software, the interrupt due to Debug interrupt becomes a Critical class interrupt and the original exception event will then finally be gener- all critical class interrupts will clear DE, CE, and EE in ated. the MSR. All asynchronous interrupts can be masked. In addition, Base Class interrupts that occur as a result of precise certain synchronous interrupts can be masked. An exceptions are not masked by the EE bit in the MSR example of such an interrupt is the Floating-Point and any such exception that occurs prior to software Enabled exception type Program interrupt. The execu- saving the state of SRR0/1 in a base class exception tion of a floating-point instruction that causes the handler will result in a situation that could result in the FPSCRFEX bit to be set to 1 is considered an exception loss of state information. event, regardless of the setting of MSRFE0,FE1. If MSRFE0,FE1 are both 0, then the Floating-Point This first step of the hardware clearing the MSR enable Enabled exception type of Program interrupt is bits lower in the hierarchy shown in Figure 16 prevents masked, but the exception persists in the FPSCRFEX any subsequent asynchronous interrupts from overwrit- bit. Later, if the MSRFE0,FE1 bits are enabled, the inter- ing the Save/Restore Registers (SRR0/SRR1, CSRR0/ rupt will finally be generated. CSRR1, MCSRR0/MCSRR1, or DSRR0/DSRR1 [Cate- gory: Embedded.Enhanced Debug]), prior to software The architecture enables implementations to avoid situ- being able to save their contents. Hardware also auto- ations in which an interrupt would cause the state infor- matically clears, on any interrupt, mation (saved in Save/Restore Registers) from a MSRWE,PR,FP,FE0,FE1,IS,DS. The clearing of these bits previous interrupt to be overwritten and lost. In order to assists in the avoidance of subsequent interrupts of do this, the architecture defines interrupt classes in a certain other types. However, guaranteeing that inter- hierarchical manner. At each interrupt class, hardware rupt classes lower in the hierarchy do not occur and automatically disables any further interrupts associated thus do not overwrite the Save/Restore Registers with the interrupt class by masking the interrupt enable (SRR0/SRR1, CSRR0/CSRR1, DSRR0/DSRR1 [Cate- in the MSR when the interrupt is taken. In addition, gory: Embedded.Enhanced Debug], or MCSRR0/ each interrupt class masks the interrupt enable in the MCSRR1) also requires the cooperation of system soft- MSR for each lower class in the hierarchy. The hierar- ware. Specifically, system software must avoid the exe- cution of instructions that could cause (or enable) a subsequent interrupt, if the contents of the Save/ Restore Registers (SRR0/SRR1, CSRR0/CSRR1, DSRR0/DSRR1 [Category: Embedded.Enhanced Debug]), or MCSRR0/MCSRR1) have not yet been saved. Chapter 5. Interrupts and Exceptions 687 Version 2.05 5.8.1 Guidelines for System Soft- and Unimplemented Operation type Program inter- rupts. ware 1 Execution of any Illegal instructions The following list identifies the actions that system soft- This prevents Illegal Instruction exception type ware must avoid, prior to having saved the Save/ Program interrupts. Restore Registers' contents: 1 Execution of any instruction that could cause an 1 Re-enabling an interrupt class that is at the same Alignment interrupt or a lower level in the interrupt hierarchy. This includes the following actions: This prevents Alignment interrupts. Included in this category are any string or multiple instructions, - Re-enabling of MSREE and any unaligned elementary load or store - Re-enabling of MSRCE,EE in critical class instructions. See Section 5.6.6 on page 677 for a interrupt handlers, and if the Embed- complete list of instructions that may cause Align- ded.Enhanced Debug category is not sup- ment interrupts. ported, re-enabling of MSRDE. It is not necessary for hardware or software to avoid - Category: Embedded.Enhanced Debug: Re- interrupts higher in the interrupt hierarchy (see enabling of MSRCE,EE,DE in Debug class inter- Figure 16) from within interrupt handlers (and hence, rupt handlers for example, hardware does not automatically clear - Re-enabling of MSREE,CE,DE,ME in Machine MSRCE,ME,DE upon a base class interrupt), since inter- Check interrupt handlers. rupts at each level of the hierarchy use different pairs of Save/Restore Registers to save the instruction address 1 Branching (or sequential execution) to addresses and MSR (i.e. SRR0/SRR1 for base class interrupts, not mapped by the TLB, or mapped without UX=1 and MCSRR0/MCSRR1,DSRR0/DSRR1 [Category: or SX=1 permission. Embedded.Enhanced Debug], or CSRR0/CSRR1 for This prevents Instruction Storage and Instruction non-base class interrupts). The converse, however, is TLB Error interrupts. not true. That is, hardware and software must cooper- ate in the avoidance of interrupts lower in the hierarchy 1 Load, Store or Cache Management instructions to from occurring within interrupt handlers, even though addresses not mapped by the TLB or not having the these interrupts use different Save/Restore Regis- required access permissions. ter pairs. This is because the interrupt higher in the This prevents Data Storage and Data TLB Error hierarchy may have occurred from within a interrupt interrupts. handler for an interrupt lower in the hierarchy prior to the interrupt handler having saved the Save/Restore 1 Execution of System Call (sc) or Trap (tw, twi, td, Registers. Therefore, within an interrupt handler, Save/ tdi) instructions Restore Registers for all interrupts lower in the hierar- This prevents System Call and Trap exception chy may contain data that is necessary to the system type Program interrupts. software. 1 Execution of any floating-point instruction This prevents Floating-Point Unavailable inter- rupts. Note that this interrupt would occur upon the execution of any floating-point instruction, due to the automatic clearing of MSRFP. However, even if software were to re-enable MSRFP, floating-point instructions must still be avoided in order to pre- vent Program interrupts due to various possible Program interrupt exceptions (Floating-Point Enabled, Unimplemented Operation). 1 Re-enabling of MSRPR This prevents Privileged Instruction exception type Program interrupts. Alternatively, software could re-enable MSRPR, but avoid the execution of any privileged instructions. 1 Execution of any Auxiliary Processor instruction This prevents Auxiliary Processor Unavailable interrupts, and Auxiliary Processor Enabled type 688 Power ISATM III-E Version 2.05 5.8.2 Interrupt Order 5.9 Exception Priorities The following is a prioritized listing of the various All synchronous (precise and imprecise) interrupts are enabled interrupts for which exceptions might exist reported in program order, as required by the Sequen- simultaneously: tial Execution Model. The one exception to this rule is 1. Synchronous (Non-Debug) Interrupts: the case of multiple synchronous imprecise interrupts. Data Storage Upon a synchronizing event, all previously executed Instruction Storage instructions are required to report any synchronous Alignment imprecise interrupt-generating exceptions, and the Program interrupt will then be generated with all of those excep- Floating-Point Unit Unavailable tion types reported cumulatively, in both the ESR, and Auxiliary Processor Unavailable any status registers associated with the particular Embedded Floating-Point Unavailable exception type (e.g. the Floating-Point Status and Con- [SP.Category: SP.Embedded Float_*] trol Register). SPE/Embedded Floating-Point/Vector For any single instruction attempting to cause multiple Unavailable exceptions for which the corresponding synchronous Embedded Floating-Point Data [Category: interrupt types are enabled, this section defines the pri- SP.Embedded Float_*] ority order by which the instruction will be permitted to Embedded Floating-Point Round [Category: cause a single enabled exception, thus generating a SP.Embedded Float_*] particular synchronous interrupt. Note that it is this System Call exception priority mechanism, along with the require- Data TLB Error ment that synchronous interrupts be generated in pro- Instruction TLB Error gram order, that guarantees that at any given time, Only one of the above types of synchronous inter- there exists for consideration only one of the synchro- rupts may have an existing exception generating it nous interrupt types listed in item 1 of Section 5.8.2 on at any given time. This is guaranteed by the excep- page 689. The exception priority mechanism also pre- tion priority mechanism (see Section 5.9 on vents certain debug exceptions from existing in combi- page 689) and the requirements of the Sequential nation with certain other synchronous interrupt- Execution Model. generating exceptions. 2. Machine Check Because unaligned Load and Store instructions, or 3. Debug Load Multiple, Store Multiple, Load String, and Store 4. Critical Input Sting instructions may be broken up into multiple, 5. Watchdog Timer smaller accesses, and these accesses may be per- 6. Processor Doorbell Critical formed in any order. The exception priority mechanism 7. External Input applies to each of the multiple storage accesses in the 8. Fixed-Interval Timer order they are performed by the implementation. 9. Decrementer This section does not define the permitted setting of 10. Processor Doorbell multiple exceptions for which the corresponding inter- 11. Embedded Performance Monitor rupt types are disabled. The generation of exceptions Even though, as indicated above, the base, synchro- for which the corresponding interrupt types are dis- nous exception types listed under item 1 are generated abled will have no effect on the generation of other with higher priority than the non-base interrupt classes exceptions for which the corresponding interrupt types listed in items 2-5, the fact is that these base class are enabled. Conversely, if a particular exception for interrupts will immediately be followed by the highest which the corresponding interrupt type is enabled is priority existing interrupt in items 2-5, without executing shown in the following sections to be of a higher priority any instructions at the base class interrupt handler. than another exception, it will prevent the setting of that This is because the base interrupt classes do not auto- other exception, independent of whether that other matically disable the MSR mask bits for the interrupts exception's corresponding interrupt type is enabled or listed in 2-5. In all other cases, a particular interrupt disabled. class from the above list will automatically disable any Except as specifically noted, only one of the exception subsequent interrupts of the same class, as well as all types listed for a given instruction type will be permitted other interrupt classes that are listed below it in the pri- to be generated at any given time. The priority of the ority order. exception types are listed in the following sections ranging from highest to lowest, within each instruction type. Chapter 5. Interrupts and Exceptions 689 Version 2.05 10. Debug (Data Address Compare, Data Value Com- Programming Note pare) Some exception types may even be mutually exclu- 11. Debug (Instruction Complete) sive of each other and could otherwise be consid- ered the same priority. In these cases, the If the instruction is causing both a Debug (Instruction exceptions are listed in the order suggested by the Address Compare) and a Debug (Data Address Com- sequential execution model. pare) or Debug (Data Value Compare), and is not caus- ing any of the exceptions listed in items 2-9, it is permissible for both exceptions to be generated and 5.9.1 Exception Priorities for recorded in the DBSR. A single Debug interrupt will result. Defined Instructions 5.9.1.3 Exception Priorities for Other 5.9.1.1 Exception Priorities for Defined Defined Floating-Point Instructions Floating-Point Load and Store Instruc- The following prioritized list of exceptions may occur as tions a result of the attempted execution of any defined float- The following prioritized list of exceptions may occur as ing-point instruction other than a load or store. a result of the attempted execution of any defined 1. Debug (Instruction Address Compare) Floating-Point Load and Store instruction. 2. Instruction TLB Error 1. Debug (Instruction Address Compare) 3. Instruction Storage Interrupt (all types) 2. Instruction TLB Error 4. Program (Illegal Instruction) 3. Instruction Storage Interrupt (all types) 5. Floating-Point Unavailable 4. Program (Illegal Instruction) 6. Program (Unimplemented Operation) 5. Floating-Point Unavailable 7. Program (Floating-point Enabled) 6. Program (Unimplemented Operation) 8. Debug (Instruction Complete) 7. Data TLB Error 8. Data Storage (all types) 5.9.1.4 Exception Priorities for Defined 9. Alignment 10. Debug (Data Address Compare, Data Value Com- Privileged Instructions pare) The following prioritized list of exceptions may occur as 11. Debug (Instruction Complete) a result of the attempted execution of any defined privi- leged instruction, except dcbi, rfi, and rfci instructions. If the instruction is causing both a Debug (Instruction Address Compare) and a Debug (Data Address Com- 1. Debug (Instruction Address Compare) pare) or Debug (Data Value Compare), and is not caus- 2. Instruction TLB Error ing any of the exceptions listed in items 2-9, it is 3. Instruction Storage Interrupt (all types) permissible for both exceptions to be generated and 4. Program (Illegal Instruction) recorded in the DBSR. A single Debug interrupt will 5. Program (Privileged Instruction) result. 6. Program (Unimplemented Operation) 7. Debug (Instruction Complete) 5.9.1.2 Exception Priorities for Other For mtmsr, mtspr (DBCR0, DBCR1, DBCR2), mtspr Defined Load and Store Instructions and (TCR), and mtspr (TSR), if they are not causing Debug Defined Cache Management Instructions (Instruction Address Compare) nor Program (Privileged Instruction) exceptions, it is possible that they are The following prioritized list of exceptions may occur as simultaneously enabling (via mask bits) multiple exist- a result of the attempted execution of any other defined ing exceptions (and at the same time possibly causing Load or Store instruction, or defined Cache Manage- a Debug (Instruction Complete) exception). When this ment instruction. occurs, the interrupts will be handled in the order defined by Section 5.8.2 on page 689. 1. Debug (Instruction Address Compare) 2. Instruction TLB Error 3. Instruction Storage Interrupt (all types) 5.9.1.5 Exception Priorities for Defined 4. Program (Illegal Instruction) Trap Instructions 5. Program (Privileged Instruction) (dcbi only) 6. Program (Unimplemented Operation) The following prioritized list of exceptions may occur as 7. Data TLB Error a result of the attempted execution of a defined Trap 8. Data Storage (all types) instruction. 9. Alignment 1. Debug (Instruction Address Compare) 2. Instruction TLB Error 690 Power ISATM III-E Version 2.05 3. Instruction Storage Interrupt (all types) 4. Program (Illegal Instruction) 4. Program (Illegal Instruction) 5. Program (Privileged Instruction) 5. Program (Unimplemented Operation) 6. Program (Unimplemented Operation) 6. Debug (Trap) 7. Debug (Return From Interrupt) 7. Program (Trap) 8. Debug (Instruction Complete) 8. Debug (Instruction Complete) If the rfi or rfci, rfmci, or rfdi [Category: Embed- If the instruction is causing both a Debug (Instruction ded.Enhanced Debug] instruction is causing both a Address Compare) and a Debug (Trap), and is not Debug (Instruction Address Compare) and a Debug causing any of the exceptions listed in items 2-5, it is (Return From Interrupt), and is not causing any of the permissible for both exceptions to be generated and exceptions listed in items 2-5, it is permissible for both recorded in the DBSR. A single Debug interrupt will exceptions to be generated and recorded in the DBSR. result. A single Debug interrupt will result. 5.9.1.6 Exception Priorities for Defined 5.9.1.9 Exception Priorities for Other System Call Instruction Defined Instructions The following prioritized list of exceptions may occur as The following prioritized list of exceptions may occur as a result of the attempted execution of a defined System a result of the attempted execution of all other instruc- Call instruction. tions not listed above. 1. Debug (Instruction Address Compare) 1. Debug (Instruction Address Compare) 2. Instruction TLB Error 2. Instruction TLB Error 3. Instruction Storage Interrupt (all types) 3. Instruction Storage Interrupt (all types) 4. Program (Illegal Instruction) 4. Program (Illegal Instruction) 5. Program (Unimplemented Operation) 5. Program (Unimplemented Operation) 6. System Call 6. Debug (Instruction Complete) 7. Debug (Instruction Complete) 5.9.2 Exception Priorities for 5.9.1.7 Exception Priorities for Defined Branch Instructions Reserved Instructions The following prioritized list of exceptions may occur as The following prioritized list of exceptions may occur as a result of the attempted execution of any reserved a result of the attempted execution of any defined instruction. branch instruction. 1. Debug (Instruction Address Compare) 1. Debug (Instruction Address Compare) 2. Instruction TLB Error 2. Instruction TLB Error 3. Instruction Storage Interrupt (all types) 3. Instruction Storage Interrupt (all types) 4. Program (Illegal Instruction) 4. Program (Illegal Instruction) 5. Program (Unimplemented Operation) 6. Debug (Branch Taken) 7. Debug (Instruction Complete) If the instruction is causing both a Debug (Instruction Address Compare) and a Debug (Branch Taken), and is not causing any of the exceptions listed in items 2-5, it is permissible for both exceptions to be generated and recorded in the DBSR. A single Debug interrupt will result. 5.9.1.8 Exception Priorities for Defined Return From Interrupt Instructions The following prioritized list of exceptions may occur as a result of the attempted execution of an rfi, rfci, rfmci, rfdi [Category:Embedded.Enhanced Debug] instruc- tion. 1. Debug (Instruction Address Compare) 2. Instruction TLB Error 3. Instruction Storage Interrupt (all types) Chapter 5. Interrupts and Exceptions 691 Version 2.05 692 Power ISATM III-E Version 2.05 Chapter 6. Reset and Initialization 6.1 Background . . . . . . . . . . . . . . . . . 693 6.4 Software Initialization Requirements . . 6.2 Reset Mechanisms . . . . . . . . . . . . 693 694 6.3 Processor State After Reset . . . . . 693 6.1 Background The Machine State Register and Processor Version Register and a TLB entry are updated as follows: This chapter describes the requirements for processor reset. This includes both the means of causing reset, Machine State Register and the specific initialization that is required to be per- formed automatically by the processor hardware. This Bit Setting Comments chapter also provides an overview of the operations CM 0 Computation Mode (set to 32-bit that should be performed by initialization software, in mode) order to fully initialize the processor. ICM 0 Interrupt Computation Mode (set In general, the specific actions taken by a processor to 32-bit) upon reset are implementation-dependent. Also, it is UCLE 0 User Cache Locking Enable the responsibility of system initialization software to ini- SPV 0 SPE/Embedded Floating-Point/ tialize the majority of processor and system resources Vector Unavailable after reset. Implementations are required to provide a minimum processor initialization such that this system WE 0 Wait State disabled software may be fetched and executed, thereby CE 0 Critical Input interrupts disabled accomplishing the rest of system initialization. DE 0 Debug interrupts disabled EE 0 External Input interrupts disabled PR 0 Supervisor mode 6.2 Reset Mechanisms FP 0 FP unavailable This specification defines two processor mechanisms ME 0 Machine Check interrupts disabled for internally invoking a reset operation using either the FE0 0 FP exception type Program inter- Watchdog Timer (see Section 7.7 on page 701) or the rupts disabled Debug facilities using DBCR0RST (see Section 8.5.1.1 FE1 0 FP exception type Program inter- on page 711). In addition, implementations will typically rupts disabled provide additional means for invoking a reset operation, via an external mechanism such as a signal pin which IS 0 Instruction Address Space 0 when activated will cause the processor to reset. DS 0 Data Address Space 0 PMM 0 Performance Monitor Mark 6.3 Processor State After Reset Figure 17. Machine State Register Initial Values The initial processor state is controlled by the register contents after reset. In general, the contents of most Processor Version Register registers are undefined after reset. Implementation-Dependent. (This register is read-only, The processor hardware is only guaranteed to initialize and contains a value which identifies the specific imple- those registers (or specific bits in registers) which must mentation) be initialized in order for software to be able to reliably perform the rest of system initialization. Chapter 6. Reset and Initialization 693 Version 2.05 TLB entry address is different from the PowerPC Architecture System Reset interrupt vector. A TLB entry (which entry is implementation-dependent) is initialized in an implementation-dependent manner An implementation may provide additional methods for that maps the last 4KB page in the implemented effec- initializing the TLB entry used for initial boot by provid- tive storage address space, with the following field set- ing an implementation-dependent RPN, or initializing tings: other TLB entries. 6.4 Software Initialization Field Setting Comments EPN see Represents the last 4K page in Requirements below effective address space When reset occurs, the processor is initialized to a min- RPN see Represents the last 4K page in imum configuration to start executing initialization code. below physical address space Initialization code is necessary to complete the proces- TS 0 translation address space 0 sor and system configuration. The initialization code SIZE 0b0001 4KB page size described in this section is the minimum recommended for configuring the processor to run application code. W ? implementation-dependent value I ? implementation-dependent value Initialization code should configure the following pro- M ? implementation-dependent value cessor resources: G ? implementation-dependent value - Invalidate the instruction cache and data E ? implementation-dependent value cache (implementation-dependent). U0 ? implementation-dependent value - Initialize system memory as required by the U1 ? implementation-dependent value operating system or application code. U2 ? implementation-dependent value - Initialize the Interrupt Vector Prefix Register U3 ? implementation-dependent value and Interrupt Vector Offset Register. TID ? implementation-dependent value, - Initialize other processor registers as needed but page must be accessible by the system. UX ? implementation-dependent value - Initialize off-chip system facilities. UR ? implementation-dependent value - Dispatch the operating system or application UW ? implementation-dependent value code. SX 1 page is execute accessible in supervisor mode SR 1 page is read accessible in supervisor mode SW 1 page is write accessible in supervisor mode VLE ? implementation-dependent value ACM ? implementation-dependent value Figure 18. TLB Initial Values The initial settings of EPN and RPN are dependent upon the number of bits implemented in the EPN and RPN fields and the minimum page size supported by the implementation. For example, an implementation that allows 1KB pages and 32 bits of effective address would implement a 22 bit EPN and set the initial value of the boot entry to 222-4 (0x3FFC) while an implemen- tation that supports only 4K pages as the smallest size and 32 bits of effective address would implement a 20 bit EPN and set the initial value of the boot entry to 220- 1 (0xFFFF). Instruction execution begins at the last word address of the page mapped by the boot TLB entry. Note that this 694 Power ISATM III-E Version 2.05 Chapter 7. Timer Facilities 7.1 Overview. . . . . . . . . . . . . . . . . . . . 695 7.4 Decrementer Auto-Reload Register . . 7.2 Time Base (TB) . . . . . . . . . . . . . . 695 698 7.2.1 Writing the Time Base . . . . . . . . 696 7.5 Timer Control Register . . . . . . . . . 698 7.3 Decrementer . . . . . . . . . . . . . . . . . 697 7.5.1 Timer Status Register . . . . . . . . . 700 7.3.1 Writing and Reading the Decre- 7.6 Fixed-Interval Timer . . . . . . . . . . . 700 menter . . . . . . . . . . . . . . . . . . . . . . . . . 697 7.7 Watchdog Timer . . . . . . . . . . . . . . 701 7.3.2 Decrementer Events . . . . . . . . . 697 7.8 Freezing the Timer Facilities . . . . . 702 7.1 Overview The period of the Time Base depends on the driving frequency. As an order of magnitude example, sup- The Time Base, Decrementer, Fixed-interval Timer, pose that the CPU clock is 1 GHz and that the Time and Watchdog Timer provide timing functions for the Base is driven by this frequency divided by 32. Then system. The remainder of this section describes these the period of the Time Base would be registers and related facilities. 64 2 × 32 TTB = -------------------- = 5.90 × 1011 seconds - 1 GHz 7.2 Time Base (TB) which is approximately 18,700 years. The Time Base is implemented such that: The Time Base (TB) is a 64-bit register (see Figure 19) containing a 64-bit unsigned integer that is incremented 1. Loading a GPR from the Time Base has no effect periodically. Each increment adds 1 to the low-order bit on the accuracy of the Time Base. (bit 63). The frequency at which the integer is updated 2. Copying the contents of a GPR to the Time Base is implementation-dependent. replaces the contents of the Time Base with the contents of the GPR. TBU TBL 0 32 63 The Power ISA does not specify a relationship between the frequency at which the Time Base is updated and Field Description other frequencies, such as the CPU clock or bus clock in a Power ISA system. The Time Base update fre- TBU Upper 32 bits of Time Base quency is not required to be constant. What is required, TBL Lower 32 bits of Time Base so that system software can keep time of day and oper- ate interval timers, is one of the following. Figure 19. Time Base 1 The system provides an (implementation-depen- The Time Base bits 0:59 increment until their value dent) interrupt to software whenever the update becomes 0xFFF_FFFF_FFFF_FFFF (259 - 1), at the frequency of the Time Base bits 0:59 changes, and next increment their value becomes a means to determine what the current update fre- 0x000_0000_0000_0000. There is no interrupt or other quency is. indication when this occurs. 1 The update frequency of the Time Base bits 0:59 is Time base bits 60:63 may increment at a variable rate. under the control of the system software. When the value of bit 59 changes, bits 60:63 are set to zero; if bits 60:63 increment to 0xF before the value of Implementations must provide a means for either pre- bit 59 changes, they remain at 0xF until the value of bit venting the Time Base from incrementing or preventing 59 changes. it from being read in user mode (MSRPR=1). If the means is under software control, it must be privileged. Chapter 7. Timer Facilities 695 Version 2.05 There must be a method for getting all processors' Time Bases to start incrementing with values that are identical or almost identical in all processors. Programming Note If software initializes the Time Base on power-on to some reasonable value and the update frequency of the Time Base is constant, the Time Base can be used as a source of values that increase at a con- stant rate, such as for time stamps in trace entries. Even if the update frequency is not constant, val- ues read from the Time Base are monotonically increasing (except when the Time Base wraps from 264-1 to 0). If a trace entry is recorded each time the update frequency changes, the sequence of Time Base values can be post-processed to become actual time values. Successive readings of the Time Base may return identical values. See the description of the Time Base in Book II, for ways to compute time of day in POSIX format from the Time Base. 7.2.1 Writing the Time Base Writing the Time Base is privileged. Reading the Time Base is not privileged; it is discussed in Book II. It is not possible to write the entire 64-bit Time Base using a single instruction. The mttbl and mttbu extended mnemonics write the lower and upper halves of the Time Base (TBL and TBU), respectively, pre- serving the other half. These are extended mnemonics for the mtspr instruction; see Appendix B, "Assembler Extended Mnemonics" on page 733. The Time Base can be written by a sequence such as: lwz Rx,upper # load 64-bit value for lwz Ry,lower # TB into Rx and Ry li Rz,0 mttbl Rz # set TBL to 0 mttbu Rx # set TBU mttbl Ry # set TBL Provided that no interrupts occur while the last three instructions are being executed, loading 0 into TBL pre- vents the possibility of a carry from TBL to TBU while the Time Base is being initialized. Programming Note The instructions for writing the Time Base are mode-independent. Thus code written to set the Time Base will work correctly in either 64-bit or 32- bit mode. 696 Power ISATM III-E Version 2.05 7.3 Decrementer 7.3.1 Writing and Reading the The Decrementer (DEC) is a 32-bit decrementing Decrementer counter that provides a mechanism for causing a Dec- The contents of the Decrementer can be read or written rementer interrupt after a programmable delay. The using the mfspr and mtspr instructions, both of which contents of the Decrementer are treated as a signed are privileged when they refer to the Decrementer. integer. Using an extended mnemonic (see Appendix B, "Assembler Extended Mnemonics" on page 733), the DEC Decrementer can be written from GPR Rx using: 32 63 mtdec Rx Figure 20. Decrementer The Decrementer can be read into GPR Rx using: Decrementer bits 32:59 count down until their value becomes 0x000_0000, at the next increment their mfdec Rx value becomes 0xFFF_FFFF. Decrementer bits 60:63 may decrement at a variable rate. When the value of Copying the Decrementer to a GPR has no effect on bit 59 changes, bits 60:63 are set to 0xF; if bits 60:63 the Decrementer contents or on the interrupt mecha- decrement to 0x0 before the value of bit 59 changes, nism. they remain at 0x0 until the value of bit 59 changes. The Decrementer is driven by the same frequency as 7.3.2 Decrementer Events the Time Base. The period of the Decrementer will A Decrementer event occurs when a decrement occurs depend on the driving frequency, but if the same values on a Decrementer value of 0x0000_0001. are used as given above for the Time Base (see Sec- tion 7.2), and if the Time Base update frequency is con- Upon the occurrence of a Decrementer event, the Dec- stant, the period would be rementer may be reloaded from a 32-bit Decrementer 32 Auto-Reload Register (DECAR). See Section 7.4. 2 × 32 Upon the occurrence of a Decrementer event, the Dec- TDEC = -------------------- = 137 seconds. - 1 GHz rementer has the following basic modes of operation. The Decrementer counts down. The operation of the Decrementer satisfies the follow- Decrement to one and stop on zero ing constraints. If TCRARE=0, TSRDIS is set to 1, the value 1. The operation of the Time Base and the Decre- 0x0000_0000 is then placed into the DEC, and the menter is coherent, i.e., the counters are driven by Decrementer stops decrementing. the same fundamental time base. If enabled by TCRDIE=1 and MSREE=1, a Decre- 2. Loading a GPR from the Decrementer has no menter interrupt is taken. See Section 5.6.11, effect on the accuracy of the Time Base. "Decrementer Interrupt" on page 680 for details of 3. Copying the contents of a GPR to the Decre- register behavior caused by the Decrementer menter replaces the contents of the Decrementer interrupt. with the contents of the GPR. Programming Note Decrement to one and auto-reload In systems that change the Time Base update fre- If TCRARE=1, TSRDIS is set to 1, the contents of quency for purposes such as power management, the Decrementer Auto-Reload Register is then the Decrementer input frequency will also change. placed into the DEC, and the Decrementer contin- Software must be aware of this in order to set inter- ues decrementing from the reloaded value. val timers. If enabled by TCRDIE=1 and MSREE=1, a Decre- If Decrementer bits 60:63 are used as part of a ran- menter interrupt is taken. See Section 5.6.11, dom number generator, software must account for "Decrementer Interrupt" on page 680 for details of the fact that these bits are set to 0xF only when bit register behavior caused by the Decrementer 59 changes state regardless of whether or not they interrupt. decremented to 0x0 since they were previously set Forcing the Decrementer to 0 using the mtspr instruc- to 0xF. tion will not cause a Decrementer exception; however, decrementing which was in progress at the instant of the mtspr may cause the exception. To eliminate the Decrementer as a source of exceptions, set TCRDIE to 0 (clear the Decrementer Interrupt Enable bit). Chapter 7. Timer Facilities 697 Version 2.05 If it is desired to eliminate all Decrementer activity, the procedure is as follows: 1. Write 0 to TCRDIE. This will prevent Decrementer activity from causing exceptions. 2. Write 0 to TCRARE to disable the Decrementer auto-reload. 3. Write 0 to Decrementer. This will halt Decrementer decrementing. While this action will not cause a Decrementer exception to be set in TSRDIS, a near simultaneous decrement may have done so. 4. Write 1 to TSRDIS. This action will clear TSRDIS to 0 ( see Section 7.5.1 on page 700). This will clear any Decrementer exception which may be pend- ing. Because the Decrementer is frozen at zero, no further Decrementer events are possible. If the auto-reload feature is disabled (TCRARE=0), then once the Decrementer decrements to zero, it will stay there until software reloads it using the mtspr instruc- tion. On reset, TCRARE is set to 0. This disables the auto- reload feature. 7.4 Decrementer Auto-Reload Register The Decrementer Auto-Reload Register is a 32-bit reg- ister as shown below. DECAR 32 63 Figure 21. Decrementer Bits of the decrementer auto-reload register are num- bered 32 (most-significant bit) to 63 (least-significant bit). The Decrementer Auto-Reload Register is pro- vided to support the auto-reload feature of the Decre- menter. See Section 7.3.2 The contents of the Decrementer Auto-Reload Register cannot be read. The contents of bits 32:63 of register RS can be written to the Decrementer Auto-Reload Register using the mtspr instruction. 7.5 Timer Control Register The Timer Control Register (TCR) is a 32-bit register. Timer Control Register bits are numbered 32 (most-sig- nificant bit) to 63 (least-significant bit). The Timer Con- trol Register controls Decrementer (see Section 7.3), Fixed-Interval Timer (see Section 7.6), and Watchdog Timer (see Section 7.7) options. The relationship of the Timer facilities to the TCR and TB is shown in the figure below. 698 Power ISATM III-E Version 2.05 TIME BASE (incrementer) TBU TBL Timer Clock 0 31 0 31 Watchdog Timer events based on one of 4 Time Base bits selected by TCRWP (the 4 Time Base bits that can be selected by TCRWP are implementation-dependent) Fixed-Interval Timer events based on one of 4 Time Base bits selected by TCRFP (the 4 Time Base bits that can be selected by TCRFP are implementation-dependent) (decrementer) DEC Decrementer event < 0/1 detect auto-reload DECAR 0 31 Figure 22. Relationships of the Timer Facilities The contents of the Timer Control Register can be read function of any of these settings is imple- using the mfspr instruction. The contents of bits 32:63 mentation-dependent. of register RS can be written to the Timer Control Reg- The Watchdog Timer Reset Control field is ister using the mtspr instruction. cleared to zero by processor reset. These bits The contents of the TCR are defined below: are set only by software. Once a 1 has been written to one of these bits, that bit remains a Bit(s) Description 1 until a reset occurs. This is to prevent errant code from disabling the Watchdog reset func- 32:33 Watchdog Timer Period (WP) (see tion. Section 7.7 on page 701) 36 Watchdog Timer Interrupt Enable (WIE) Specifies one of 4 bit locations of the Time (see Section 7.7 on page 701) Base used to signal a Watchdog Timer exception on a transition from 0 to 1. The 4 0 Disable Watchdog Timer interrupt Time Base bits that can be specified to 1 Enable Watchdog Timer interrupt serve as the Watchdog Timer period are 37 Decrementer Interrupt Enable (DIE) (see implementation-dependent. Section 7.3 on page 697) 0 Disable Decrementer interrupt 34:35 Watchdog Timer Reset Control (WRC) (see 1 Enable Decrementer interrupt Section 7.7 on page 701) 38:39 Fixed-Interval Timer Period (FP) (see 00 No Watchdog Timer reset will occur Section 7.6 on page 700) TCRWRC resets to 0b00. This field may be Specifies one of 4 bit locations of the Time set by software, but cannot be cleared by Base used to signal a Fixed-Interval Timer software (except by a software-induced exception on a transition from 0 to 1. The 4 reset) Time Base bits that can be specified to serve as the Fixed-Interval Timer period are imple- 01-11 mentation-dependent. Force processor to be reset on second 40 Fixed-Interval Timer Interrupt Enable (FIE) time-out of Watchdog Timer. The exact (see Section 7.6 on page 700 Chapter 7. Timer Facilities 699 Version 2.05 0 Disable Fixed-Interval Timer interrupt 33 Watchdog Timer Interrupt Status (WIS) 1 Enable Fixed-Interval Timer interrupt (see Section 7.7 on page 701) 41 Auto-Reload Enable (ARE) 0 A Watchdog Timer event has not occurred. 0 Disable auto-reload of the Decrementer 1 A Watchdog Timer event has occurred. Decrementer exception is presented (i.e. When MSRCE=1 and TCRWIE=1, a TSRDIS is set to 1) when the Decrementer Watchdog Timer interrupt is taken. is decremented from a value of 34:35 Watchdog Timer Reset Status (WRS) (see 0x0000_0001. The next value placed in the Section 7.7 on page 701) Decrementer is the value 0x0000_0000. The Decrementer then stops decrementing. These two bits are set to one of three values If MSREE=1, TCRDIE=1, and TSRDIS=1, a when a reset is caused by the Watchdog Decrementer interrupt is taken. Software Timer. These bits are undefined at power-up. must reset TSRDIS. 00 No Watchdog Timer reset has occurred. 1 Enable auto-reload of the Decrementer 01 Implementation-dependent reset informa- Decrementer exception is presented (i.e. tion. TSRDIS is set to 1) when the Decrementer 10 Implementation-dependent reset informa- is decremented from a value of tion. 0x0000_0001. The contents of the Decre- 11 Implementation-dependent reset informa- menter Auto-Reload Register is placed in tion. the Decrementer. The Decrementer 36 Decrementer Interrupt Status (DIS) (see resumes decrementing. If MSREE=1, Section 7.3.2 on page 697) TCRDIE=1, and TSRDIS=1, a Decrementer interrupt is taken. Software must reset 0 A Decrementer event has not occurred. TSRDIS. 1 A Decrementer event has occurred. When MSREE=1 and TCRDIE=1, a Decrementer 42 Implementation-dependent interrupt is taken. 43:63 Reserved 37 Fixed-Interval Timer Interrupt Status (FIS) (see Section 7.6 on page 700) 7.5.1 Timer Status Register 0 A Fixed-Interval Timer event has not The Timer Status Register (TSR) is a 32-bit register. occurred. Timer Status Register bits are numbered 32 (most-sig- 1 A Fixed-Interval Timer event has nificant bit) to 63 (least-significant bit). The Timer Sta- occurred. When MSREE=1 and TCRFIE=1, tus Register contains status on timer events and the a Fixed-Interval Timer interrupt is taken. most recent Watchdog Timer-initiated processor reset. 38:63 Reserved The Timer Status Register is set via hardware, and read and cleared via software. The contents of the Timer Status Register can be read using the mfspr 7.6 Fixed-Interval Timer instruction. Bits in the Timer Status Register can be The Fixed-Interval Timer (FIT) is a mechanism for pro- cleared using the mtspr instruction. Clearing is done viding timer interrupts with a repeatable period, to facil- by writing bits 32:63 of a General Purpose Register to itate system maintenance. It is similar in function to an the Timer Status Register with a 1 in any bit position auto-reload Decrementer, except that there are fewer that is to be cleared and 0 in all other bit positions. The selections of interrupt period available. The Fixed-Inter- write-data to the Timer Status Register is not direct val Timer exception occurs on 0 to 1 transitions of a data, but a mask. A 1 causes the bit to be cleared, and selected bit from the Time Base (see Section 7.5). a 0 has no effect. The Fixed-Interval Timer exception is logged by TSR- The contents of the TSR are defined below: FIS. A Fixed-Interval Timer interrupt will occur if TCRFIE and MSREE are enabled. See Section 5.6.12 on Bit(s) Description page 680 for details of register behavior caused by the 32 Enable Next Watchdog Timer (ENW) (see Fixed-Interval Timer interrupt. Section 7.7 on page 701) Note that a Fixed-Interval Timer exception will also 0 Action on next Watchdog Timer time-out occur if the selected Time Base bit transitions from 0 to is to set TSRENW 1 due to an mtspr instruction that writes a 1 to the bit 1 Action on next Watchdog Timer time-out when its previous value was 0. is governed by TSRWIS 700 Power ISATM III-E Version 2.05 7.7 Watchdog Timer to an mtspr instruction that writes a 1 to the bit when its previous value was 0. The Watchdog Timer is a facility intended to aid system When a Watchdog Timer time-out occurs while recovery from faulty software or hardware. Watchdog TSRWIS = 1 and TSRENW = 1, a processor reset occurs time-outs occur on 0 to 1 transitions of selected bits if it is enabled by a non-zero value of the Watchdog from the Time Base (Section 7.5). Reset Control field in the Timer Control Register (TCR- When a Watchdog Timer time-out occurs while Watch- WRC). This is referred to as a Watchdog Timer Second dog Timer Interrupt Status is clear (TSRWIS = 0) and Time Out. The assumption is that TSRWIS was not the next Watchdog Time-out is enabled (TSRENW = 1), cleared because the processor was unable to execute a Watchdog Timer exception is generated and logged the Watchdog Timer interrupt handler, leaving reset as by setting TSRWIS to 1. This is referred to as a Watch- the only available means to restart the system. Note dog Timer First Time Out. A Watchdog Timer interrupt that once TCRWRC has been set to a non-zero value, it will occur if enabled by TCRWIE and MSRCE. See cannot be reset by software; this feature prevents Section 5.6.13 on page 680 for details of register errant software from disabling the Watchdog Timer behavior caused by the Watchdog Timer interrupt. The reset capability. purpose of the Watchdog Timer First time-out is to give A more complete view of Watchdog Timer behavior is an indication that there may be problem and give the afforded by Figure 23 and Table 24, which describe the system a chance to perform corrective action or cap- Watchdog Timer state machine and Watchdog Timer ture a failure before a reset occurs from the Watchdog controls. The numbers in parentheses in the figure refer Timer Second time-out as explained further below. to the discussion of modes of operation which follow Note that a Watchdog Timer exception will also occur if the table. the selected Time Base bit transitions from 0 to 1 due Time-out. No exception recorded in TSRWIS. Set TSRENW so next time-out will cause exception. (2) SW Loop TSRENW,WIS=0b00 TSRENW,WIS=0b10 (1) Watchdog Interrupt Time-out. WDT exception recorded in TSRWIS Handler WDT interrupt will occur if enabled by (3) SW Loop TCRWIE and MSRCE (2) Watchdog Interrupt Handler If TCRWRC00 then RESET, including Time-out TSRENW,WIS=0b01 TSRENW,WIS=0b11 TSRWRS TCRWRC TCRWRC 0b00 Time-out. Set TSRENW so next time-out will cause reset Figure 23. Watchdog State Machine Chapter 7. Timer Facilities 701 Version 2.05 Enable WDT Status Next WDT Action when timer interval expires (TSRWIS) (TSRENW) 0 0 Set Enable Next Watchdog Timer (TSRENW=1). 0 1 Set Enable Next Watchdog Timer (TSRENW=1). 1 0 Set Watchdog Timer interrupt status bit (TSRWIS=1). If Watchdog Timer interrupt is enabled (TCRWIE=1 and MSRCE=1), then interrupt. 1 1 Cause Watchdog Timer reset action specified by TCRWRC. Reset will copy pre-reset TCRWRC into TSRWRS, then clear TCRWRC. Figure 24. Watchdog Timer Controls The controls described in the above table imply three different modes of operation that a programmer might 7.8 Freezing the Timer Facilities select for the Watchdog Timer. Each of these modes The debug mechanism provides a means of tempo- assumes that TCRWRC has been set to allow processor rarily freezing the timers upon a debug event. Specifi- reset by the Watchdog facility: cally, the Time Base and Decrementer can be frozen 1. Always take the Watchdog Timer interrupt when and prevented from incrementing/decrementing, pending, and never attempt to prevent its occur- respectively, whenever a debug event is set in the rence. In this mode, the Watchdog Timer interrupt Debug Status Register. Note that this also freezes the caused by a first time-out is used to clear TSRWIS FIT and Watchdog timer. This allows a debugger to so a second time-out never occurs. TSRENW is not simulate the appearance of `real time', even though the cleared, thereby allowing the next time-out to application has been temporarily `halted' to service the cause another interrupt. debug event. See the description of bit 63 of the Debug Control Register 0 (Freeze Timers on Debug Event or 2. Always take the Watchdog Timer interrupt when DBCR0FT) in Section 8.5.1.1 on page 711. pending, but avoid when possible. In this mode a recurring code loop of reliable duration (or perhaps a periodic interrupt handler such as the Fixed- Interval Timer interrupt handler) is used to repeat- edly clear TSRENW such that a first time-out exception is avoided, and thus no Watchdog Timer interrupt occurs. Once TSRENW has been cleared, software has between one and two full Watchdog periods before a Watchdog exception will be posted in TSRWIS. If this occurs before the soft- ware is able to clear TSRENW again, a Watchdog Timer interrupt will occur. In this case, the Watch- dog Timer interrupt handler will then clear both TSRENW and TSRWIS, in order to (hopefully) avoid the next Watchdog Timer interrupt. 3. Never take the Watchdog Timer interrupt. In this mode, Watchdog Timer interrupts are disabled (via TCRWIE=0), and the system depends upon a recurring code loop of reliable duration (or perhaps a periodic interrupt handler such as the Fixed- Interval Timer interrupt handler) to repeatedly clear TSRWIS such that a second time-out is avoided, and thus no reset occurs. TSRENW is not cleared, thereby allowing the next time-out to set TSRWIS again. The recurring code loop must have a period which is less than one Watchdog Timer period in order to guarantee that a Watchdog Timer reset will not occur. 702 Power ISATM III-E Version 2.05 Chapter 8. Debug Facilities 8.1 Overview. . . . . . . . . . . . . . . . . . . . 703 8.4.10 Critical Interrupt Return Debug 8.2 Internal Debug Mode . . . . . . . . . . 703 Event [Category: Embedded.Enhanced 8.3 External Debug Mode [Category: Debug] . . . . . . . . . . . . . . . . . . . . . . . . . 711 Embedded.Enhanced Debug] . . . . . . . 704 8.5 Debug Registers . . . . . . . . . . . . . . 711 8.4 Debug Events . . . . . . . . . . . . . . . . 704 8.5.1 Debug Control Registers . . . . . . 711 8.4.1 Instruction Address Compare Debug 8.5.1.1 Debug Control Register 0 (DBCR0) Event . . . . . . . . . . . . . . . . . . . . . . . . . . 705 711 8.4.2 Data Address Compare Debug 8.5.1.2 Debug Control Register 1 (DBCR1) Event . . . . . . . . . . . . . . . . . . . . . . . . . . 707 712 8.4.3 Trap Debug Event . . . . . . . . . . . 708 8.5.1.3 Debug Control Register 2 (DBCR2) 8.4.4 Branch Taken Debug Event . . . 708 714 8.4.5 Instruction Complete Debug Event . 8.5.2 Debug Status Register . . . . . . . . 715 709 8.5.3 Instruction Address Compare Regis- 8.4.6 Interrupt Taken Debug Event . . 709 ters. . . . . . . . . . . . . . . . . . . . . . . . . . . . 716 8.4.6.1 Causes of Interrupt Taken Debug 8.5.4 Data Address Compare Registers . . Events . . . . . . . . . . . . . . . . . . . . . . . . . 709 716 8.4.6.2 Interrupt Taken Debug Event 8.5.5 Data Value Compare Registers . 717 Description . . . . . . . . . . . . . . . . . . . . . 709 8.6 Debugger Notify Halt Instruction 8.4.7 Return Debug Event . . . . . . . . . 710 [Category: Embedded.Enhanced Debug] . 8.4.8 Unconditional Debug Event . . . . 710 718 8.4.9 Critical Interrupt Taken Debug Event [Category: Embedded.Enhanced Debug] . 710 8.1 Overview In addition to the facilities described here, implementa- tions will typically include debug facilities, modes, and Processors provide debug facilities to enable hardware access mechanisms which are implementation-spe- and software debug functions, such as instruction and cific. For example, implementations will typically pro- data breakpoints and program single stepping. The vide access to the debug facilities via a dedicated debug facilities consist of a set of Debug Control Regis- interface such as the IEEE 1149.1 Test Access Port ters (DBCR0, DBCR1, and DBCR2) (see Section 8.5.1 (JTAG). on page 711), a set of Address and Data Value Com- pare Registers (IAC1, IAC2, IAC3, IAC4, DAC1, DAC2, DVC1, and DVC2), (see Section 8.4.3, Section 8.4.4, 8.2 Internal Debug Mode and Section 8.4.5), a Debug Status Register (DBSR) Debug events include such things as instruction and (see Section 8.5.2) for enabling and recording various data breakpoints. These debug events cause status kinds of debug events, and a special Debug interrupt bits to be set in the Debug Status Register. The exist- type built into the interrupt mechanism (see ence of a set bit in the Debug Status Register is consid- Section 5.6.16). The debug facilities also provide a ered a Debug exception. Debug exceptions, if enabled, mechanism for software-controlled processor reset, will cause Debug interrupts. and for controlling the operation of the timers in a debug environment. There are two different mechanisms that control whether Debug interrupts are enabled. The first is the The mfspr and mtspr instructions (see Section 3.4.1) MSRDE bit, and this bit must be set to 1 to enable provide access to the registers of the debug facilities. Chapter 8. Debug Facilities 703 Version 2.05 Debug interrupts. The second mechanism is an enable bit in the Debug Control Register 0 (DBCR0). This bit is 8.4 Debug Events the Internal Debug Mode bit (DBCR0IDM), and it must Debug events are used to cause Debug exceptions to also be set to 1 to enable Debug interrupts. be recorded in the Debug Status Register (see When DBCR0IDM=1, the processor is in Internal Debug Section 8.5.2). In order for a debug event to be enabled Mode. In this mode, debug events will (if also enabled to set a Debug Status Register bit and thereby cause a by MSRDE) cause Debug interrupts. Software at the Debug exception, the specific event type must be Debug interrupt vector location will thus be given con- enabled by a corresponding bit or bits in the Debug trol upon the occurrence of a debug event, and can Control Register DBCR0 (see Section 8.5.1.1), DBCR1 access (via the normal instructions) all architected pro- (see Section 8.5.1.2), or DBCR2 (see Section 8.5.1.3), cessor resources. In this fashion, debug monitor soft- in most cases; the Unconditional Debug Event (UDE) is ware can control the processor and gather status, and an exception to this rule. Once a Debug Status Regis- interact with debugging hardware connected to the pro- ter bit is set, if Debug interrupts are enabled by MSRDE, cessor. a Debug interrupt will be generated. When the processor is not in Internal Debug Mode Certain debug events are not allowed to occur when (DBCR0IDM=0), debug events may still occur and be MSRDE=0. In such situations, no Debug exception recorded in the Debug Status Register. These excep- occurs and thus no Debug Status Register bit is set. tions may be monitored via software by reading the Other debug events may cause Debug exceptions and Debug Status Register (using mfspr), or may eventu- set Debug Status Register bits regardless of the state ally cause a Debug interrupt if later enabled by setting of MSRDE. The associated Debug interrupts that result DBCR0IDM=1 (and MSRDE=1). Processor behavior from such Debug exceptions will be delayed until when debug events occur while DBCR0IDM=0 is imple- MSRDE=1, provided the exceptions have not been mentation-dependent. cleared from the Debug Status Register in the mean- time. Any time that a Debug Status Register bit is allowed to 8.3 External Debug Mode [Cate- be set while MSRDE=0, a special Debug Status Regis- gory: Embedded.Enhanced ter bit, Imprecise Debug Event (DBSRIDE), will also be set. DBSRIDE indicates that the associated Debug Debug] exception bit in the Debug Status Register was set while Debug interrupts were disabled via the MMSRDE The External Debug Mode is a mode in which facilities bit. Debug interrupt handler software can use this bit to external to the processor can access processor determine whether the address recorded in CSRR0/ resources and control execution. These facilities are DSRR0 [Category: Embedded.Enhanced Debug] defined as the external debug facilities and are not should be interpreted as the address associated with defined here, however some instructions and registers the instruction causing the Debug exception, or simply share internal and external debug roles and are briefly the address of the instruction after the one which set described as necessary. the MSRDE bit, thereby enabling the delayed Debug A dnh instruction is provided to stop instruction fetch- interrupt. ing and execution and allow the processor to be man- Debug interrupts are ordered with respect to other aged by an external debug facility. After the dnh interrupt types (see Section 7.8 on page 179). Debug instruction is executed, instructions are not fetched, exceptions are prioritized with respect to other excep- interrupts are not taken, and the processor does not tions (see Section 7.9 on page 183). execute instructions. There are eight types of debug events defined: 1. Instruction Address Compare debug events 2. Data Address Compare debug events 3. Trap debug events 4. Branch Taken debug events 5. Instruction Complete debug events 6. Interrupt Taken debug events 7. Return debug events 8. Unconditional debug events 704 Power ISATM III-E Version 2.05 Programming Note There are two classes of debug exception types: ysis it wants to, then clears all debug event enables in the DBCR except for the instruction Type 1: exception before instruction complete debug event enable. Type 2: exception after instruction 4. Software does an rfci or rfdi [Category: Embed- Almost all debug exceptions fall into the first type. That ded.Enhanced Debug]. is, they all take the interrupt upon encountering an 5. Hardware would execute and complete one instruction having the exception without updating any instruction (the branch taken in this case), and architectural state (other than DBSR, CSRR0/DSRR0 then take a Debug interrupt with CSRR0/DSRR0 [Category: Embedded.Enhanced Debug], CSRR1/ [Category: Embedded.Enhanced Debug] pointing DSRR1 [Category: Embedded.Enhanced Debug], to the target of the branch. MSR) for that instruction. 6. Software would see the instruction complete inter- The CSRR0/DSRR0 [Category: Embedded.Enhanced rupt type. It clears the instruction complete event Debug] for this type of exception points to the instruc- enable, then enables the branch taken interrupt tion that encountered the exception. This includes IAC, event again. DAC, branch taken, etc. 7. Software does an rfci or rfdi [Category: Embed- The only exception which fall into the second type is ded.Enhanced Debug]. the instruction complete debug exception. This excep- tion is taken upon completing and updating one instruc- 8. Hardware resumes on the target of the taken tion and then pointing CSRR0/DSRR0 [Category: branch and continues until another taken branch, Embedded.Enhanced Debug] to the next instruction to in which case we end up at step 2 again. execute. This, at first, seems like a double tax (i.e. 2 debug inter- To make forward progress for any Type 1 debug rupts for every instance of a Type 1 exception), but exception one does the following: there doesn't seem like any other clean way to make forward progress on Type 1 debug exceptions. The 1. Software sets up Type 1 exceptions (e.g. branch only other way to avoid the double tax is to have the taken debug exceptions) and then returns to nor- debug handler routine actually emulate the instruction mal program operation pointed to for the Type 1 exceptions, determine the 2. Hardware takes Debug interrupt upon the first next instruction that would have been executed by the branch taken Debug exception, pointing to the interrupted program flow and load the CSRR0/DSRR0 branch with CSRR0/DSRR0 [Category: Embed- [Category: Embedded.Enhanced Debug] with that ded.Enhanced Debug]. address and do an rfci/rfdi [Category: Embed- ded.Enhanced Debug]; this is probably not faster. 3. Software, in the debug handler, sees the branch taken exception type, does whatever logging/anal- 8.4.1 Instruction Address Com- DBCR1IAC2US specifies whether IAC2 debug events can occur in user mode or supervisor mode, or both. pare Debug Event DBCR1IAC3US specifies whether IAC3 debug events One or more Instruction Address Compare debug can occur in user mode or supervisor mode, or both. events (IAC1, IAC2, IAC3 or IAC4) occur if they are enabled and execution is attempted of an instruction at DBCR1IAC4US specifies whether IAC4 debug events an address that meets the criteria specified in the can occur in user mode or supervisor mode, or both. DBCR0, DBCR1, IAC1, IAC2, IAC3, and IAC4 Regis- ters. Effective/Real Address Mode DBCR1IAC1ER specifies whether effective addresses, Instruction Address Compare User/ real addresses, effective addresses and MSRIS=0, or Supervisor Mode effective addresses and MSRIS=1 are used in deter- mining an address match on IAC1 debug events. DBCR1IAC1US specifies whether IAC1 debug events can occur in user mode or supervisor mode, or both. DBCR1IAC2ER specifies whether effective addresses, real addresses, effective addresses and MSRIS=0, or Chapter 8. Debug Facilities 705 Version 2.05 effective addresses and MSRIS=1 are used in deter- address of the instruction fetch is greater than mining an address match on IAC2 debug events. or equal to the contents of the IAC1 and less than the contents of the IAC2, an instruction DBCR1IAC3ER specifies whether effective addresses, address match occurs. real addresses, effective addresses and MSRIS=0, or effective addresses and MSRIS=1 are used in deter- For IAC3 and IAC4 debug events, if the 64-bit mining an address match on IAC3 debug events. address of the instruction fetch is greater than or equal to the contents of the IAC3 and less DBCR1IAC4ER specifies whether effective addresses, than the contents of the IAC4, an instruction real addresses, effective addresses and MSRIS=0, or address match occurs. effective addresses and MSRIS=1 are used in deter- mining an address match on IAC4 debug events. - For 64-bit implementations, the addresses are masked to compare only bits 32:63 when the processor is executing in 32-bit mode. Instruction Address Compare Mode - Exclusive address range compare mode DBCR1IAC12M specifies whether all or some of the bits For IAC1 and IAC2 debug events, if the 64-bit of the address of the instruction fetch must match the address of the instruction fetch is less than the contents of the IAC1 or IAC2, whether the address contents of the IAC1 or greater than or equal must be inside a specific range specified by the IAC1 to the contents of the IAC2, an instruction and IAC2 or outside a specific range specified by the address match occurs. IAC1 and IAC2 for an IAC1 or IAC2 debug event to occur. For IAC3 and IAC4 debug events, if the 64-bit address of the instruction fetch is less than the DBCR1IAC34M specifies whether all or some of the bits contents of the IAC3 or greater than or equal of the address of the instruction fetch must match the to the contents of the IAC4, an instruction contents of the IAC3 Register or IAC4 Register, address match occurs. whether the address must be inside a specific range specified by the IAC3 Register and IAC4 Register or For 64-bit implementations, the addresses are outside a specific range specified by the IAC3 Register masked to compare only bits 32:63 when the and IAC4 Register for an IAC3 or IAC4 debug event to processor is executing in 32-bit mode. occur. See the detailed description of DBCR0 (see There are four instruction address compare modes. Section 8.5.1.1, "Debug Control Register 0 (DBCR0)" on page 711) and DBCR1 (see Section 8.5.1.2, "Debug There are four instruction address compare modes. Control Register 1 (DBCR1)" on page 712) and the - Exact address compare mode modes for detecting IAC1, IAC2, IAC3 and IAC4 debug If the address of the instruction fetch is equal events. Instruction Address Compare debug events to the value in the enabled IAC Register, an can occur regardless of the setting of MSRDE or instruction address match occurs. For 64-bit DBCR0IDM. implementations, the addresses are masked When an Instruction Address Compare debug event to compare only bits 32:63 when the proces- occurs, the corresponding DBSRIAC1, DBSRIAC2, sor is executing in 32-bit mode. DBSRIAC3, or DBSRIAC4 bit or bits are set to record the - Address bit match mode debug exception. If MSRDE=0, DBSRIDE is also set to 1 For IAC1 and IAC2 debug events, if the to record the imprecise debug event. address of the instruction fetch access, If MSRDE=1 (i.e. Debug interrupts are enabled) at the ANDed with the contents of the IAC2, are time of the Instruction Address Compare debug excep- equal to the contents of the IAC1, also ANDed tion, a Debug interrupt will occur immediately (provided with the contents of the IAC2, an instruction there exists no higher priority exception which is address match occurs. enabled to cause an interrupt). The execution of the For IAC3 and IAC4 debug events, if the instruction causing the exception will be suppressed, address of the instruction fetch, ANDed with and CSRR0/DSRR0 [Category: Embedded.Enhanced the contents of the IAC4, are equal to the con- Debug] will be set to the address of the excepting tents of the IAC3, also ANDed with the con- instruction. tents of the IAC4, an instruction address If MSRDE=0 (i.e. Debug interrupts are disabled) at the match occurs. time of the Instruction Address Compare debug excep- For 64-bit implementations, the addresses are tion, a Debug interrupt will not occur, and the instruc- masked to compare only bits 32:63 when the tion will complete execution (provided the instruction is processor is executing in 32-bit mode. not causing some other exception which will generate - Inclusive address range compare mode an enabled interrupt). For IAC1 and IAC2 debug events, if the 64-bit 706 Power ISATM III-E Version 2.05 Later, if the debug exception has not been reset by with respect to debug events. Note that dcbf, clearing DBSRIAC1, DBSRIAC2, DBSRIAC3, and dcbfep, dcbst, and dcbstep are considered DBSRIAC4, and MSRDE is set to 1, a delayed Debug reads with respect to Data Storage excep- interrupt will occur. In this case, CSRR0/DSRR0 [Cate- tions, since they do not actually change the gory: Embedded.Enhanced Debug will contain the data at a given address. However, since the address of the instruction after the one which enabled execution of these instructions may result in the Debug interrupt by setting MSRDE to 1. Software in write activity on the processor's data bus, they the Debug interrupt handler can observe DBSRIDE to are treated as writes with respect to debug determine how to interpret the value in CSRR0/DSRR0 events. [Category: Embedded.Enhanced Debug. Data Address Compare User/Supervi- 8.4.2 Data Address Compare sor Mode Debug Event DBCR2DAC1US specifies whether DAC1R and DAC1W debug events can occur in user mode or One or more Data Address Compare debug events supervisor mode, or both. (DAC1R, DAC1W, DAC2R, DAC2W) occur if they are enabled, execution is attempted of a data storage DBCR2DAC2US specifies whether DAC2R and access instruction, and the type, address, and possibly DAC2W debug events can occur in user mode or even the data value of the data storage access meet supervisor mode, or both. the criteria specified in the Debug Control Register 0, Debug Control Register 2, and the DAC1, DAC2, Effective/Real Address Mode DVC1, and DVC2 Registers. DBCR2DAC1ER specifies whether effective addresses, real addresses, effective addresses Data Address Compare Read/Write and MSRDS=0, or effective addresses and Enable MSRDS=1 are used to in determining an address match on DAC1R and DAC1W debug events. DBCR0DAC1 specifies whether DAC1R debug events can occur on read-type data storage accesses and DBCR2DAC2ER specifies whether effective whether DAC1W debug events can occur on write-type addresses, real addresses, effective addresses data storage accesses. and MSRDS=0, or effective addresses and MSRDS=1 are used to in determining an address DBCR0DAC2 specifies whether DAC2R debug events match on DAC2R and DAC2W debug events. can occur on read-type data storage accesses and whether DAC2W debug events can occur on write-type data storage accesses. Data Address Compare Mode Indexed-string instructions (lswx, stswx) for which the DBCR2DAC12M specifies whether all or some of the XER field specifies zero bytes as the length of the bits of the address of the data storage access must string are treated as no-ops, and are not allowed to match the contents of the DAC1 or DAC2, whether cause Data Address Compare debug events. the address must be inside a specific range speci- fied by the DAC1 and DAC2 or outside a specific All Load instructions are considered reads with respect range specified by the DAC1 and DAC2 for a to debug events, while all Store instructions are consid- DAC1R, DAC1W, DAC2R or DAC2W debug event ered writes with respect to debug events. In addition, to occur. the Cache Management instructions, and certain spe- cial cases, are handled as follows. There are four data address compare modes. - dcbt, dcbtls, dcbtep, dcbtst, dcbtstls, dcbt- - Exact address compare mode step, icbt, icbtls, icbtep, icbi, icblc, dcblc, If the 64-bit address of the data storage and icbiep are all considered reads with access is equal to the value in the enabled respect to debug events. Note that dcbt, Data Address Compare Register, a data dcbtep, dcbtst, dcbtstep, icbt, and icbtep address match occurs. are treated as no-operations when they report Data Storage or Data TLB Miss exceptions, For 64-bit implementations, the addresses instead of being allowed to cause interrupts. are masked to compare only bits 32:63 when However, these instructions are allowed to the processor is executing in 32-bit mode. cause Debug interrupts, even when they - Address bit match mode would otherwise have been no-op'ed due to a If the address of the data storage access, Data Storage or Data TLB Miss exception. ANDed with the contents of the DAC2, are - dcbz, dcbzep, dcbi, dcbf, dcbfep, dcba, equal to the contents of the DAC1, also dcbst, and dcbstep are all considered writes ANDed with the contents of the DAC2, a data Chapter 8. Debug Facilities 707 Version 2.05 address match occurs. If MSRDE=0 (i.e. Debug interrupts are disabled) at the time of the Data Address Compare debug exception, a For 64-bit implementations, the addresses are Debug interrupt will not occur, and the instruction will masked to compare only bits 32:63 when the complete execution (provided the instruction is not processor is executing in 32-bit mode. causing some other exception which will generate an enabled interrupt). Also, DBSRIDE is set to indicate that - Inclusive address range compare mode the debug exception occurred while Debug interrupts If the 64-bit address of the data storage were disabled by MSRDE=0. access is greater than or equal to the contents of the DAC1 and less than the contents of the Later, if the debug exception has not been reset by DAC2, a data address match occurs. clearing DBSRDAC1R, DBSRDAC1W, DBSRDAC2R, DBSRDAC2W, and MSRDE is set to 1, a delayed Debug For 64-bit implementations, the addresses are interrupt will occur. In this case, CSRR0/DSRR0 [Cate- masked to compare only bits 32:63 when the gory: Embedded.Enhanced Debug will contain the processor is executing in 32-bit mode. address of the instruction after the one which enabled - Exclusive address range compare mode the Debug interrupt by setting MSRDE to 1. Software in If the 64-bit address of the data storage the Debug interrupt handler can observe DBSRIDE to access is less than the contents of the DAC1 determine how to interpret the value in CSRR0/DSRR0 or greater than or equal to the contents of the [Category: Embedded.Enhanced Debug. DAC2, a data address match occurs. 8.4.3 Trap Debug Event For 64-bit implementations, the addresses are masked to compare only bits 32:63 when the A Trap debug event (TRAP) occurs if DBCR0TRAP=1 processor is executing in 32-bit mode. (i.e. Trap debug events are enabled) and a Trap instruction (tw, twi, td, tdi) is executed and the condi- Data Value Compare Mode tions specified by the instruction for the trap are met. The event can occur regardless of the setting of DBCR2DVC1M and DBCR2DVC1BE specify whether MSRDE or DBCR0IDM. and how the data value being accessed by the storage access must match the contents of the When a Trap debug event occurs, DBSRTR is set to 1 DVC1 for a DAC1R or DAC1W debug event to to record the debug exception. If MSRDE=0, DBSRIDE occur. is also set to 1 to record the imprecise debug event. DBCR2DVC2M and DBCR2DVC2BE specify whether If MSRDE=1 (i.e. Debug interrupts are enabled) at the and how the data value being accessed by the time of the Trap debug exception, a Debug interrupt will storage access must match the contents of the occur immediately (provided there exists no higher pri- DVC2 for a DAC2R or DAC2W debug event to ority exception which is enabled to cause an interrupt), occur. and CSRR0/DSRR0 [Category: Embedded.Enhanced Debug] will be set to the address of the excepting The description of DBCR0 (see Section 8.5.1.1) and instruction. DBCR2 (see Section 8.5.1.3) and the modes for detect- ing Data Address Compare debug events. Data If MSRDE=0 (i.e. Debug interrupts are disabled) at the Address Compare debug events can occur regardless time of the Trap debug exception, a Debug interrupt will of the setting of MSRDE or DBCR0IDM. not occur, and a Trap exception type Program interrupt will occur instead if the trap condition is met. When an Data Address Compare debug event occurs, the corresponding DBSRDAC1R, DBSRDAC1W, Later, if the debug exception has not been reset by DBSRDAC2R, or DBSRDAC2W bit or bits are set to 1 to clearing DBSRTR, and MSRDE is set to 1, a delayed record the debug exception. If MSRDE=0, DBSRIDE is Debug interrupt will occur. In this case, CSRR0/DSRR0 also set to 1 to record the imprecise debug event. [Category: Embedded.Enhanced Debug will contain the address of the instruction after the one which If MSRDE=1 (i.e. Debug interrupts are enabled) at the enabled the Debug interrupt by setting MSRDE to 1. time of the Data Address Compare debug exception, a Software in the debug interrupt handler can observe Debug interrupt will occur immediately (provided there DBSRIDE to determine how to interpret the value in exists no higher priority exception which is enabled to CSRR0/DSRR0 [Category: Embedded.Enhanced cause an interrupt), the execution of the instruction Debug]. causing the exception will be suppressed, and CSRR0/ DSRR0 [Category: Embedded.Enhanced Debug will be set to the address of the excepting instruction. Depend- 8.4.4 Branch Taken Debug Event ing on the type of instruction and/or the alignment of A Branch Taken debug event (BRT) occurs if the data access, the instruction causing the exception DBCR0BRT=1 (i.e. Branch Taken Debug events are may have been partially executed (see Section 5.7). enabled), execution is attempted of a branch instruction 708 Power ISATM III-E Version 2.05 whose direction will be taken (that is, either an uncondi- 8.4.6 Interrupt Taken Debug Event tional branch, or a conditional branch whose branch condition is met), and MSRDE=1. 8.4.6.1 Causes of Interrupt Taken Branch Taken debug events are not recognized if MSRDE=0 at the time of the execution of the branch Debug Events instruction and thus DBSRIDE can not be set by a Only base class interrupts can cause an Interrupt Branch Taken debug event. This is because branch Taken debug event. If the Embedded.Enhanced Debug instructions occur very frequently. Allowing these com- category is not supported or is supported and not mon events to be recorded as exceptions in the DBSR enabled, all other interrupts automatically clear while debug interrupts are disabled via MSRDE would MSRDE, and thus would always prevent the associated result in an inordinate number of imprecise Debug Debug interrupt from occurring precisely. If the Embed- interrupts. ded.Enhanced Debug category is supported and When a Branch Taken debug event occurs, the DBSR- enabled, then critical class interrupts do not automati- cally clear MSRDE, but they cause Critical Interrupt BRT bit is set to 1 to record the debug exception and a Debug interrupt will occur immediately (provided there Taken debug events instead of Interrupt Taken debug exists no higher priority exception which is enabled to events. cause an interrupt). The execution of the instruction Also, if the Embedded.Enhanced Debug category is not causing the exception will be suppressed, and CSRR0/ supported or is supported and not enabled, Debug DSRR0 [Category: Embedded.Enhanced Debug] will interrupts themselves are critical class interrupts, and be set to the address of the excepting instruction. thus any Debug interrupt (for any other debug event) would always end up setting the additional exception of 8.4.5 Instruction Complete Debug DBSRIRPT upon entry to the Debug interrupt handler. At this point, the Debug interrupt handler would be Event unable to determine whether or not the Interrupt Taken debug event was related to the original debug event. An Instruction Complete debug event (ICMP) occurs if DBCR0ICMP=1 (i.e. Instruction Complete debug events are enabled), execution of any instruction is completed, 8.4.6.2 Interrupt Taken Debug Event and MSRDE=1. Note that if execution of an instruction Description is suppressed due to the instruction causing some other exception which is enabled to generate an inter- An Interrupt Taken debug event (IRPT) occurs if rupt, then the attempted execution of that instruction DBCR0IRPT=1 (i.e. Interrupt Taken debug events are does not cause an Instruction Complete debug event. enabled) and a base class interrupt occurs. Interrupt The sc instruction does not fall into the type of an Taken debug events can occur regardless of the set- instruction whose execution is suppressed, since the ting of MSRDE. instruction actually completes execution and then gen- When an Interrupt Taken debug event occurs, DBSR- erates a System Call interrupt. In this case, the Instruc- IRPT is set to 1 to record the debug exception. If tion Complete debug exception will also be set. MSRDE=0, DBSRIDE is also set to 1 to record the Instruction Complete debug events are not recognized imprecise debug event. if MSRDE=0 at the time of the execution of the instruc- If MSRDE=1 (i.e. Debug interrupts are enabled) at the tion, DBSRIDE can not be set by an ICMP debug event. time of the Interrupt Taken debug event, a Debug inter- This is because allowing the common event of Instruc- rupt will occur immediately (provided there exists no tion Completion to be recorded as an exception in the higher priority exception which is enabled to cause an DBSR while Debug interrupts are disabled via MSRDE interrupt), and Critical Save/Restore Register 0/Debug would mean that the Debug interrupt handler software Save/Restore Register 0 [Category: Embed- would receive an inordinate number of imprecise ded.Enhanced Debug] will be set to the address of the Debug interrupts every time Debug interrupts were re- interrupt vector which caused the Interrupt Taken enabled via MSRDE. debug event. No instructions at the base interrupt han- When an Instruction Complete debug event occurs, dler will have been executed. DBSRICMP is set to 1 to record the debug exception, a If MSRDE=0 (i.e. Debug interrupts are disabled) at the Debug interrupt will occur immediately (provided there time of the Interrupt Taken debug event, a Debug inter- exists no higher priority exception which is enabled to rupt will not occur, and the handler for the interrupt cause an interrupt), and CSRR0/DSRR0 [Category: which caused the Interrupt Taken debug event will be Embedded.Enhanced Debug] will be set to the address allowed to execute. of the instruction after the one causing the Instruction Complete debug exception. Later, if the debug exception has not been reset by clearing DBSRIRPT, and MSRDE is set to 1, a delayed Debug interrupt will occur. In this case, CSRR0/DSRR0 Chapter 8. Debug Facilities 709 Version 2.05 [Category: Embedded.Enhanced Debug] will contain ded.Enhanced Debug] will be set to the address of the the address of the instruction after the one which instruction which would have executed next had the enabled the Debug interrupt by setting MSRDE to 1. interrupt not occurred. Software in the Debug interrupt handler can observe If MSRDE=0 (i.e. Debug interrupts are disabled) at the the DBSRIDE bit to determine how to interpret the value time of the Unconditional Debug exception, a Debug in CSRR0/DSRR0 [Category: Embedded.Enhanced interrupt will not occur. Debug. Later, if the Unconditional Debug exception has not been reset by clearing DBSRUDE, and MSRDE is set to 8.4.7 Return Debug Event 1, a delayed Debug interrupt will occur. In this case, A Return debug event (RET) occurs if DBCR0RET=1 CSRR0/DSRR0 [Category: Embedded.Enhanced and an attempt is made to execute an rfi. Return debug Debug] will contain the address of the instruction after events can occur regardless of the setting of MSRDE. the one which enabled the Debug interrupt by setting MSRDE to 1. Software in the Debug interrupt handler When a Return debug event occurs, DBSRRET is set to can observe DBSRIDE to determine how to interpret the 1 to record the debug exception. If MSRDE=0, DBSRIDE value in CSRR0/DSRR0 [Category: Embed- is also set to 1 to record the imprecise debug event. ded.Enhanced Debug]. If MSRDE=1 at the time of the Return Debug event, a Debug interrupt will occur immediately, and CSRR0/ 8.4.9 Critical Interrupt Taken DSRR0 [Category: Embedded.Enhanced Debug will be set to the address of the rfi. Debug Event [Category: Embed- If MSRDE=0 at the time of the Return Debug event, a ded.Enhanced Debug] Debug interrupt will not occur. A Critical Interrupt Taken debug event (CIRPT) occurs Later, if the Debug exception has not been reset by if DBCR0CIRPT = 1 (i.e. Critical Interrupt Taken debug clearing DBSRRET, and MSRDE is set to 1, a delayed events are enabled) and a critical interrupt occurs. A imprecise Debug interrupt will occur. In this case, critical interrupt is any interrupt that saves state in CSRR0/DSRR0 [Category: Embedded.Enhanced CSRR0 and CSRR1 when the interrupt is taken. Criti- Debug will contain the address of the instruction after cal Interrupt Taken debug events can occur regardless the one which enabled the Debug interrupt by setting of the setting of MSRDE. MSRDE to 1. An imprecise Debug interrupt can be When a Critical Interrupt Taken debug event occurs, caused by executing an rfi when DBCR0RET=1 and DBSRCIRPT is set to 1 to record the debug event. If MSRDE=0, and the execution of that rfi happens to MSRDE=0, DBSRIDE is also set to 1 to record the cause MSRDE to be set to 1. Software in the Debug imprecise debug event. interrupt handler can observe the DBSRIDE bit to deter- mine how to interpret the value in CSRR0/DSRR0 [Cat- If MSRDE = 1 (i.e. Debug Interrupts are enabled) at the egory: Embedded.Enhanced Debug]. time of the Critical Interrupt Taken debug event, a Debug Interrupt will occur immediately (provided there is no higher priority exception which is enabled to 8.4.8 Unconditional Debug Event cause an interrupt), and DSRR0 will be set to the An Unconditional debug event (UDE) occurs when the address of the first instruction of the critical interrupt Unconditional Debug Event (UDE) signal is activated handler. No instructions at the critical interrupt handler by the debug mechanism. The exact definition of the will have been executed. UDE signal and how it is activated is implementation- If MSRDE = 0 (i.e. Debug Interrupts are disabled) at the dependent. The Unconditional debug event is the only time of the Critical Interrupt Taken debug event, a debug event which does not have a corresponding Debug Interrupt will not occur, and the handler for the enable bit for the event in DBCR0 (hence the name of critical interrupt which caused the debug event will be the event). The Unconditional debug event can occur allowed to execute normally. Later, if the debug excep- regardless of the setting of MSRDE. tion has not been reset by clearing DBSRCIRPT and When an Unconditional debug event occurs, the MSRDE is set to 1, a delayed Debug Interrupt will DBSRUDE bit is set to 1 to record the Debug exception. occur. In this case DSRR0 will contain the address of If MSRDE=0, DBSRIDE is also set to 1 to record the the instruction after the one that set MSRDE = 1. Soft- imprecise debug event. ware in the Debug Interrupt handler can observe DBSRIDE to determine how to interpret the value in If MSRDE=1 (i.e. Debug interrupts are enabled) at the DSRR0. time of the Unconditional Debug exception, a Debug interrupt will occur immediately (provided there exists no higher priority exception which is enabled to cause an interrupt), and CSRR0/DSRR0 [Category: Embed- 710 Power ISATM III-E Version 2.05 8.4.10 Critical Interrupt Return 8.5.1.1 Debug Control Register 0 Debug Event [Category: Embed- (DBCR0) ded.Enhanced Debug] The contents of the DBCR0 can be read into bits 32:63 of register RT using mfspr RT,DBCR0, setting bits 0:31 A Critical Interrupt Return debug event (CRET) occurs of RT to 0. The contents of bits 32:63 of register RS if DBCR0CRET = 1 (i.e. Critical Interrupt Return debug can be written to the DBCR0 using mtspr DBCR0,RS. events are enabled) and an attempt is made to execute The bit definitions for DBCR0 are shown below. an rfci instruction. Critical Interrupt Return debug events can occur regardless of the setting of MSRDE. Bit(s) Description When a Critical Interrupt Return debug event occurs, 32 External Debug Mode (EDM) [Category: DBSRCRET is set to 1 to record the debug event. If Embedded.Enhanced Debug] MSRDE=0, DBSRIDE is also set to 1 to record the The EDM bit is a read-only bit that reflects imprecise debug event. whether the processor is controlled by an external debug facility. When EDM is set, If MSRDE = 1 (i.e. Debug Interrupts are enabled) at the internal debug mode is suppressed and the time of the Critical Interrupt Return debug event, a taking of debug interrupts does not occur. Debug Interrupt will occur immediately (provided there is no higher priority exception which is enabled to 0 The processor is not in external debug cause an interrupt), and DSRR0 will be set to the mode. address of the rfci instruction. 1 The processor is in external debug mode. If MSRDE = 0 (i.e. Debug Interrupts are disabled) at the 33 Internal Debug Mode (IDM) time of the Critical Interrupt Return debug event, a 0 Debug interrupts are disabled. Debug Interrupt will not occur. Later, if the debug 1 If MSRDE=1, then the occurrence of a exception has not been reset by clearing DBSRCRET debug event or the recording of an earlier and MSRDE is set to 1, a delayed Debug Interrupt will debug event in the Debug Status Register occur. In this case DSRR0 will contain the address of when MSRDE=0 or DBCR0IDM=0 will the instruction after the one that set MSRDE = 1. An cause a Debug interrupt. imprecise Debug Interrupt can be caused by executing an rfci when DBCR0CRET = 1 and MSRDE = 0, and the 34:35 Reset (RST) execution of the rfci happens to cause MSRDE to be 00 No action set to 1. Software in the Debug Interrupt handler can 01 Implementation-specific observe DBSRIDE to determine how to interpret the 10 Implementation-specific value in DSRR0. 11 Implementation-specific Warning: Writing 0b01, 0b10, or 0b11 to 8.5 Debug Registers these bits may cause a processor reset to occur. This section describes debug-related registers that are 36 Instruction Completion Debug Event accessible to software running on the processor. These (ICMP) registers are intended for use by special debug tools and debug software, and not by general application or 0 ICMP debug events are disabled operating system code. 1 ICMP debug events are enabled Note: Instruction Completion will not cause an ICMP debug event if MSRDE=0. 8.5.1 Debug Control Registers Debug Control Register 0 (DBCR0), Debug Control Register 1 (DBCR1), and Debug Control Register 2 37 Branch Taken Debug Event Enable (BRT) (DBCR2) are each 32-bit registers. Bits of DBCR0, 0 BRT debug events are disabled DBCR1, and DBCR2 are numbered 32 (most-signifi- 1 BRT debug events are enabled cant bit) to 63 (least-significant bit). DBCR0, DBCR1, and DBCR2 are used to enable debug events, reset the Note: Taken branches will not cause a BRT processor, control timer operation during debug events, debug event if MSRDE=0. and set the debug mode of the processor. 38 Interrupt Taken Debug Event Enable (IRPT) 0 IRPT debug events are disabled 1 IRPT debug events are enabled Note: Critical interrupts will not cause an IRPT Debug event even if MSRDE=0. If the Embed- Chapter 8. Debug Facilities 711 Version 2.05 ded.Enhanced Debug category is supported, Debug] see Section 8.4.9. A Critical Interrupt Taken Debug Event occurs when DBCR0CIRPT = 1 and a critical interrupt 39 Trap Debug Event Enable (TRAP) (any interrupt that uses the critical class, i.e. 0 TRAP debug events cannot occur uses CSRR0 and CSRR1) occurs. 1 TRAP debug events can occur 0 Critical interrupt taken debug events are 40 Instruction Address Compare 1 Debug disabled. Event Enable (IAC1) 1 Critical interrupt taken debug events are 0 IAC1 debug events cannot occur enabled. 1 IAC1 debug events can occur 58 Critical Interrupt Return Debug Event 41 Instruction Address Compare 2 Debug (CRET) [Category: Embedded.Enhanced Event Enable (IAC2) Debug] A Critical Interrupt Return Debug Event 0 IAC2 debug events cannot occur occurs when DBCR0CRET= 1 and a return 1 IAC2 debug events can occur from critical interrupt (an rfci instruction is 42 Instruction Address Compare 3 Debug executed) occurs. Event Enable (IAC3) 0 Critical interrupt return debug events are 0 IAC3 debug events cannot occur disabled. 1 IAC3 debug events can occur 1 Critical interrupt return debug events are enabled. 43 Instruction Address Compare 4 Debug Event Enable (IAC4) 59:62 Implementation-dependent 0 IAC4 debug events cannot occur 63 Freeze Timers on Debug Event (FT) 1 IAC4 debug events can occur 0 Enable clocking of timers 44:45 Data Address Compare 1 Debug Event 1 Disable clocking of timers if any DBSR bit Enable (DAC1) is set (except MRR) 00 DAC1 debug events cannot occur 01 DAC1 debug events can occur only if a 8.5.1.2 Debug Control Register 1 store-type data storage access (DBCR1) 10 DAC1 debug events can occur only if a The contents of the DBCR1 can be read into bits 32:63 load-type data storage access a register RT using mfspr RT,DBCR1, setting bits 0:31 11 DAC1 debug events can occur on any of RT to 0. The contents of bits 32:63 of register RS data storage access can be written to the DBCR1 using mtspr DBCR1,RS. 46:47 Data Address Compare 2 Debug Event The bit definitions for DBCR1 are shown below. Enable (DAC2) Bit(s) Description 00 DAC2 debug events cannot occur 01 DAC2 debug events can occur only if a 32:33 Instruction Address Compare 1 User/ store-type data storage access Supervisor Mode(IAC1US) 10 DAC2 debug events can occur only if a 00 IAC1 debug events can occur load-type data storage access 01 Reserved 11 DAC2 debug events can occur on any 10 IAC1 debug events can occur only if data storage access MSRPR=0 11 IAC1 debug events can occur only if MSRPR=1 48 Return Debug Event Enable (RET) 34:35 Instruction Address Compare 1 Effective/ 0 RET debug events cannot occur Real Mode (IAC1ER) 1 RET debug events can occur 00 IAC1 debug events are based on effective Note: Return From Critical Interrupt will not addresses cause an RET debug event if MSRDE=0. If the 01 IAC1 debug events are based on real Embedded.Enhanced Debug category is sup- addresses ported, see Section 8.4.10 10 IAC1 debug events are based on effective addresses and can occur only if MSRIS=0 49:56 Reserved 11 IAC1 debug events are based on effective 57 Critical Interrupt Taken Debug Event addresses and can occur only if MSRIS=1 (CIRPT) [Category: Embedded.Enhanced 712 Power ISATM III-E Version 2.05 36:37 Instruction Address Compare 2 User/ If IAC1USAC2US or IAC1ERIAC2ER, Supervisor Mode (IAC2US) results are boundedly undefined. 00 IAC2 debug events can occur 42:47 Reserved 01 Reserved 48:49 Instruction Address Compare 3 User/ 10 IAC2 debug events can occur only if Supervisor Mode (IAC3US) MSRPR=0 11 IAC2 debug events can occur only if 00 IAC3 debug events can occur MSRPR=1 01 Reserved 10 IAC3 debug events can occur only if 38:39 Instruction Address Compare 2 Effective/ MSRPR=0 Real Mode (IAC2ER) 11 IAC3 debug events can occur only if 00 IAC2 debug events are based on effective MSRPR=1 addresses 50:51 Instruction Address Compare 3 Effective/ 01 IAC2 debug events are based on real Real Mode (IAC3ER) addresses 10 IAC2 debug events are based on effective 00 IAC3 debug events are based on effective addresses and can occur only if MSRIS=0 addresses 11 IAC2 debug events are based on effective 01 IAC3 debug events are based on real addresses and can occur only if MSRIS=1 addresses 10 IAC3 debug events are based on effective 40:41 Instruction Address Compare 1/2 Mode addresses and can occur only if MSRIS=0 (IAC12M) 11 IAC3 debug events are based on effective 00 Exact address compare addresses and can occur only if MSRIS=1 IAC1 debug events can occur only if the 52:53 Instruction Address Compare 4 User/ address of the instruction fetch is equal to Supervisor Mode (IAC4US) the value specified in IAC1. 00 IAC4 debug events can occur IAC2 debug events can occur only if the 01 Reserved address of the instruction fetch is equal to 10 IAC4 debug events can occur only if the value specified in IAC2. MSRPR=0 11 IAC4 debug events can occur only if MSRPR=1 01 Address bit match 54:55 Instruction Address Compare 4 Effective/ IAC1 and IAC2 debug events can occur Real Mode (IAC4ER) only if the address of the instruction fetch, ANDed with the contents of IAC2 are equal 00 IAC4 debug events are based on effective to the contents of IAC1, also ANDed with addresses the contents of IAC2. 01 IAC4 debug events are based on real addresses If IAC1USIAC2US or IAC1ERIAC2ER, 10 IAC4 debug events are based on effective results are boundedly undefined. addresses and can occur only if MSRIS=0 11 IAC4 debug events are based on effective 10 Inclusive address range compare addresses and can occur only if MSRIS=1 IAC1 and IAC2 debug events can occur 56:57 Instruction Address Compare 3/4 Mode only if the address of the instruction fetch is (IAC34M) greater than or equal to the value specified 00 Exact address compare in IAC1 and less than the value specified in IAC2. IAC3 debug events can occur only if the address of the instruction fetch is equal to If IAC1USIAC2US or IAC1ERIAC2ER, the value specified in IAC3. results are boundedly undefined. IAC4 debug events can occur only if the address of the instruction fetch is equal to 11 Exclusive address range compare the value specified in IAC4. IAC1 and IAC2 debug events can occur only if the address of the instruction fetch is 01 Address bit match less than the value specified in IAC1 or is greater than or equal to the value specified IAC3 and IAC4 debug events can occur in IAC2. only if the address of the data storage access, ANDed with the contents of IAC4 Chapter 8. Debug Facilities 713 Version 2.05 are equal to the contents of IAC3, also 11 DAC1 debug events are based on effec- ANDed with the contents of IAC4. tive addresses and can occur only if MSRDS=1 If IAC3USIAC4US or IAC3ERIAC4ER, results are boundedly undefined. 36:37 Data Address Compare 2 User/Supervisor Mode (DAC2US) 10 Inclusive address range compare 00 DAC2 debug events can occur 01 Reserved IAC3 and IAC4 debug events can occur 10 DAC2 debug events can occur only if only if the address of the instruction fetch is MSRPR=0 greater than or equal to the value specified 11 DAC2 debug events can occur only if in IAC3 and less than the value specified in MSRPR=1 IAC4. 38:39 Data Address Compare 2 Effective/Real If IAC3USIAC4US or IAC3ERIAC4ER, Mode (DAC2ER) results are boundedly undefined. 00 DAC2 debug events are based on effec- 11 Exclusive address range compare tive addresses 01 DAC2 debug events are based on real IAC3 and IAC4 debug events can occur addresses only if the address of the instruction fetch is 10 DAC2 debug events are based on effec- less than the value specified in IAC3 or is tive addresses and can occur only if greater than or equal to the value specified MSRDS=0 in IAC4. 11 DAC2 debug events are based on effec- If IAC3USIAC4US or IAC3ERIAC4ER, tive addresses and can occur only if results are boundedly undefined. MSRDS=1 58:63 Reserved 40:41 Data Address Compare 1/2 Mode (DAC12M) 8.5.1.3 Debug Control Register 2 00 Exact address compare (DBCR2) DAC1 debug events can occur only if the address of the data storage access is equal The contents of the DBCR2 can be copied into bits to the value specified in DAC1. 32:63 register RT using mfspr RT,DBCR2, setting bits 0:31 of register RT to 0. The contents of bits 32:63 of a DAC2 debug events can occur only if the register RS can be written to the DBCR2 using address of the data storage access is equal mtspr DBCR2,RS. The bit definitions for DBCR2 are to the value specified in DAC2. shown below. 01 Address bit match Bit(s) Description DAC1 and DAC2 debug events can occur 32:33 Data Address Compare 1 User/Supervisor only if the address of the data storage Mode (DAC1US) access, ANDed with the contents of DAC2 00 DAC1 debug events can occur are equal to the contents of DAC1, also 01 Reserved ANDed with the contents of DAC2. 10 DAC1 debug events can occur only if If DAC1USDAC2US or MSRPR=0 DAC1ERDAC2ER, results are boundedly 11 DAC1 debug events can occur only if undefined. MSRPR=1 34:35 Data Address Compare 1 Effective/Real Mode (DAC1ER) 00 DAC1 debug events are based on effec- 10 Inclusive address range compare tive addresses 01 DAC1 debug events are based on real DAC1 and DAC2 debug events can occur addresses only if the address of the data storage 10 DAC1 debug events are based on effec- access is greater than or equal to the value tive addresses and can occur only if specified in DAC1 and less than the value MSRDS=0 specified in DAC2. If DAC1US DAC2US or DAC1ER DAC2ER, results are boundedly undefined. 714 Power ISATM III-E Version 2.05 Specifies which bytes in the aligned data 11 Exclusive address range compare value being read or written by the storage access are compared to the corresponding DAC1 and DAC2 debug events can occur bytes in DVC2 only if the address of the data storage access is less than the value specified in DAC1 or is greater than or equal to the 8.5.2 Debug Status Register value specified in DAC2. The Debug Status Register (DBSR) is a 32-bit register If DAC1US DAC2US or DAC1ER and contains status on debug events and the most DAC2ER, results are boundedly undefined. recent processor reset. 42:43 Reserved The DBSR is set via hardware, and read and cleared 44:45 Data Value Compare 1 Mode (DVC1M) via software. The contents of the DBSR can be read into bits 32:63 of a register RT using the mfspr instruc- 00 DAC1 debug events can occur tion, setting bits 0:31 of RT to zero. Bits in the DBSR 01 DAC1 debug events can occur only when can be cleared using the mtspr instruction. Clearing is all bytes specified in DBCR2DVC1BE in the done by writing bits 32:63 of a register to the DBSR data value of the data storage access with a 1 in any bit position that is to be cleared and 0 in match their corresponding bytes in DVC1 all other bit positions. The write-data to the DBSR is not 10 DAC1 debug events can occur only when direct data, but a mask. A 1 causes the bit to be at least one of the bytes specified in cleared, and a 0 has no effect. DBCR2DVC1BE in the data value of the data storage access matches its corre- The bit definitions for the DBSR are shown below: sponding byte in DVC1 11 DAC1 debug events can occur only when Bit(s) Description all bytes specified in DBCR2DVC1BE within 32 Imprecise Debug Event (IDE) at least one of the halfwords of the data value of the data storage access matches Set to 1 if MSRDE=0 and a debug event their corresponding bytes in DVC1 causes its respective Debug Status Register bit to be set to 1. 46:47 Data Value Compare 2 Mode (DVC2M) 33 Unconditional Debug Event (UDE) 00 DAC2 debug events can occur 01 DAC2 debug events can occur only when Set to 1 if an Unconditional debug event all bytes specified in DBCR2DVC2BE in the occurred. See Section 8.4.8. data value of the data storage access 34:35 Most Recent Reset (MRR) match their corresponding bytes in DVC2 10 DAC2 debug events can occur only when Set to one of three values when a reset at least one of the bytes specified in occurs. These two bits are undefined at DBCR2DVC2BE in the data value of the power-up. data storage access matches its corre- sponding byte in DVC2 00 No reset occurred since these bits last 11 DAC2 debug events can occur only when cleared by software all bytes specified in DBCR2DVC2BE within 01 Implementation-dependent reset informa- at least one of the halfwords of the data tion value of the data storage access matches 10 Implementation-dependent reset informa- their corresponding bytes in DVC2 tion 11 Implementation-dependent reset informa- tion 48:55 Data Value Compare 1 Byte Enables (DVC1BE) 36 Instruction Complete Debug Event (ICMP) Specifies which bytes in the aligned data Set to 1 if an Instruction Completion debug value being read or written by the storage event occurred and DBCR0ICMP=1. See access are compared to the corresponding Section 8.4.5. bytes in DVC1. 37 Branch Taken Debug Event (BRT) 56:63 Data Value Compare 2 Byte Enables (DVC2BE) Set to 1 if a Branch Taken debug event occurred and DBCR0BRT=1. See Section 8.4.4. Chapter 8. Debug Facilities 715 Version 2.05 38 Interrupt Taken Debug Event (IRPT) 53:56 Implementation-dependent Set to 1 if an Interrupt Taken debug event 57 Critical Interrupt Taken Debug Event occurred and DBCR0IRPT=1. See (CIRPT) [Category: Embedded.Enhanced Section 8.4.6. Debug] A Critical Interrupt Taken Debug Event occurs 39 Trap Instruction Debug Event (TRAP) when DBCR0CIRPT=1 and a critical interrupt Set to 1 if a Trap Instruction debug event (any interrupt that uses the critical class, i.e. occurred and DBCR0TRAP=1. See uses CSRR0 and CSRR1) occurs. Section 8.4.3. 0 Critical interrupt taken debug events are 40 Instruction Address Compare 1 Debug disabled. Event (IAC1) 1 Critical interrupt taken debug events are enabled. Set to 1 if an IAC1 debug event occurred and DBCR0IAC1=1. See Section 8.4.1. 58 Critical Interrupt Return Debug Event (CRET) [Category: Embedded.Enhanced 41 Instruction Address Compare 2 Debug Debug] Event (IAC2) A Critical Interrupt Return Debug Event Set to 1 if an IAC2 debug event occurred and occurs when DBCR0CRET=1 and a return from DBCR0IAC2=1. See Section 8.4.1. critical interrupt (an rfci instruction is exe- cuted) occurs. 42 Instruction Address Compare 3 Debug Event (IAC3) 0 Critical interrupt return debug events are disabled. Set to 1 if an IAC3 debug event occurred and 1 Critical interrupt return debug events are DBCR0IAC3=1. See Section 8.4.1. enabled. 43 Instruction Address Compare 4 Debug 59:63 Implementation-dependent Event (IAC4) Set to 1 if an IAC4 debug event occurred and DBCR0IAC4=1. See Section 8.4.1. 8.5.3 Instruction Address Com- 44 Data Address Compare 1 Read Debug pare Registers Event (DAC1R) The Instruction Address Compare Register 1, 2, 3, and Set to 1 if a read-type DAC1 debug event 4 (IAC1, IAC2, IAC3, and IAC4 respectively) are each occurred and DBCR0DAC1=0b10 or 64-bits, with bit 63 being reserved. DBCR0DAC1=0b11. See Section 8.4.2. A debug event may be enabled to occur upon an 45 Data Address Compare 1 Write Debug attempt to execute an instruction from an address Event (DAC1W) specified in either IAC1, IAC2, IAC3, or IAC4, inside or outside a range specified by IAC1 and IAC2 or, inside Set to 1 if a write-type DAC1 debug event or outside a range specified by IAC3 and IAC4, or to occurred and DBCR0DAC1=0b01 or blocks of addresses specified by the combination of the DBCR0DAC1=0b11. See Section 8.4.2. IAC1 and IAC2, or to blocks of addresses specified by 46 Data Address Compare 2 Read Debug the combination of the IAC3 and IAC4. Since all Event (DAC2R) instruction addresses are required to be word-aligned, the two low-order bits of the Instruction Address Com- Set to 1 if a read-type DAC2 debug event pare Registers are reserved and do not participate in occurred and DBCR0DAC2=0b10 or the comparison to the instruction address (see DBCR0DAC2=0b11. See Section 8.4.2. Section 8.4.1 on page 705). The contents of the Instruction Address Compare i 47 Data Address Compare 2 Write Debug Register (where i={1,2,3, or 4}) can be read into regis- Event (DAC2W) ter RT using mfspr RT,IACi. The contents of register RS can be written to the Instruction Address Compare i Set to 1 if a write-type DAC2 debug event Register using mtspr IACi,RS. occurred and DBCR0DAC2=0b01 or DBCR0DAC2=0b11. See Section 8.4.2. 48 Return Debug Event (RET) 8.5.4 Data Address Compare Reg- Set to 1 if a Return debug event occurred and isters DBCR0RET=1. See Section 8.4.2. The Data Address Compare Register 1 and 2 (DAC1 49:52 Reserved and DAC2 respectively) are each 64-bits. 716 Power ISATM III-E Version 2.05 A debug event may be enabled to occur upon loads, stores, or cache operations to an address specified in either the DAC1 or DAC2, inside or outside a range specified by the DAC1 and DAC2, or to blocks of addresses specified by the combination of the DAC1 and DAC1 (see Section 8.4.2). The contents of the Data Address Compare i Register (where i={1 or 2}) can be read into register RT using mfspr RT,DACi. The contents of register RS can be written to the Data Address Compare i Register using mtspr DACi,RS. The contents of the DAC1 or DAC2 are compared to the address generated by a data storage access instruction. 8.5.5 Data Value Compare Regis- ters The Data Value Compare Register 1 and 2 (DVC1 and DVC2 respectively) are each 64-bits. A DAC1R, DAC1W, DAC2R, or DAC2W debug event may be enabled to occur upon loads or stores of a spe- cific data value specified in either or both of the DVC1 and DVC2. DBCR2DVC1M and DBCR2DVC1BE control how the contents of the DVC1 is compared with the value and DBCR2DVC2M and DBCR2DVC2BE control how the contents of the DVC2 is compared with the value (see Section 8.4.2 and Section 8.5.1.3). The contents of the Data Value Compare i Register (where i={1 or 2}) can be read into register RT using mfspr RT,DVCi. The contents of register RS can be written to the Data Value Compare i Register using mtspr DVCi,RS. Chapter 8. Debug Facilities 717 Version 2.05 8.6 Debugger Notify Halt Instruction [Category: Embedded.Enhanced Debug] The dnh instruction provides the means for the transfer of information between the processor and an imple- mentation-dependent external debug facility. dnh also causes the processor to stop fetching and executing instructions. Debugger Notify Halt XFX-form dnh DUI,DUIS 19 DUI DUIS 198 / 0 6 11 21 31 if enabled by implementation-dependent means then implementation-dependent register 1 DUI halt processor else illegal instruction exception Execution of the dnh instruction causes the processor to stop fetching instructions and taking interrupts if exe- cution of the instruction has been enabled. The con- tents of the DUI field are sent to the external debug facility to identify the reason for the halt. If execution of the dnh instruction has not been previ- ously enabled, executing the dnh instruction produces an Illegal Instruction exception. The means by which execution of the dnh instruction is enabled is imple- mentation-dependent. The current state of the processor debug facility, whether the processor is in IDM or EDM mode has no effect on the execution of the dnh instruction. The instruction is context synchronizing. Programming Note The DUIS field in the instruction may be used to pass information to an external debug facility. After the dnh instruction has executed, the instruction itself can be read back by the Illegal Instruction Interrupt handler or the external debug facility if the contents of the DUIS field are of interest. If the pro- cessor entered the Illegal Instruction Interrupt han- dler, software can use SRR0 to obtain the address of the dnh instruction which caused the handler to be invoked. Special Registers Altered: None 718 Power ISATM III-E Version 2.05 Chapter 9. Processor Control [Category: Embedded.Processor Control] 9.1 Overview. . . . . . . . . . . . . . . . . . . . 719 9.2.1.2 Doorbell Critical Message Filtering 9.2 Programming Model . . . . . . . . . . . 719 720 9.2.1 Processor Message Handling and 9.3 Processor Control Instructions . . . 721 Filtering . . . . . . . . . . . . . . . . . . . . . . . . 719 9.2.1.1 Doorbell Message Filtering . . . 720 9.1 Overview when the message is received and the processor determines through examination of the payload that the The Processor Control facility provides a mechanism message should be accepted. The examination of the for processors within a coherence domain to send mes- payload for this purpose is termed filtering. The accep- sages to all devices in the coherence domain. The facil- tance of a Processor Doorbell [Critical] message ity provides a mechanism for sending interrupts that causes an exception to be generated on the accepting are not dependent on the interrupt controller to proces- processor. sors and allows message filtering by the processors Processors accept and filter messages defined in that receive the message. Section 9.2.1. Processors may also accept other imple- The Processor Control facility is also useful for sending mentation-dependent defined messages. messages to a device that provides specialized ser- vices such as secure boot operations controlled by a 9.2.1 Processor Message Han- security device. dling and Filtering The Processor Control facility defines how processors send messages and what actions processors take on Processors filter, accept, and handle message types the receipt of a message. The actions taken by devices defined as follows. The message type is specified in other than processors are not defined. the message and is determined by the contents of reg- ister RB32:36 used as the operand in the msgsnd instruction.The message type is interpreted as follows: 9.2 Programming Model Value Description Processors initiate a message by executing the msg- 0 Doorbell Interrupt (DBELL) snd instruction and specifying a message type and A Processor Doorbell exception is generated message payload in a general purpose register. Send- on the processor when the processor has fil- ing a message causes the message to be sent to all the tered the message based on the payload and devices, including the sending processor, in the coher- has determined that it should accept the mes- ence domain in a reliable manner. sage. A Processor Doorbell Interrupt occurs when no higher priority exception exists, a Each device receives all messages that are sent. The Processor Doorbell exception exists, and actions that a device takes are dependent on the mes- MSREE=1. sage type and payload. There are no restrictions on what messages a processor can send. 1 Doorbell Critical Interrupt (DBELL_CRIT) A Processor Doorbell Critical exception is To provide inter processor interrupt capability two mes- generated on the processor when the proces- sage types are defined, Processor Doorbell and Pro- sor has filtered the message based on the cessor Doorbell Critical. A Processor Doorbell [Critical] payload and has determined that it should message causes an interrupt to occur on processors accept the message. A Processor Doorbell Chapter 9. Processor Control [Category: Embedded.Processor Control] 719 Version 2.05 Critical Interrupt occurs when no higher prior- 9.2.1.2 Doorbell Critical Message Filter- ity exception exists, a Processor Doorbell Crit- ing ical exception exists, and MSRCE=1. A processor receiving a DBELL_CRIT message type Message types other than these and their associated will filter the message and either ignore the message or actions are implementation-dependent. accept the message and generate a Processor Door- bell Critical exception based on the payload and the 9.2.1.1 Doorbell Message Filtering state of the processor at the time the message is received. A processor receiving a DBELL message type will filter the message and either ignore the message or accept The payload is specified in the message and is deter- the message and generate a Processor Doorbell mined by the contents of register RB37:63 used as the exception based on the payload and the state of the operand in the msgsnd instruction. The payload bits processor at the time the message is received. are defined below. The payload is specified in the message and is deter- Bit Description mined by the contents of register RB37:63 used as the 37 Broadcast (BRDCAST) operand in the msgsnd instruction. The payload bits The message is accepted by all processors are defined below. regardless of the value of the PIR register and Bit Description the value of PIRTAG. 37 Broadcast (BRDCAST) 0 If the value of PIR and PIRTAG are equal The message is accepted by all processors a Processor Doorbell Critical exception is regardless of the value of the PIR register and generated. the value of PIRTAG. 1 A Processor Doorbell Critical exception is generated regardless of the value of 0 If the value of PIR and PIRTAG are equal PIRTAG and PIR. a Processor Doorbell exception is gener- ated. 38:41 Reserved 1 A Processor Doorbell exception is gener- 50:63 PIR Tag (PIRTAG) ated regardless of the value of PIRTAG The contents of this field are compared with and PIR. bits 50:63 of the PIR register. 38:41 Reserved If a DBELL_CRIT message is received by a processor 50:63 PIR Tag (PIRTAG) and either payloadBRDCAST=1 or PIR50:63=payload- The contents of this field are compared with PIRTAG then a Processor Doorbell Critical exception is bits 50:63 of the PIR register. generated. The exception condition remains until a Pro- cessor Doorbell Critical Interrupt is taken, or a msgclr If a DBELL message is received by a processor and instruction is executed on the receiving processor with either payloadBRDCAST=1 or PIR50:63=payloadPIRTAG a message type of DBELL_CRIT. A change to any of then a Processor Doorbell exception is generated. The the filtering criteria (i.e. changing the PIR register) will exception condition remains until a Processor Doorbell not clear a pending Processor Doorbell Critical excep- Interrupt is taken, or a msgclr instruction is executed tion. on the receiving processor with a message type of DBELL. A change to any of the filtering criteria (i.e. DBELL_CRIT messages are not cumulative. That is, if changing the PIR register) will not clear a pending Pro- a DBELL_CRIT message is accepted and the interrupt cessor Doorbell exception. is pended because MSRCE=0, further DBELL_CRIT messages that would be accepted are ignored until the DBELL messages are not cumulative. That is, if a Processor Doorbell Critical exception is cleared by tak- DBELL message is accepted and the interrupt is ing the interrupt or cleared by executing a msgclr with pended because MSREE=0, further DBELL messages a message type of DBELL_CRIT on the receiving pro- that would be accepted are ignored until the Processor cessor. Doorbell exception is cleared by taking the interrupt or cleared by executing a msgclr with a message type of The temporal relationship between when a DBELL on the receiving processor. DBELL_CRIT message is sent and when it is received in a given processor is not defined. The temporal relationship between when a DBELL message is sent and when it is received in a given pro- cessor is not defined. 720 Power ISATM III-E Version 2.05 9.3 Processor Control Instructions msgsnd and msgclr instructions are provided for In the instruction descriptions the statement "this sending and clearing messages to processors and instructions is treated as a Store" means that the other devices in the coherence domain. These instruc- instruction is treated as a Store with respect to the stor- tions are privileged. age access ordering mechanism caused by memory barriers in Section 1.7.1 of Book II. Message Send X-form Message Clear X-form msgsnd RB msgclr RB 31 /// /// RB 206 / 31 /// /// RB 238 / 0 6 11 16 21 31 0 6 11 16 21 31 msgtype 1 GPR(RB)32:36 msgtype 1 GPR(RB)32:36 payload 1 GPR(RB)37:63 clear_received_message(msgtype) send_msg_to_choherence_domain(msgtype, payload) msgclr clears a message of msgtype previously msgsnd sends a message to all devices in the coher- accepted by the processor executing the msgclr. msg- ence domain. The message contains a type and a pay- type is defined by the contents of RB32:36. A message load. The message type (msgtype) is defined by the is said to be cleared when a pending exception gener- contents of RB32:36 and the message payload is ated by an accepted message has not yet taken its defined by the contents of RB37:63. Message delivery is associated interrupt. reliable and guaranteed. Each device may perform specific actions based on the message type and pay- If a pending exception exists for msgtype that excep- load or may ignore messages. Consult the implementa- tion is cleared at the completion of the msgclr instruc- tion user's manual for specific actions taken based on tion. message type and payload. For processors, the types of messages that can be For processors, actions taken on receipt of a message cleared are defined in Section 9.2.1. are defined in Section 9.2.1. This instruction is privileged. For storage access ordering, msgsnd is treated as a Special Registers Altered: Store with respect to memory barriers. None This instruction is privileged. Programming Note Special Registers Altered: Execution of a msgclr instruction that clears a None pending exception when the associated interrupt is masked because the interrupt enable (MSREE or MSRCE) is not set to 1 will always clear the pending exception (and thus the interrupt will not occur) if a subsequent instruction causes MSREE or MSRCE to be set to 1. Chapter 9. Processor Control [Category: Embedded.Processor Control] 721 Version 2.05 722 Power ISATM III-E Version 2.05 Chapter 10. Synchronization Requirements for Context Alterations Changing the contents of certain System Registers, the If a sequence of instructions contains context-altering contents of TLB entries, or the contents of other system instructions and contains no instructions that are resources that control the context in which a program affected by any of the context alterations, no software executes can have the side effect of altering the con- synchronization is required within the sequence. text in which data addresses and instruction addresses are interpreted, and in which instructions are executed Programming Note and data accesses are performed. For example, Sometimes advantage can be taken of the fact that changing certain bits in the MSR has the side effect of certain events, such as interrupts, and certain changing how instruction addresses are calculated. instructions that occur naturally in the program, These side effects need not occur in program order, such as an rfi, rfci, rfmci, or rfdi [Cate- and therefore may require explicit synchronization by gory:Embeddd.Enhanced Debug] that returns from software. (Program order is defined in Book II.) an interrupt handler, provide the required synchro- An instruction that alters the context in which data nization. addresses or instruction addresses are interpreted, or in which instructions are executed or data accesses are No software synchronization is required before or after performed, is called a context-altering instruction. This a context-altering instruction that is also context syn- chapter covers all the context-altering instructions. The chronizing (e.g., rfi, etc.) or when altering the MSR in software synchronization required for them is shown in most cases (see the tables). No software synchroniza- Table 5 (for data access) and Table 4 (for instruction tion is required before most of the other alterations fetch and execution). shown in Table 4, because all instructions preceding the context-altering instruction are fetched and The notation "CSI" in the tables means any context decoded before the context-altering instruction is exe- synchronizing instruction (e.g., sc, isync, rfi, rfci, cuted (the processor must determine whether any of rfmci, or rfdi [Category: Embedded. Enhanced these preceding instructions are context synchroniz- Debug]). A context synchronizing interrupt (i.e., any ing). interrupt except non-recoverable System Reset or non- recoverable Machine Check) can be used instead of a Unless otherwise stated, the material in this chapter context synchronizing instruction. If it is, phrases like assumes a uniprocessor environment. "the synchronizing instruction", below, should be inter- preted as meaning the instruction at which the interrupt occurs. If no software synchronization is required before (after) a context-altering instruction, "the syn- chronizing instruction before (after) the context-altering instruction" should be interpreted as meaning the con- text-altering instruction itself. The synchronizing instruction before the context-alter- ing instruction ensures that all instructions up to and including that synchronizing instruction are fetched and executed in the context that existed before the alter- ation. The synchronizing instruction after the context- altering instruction ensures that all instructions after that synchronizing instruction are fetched and executed in the context established by the alteration. Instruc- tions after the first synchronizing instruction, up to and including the second synchronizing instruction, may be fetched or executed in either context. Chapter 10. Synchronization Requirements for Context Alterations 723 Version 2.05 Instruction or Required Required Notes Instruction or Required Required Notes Event Before After Event Before After interrupt none none interrupt none none rfi none none rfi none none rfci none none rfci none none rfmci none none rfmci none none rfdi[Category:E.ED] none none rfdi[Category:E.ED] none none sc none none sc none none mtmsr (CM) none none mtmsr (CM) none CSI mtmsr (ICM) none CSI mtmsr (ICM) none none mtmsr (UCLE) none none mtmsr (PR) none CSI mtmsr (SPV) none none mtmsr (ME) none CSI 3 mtmsr (WE) -- -- 4 mtmsr (DS) none CSI mtmsr (CE) none none 5 mtspr (PID) CSI CSI mtmsr (EE) none none 5 mtspr (DBSR) -- -- 6 mtmsr (PR) none CSI mtspr --- --- 6 mtmsr (FP) none CSI (DBCR0,DBCR2) mtmsr (DE) none CSI mtspr -- -- 6 mtmsr (ME) none CSI 3 (DAC1,DAC2, mtmsr (FE0) none CSI DVC1,DVC2) mtmsr (FE1) none CSI tlbivax CSI CSI, or CSI 1,7 and sync mtmsr (IS) none CSI 2 tlbwe CSI CSI, or CSI 1,7 mtspr (DEC) none none 8 and sync mtspr (PID) none CSI 2 mtspr (IVPR) none none Table 5: Synchronization requirements for data access mtspr (DBSR) -- -- 6 mtspr -- -- 6 Notes: (DBCR0,DBCR1) 1. There are additional software synchronization mtspr -- -- 6 requirements for this instruction in multiprocessor (IAC1,IAC2,IAC3, environments (e.g., it may be necessary to invali- IAC4) date one or more TLB entries on all processors in mtspr (IVORi) none none the multiprocessor system and to be able to deter- mtspr (TSR) none none 8 mine that the invalidations have completed and mtspr (TCR) none none 8 that all side effects of the invalidations have taken tlbivax none CSI, or 1,7 effect); it is also necessary to execute a tlbsync CSI and sync instruction. tlbwe none CSI, or 1,7 2. The alteration must not cause an implicit branch in CSI and sync real address space. Thus the real address of the wrtee none none 5 context-altering instruction and of each subse- wrteei none none 5 quent instruction, up to and including the next con- text synchronizing instruction, must be Table 4: Synchronization requirements for instruction independent of whether the alteration has taken fetch and/or execution effect. 3. A context synchronizing instruction is required after altering MSRME to ensure that the alteration takes effect for subsequent Machine Check inter- rupts, which may not be recoverable and therefore may not be context synchronizing. 4. Synchronization requirements for changing the Wait State Enable are implementation-dependent,. 5. The effect of changing MSREE or MSRCE is imme- diate. 724 Power ISATM III-E Version 2.05 If an mtmsr, wrtee, or wrteei instruction sets Programming Note MSREE to `0', an External Input, DEC or FIT inter- rupt does not occur after the instruction is exe- The following sequence illustrates why it is cuted. necessary, for data accesses, to ensure that all storage accesses due to instructions before If an mtmsr, wrtee, or wrteei instruction changes the tlbwe or tlbivax have completed to a point MSREE from `0' to `1' when an External Input, Dec- at which they have reported all exceptions they rementer, Fixed-Interval Timer, or higher priority will cause. Assume that valid TLB entries exist enabled exception exists, the corresponding inter- for the target storage location when the rupt occurs immediately after the mtmsr, wrtee, or sequence starts. wrteei is executed, and before the next instruction 1 A program issues a load or store to a is executed in the program that set MSREE to `1'. page. If an mtmsr instruction sets MSRCE to `0', a Criti- 1 The same program executes a tlbwe or cal Input or Watchdog Timer interrupt does not tlbivax that invalidates the corresponding occur after the instruction is executed. TLB entry. 1 The Load or Store instruction finally exe- If an mtmsr instruction changes MSRCE from `0' to cutes, and gets a TLB Miss exception. `1' when a Critical Input, Watchdog Timer or higher 1 The TLB Miss exception is semantically priority enabled exception exists, the correspond- incorrect. In order to prevent it, a context ing interrupt occurs immediately after the mtmsr is synchronizing instruction must be exe- executed, and before the next instruction is exe- cuted between steps 1 and 2. cuted in the program that set MSRCE to `1'. 6. Synchronization requirements for changing any of 8. The elapsed time between the Decrementer the Debug Facility Registers are implementation- reaching zero, or the transition of the selected dependent. Time Base bit for the Fixed-Interval Timer or the Watchdog Timer, and the signalling of the Decre- 7. For data accesses, the context synchronizing menter, Fixed-Interval Timer or the Watchdog instruction before the tlbwe or tlbivax instruction Timer exception is not defined. ensures that all storage accesses due to preceding instructions have completed to a point at which they have reported all exceptions they will cause. The context synchronizing instruction after the tlbwe or tlbivax ensures that subsequent storage accesses (data and instruction) will use the updated value in the TLB entry(s) being affected. It does not ensure that all storage accesses previ- ously translated by the TLB entry(s) being updated have completed with respect to storage; if these completions must be ensured, the tlbwe or tlbivax must be followed by an sync instruction as well as by a context synchronizing instruction. Chapter 10. Synchronization Requirements for Context Alterations 725 Version 2.05 726 Power ISATM III-E Version 2.05 Appendix A. Implementation-Dependent Instructions This appendix documents architectural resources that tions may exercise reasonable flexibility in implement- are allocated for specific implementation-sensitive ing these functions, but that flexibility should be limited functions which have scope-limited utility. Implementa- to that allowed in this appendix. A.1 Embedded Cache Initialization [Category: Embedded.Cache Ini- tialization] Data Cache Invalidate X-form Instruction Cache Invalidate X-form dci CT ici CT 31 / CT /// /// 454 / 31 / CT /// /// 966 / 0 6 7 11 16 21 31 0 6 7 11 16 21 31 If CT is not supported by the implementation, this If CT is not supported by the implementation, this instruction designates the primary data cache as the instruction designates the primary instruction cache as target data cache. the target instruction cache. If CT is supported by the implementation, let CT desig- If CT is supported by the implementation, let CT desig- nate either the primary data cache or another level of nate either the primary instruction cache or another the data cache hierarchy, as specified in Book II Sec- level of the instruction cache hierarchy, as specified in tion 3.2, as the target data cache. Book II Section 3.2, as the target instruction cache. The contents of the target data cache of the processor The contents of the target instruction cache of the pro- executing the dci instruction are invalidated. cessor executing the ici instruction are invalidated. Software must place a sync instruction before the dci Software must place a sync instruction before the ici to to guarantee all previous data storage accesses com- guarantee all previous instruction storage accesses plete before the dci is performed. complete before the ici is performed. Software must place a sync instruction after the dci to Software must place an isync instruction after the ici to guarantee that the dci completes before any subse- invalidate any instructions that may have already been quent data storage accesses are performed. fetched from the previous contents of the instruction cache after the isync. This instruction is privileged. This instruction is privileged. Special Registers Altered: None Special Registers Altered: None Extended Mnemonics: Extended Mnemonics: Extended mnemonic for Data Cache Invalidate Extended mnemonic for Instruction Cache Invalidate Extended: Equivalent to: dccci dci 0 Extended: Equivalent to: iccci ici 0 Appendix A. Implementation-Dependent Instructions 727 Version 2.05 A.2 Embedded Cache Debug Facility [Category: Embedded.Cache Debug] A.2.1 Embedded Cache Debug Registers A.2.1.1 Data Cache Debug Tag Regis- A.2.1.2 Data Cache Debug Tag Regis- ter High ter Low The Data Cache Debug Tag Register High (DCDB- The Data Cache Debug Tag Register Low (DCDBTRL) TRH) is a 32-bit Special Purpose Register. The Data is a 32-bit Special Purpose Register. The Data Cache Cache Debug Tag Register High is read using mfspr Debug Tag Register Low is read using mfspr and is set and is set by dcread. by dcread. DCDBTRH DCDBTRL 32 63 32 63 Figure 25. Data Cache Debug Tag Register High Figure 26. Data Cache Debug Tag Register Low Programming Note Programming Note An example implementation of DCDBTRH could An example implementation of DCDBTRL could have the following content and format. have the following content and format. Bit(s) Description Bit(s) Description 32:55 Tag Real Address (TRA) 32:44 Reserved (TRA) Bits 0:23 of the lower 32 bits of the 36-bit 45 U bit parity (UPAR) real address associated with this cache block 46:47 Tag parity (TPAR) 56 Valid (V) 48:51 Data parity (DPAR) The valid indicator for the cache block (1 52:55 Modified (dirty) parity (MPAR) indicates valid) 56:59 Dirty Indicators (D) 57:59 Reserved The "dirty" (modified) indicators for each 60:63 Tag Extended Real Address (TERA) of the four doublewords in the cache block Upper 4 bits of the 36-bit real address 60 U0 Storage Attribute (U0) associated with this cache block The U0 storage attribute for the page Implementations may support different content and associated with this cache block format based on their cache implementation. 61 U1 Storage Attribute (U1) The U1 storage attribute for the page associated with this cache block 62 U2 Storage Attribute (U2) The U2 storage attribute for the page associated with this cache block 63 U3 Storage Attribute (U3) The U3 storage attribute for the page associated with this cache block Implementations may support different content and format based on their cache implementation. 728 Power ISATM III-E Version 2.05 A.2.1.3 Instruction Cache Debug Data A.2.1.5 Instruction Cache Debug Tag Register Register Low The Instruction Cache Debug Data Register (ICDBDR) The Instruction Cache Debug Tag Register Low (ICDB- is a read-only 32-bit Special Purpose Register. The TRL) is a 32-bit Special Purpose Register. The Instruc- Instruction Cache Debug Data Register can be read tion Cache Debug Tag Register Low is read using using mfspr and is set by icread. mfspr and is set by icread. ICDBDR ICDBTRL 32 63 32 63 Figure 27. Instruction Cache Debug Data Register Figure 29. Instruction Cache Debug Tag Register Low A.2.1.4 Instruction Cache Debug Tag Programming Note Register High An example implementation of ICDBTRL could The Instruction Cache Debug Tag Register High have the following content and format. (ICDBTRH) is a 32-bit Special Purpose Register. The Instruction Cache Debug Tag Register High is read Bit(s) Description using mfspr and is set by icread. 32:53 Reserved ICDBTRH 54 Translation Space (TS) 32 63 The address space portion of the virtual address associated with this cache block. Figure 28. Instruction Cache Debug Tag Register High 55 Translation ID Disable (TD) TID Disable field for the memory page Programming Note associated with this cache block An example implementation of ICDBTRH could 56:63 Translation ID (TID) have the following content and format. TID field portion of the virtual address associated with this cache block Bit(s) Description Other implementations may support different con- 32:55 Tag Effective Address (TEA) tent and format based on their cache implementa- Bits 0:23 of the 32-bit effective address tion. associated with this cache block 56 Valid (V) The valid indicator for the cache block (1 indicates valid) 57:58 Tag parity (TPAR) 59 Instruction Data parity (DPAR) 60:63 Reserved Implementations may support different content and format based on their cache implementation. Appendix A. Implementation-Dependent Instructions 729 Version 2.05 A.2.2 Embedded Cache Debug Instructions Data Cache Read X-form msync # ensure that all previous # cache operations have dcread RT,RA,RB # completed 31 RT RA RB 486 / dcread regT,regA,regB# read cache information; 0 6 11 16 21 31 isync # ensure dcread completes # before attempting to [Alternative Encoding] # read results 31 RT RA RB 326 / mfspr regD,dcdbtrh # move high portion of tag 0 6 11 16 21 31 # into GPR D mfspr regE,dcdbtrl # move low portion of tag if RA = 0 then b 1 0 # into GPR E else b 1 (RA) EA 1 b + (RB) This instruction is privileged. C 1 log2(cache size) Special Registers Altered: B 1 log2(cache block size) DCDBTRH DCDBTRL IDX1 EA64-C:63-B WD 1 EA64-B:61 RT0:311 undefined Programming Note RT32:631 (data cache data)[IDX]WD×32:WD×32+31 dcread can be used by a debug tool to determine DCDBTRH1 (data cache tag high)[IDX] the contents of the data cache, without knowing the DCDBTRL1 (data cache tag low)[IDX] specific addresses of the blocks which are currently Let the effective address (EA) be the sum of the con- contained within the cache. tents of register RA, or 0 if RA is equal to 0, and the contents of register RB. Programming Note Let C = log2(cache size in bytes). Execution of dcread before the data cache has Let B = log2(cache block size in bytes). completed all cache operations associated with previously executed instructions (such as block fills EA64-C:63-B selects one of the 2C-B data cache blocks. and block flushes) is undefined. EA64-B:61 selects one of the data words in the selected data cache block. The selected word in the selected data cache block is placed into register RT. The contents of the data cache directory entry associ- ated with the selected data cache block are placed into DCDBTRH and DCDBTRL (see Figure 25 and Figure 26). dcread requires software to guarantee execution syn- chronization before subsequent mfspr instructions can read the results of the dcread instruction into GPRs. In order to guarantee that the mfspr instructions obtain the results of the dcread instruction, a sequence such as the following must be used: 730 Power ISATM III-E Version 2.05 Instruction Cache Read X-form Programming Note icread RA,RB icread can be used by a debug tool to determine the contents of the instruction cache, without know- 31 /// RA RB 998 / ing the specific addresses of the blocks which are 0 6 11 16 21 31 currently contained within the cache. if RA = 0 then b 1 0 else b 1 (RA) EA 1 b + (RB) C 1 log2(cache size) B 1 log2(cache block size) IDX1 EA64-C:63-B WD 1 EA64-B:61 ICDBDR1 (instruction cache data)[IDX]WD×32:WD×32+31 ICDBTRH1 (instruction cache tag high)[IDX] ICDBTRL1 (instruction cache tag low)[IDX] Let the effective address (EA) be the sum of the con- tents of register RA, or 0 if RA is equal to 0, and the contents of register RB. Let C = log2(cache size in bytes). Let B = log2(cache block size in bytes). EA64-C:63-B selects one of the 2C-B instruction cache blocks. EA64-B:61 selects one of the data words in the selected instruction cache block. The selected word in the selected instruction cache block is placed into ICDBDR. The contents of the instruction cache directory entry associated with the selected cache block are placed into ICDBTRH and ICDBTRL (see Figure 28 and Figure 29). icread requires software to guarantee execution syn- chronization before subsequent mfspr instructions can read the results of the icread instruction into GPRs. In order to guarantee that the mfspr instructions obtain the results of the icread instruction, a sequence such as the following must be used: icread regA,regB # read cache information isync # ensure icread completes # before attempting to # read results mficdbdr regC # move instruction # information into GPR C mficdbtrh regD # move high portion of # tag into GPR D mficdbtrl regE # move low portion of tag # into GPR E This instruction is privileged. Special Registers Altered: ICDBDR ICDBTRH ICDBTRL Appendix A. Implementation-Dependent Instructions 731 Version 2.05 732 Power ISATM III-E Version 2.05 Appendix B. Assembler Extended Mnemonics In order to make assembler language programs simpler to write and easier to understand, a set of extended mnemonics and symbols is provided for certain instruc- tions. This appendix defines extended mnemonics and symbols related to instructions defined in Book III. Assemblers should provide the extended mnemonics and symbols listed here, and may provide others. Appendix B. Assembler Extended Mnemonics 733 Version 2.05 B.1 Move To/From Special Purpose Register Mnemonics This section defines extended mnemonics for the Time Base instruction, which specifies the portion of mtspr and mfspr instructions, including the Special the Time Base as a numeric operand. Purpose Registers (SPRs) defined in Book I and cer- Note: mftb serves as both a basic and an extended tain privileged SPRs, and for the Move From Time mnemonic. The Assembler will recognize an mftb mne- Base instruction defined in Book II. monic with two operands as the basic form, and an The mtspr and mfspr instructions specify an SPR as a mftb mnemonic with one operand as the extended numeric operand; extended mnemonics are provided form. In the extended form the TBR operand is omitted that represent the SPR in the mnemonic rather than and assumed to be 268 (the value that corresponds to requiring it to be coded as an operand. Similar TB). extended mnemonics are provided for the Move From Table 6: Extended mnemonics for moving to/from an SPR Move To SPR Move From SPR Special Purpose Register Extended Equivalent to Extended Equivalent to Fixed-Point Exception Register mtxer Rx mtspr 1,Rx mfxer Rx mfspr Rx,1 Link Register mtlr Rx mtspr 8,Rx mflr Rx mfspr Rx,8 Count Register mtctr Rx mtspr 9,Rx mfctr Rx mfspr Rx,9 Decrementer mtdec Rx mtspr 22,Rx mfdec Rx mfspr Rx,22 Save/Restore Register 0 mtsrr0 Rx mtspr 26,Rx mfsrr0 Rx mfspr Rx,26 Save/Restore Register 1 mtsrr1 Rx mtspr 27,Rx mfsrr1 Rx mfspr Rx,27 Special Purpose Registers mtsprg n,Rx mtspr 272+n,Rx mfsprg Rx,n mfspr Rx,272+n G0 through G3 Time Base [Lower] mttbl Rx mtspr 284,Rx mftb Rx mfspr Rx,268 Time Base Upper mttbu Rx mtspr 285,Rx mftbu Rx mfspr Rx,269 Processor Version Register - - mfpvr Rx mfspr Rx,287 734 Power ISATM III-E Version 2.05 Appendix C. Guidelines for 64-bit Implementations in 32-bit Mode and 32-bit Implementations C.1 Hardware Guidelines C.1.1 64-bit Specific Instructions the 32 0s when implementing these instructions. For Branch to Link Register and Branch to Count Register The instructions in the Category: 64-Bit are considered instructions, given the LR and CTR are implemented restricted only to 64-bit processing. A 32-bit implemen- only as 32-bit registers, only concatenating 2 0s to the tation need not implement the group; likewise, the right of bits 32:61 of these registers is necessary to 32-bit applications will not utilize any of these instruc- form the 32-bit branch target address. tions. All other instructions shall either be supported For next sequential instruction address computation, directly by the implementation, or sufficient infrastruc- the behavior is the same as for 64-bit implementations ture will be provided to enable software emulation of in 32-bit mode. the instructions. A 64-bit implementation that is execut- ing in 32-bit mode may choose to take an Unimple- mented Instruction Exception when these 64-bit C.1.4 TLB Fields on 32-bit Imple- specific instructions are executed. mentations C.1.2 Registers on 32-bit Imple- 32-bit implementations should support bits 32:53 of the Effective Page Number (EPN) field in the TLB. This size mentations provides support for a 32-bit effective address, which Power ISA ABIs may have come to expect to be avail- The Power ISA provides 32-bit and 64-bit registers. All able. 32-bit implementations may support greater than 32-bit registers shall be supported as defined in the 32-bit real addresses by supporting more than bits specification except the MSR. The MSR shall be sup- 32:53 of the Real Page Number (RPN) field in the TLB. ported as defined in the specification except that bits 32:33 (CM and ICM) are treated as reserved bits. Only bits 32:63 of the 64-bit registers are required to be implemented in hardware in a 32-bit implementation C.2 32-bit Software Guidelines except for the 64-bit FPRs. Such 64-bit registers include the LR, the CTR, the XER, the 32 GPRs, SRR0 C.2.1 32-bit Instruction Selection and CSRR0. Any software that uses any of the instructions listed in Likewise, other than floating-point instructions, all Category: 64-Bit shall be considered 64-bit software, instructions which are defined to return a 64-bit result and correct execution cannot be guaranteed on 32-bit shall return only bits 32:63 of the result on a 32-bit implementations. Generally speaking, 32-bit software implementation. should avoid using any instruction or instructions that depend on any particular setting of bits 0:31 of any C.1.3 Addressing on 32-bit Imple- 64-bit application-accessible system register, including General Purpose Registers, for producing the correct mentations 32-bit results. Context switching may or may not pre- Only bits 32:63 of the 64-bit instruction and data stor- serve the upper 32 bits of application-accessible 64-bit age effective addresses need to be calculated and pre- system registers and insertion of arbitrary settings of sented to main storage. Given that the only branch and those upper 32 bits at arbitrary times during the execu- data storage access instructions that are not included tion of the 32-bit application must not affect the final in Section C.1.1 are defined to prepend 32 0s to bits result. 32:63 of the effective address computation, a 32-bit implementation can simply bypass the prepending of Appendix C. Guidelines for 64-bit Implementations in 32-bit Mode and 735 Version 2.05 736 Power ISATM III-E Version 2.05 Appendix D. Type FSL Storage Control [Category: Embedded.MMU Type FSL] D.1 Type FSL Storage Control D.2 Type FSL Storage Control Overview Registers The Embedded category provides two different mem- ory management and TLB programming models from D.2.1 Process ID Registers (PIDn) which an implementation may choose. Both models use the same definition of the general contents of a Process ID Registers are used by system software to Translation Lookaside Buffer (TLB) entry, but differ on specify which TLB entries are used by the processor to what methods and resources are used to manipulate accomplish address translation for loads, stores, and the TLB itself. The programming model presented here instruction fetches. Section 4.7.1.1 defines the PID reg- is called Type FSL and it defines functions and struc- ister. The PID register is synonymous with PID0. In tures that are visible to software. These are divided into addition to PID0, 2 additional PID registers, PID1 and the following areas: PID2 are defined. An implementation may choose to provide any number of PIDs up to a maximum of 3. The 1 The TLB itself. The TLB consists of one or more number of PIDs implemented is indicated by the value structures called TLB arrays each of which may of MMUCFGNPIDS and the number of bits implemented have differing characteristics. in each PID register is indicated by the value of 1 The address translation mechanism. MMUCFGPIDSIZE. PID values are used to construct vir- 1 Methods and effects of changing and manipulating tual addresses for accessing memory. TLB arrays. 1 Configuration information available to the operat- PIDn ing system that describes the structure and form of 32 63 the TLB arrays and translation mechanism. Figure 30. Process ID Register (PID0­PID2) The TLB structure and the methods of performing translations are called the Memory Management Unit Bit Description (MMU). 32:49 Reserved The programming model for reading and writing TLBs is 50:63 Process ID software managed. Hardware page table formats are Identifies the process not defined and software is free to choose any form in which to hold information about address translation. Programming Note Address translation is accomplished through a set of The suggested software convention for PID usage TLB arrays, PID registers, and address space identifi- is to use PID0 to denote private mappings for a ers from the MSR, all of which are software managed. process and to use other PIDs to handle mappings TLB entries are used to translate both instruction and that may be common to multiple processes. This data memory references providing a unified memory method allows for processes sharing address management model. space to also share TLB entries if the shared address space is mapped at the same virtual address in each process. D.2.2 Translation Lookaside Buffer The MMU contains up to four TLB arrays. TLB arrays are on-chip storage areas for holding TLB entries. A Appendix D. Type FSL Storage Control [Category: Embedded.MMU Type 737 Version 2.05 TLB entry contains effective to real address mappings instructions. Execution of a tlbre instruction causes the for loads, stores, and instruction fetches. A TLB array TLB entry specified by MAS0TLBSEL, MAS0ESEL, and contains zero or more TLB entries. Each of the TLB MAS2EPN to be copied to the MAS registers. Con- entries has specific fields that can be accessed using versely, execution of a tlbwe instruction causes the the corresponding fields in the MMU Assist Registers TLB entry specified by MAS0TLBSEL, MAS0ESEL, and (see Section D.2.4). Each TLB array that is imple- MAS2EPN to be written with contents of the MAS regis- mented has a configuration register (TLBnCFG) associ- ters. MAS registers may also be updated by hardware ated with it describing the size and attributes of the TLB on the occurrence of an Instruction or Data TLB Error entries in that array (see Section D.2.5.2). interrupt or as the result of a tlbsx instruction. A TLB entry contains the fields described in Section All MAS registers are privileged. All MAS registers with 4.7.1.2 as well as these additional fields: the exception of MAS7 must be implemented. MAS7 is not required to be implemented if the processor sup- Field Description ports 32 bits or less of real address. IPROT Invalidation protection. This entry is protected Processors are only required to implement the neces- from all TLB invalidation mechanisms except sary bits of any multi-bit field in a MAS register such the explicit writing of a 0 to the V bit. that only the resources supplied by the processor are ACM The Alternate Coherency Mode (ACM) represented. Any non-implemented bits in a field attribute allows an implementation to employ should have no effect when writing and should always more than a single coherency method. This read as zero. For example, a processor that imple- allows for a processor to participate in multiple ments only 2 TLB arrays will likely only implement the coherency protocols. If the M attribute (Mem- lower-order bit of the MAS0TLBSEL field. ory Coherence Required) is not set for a page (M=0), the page has no coherency associated D.2.4.1 MAS0 Register with it and the ACM attribute is ignored. If the M attribute is set to 1 for a page (M=1), the The MAS0 register contains fields for identifying and ACM attribute is used to determine the coher- selecting a TLB entry. ence domain (or protocol) used. The values for ACM are implementation-dependent. MAS0 32 63 D.2.3 Address Space Identifiers Figure 31. MAS0 register The address space identifier is called the AS bit. Thus These bits are interpreted as follows: there are two possible address spaces, 0 and 1. The Bit Description value of the AS bit (see Section 4.7.2, Figure 8) is determined by the type of translation performed and 32:33 Reserved from the contents of the MSR when an address is 34:35 TLB Select (TLBSEL) translated. If the type of translation performed is an Selects TLB for access. instruction fetch, the value of the AS bit is taken from 00 TLB0 the contents of MSRIS. If the type of translation per- 01 TLB1 formed is a load, store, or other data translation includ- 10 TLB2 ing target addresses of software initiated instruction 11 TLB3 fetch hints and locks the value of the AS bit is taken from the contents of MSRDS. 36:47 Entry Select (ESEL) Identifies an entry in the selected array to be Programming Note used for tlbwe and tlbre. Valid values for While system software is free to use address space ESEL are from 0 to TLBnCFGASSOC - 1. That bits as it sees fit, it should be noted that on inter- is, ESEL selects the entry in the TLB array rupt, the MSRIS and MSRDS bits are set to 0. This from the set of entries which can be used for encourages software to use address space 0 for translating addresses with the EPN specified system software and address space 1 for user soft- by MAS2EPN. For fully-associative TLB arrays, ware. ESEL ranges from 0 to TLBnCFGNENTRY - 1. ESEL is also updated on TLB error exceptions (misses), and tlbsx hit and miss cases. D.2.4 MMU Assist Registers 48:51 Reserved The MMU Assist Registers (MAS) are used to transfer 52:63 Next Victim (NV) data to and from the TLB arrays. MAS registers can be NV is a hint to software to identify the next vic- read and written by software using mfspr and mtspr tim to be targeted for a TLB miss replacement 738 Power ISATM III-E Version 2.05 operation for those TLBs that support the NV 56:63 Reserved field. If the TLB selected by MAS0TLBSEL does not support the NV field, then this field is undefined. The computation of this field is D.2.4.3 MAS2 Register implementation-dependent. NV is updated on The MAS2 register is a 64-bit register. The register TLB error exceptions (misses), tlbsx hit and contains fields for specifying the effective page address miss cases as shown in Table 7, and on exe- and the storage attributes for a TLB entry. cution of tlbre if the TLB array being accessed supports the NV field. When NV is updated by MAS2 a supported TLB array, the NV field will always 0 63 present a value that can be used in the MAS0ESEL field. Figure 33. MAS2 register These bits are interpreted as follows: D.2.4.2 MAS1 Register Bit Description The MAS1 register contains fields for selecting a TLB 0:51 Effective Page Number (EPN) entry during translation. Depending on page size, only the bits associ- ated with a page boundary are valid. Bits that MAS1 represent offsets within a page are ignored 32 63 and should be zero. EPN0:31 are accessible only in 64-bit implementations as the upper 32 Figure 32. MAS1 register bits of the effective address of the page. These bits are interpreted as follows: 52:55 Reserved Bit Definition 56:57 Alternate Coherency Mode (ACM) 32 TLB Valid Bit (V) The ACM attribute allows an implementation to employ more than a single coherency 0 This TLB entry is invalid. method. This allows for a processor to partici- 1 This TLB entry is valid. pate in multiple coherency protocols. If the M 33 Invalidate Protect (IPROT) attribute (Memory Coherence Required) is not Indicates this TLB entry is protected from set for a page (M=0), the page has no coher- invalidate operations due to execution of ency associated with it and the ACM attribute tlbivax, tlbivax invalidations from another is ignored. If the M attribute is set to 1 for a processor, or invalidate all operations. IPROT page (M=1), the ACM attribute is used to is only implemented for TLB entries in TLB determine the coherence domain (or protocol) arrays where TLBnCFGIPROT is indicated. used. The values for ACM are implementa- tion-dependent. 0 Entry is not protected from invalidation 1 Entry is protected from invalidation. Programming Note 34:47 Translation Identity (TID) Some previous implementations may During translation, TID is compared with the have a storage bit in the bit 57 position current process IDs (PIDs) to select a TLB labeled as X0. entry. A TID value of 0 defines an entry as glo- bal and matches with all process IDs. 58 VLE Mode (VLE) 48:50 Reserved [Category: VLE] Identifies pages which contain instructions to 51 Translation Space (TS) be decoded as VLE instructions (see Chapter During translation, TS is compared with AS 1 of Book VLE). Setting the VLE attribute to 1 (the IS or DS fields of the MSR depending on and setting the E attribute to 1 is considered a the type of access) to select a TLB entry. programming error and an attempt to fetch 52:55 Translation Size (TSIZE) instructions from a page so marked produces TSIZE defines the page size of the TLB entry. an Instruction Storage Interrupt Byte Ordering For TLB arrays that contain fixed-size TLB Exception and sets ESRBO. entries, this field is ignored. For variable page 0 Instructions fetched from the page are size TLB arrays, the page size is decoded and executed as non-VLE 4TSIZE Kbytes. TSIZE must be a non-zero instructions. value between TLBnCFGMINSIZE and 1 Instructions fetched from the page are TLBnCFGMAXSIZE. Encodings for page size decoded and executed as VLE instruc- are defined in Section 4.7.1.2. tions. Appendix D. Type FSL Storage Control [Category: Embedded.MMU Type 739 Version 2.05 and can be used by system software. For Programming Note example, these bits may be used to hold infor- Some previous implementations may mation useful to a page scanning algorithm or have a storage bit in this position labeled be used to mark more abstract page as X1. Software should not use the pres- attributes. ence of this bit (the ability to set to 1 and read a 1) to determine if the implementa- 58:63 Permission Bits (UX, SX, UW, SW, UR, SR). tion supports the VLE. User and supervisor execute, write, and read permission bits. The effect of the Permission 59 Write Through (W) Bits are defined in Section 4.7.1.2. 0 This page is not Write-Through Required storage. D.2.4.5 MAS4 Register 1 This page is Write-Through Required stor- The MAS4 register contains fields for specifying default age. information to be pre-loaded on certain MMU related 60 Caching Inhibited (I) exceptions. See Section D.4.5 for more information. 0 This page is not Caching Inhibited stor- MAS4 age. 32 63 1 This page is Caching Inhibited storage 61 Memory Coherence Required (M) Figure 35. MAS4 register 0 This page is not Memory Coherence The MAS4 fields are described below. Required storage. Bit Description 1 This page is Memory Coherence Required storage. 32:33 Reserved 62 Guarded (G) 34:35 TLBSEL Default Value (TLBSELD) Specifies the default value loaded in 0 This page is not Guarded storage. MAS0TLBSEL on a TLB miss exception. 1 This page is Guarded storage. 36:43 Reserved 63 Endianness (E) 44:47 TID Default Selection Value (TIDSELD) 0 The page is accessed in Big-Endian byte Specifies which of the current PID registers order. should be used to load the MAS1TID field on a 1 The page is accessed in Little-Endian byte TLB miss exception. order. D.2.4.4 MAS3 Register The PID registers are addressed as follows: 0000 = PID0 (PID) The MAS3 register contains fields for specifying the 0001 = PID1 real page address, user defined attributes, and the per- 0010 = PID2 mission attributes for a TLB entry. A value that references a non-implemented MAS3 PID register causes a value of 0 to be placed 32 63 in MAS1TID. Figure 34. MAS3 register 48:51 Reserved These bits are interpreted as follows: 52:55 Default TSIZE Value (TSIZED) Specifies the default value loaded into Bit Description MAS1TSIZE on a TLB miss exception. 32:51 Real Page Number (bits 32:51) (RPNL or 56:57 Default ACM Value (ACMD) RPN32:51) Specifies the default value loaded into Depending on page size, only the bits associ- MAS2ACM on a TLB miss exception. ated with a page boundary are valid. Bits that represent offsets within a page are ignored 58 Default VLE Value (VLED) and should be zero. RPN0:31 are accessed Specifies the default value loaded into through MAS7. MAS2VLE on a TLB miss exception. 52:53 Reserved 59 Default W Value (WD) Specifies the default value loaded into 54:57 User Bits (U0:U3) MAS2W on a TLB miss exception. These bits are associated with a TLB entry 740 Power ISATM III-E Version 2.05 60 Default I Value (ID) Specifies the default value loaded into MAS2I on a TLB miss exception. 61 Default M Value (MD) Specifies the default value loaded into MAS2M on a TLB miss exception. 62 Default G Value (GD) Specifies the default value loaded into MAS2G on a TLB miss exception. 63 Default E Value (ED) Specifies the default value loaded into MAS2E on a TLB miss exception. D.2.4.6 MAS6 Register The MAS6 register contains fields for specifying PID and AS values to be used when searching TLB entries with the tlbsx instruction. MAS6 32 63 Figure 36. MAS6 register These bits are interpreted as follows: Bit Description 32:33 Reserved 34:47 Search PID0 (SPID0) Specifies the value of PID0 used when searching the TLB during execution of tlbsx. This field is valid for only the number of bits implemented for PID registers. 48:62 Reserved 63 Address Space Value for Searches (SAS) Specifies the value of AS used when search- ing the TLB during execution of tlbsx. D.2.4.7 MAS7 Register The MAS7 register contains the high order address bits of the RPN for implementations that support more than 32 bits of physical address. Implementations that do not support more than 32 bits of physical addressing are not required to implement MAS7. MAS7 32 63 Figure 37. MAS7 register These bits are interpreted as follows: Bit Description 32:63 Real Page Number (bits 0:31) (RPNU or RPN0:31) RPN32:51 are accessed through MAS3. Appendix D. Type FSL Storage Control [Category: Embedded.MMU Type 741 Version 2.05 Table 7: MAS Register Update Summary Value Loaded on Event MAS Field Updated Data or Instruction tlbsx hit tlbsx miss tlbre TLB Error Interrupt MAS0TLBSEL MAS4TLBSELD TLB array that hit MAS4TLBSELD -- MAS0ESEL if TLB array Number of entry that hit if TLB array -- [MAS4TLBSELD] sup- [MAS4TLBSELD] sup- ports next victim then ports next victim then hardware hint, hardware hint, else undefined else undefined MAS0NV if TLB array if TLB array if TLB array if TLB array [MAS4TLBSELD] sup- [MAS4TLBSELD] sup- [MAS4TLBSELD] sup- [MAS4TLBSELD] sup- ports next victim then ports next victim then ports next victim then ports next victim then next hardware hint, hardware hint, next hardware hint, hardware hint, else undefined else undefined else undefined else undefined MAS1V 1 1 0 TLBV MAS1IPROT 0 TLBIPROT 0 TLBIPROT MAS1TID if PID[MAS4TIDSELD] TLBTID MAS6SPID0 TLBTID implemented then PID[MAS4TIDSELD] else 0 MAS1TS MSRIS or MSRDS TLBTS MAS6SAS TLBTS MAS1TSIZE MAS4TSIZED TLBSIZE MAS4TSIZED TLBSIZE MAS2EPN EA0:511 TLBEPN undefined TLBEPN MAS2ACM MAS4ACMD TLBACM MAS4ACMD TLBACM MAS2VLE MAS4VLED TLBVLE MAS4VLED TLBVLE MAS2W MAS4WD TLBW MAS4WD TLBW MAS2I MAS4ID TLBI MAS4ID TLBI MAS2M MAS4MD TLBM MAS4MD TLBM MAS2G MAS4GD TLBG MAS4GD TLBG MAS2E MAS4ED TLBE MAS4ED TLBE MAS3RPN 0 TLBRPN 0 TLBRPN (bits 32:51) (bits 32:51) MAS3U0 U1 U2 U3 0 TLBU0 U1 U2 U3 0 TLBU0 U1 U2 U3 MAS3UX SX UW 0 TLBUX SX UW SW UR SR 0 TLBUX SX UW SW UR SR SW UR SR MAS4 -- -- -- -- MAS6SPID0 PID0 -- -- -- MAS6SAS MSRIS or MSRDS -- -- -- MAS7RPN 0 TLBRPN 0 TLBRPN (bits 0:31) (bits 0:31) 1. If MSRCM=0 (32-bit mode) at the time of the exception, EPN0:31 are set to 0. 742 Power ISATM III-E Version 2.05 D.2.5 MMU Configuration and D.2.5.2 TLB Configuration Registers Control Registers (TLBnCFG) The TLBnCFG read-only registers provide information about each specific TLB that is implemented. There is D.2.5.1 MMU Configuration Register one TLBnCFG register implemented for each TLB array (MMUCFG) that is implemented. TLB0CFG corresponds to TLB0, The read-only MMUCFG register is described as fol- TLB1CFG corresponds to TLB1, etc. lows. TLBnCFG provides configuration information for the corresponding TLB array. MMUCFG 32 63 TLBnCFG Figure 38. MMU Configuration Register 32 63 These bits are interpreted as follows: Figure 39. TLB Configuration Register Bit Description These bits are interpreted as follows: 32:39 Reserved Bit Description 40:46 Real Address Size (RASIZE) 32:39 Associativity (ASSOC) Number of bits in a real address supported by Total number of entries in a TLB array which the implementation. can be used for translating addresses with a given EPN. This number is referred to as the 47:48 Reserved associativity level of the TLB array. A value 49:52 Number of PID Registers (NPIDS) equal to NENTRY or 0 indicates the array is Indicates the number of PID registers pro- fully-associative. vided by the processor. 40:43 Minimum Page Size (MINSIZE) 53:57 PID Register Size (PIDSIZE) Minimum page size of TLB array. Page size The value of PIDSIZE is one less than the encoding is defined in Section 4.7.1.2. number of bits implemented for each of the 44:47 Maximum Page Size (MAXSIZE) PID registers implemented by the processor. Maximum page size of TLB array. Page size The processor implements only the least sig- encoding is defined in Section 4.7.1.2. nificant PIDSIZE+1 bits in the PID registers. The maximum number of PID register bits that 48 Invalidate Protection (IPROT) may be implemented is 14. Invalidate protect capability of TLB array. 58:59 Reserved 0 Indicates invalidate protection capability not supported. 60:61 Number of TLBs (NTLBS) 1 Indicates invalidate protection capability The value of NTLBS is one less than the num- supported. ber of software-accessible TLB structures that are implemented by the processor. NTLBS is 49 Page Size Availability (AVAIL) set to one less than the number of TLB struc- Page size availability of TLB array. tures so that its value matches the maximum 0 Fixed selectable page size from MINSIZE value of MAS0TLBSEL. to MAXSIZE (all TLB entries are the same 00 1 TLB size). 01 2 TLBs 1 Variable page size from MINSIZE to MAX- 10 3 TLBs SIZE (each TLB entry can be sized sepa- 11 4 TLBs rately). 62:63 MMU Architecture Version Number (MAVN) 50:51 Reserved Indicates the version number of the architec- 52:63 Number of Entries (NENTRY) ture of the MMU implemented by the proces- Number of entries in TLB array. sor. 00 Version 1.0 D.2.5.3 MMU Control and Status Regis- 01 Reserved 10 Reserved ter (MMUCSR0) 11 Reserved The MMUCSR0 register is used for general control of the MMU including invalidation of the TLB arrays and page sizes for programmable fixed size arrays. For TLB Appendix D. Type FSL Storage Control [Category: Embedded.MMU Type 743 Version 2.05 arrays that have programmable fixed sizes, the Programming Note TLBn_PS fields allow software to specify the page size. Changing the fixed page size of an entire MMUCSR0 array must be done with great care. If any entries in the array are valid, changing the 32 63 page size may cause those entries to Figure 40. MMU Control and Status Register 0 overlap, creating a serious programming error. It is suggested that the entire TLB These bits are interpreted as follows: array be invalidated and any entries with Bit Description IPROT have their V bits set to zero before changing page size. 32:40 Reserved 41:56 TLBn Array Page Size A 4-bit field specifies the page size for TLBn array. Page size encoding is defined in Sec- D.3 Page Identification and tion 4.7.1.2. For each TLB array n, the field is Address Translation implemented only if TLBnCFGAVAIL=0 and TLBnCFGMINSIZETLBnCFGMAXSIZE. If the Page Identification occurs as described in Section 4.7.2 value of TLBn_PS is not between TLBnCFG- except the matching TLB entry may be identified using MINSIZE and TLBnCFGMAXSIZE the page size more than one PID register. Accesses that would result is set to TLBnCFGMINSIZE. in multiple matching entries are not allowed and are 41:44 TLB3 Array Page Size (TLB3_PS) considered a serious programming error by system Page size of the TLB3 array. software and the results of such a translation are unde- fined. A PID register containing a 0 value (or the same 45:48 TLB2 Array Page Size (TLB2_PS) value as another PID register) will form a non unique Page size of the TLB2 array. match and is permissible. 49:52 TLB1 Array Page Size (TLB1_PS) Once a match occurs the matching TLB entry is used Page size of the TLB1 array. for access control, storage attributes, and effective to 53:56 TLB0 Array Page Size (TLB0_PS) real address translation. Page size of the TLB0 array. 57:62 TLBn Invalidate All TLB invalidate all bit for the TLBn array. D.4 TLB Management 0 If this bit reads as a 1, an invalidate all operation for the TLBn array is in D.4.1 Reading TLB Entries progress. Hardware will set this bit to 0 TLB entries can be read by executing tlbre instructions. when the invalidate all operation is com- At the time of tlbre execution, the MAS registers are pleted. Writing a 0 to this bit during an used to index a specific TLB entry and upon completion invalidate all operation is ignored. of the tlbre instruction, the MAS registers will contain 1 TLBn invalidation operation. Hardware ini- the contents of the indexed TLB entry. tiates a TLBn invalidate all operation. When this operation is complete, this bit is Specifying invalid values for MAS0TLBSEL and cleared. Writing a 1 during an invalidate MAS0ESEL produce undefined results. all operation produces an undefined result. If the TLB array supports IPROT, entries that have IPROT set will not be D.4.2 Writing TLB Entries invalidated. TLB entries can be written by executing tlbwe instruc- 57 TLB2 Invalidate All (TLB2_FI) tions. At the time of tlbwe execution, the MAS registers TLB invalidate all bit for the TLB2 array. are used to index a specific TLB entry and contain the contents to be written to the indexed TLB entry. Upon 58 TLB3 Invalidate All (TLB3_FI) completion of the tlbwe instruction, the contents of the TLB invalidate all bit for the TLB3 array. MAS registers corresponding to TLB entry fields will be 59:60 Reserved written to the indexed TLB entry. 61 TLB0 Invalidate All (TLB0_FI) Specifying invalid values for MAS0TLBSEL ESEL pro- TLB invalidate all bit for the TLB0 array. duces undefined results. 62 TLB1 Invalidate All (TLB1_FI) TLB invalidate all bit for the TLB1 array. 63 Reserved 744 Power ISATM III-E Version 2.05 D.4.3 Invalidating TLB Entries Programming Note TLB entries may be invalidated by three different meth- Not all TLB arrays in a given implementation will ods. The TLB entry can be invalidated as the result of a implement the IPROT attribute. It is likely that tlbwe instruction that sets the MAS1V bit in the entry to implementations that are suitable for demand page 0. TLB entries may also be invalidated as a result of a environments will implement it for only a single tlbivax instruction or from an invalidation resulting from array, while not implementing it for other TLB a tlbivax on another processor. Lastly, TLB entries arrays. may be invalidated as a result of an invalidate all oper- ation specified through appropriate settings in the Programming Note MMUCSR0. Operating systems need to use great care when In both multiprocessor and uniprocessor systems, using protected (IPROT) TLB entries, particularly in invalidations can occur on a wider set of TLB entries SMP systems. An SMP system that contains TLB than intended. That is, a virtual address presented for entries on other processors will require a cross pro- invalidation may cause not only the intended TLB tar- cessor interrupt or some other synchronization geted for invalidation to be invalidated, but may also mechanism to assure that each processor per- invalidate other TLB entries depending on the imple- forms the required invalidation by writing its own mentation. This is because parts of the translation TLB entries. mechanism may not be fully specified to the hardware at invalidate time. This is especially true in SMP sys- tems, where the invalidation address must be supplied Programming Note to all processors in the system, and there may be other To ensure a TLB entry that is not protected by limitations imposed by the hardware implementation. IPROT is invalidated if software does not know This phenomenon is known as generous invalidates. which TLB array the entry is in, software should The architecture assures that the intended TLB will be issue a tlbivax instruction targeting each TLB in invalidated, but does not guarantee that it will be the the implementation with the EA to be invalidated. only one. A TLB entry invalidated by writing the V bit of the TLB entry to 0 by use of a tlbwe instruction is guar- Programming Note anteed to invalidate only the addressed TLB entry. Invalidates occurring from tlbivax instructions or from The preferred method of invalidating entire TLB tlbivax instructions on another processor may cause arrays is invalidation using MMUCSR0. generous invalidates. The architecture provides a method to protect against Programming Note generous invalidations. This is important since there Invalidations using MMUCSR0 only affect the TLB are certain virtual memory regions that must be prop- array on the processor that performs the invalida- erly mapped to make forward progress. To prevent this, tion. To perform invalidations in a multiprocessor the architecture specifies an IPROT bit for TLB entries. system on all processors in a coherence domain, If the IPROT bit is set to 1 in a given TLB entry, that software should use tlbivax. entry is protected from invalidations resulting from tlbivax instructions, or from invalidate all operations. TLB entries with the IPROT field set may only be invali- D.4.4 Searching TLB Entries dated by explicitly writing the TLB entry and specifying a 0 for the V (MAS1V) field. Software may search the MMU by using the tlbsx instruction. The tlbsx instruction uses PID values and Programming Note an AS value from the MAS registers instead of the PID The most obvious issue with generous invalida- registers and the MSR. This allows software to search tions is the code memory region that serves as the address spaces that differ from the current address exception handler for MMU faults. If this region space defined by the PID registers. This is useful for does not have a valid mapping, an MMU exception TLB fault handling. cannot be handled because the first address of the exception handler will result in another MMU D.4.5 TLB Replacement Hardware exception. Assist The architecture provides mechanisms to assist soft- ware in creating and updating TLB entries when MMU related exceptions occur. This is called TLB Replace- ment Hardware Assist. Hardware will update the MAS Appendix D. Type FSL Storage Control [Category: Embedded.MMU Type 745 Version 2.05 registers on the occurrence of a Data TLB Error Inter- ing the exception was executing in 32-bit mode, rupt or Instruction TLB Error interrupt. then bits 0:31 of the EPN field in MAS2 will be set to 0. When a Data or Instruction TLB Error interrupt (miss) 1 Executing a tlbre instruction in 32-bit mode will set occurs, MAS0, MAS1, and MAS2 are automatically bits 0:31 of the MAS2 EPN field to an undefined updated using the defaults specified in MAS4 as well value. as the AS and EPN values corresponding to the access that caused the exception. MAS6 is updated to set Programming Note MAS6SPID0 to the value of PID0 and MAS6SAS to the value of MSRDS or MSRIS depending on the type of This allows a 32-bit OS to operate seamlessly on a access that caused the error. In addition, if 64-bit implementation and a 64-bit OS to easily MAS4TLBSELD identifies a TLB array that supports NV support 32-bit applications. (Next Victim), MAS0ESEL is loaded with a value that hardware believes represents the best TLB entry to vic- timize to create a new TLB entry and MAS0NV is updated with the TLB entry index of what hardware believes to be the next victim. Thus MAS0ESEL identi- fies the current TLB entry to be replaced, and MAS0NV points to the next victim. When software writes the TLB entry, the MAS0NV field is written to the TLB array. The algorithm used by the hardware to determine which TLB entry should be targeted for replacement is imple- mentation-dependent. The automatic update of the MAS registers sets up all the necessary fields for creating a new TLB entry with the exception of RPN, the U0-U3 attribute bits, and the permission bits. With the exception of the upper 32 bits of RPN and the page attributes (should software desire to specify changes from the default attributes), all the remaining fields are located in MAS3, requiring only the single MAS register manipulation by software before writing the TLB entry. For Instruction Storage interrupt (ISI) and Data Storage interrupt (DSI) related exceptions, the MAS registers are not updated. Software must explicitly search the TLB to find the appropriate entry. The update of MAS registers through TLB Replace- ment Hardware Assist is summarized in Table 7. D.5 32-bit and 64-bit Specific MMU Behavior MMU behavior is largely unaffected by whether the pro- cessor is in 32-bit computation mode (MSRCM=0) or 64-bit computation mode (MSRCM=1). The only differ- ences occur in the EPN field of the TLB entry and the EPN field of MAS2. The differences are summarized here. 1 Executing a tlbwe instruction in 32-bit mode will set bits 0:31 of the TLB EPN field to 0, regardless of the value of bits 0:31 of the EPN field in MAS2. 1 Updates to MAS registers via TLB Replacement Hardware Assist (see Section D.4.5), update bits 0:51 of the EPN field regardless of the computa- tion mode of the processor at the time of the exception or the interrupt computation mode in which the interrupt is taken. If the instruction caus- 746 Power ISATM III-E Version 2.05 D.6 Type FSL MMU Instructions The instructions described in this section, replace the instructions described in Section 4.9.4.1, "TLB Man- agement Instructions". TLB Invalidate Virtual Address Indexed a set of operations which is independent of the other X-form sets that mbar orders. The effects of the invalidation are not guaranteed to be tlbivax RA,RB visible to the programming model until the completion of a context synchronizing operation. 31 /// RA RB 786 / 0 6 11 16 21 31 Invalidations may occur for other TLB entries in the designated array, but in no case will any TLB entries if RA = 0 then b 1 0 with the IPROT attribute set be made invalid. else b 1 (RA) EA 1 b + (RB) In some implementations, if RA does not equal 0, it for each processor may produce an Illegal Instruction exception. for TLB array = EA59:60 This instruction is privileged. for each TLB entry m 1 ¬((1 << (2×(entrySIZE-1))) - 1) Special Registers Altered: if ((EA0:51 & m) = (entryEPN & m)) | EA61 None then if entryIPROT = 0 then entryV 1 0 Programming Note Let the effective address (EA) be the sum(RA|0)+ (RB). The use of EA61 to invalidate TLB arrays may be The EA is interpreted as show below. phased out in future versions of the architecture. EA0:51 EA0:51 The preferred method of invalidating TLB arrays is invalidation using MMUCSR0. EA52:58 Reserved EA59:60 TLB array selector 00 TLB0 01 TLB1 10 TLB2 11 TLB3 EA61 TLB Invalidate All EA62:63 Reserved If EA61=0, then if the TLB array targeted by EA59:60 contains an entry identified by EA0:51, that entry is made invalid unless the TLB entry is protected by the IPROT attribute. A TLB entry is identified if, for m = ¬((1 << (2×(TLB_entrysize-1))) - 1), EA0:51&m is equal to TLB_entryEPN&m. The AS bit does not partici- pate in the comparison. If EA61=1, then all entries not protected by the IPROT attribute in the TLB array targeted by EA59:60 are made invalid. This instruction causes the target TLB entry to be inval- idated in all processors. The operation performed by this instruction is ordered by the mbar (or sync) instruction with respect to a sub- sequent tlbsync instruction executed by the processor executing the tlbivax instruction. The operations caused by tlbivax and tlbsync are ordered by mbar as Appendix D. Type FSL Storage Control [Category: Embedded.MMU Type 747 Version 2.05 TLB Search Indexed X-form Special Registers Altered: MAS0 MAS1 MAS2 MAS3 MAS7 tlbsx RA,RB 31 /// RA RB 914 / 0 6 11 16 21 31 TLB Read Entry X-form if RA = 0 then b 1 0 tlbre else b 1 (RA) EA 1 b + (RB) 31 /// /// /// 946 / pid 1 MAS6SPID0 0 6 11 16 21 31 as 1 MAS6SAS va 1 as || pid || EA if Valid_matching_entry_exists(va) then entry 1 SelectTLB(MAS0TLBSEL, MAS0ESEL, MAS2EPN) entry 1 matching entry found rpn 1 entryRPN array 1 TLB array number where TLB entry found if TLB array supports Next Victim then index 1 index into TLB array of TLB entry found MAS0NV 1 hint if TLB array supports Next Victim then else hint 1 hardware hint for Next Victim MAS0NV 1 undefined else MAS1V IPROT TID TS TSIZE 1 entryV IPROT TID TS SIZE hint 1 undefined MAS2EPN VLE W I M G E ACM 1 entryEPN VLE W I M G E ACM rpn 1 entryRPN MAS3RPNL 1 rpn32:51 MAS0TLBSEL 1 array MAS3U0:U3 UX SX UW SW UR SR 1 entryU0:U3 UX SX UW SW UR SR MAS0ESEL 1 index MAS7RPNU 1 rpn0:31 MAS0NV 1 hint The contents of the TLB entry specified by MAS1V 1 1 MAS0TLBSEL, MAS0ESEL, and MAS2EPN are read and MAS1IPROT TID TS TSIZE 1 entryIPROT TID TS SIZE placed into the MAS registers. MAS2EPN VLE W I M G E ACM 1 entryEPN VLE W I M G E ACM MAS3RPNL 1 rpn32:51 If the TLB array supports MAS0NV, then an implemen- MAS3U0:U3 UX SX UW SW UR SR 1 entryU0:U3 UX SX UW SW UR SR tation defined value, hint, specifying the index for the MAS7RPNU 1 rpn0:31 next entry to be replaced is loaded into MAS0NV; other- else wise MAS0NV is set to an undefined value. MAS0TLBSEL 1 MAS4TLBSELD MAS0ESEL 1 hint If the specified entry does not exist, the results are MAS0NV 1 hint undefined. MAS1V IPROT 1 0 MAS1TID TS 1 MAS6SPID0 SAS This instruction is privileged. MAS1TSIZE 1 MAS4TSIZED MAS2VLE W I M G E ACM 1 MAS4VLED WD ID MD GD ED ACMD Special Registers Altered: MAS2EPN 1 undefined MAS0 MAS1 MAS2 MAS3 MAS7 MAS3RPNL 1 0 MAS3U0:U3 UX SX UW SW UR SR 1 0 MAS7RPNU 1 0 Let the effective address (EA) be the sum(RA|0)+ (RB). If any valid TLB array contains an entry corresponding to the virtual address formed by MAS6SAS SPID0 and EA, that entry as well as the index and array are read into the MAS registers. If no valid matching translation exists, MAS1V is set to 0 and the MAS registers are loaded with defaults to facilitate a TLB replacement. If the TLB array supports MAS0NV, an implementation defined value, hint, specifying the index for the next entry to be replaced is loaded into MAS0NV regardless of whether a match occurs; otherwise MAS0NV is set to an undefined value. It is also loaded into MAS0ESEL if no match occurs. In some implementations, if RA does not equal 0, it may produce an Illegal Instruction exception. This instruction is privileged. 748 Power ISATM III-E Version 2.05 TLB Synchronize X-form TLB Write Entry X-form tlbsync tlbwe 31 /// /// /// 566 / 31 /// /// /// 978 / 0 6 11 16 21 31 0 6 11 16 21 31 The tlbsync instruction provides an ordering function entry 1 SelectTLB(MAS0TLBSEL, MAS0ESEL, MAS2EPN) for the effects of all tlbivax instructions executed by the rpn 1 MAS7RPNU || MAS3RPNL processor executing the tlbsync instruction, with hint 1 MAS0NV respect to the memory barrier created by a subsequent entryV IPROT TID TS SIZE 1 MAS1V IPROT TID TS TSIZE sync (msync) instruction executed by the same pro- entryEPN VLE W I M G E ACM 1 MAS2EPN VLE W I M G E ACM entryU0:U3 UX SX UW SW UR SR 1 MAS3U0:U3 UX SX UW SW UR SR cessor. Executing a tlbsync instruction ensures that all entryRPN 1 rpn of the following will occur. The contents of the MAS registers are written to the 1 All TLB invalidations caused by tlbivax instructions TLB entry specified by MAS0TLBSEL, MAS0ESEL, and preceding the tlbsync instruction will have com- MAS2EPN. pleted on any other processor before any storage accesses associated with data accesses caused MAS0NV provides a suggestion to hardware of where by instructions following the sync (msync) instruc- the next hardware hint for replacement should be given tion are performed with respect to that processor. when the next Data or Instruction TLB Error Interrupt, tlbsx, or tlbre instruction occurs. 1 All storage accesses by other processors for which the address was translated using the translations If the specified entry does not exist, the results are being invalidated will have been performed with undefined. respect to the processor executing the sync (msync) instruction, to the extent required by the A context synchronizing instruction is required after a associated Memory Coherence Required tlbwe instruction to ensure any subsequent instructions attributes, before the sync (msync) instruction's that will use the updated TLB values execute in the new memory barrier is created. context. The operation performed by this instruction is ordered This instruction is privileged. by the mbar or sync (msync) instruction with respect Special Registers Altered: to preceding tlbivax instructions executed by the pro- None cessor executing the tlbsync instruction. The opera- tions caused by tlbivax and tlbsync are ordered by mbar as a set of operations, which is independent of the other sets that mbar orders. The tlbsync instruction may complete before opera- tions caused by tlbivax instructions preceding the tlb- sync instruction have been performed. This instruction is privileged. Special Registers Altered: None Appendix D. Type FSL Storage Control [Category: Embedded.MMU Type 749 Version 2.05 750 Power ISATM III-E Version 2.05 Appendix E. Example Performance Monitor [Category: Embedded.Performance Monitor] 1 Counter registers. These registers are used for E.1 Overview data collection. The occurrence of selected events This appendix describes an example of a Performance are counted here. These registers are named Monitor facility. It defines an architecture suitable for PMC0..15. User and supervisor level access to performance monitoring facilities in the Embedded these registers is through different PMR numbers environment. The architecture itself presents only pro- allowing different access rights. gramming model visible features in conjunction with 1 Global controls. This register control global set- architecturally defined behavioral features. Much of the tings of the Performance Monitor facility and affect selection of events is by necessity implementation- all counters. This register is named PMGC0. User dependent and is not described as part of the architec- and supervisor level access to these registers is ture; however, this document provides guidelines for through different PMR numbers allowing different some features of a performance monitor implementa- access rights. In addition, a bit in the MSR tion that should be followed by all implementations. (MSRPMM) is defined to enable/disable counting. The example Performance Monitor facility provides the 1 Local controls. These registers control settings ability to monitor and count predefined events such as that apply only to a particular counter. These regis- processor clocks, misses in the instruction cache or ters are named PMLCa0..15 and PMLCb0..15. data cache, types of instructions decoded, or mispre- User and supervisor level access to these regis- dicted branches. The count of such events can be used ters is through different PMR numbers allowing dif- to trigger the Performance Monitor exception. While ferent access rights. Each set of local control most of the specific events are not architected, the registers (PMLCan and PMLCbn) contains con- mechanism of controlling data collection is. trols that apply to the associated same numbered counter register (e.g. PMLCa0 and PMLCb0 con- The example Performance Monitor facility can be used tain controls for PMC0 while PMLCa1 and to do the following: PMLCb1 contain controls for PMC1). 1 Improve system performance by monitoring soft- ware execution and then recoding algorithms for Assembler Note more efficiency. For example, memory hierarchy The counter registers, global controls, and local behavior can be monitored and analyzed to opti- controls have alias names which cause the assem- mize task scheduling or data distribution algo- bler to use different PMR numbers. The names rithms. PMC0...15, PMGC0, PMLCa0...15, and 1 Characterize processors in environments not eas- PMLCb0...15 cause the assembler to use the ily characterized by benchmarking. supervisor level PMR number, and the names UPMC0...15, UPMGC0, UPMLCa0...15, and 1 Help system developers bring up and debug their UPMLCb0...15 cause the assembler to use the systems. user-level PMR number. A given implementation may implement fewer counter E.2 Programming Model registers (and their associated control registers) than The example Performance Monitor facility defines a set are architected. Architected counter and counter con- of Performance Monitor Registers (PMRs) that are trol registers that are not implemented behave the used to collect and control performance data collection same as unarchitected Performance Monitor Registers. and an interrupt to allow intervention by software. The PMRs are described in Section E.3. PMRs provide various controls and access to collected data. They are categorized as follows: Software uses the global and local controls to select which events are counted in the counter registers, when such events should be counted, and what action Appendix E. Example Performance Monitor [Category: Embedded.Perfor- 751 Version 2.05 should be taken when a counter overflows. Software monitoring of each processor state are shown in can use the collected information to determine perfor- Figure 41. mance attributes of a given segment of code, a pro- cess, or the entire software system. PMRs can be read Processor State FCS FCU FCM1 FCM0 by software using the mfpmr instruction and PMRs can Marked 0 0 0 1 be written by using the mtpmr instruction. Both instruc- tions are described in Section E.4. Not marked 0 0 1 0 Supervisor 0 1 0 0 Since counters are defined as 32-bit registers, it is pos- sible for the counting of some events to overflow. A User 1 0 0 0 Performance Monitor interrupt is provided that can be Marked and supervisor 0 1 0 1 programmed to occur in the event of a counter over- Marked and user 1 0 0 1 flow. The Performance Monitor interrupt is described in detail in Section E.2.5 and Section E.2.6. Not marked and supervisor 0 1 1 0 Not mark and user 1 0 1 0 E.2.1 Event Counting All 0 0 0 0 None X X 1 1 Event counting can be configured in several different None 1 1 X X ways. This section describes configurability and spe- cific unconditional counting modes. Figure 41. Processor States and PMLCan Bit Settings E.2.2 Processor Context Config- Two unconditional counting modes may be specified: urability 1 Counting is unconditionally enabled regardless of the states of MSRPMM and MSRPR. This can be Counting can be enabled if conditions in the processor accomplished by setting PMLCanFCS, state match a software-specified condition. Because a PMLCanFCU, PMLCanFCM1, and PMLCanFCM0 to software task scheduler may switch a processor's exe- 0 for each counter control. cution among multiple processes and because statis- tics on only a particular process may be of interest, a 1 Counting is unconditionally disabled regardless of facility is provided to mark a process. The Performance the states of MSRPMM and MSRPR. This can be Monitor mark bit, MSRPMM, is used for this purpose. accomplished by setting PMGC0FAC to 1 or by set- System software may set this bit to 1 when a marked ting PMLCanFC to 1 for each counter control. Alter- process is running. This enables statistics to be gath- natively, this can be accomplished by setting ered only during the execution of the marked process. PMLCanFCM1 to 1 and PMLCanFCM0 to 1 for each The states of MSRPR and MSRPMM together define a counter control or by setting PMLCanFCS to 1 and state that the processor (supervisor or user) and the PMLCanFCU to 1 for each counter control. process (marked or unmarked) may be in at any time. If this state matches an individual state specified by the Programming Note PMLCanFCS, PMLCanFCU, PMLCanFCM1 and Events may be counted in a fuzzy manner. That is, PMLCanFCM0 fields in PMLCan (the state for which events may not be counted precisely due to the monitoring is enabled), counting is enabled for PMCn. nature of an implementation. Users of the Perfor- Each event, on an implementation basis, may count mance Monitor facility should be aware that an regardless of the value of MSRPMM. The counting event may be counted even if it was precisely fil- behavior of each event should be documented in the tered, though it should not have been. In general User's Manual. such discrepancies are statistically unimportant and users should not assume that counts are The processor states and the settings of the explicitly accurate. PMLCanFCS, PMLCanFCU, PMLCanFCM1 and PMLCanFCM0 fields in PMLCan necessary to enable E.2.3 Event Selection Events to count are determined by placing an imple- mentation defined event value into the PMLCa0..15EVENT field. Which events may be pro- grammed into which counter are implementation spe- cific and should be defined in the User's Manual. In general, most events may be programmed into any of the implementation available counters. Programming a 752 Power ISATM III-E Version 2.05 counter with an event that is not supported for that Programming Note counter gives boundedly undefined results. When taking a Performance Monitor interrupt soft- Programming Note ware should clear the overflow condition by reading the counter register and setting the counter register Event name and event numbers will differ greatly to a non-overflow value since the normal return across implementations and software should not from the interrupt will set MSREE back to 1. expect that events and event names will be consis- tent. E.3 Performance Monitor Regis- E.2.4 Thresholds ters Thresholds are values that must be exceeded for an event to be counted. Threshold values are pro- grammed in the PMLCb0..15THRESHOLD field. The E.3.1 Performance Monitor Glo- events which may be thresholded and the units of each bal Control Register 0 event that may be thresholded are implementation- dependent. Programming a threshold value for an The Performance Monitor Global Control Register 0 event that is not defined to use a threshold gives (PMGC0) controls all Performance Monitor counters. boundedly undefined results. PMGC0 32 63 E.2.5 Performance Monitor Figure 42. [User] Performance Monitor Global Exception Control Register 0 A Performance Monitor exception occurs when counter These bits are interpreted as follows: overflow detection is enabled and a counter overflows. More specifically, for each counter register n, if Bit Description PMGC0PMIE=1 and PMLCanCE=1 and PMCnOV=1 and 32 Freeze All Counters (FAC) MSREE = 1, a Performance Monitor exception is said to The FAC bit is sticky; that is, once set to 1 it exist. The Performance Monitor exception condition will remains set to 1 until it is set to 0 by an mtpmr cause a Performance Monitor interrupt if the exception instruction. is the highest priority exception. 0 The PMCs can be incremented (if enabled The Performance Monitor exception is level sensitive by other Performance Monitor control and the exception condition may cease to exist if any of fields). the required conditions fail to be met. Thus it is possible 1 The PMCs can not be incremented. for a counter to overflow and continue counting events 33 Performance Monitor Interrupt Enable until PMCnOV becomes 0 without taking a Performance (PMIE) Monitor interrupt if MSREE = 0 during the overflow con- dition. To avoid this, software should program the 0 Performance Monitor interrupts are dis- counters to freeze if an overflow condition is detected abled. (see Section E.3.4). 1 Performance Monitor interrupts are enabled and occur when an enabled con- dition or event occurs. Enabled conditions E.2.6 Performance Monitor Inter- and events are described in Section E.2.5. rupt 34 Freeze Counters on Enabled Condition or Event (FCECE) A Performance Monitor interrupt occurs when a Perfor- Enabled conditions and events are described mance Monitor exception exists and no higher priority in Section E.2.5. exception exists. When a Performance Monitor inter- rupt occurs, SRR0 and SRR1 record the current state 0 The PMCs can be incremented (if enabled of the NIA and the MSR, the MSR is set to handle the by other Performance Monitor control interrupt, and instruction execution resumes at fields). IVPR0:47 || IVOR3548:59 || 0b0000. 1 The PMCs can be incremented (if enabled by other Performance Monitor control The Performance Monitor interrupt is precise and asyn- fields) only until an enabled condition or chronous. event occurs. When an enabled condition or event occurs, PMGC0FAC is set to 1. It is the user's responsibility to set PMGC0FAC to 0. Appendix E. Example Performance Monitor [Category: Embedded.Perfor- 753 Version 2.05 35:63 Reserved 0 Overflow conditions for PMCn cannot occur (PMCn cannot cause interrupts, The UPMGC0 register is an alias to the PMGC0 regis- cannot freeze counters) ter for user mode read only access. 1 Overflow conditions occur when the most- significant-bit of PMCn is equal to 1. E.3.2 Performance Monitor Local It is recommended that CE be set to 0 when Control A Registers counter PMCn is selected for chaining; see Section E.5.1. The Performance Monitor Local Control A Registers 0 through 15 (PMLCa0..15) function as event selectors 38:40 Reserved and give local control for the corresponding numbered 41:47 Event Selector (EVENT) Performance Monitor counters. PMLCa works with the Up to 128 events selectable; see Section corresponding numbered PMLCb register. E.2.3. 48:53 Setting is implementation-dependent. PMLCa0..15 32 63 54:63 Reserved Figure 43. [User] Performance Monitor Local The UPMLCa0..15 registers are aliases to the Control A Registers PMLCa0..15 registers for user mode read only access. PMLCa is set to 0 at reset. These bits are interpreted as follows: E.3.3 Performance Monitor Local Bit Description Control B Registers 32 Freeze Counter (FC) The Performance Monitor Local Control B Registers 0 0 The PMC can be incremented (if enabled through 15 (PMLCb0..15) specify a threshold value and by other Performance Monitor control a multiple to apply to a threshold event selected for the fields). corresponding Performance Monitor counter. Thresh- 1 The PMC can not be incremented. old capability is implementation counter dependent. Not all events or all counters of an implementation are 33 Freeze Counter in Supervisor State (FCS) guaranteed to support thresholds. PMLCb works with 0 The PMC is incremented (if enabled by the corresponding numbered PMLCa register. other Performance Monitor control fields). 1 The PMC can not be incremented if PMLCb0..15 MSRPR is 0. 32 63 34 Freeze Counter in User State (FCU) Figure 44. [User] Performance Monitor Local Control B Register 0 The PMC can be incremented (if enabled by other Performance Monitor control PMLCb is set to 0 at reset. These bits are interpreted fields). as follows: 1 The PMC can not be incremented if Bit Description MSRPR is 1. 32:52 Reserved 35 Freeze Counter while Mark is Set (FCM1) 53:55 Threshold Multiple (THRESHMUL) 0 The PMC can be incremented (if enabled by other Performance Monitor control 000 Threshold field is multiplied by 1 fields). (THRESHOLD × 1) 1 The PMC can not be incremented if 001 Threshold field is multiplied by 2 MSRPMM is 1. (THRESHOLD × 2) 010 Threshold field is multiplied by 4 36 Freeze Counter while Mark is Cleared (THRESHOLD × 4) (FCM0) 011 Threshold field is multiplied by 8 0 The PMC can be incremented (if enabled (THRESHOLD × 8) by other Performance Monitor control 100 Threshold field is multiplied by 16 fields). (THRESHOLD × 16) 1 The PMC can not be incremented if 101 Threshold field is multiplied by 32 MSRPMM is 0. (THRESHOLD × 32) 37 Condition Enable (CE) 110 Threshold field is multiplied by 64 (THRESHOLD × 64) 754 Power ISATM III-E Version 2.05 111 Threshold field is multiplied by 128 the counter increments from a value below (THRESHOLD × 128) 2,147,483,648 (0x8000_0000) to a value greater than or equal to 2,147,483,648 (0x8000_0000). 56:57 Reserved 58:63 Threshold (THRESHOLD) Several different actions may occur when an overflow Only events that exceed the value THRESH- state is reached, depending on the configuration: OLD multiplied as described by THRESHMUL 1 If PMLCanCE is 0, no special actions occur on are counted. Events to which a threshold overflow: the counter continues incrementing, and value applies are implementation-dependent no exception is signaled. as are the unit (for example duration in cycles) 1 If PMLCanCE and PMGC0FCECE are 1, all counters and the granularity with which the threshold are frozen when PMCn overflows. value is interpreted. 1 If PMLCanCE, PMGC0PMIE, and MSREE are 1, an exception is signalled when PMCn reaches over- Programming Note flow. Note that the interrupts are masked by setting By varying the threshold value, software can obtain MSREE to 0. An overflow condition may be present a profile of the event characteristics subject to while MSREE is zero, but the interrupt is not taken thresholding. For example, if PMC1 is configured to until MSREE is set to 1. count cache misses that last longer than the If an overflow condition occurs while MSREE is 0 (the threshold value, software can measure the distribu- exception is masked), the exception is still signalled tion of cache miss durations for a given program by once MSREE is set to 1 if the overflow condition is still monitoring the program repeatedly using a different present and the configuration has not been changed in threshold value each time. the meantime to disable the exception; however, if MSREE remains 0 until after the counter leaves the The UPMLCb0..15 registers are aliases to the overflow state (MSB becomes 0), or if MSREE remains PMLCb0..15 registers for user mode read only access. 0 until after PMLCanCE or PMGC0PMIE are set to 0, the exception does not occur. E.3.4 Performance Monitor Programming Note Counter Registers Loading a PMC with an overflowed value can The Performance Monitor Counter Registers cause an immediate exception. For example, if (PMC0..15) are 32-bit counters that can be pro- PMLCanCE, PMGC0PMIE, and MSREE are all 1, grammed to generate interrupt signals when they over- and an mtpmr loads an overflowed value into a flow. Each counter is enabled to count up to 128 PMCn that previously held a non-overflowed value, events. then an interrupt will be generated before any event counting has occurred. PMC0..15 32 63 The following sequence is generally recommended for setting the counter values and configurations. Figure 45. [User] Performance Monitor Counter 1. Set PMGC0FAC to 1 to freeze the counters. Registers 2. Perform a series of mtpmr operations to initialize PMCs are set to 0 at reset. These bits are interpreted counter values and configure the control registers as follows: 3. Release the counters by setting PMGC0FAC to 0 Bit Description with a final mtpmr. 32 Overflow (OV) 0 Counter has not reached an overflow state. 1 Counter has reached an overflow state. 33:63 Counter Value (CV) Indicates the number of occurrences of the specified event. The minimum value for a counter is 0 (0x0000_0000) and the maximum value is 4,294,967,295 (0xFFFF_FFFF). A counter can increment up to the maximum value and then wraps to the minimum value. A counter enters the overflow state when the high- order bit is set to 1, which normally occurs only when Appendix E. Example Performance Monitor [Category: Embedded.Perfor- 755 Version 2.05 E.4 Performance Monitor Instructions Move From Performance Monitor Register Move To Performance Monitor Register XFX-form XFX-form mfpmr RT,PMRN mtpmr PMRN,RS 31 RT pmrn 334 / 31 RS pmrn 462 / 0 6 11 21 31 0 6 11 21 31 n 1 pmrn5:9 || pmrn0:4 n 1 pmrn5:9 || pmrn0:4 if length(PMR(n)) = 64 then if length(PMR(n)) = 64 then RT 1 PMR(n) PMR(n) 1 (RS) else else RT 1 320 || PMR(n)32:63 PMR(n) 1 (RS)32:63 Let PMRN denote a Performance Monitor Register Let PMRN denote a Performance Monitor Register number and PMR the set of Performance Monitor Reg- number and PMR the set of Performance Monitor Reg- isters. isters. The contents of the designated Performance Monitor The contents of the register RS are placed into the des- Register are placed into register RT. ignated Performance Monitor Register. The list of defined Performance Monitor Registers and The list of defined Performance Monitor Registers and their privilege class is provided in Figure 46. their privilege class is provided in Figure 46. Execution of this instruction specifying a defined and Execution of this instruction specifying a defined and privileged Performance Monitor Register when privileged Performance Monitor Register when MSRPR=1 will result in a Privileged Instruction excep- MSRPR=1 will result in a Privileged Instruction excep- tion. tion. Execution of this instruction specifying an undefined Execution of this instruction specifying an undefined Performance Monitor Register will either result in an Performance Monitor Register will either result in an Illegal Instruction exception or will produce an unde- Illegal Instruction exception or will perform no opera- fined value for register RT. tion. Special Registers Altered: Special Registers Altered: None None PMR1 Privileged decimal Register Name Cat pmrn5:9 pmrn0:4 mtpmr mfpmr 0-15 00000 0xxxx PMC0..15 - no E.PM 16-31 00000 1xxxx PMC0..15 yes yes E.PM 128-143 00100 0xxxx PMLCA0..15 - no E.PM 144-159 00100 1xxxx PMLCA0..15 yes yes E.PM 256-271 01000 0xxxx PMLCB0..15 - no E.PM 272-287 01000 1xxxx PMLCB0..15 yes yes E.PM 384 01100 00000 PMGC0 - no E.PM 400 01100 10000 PMGC0 yes yes E.PM - This register is not defined for this instruction. 1 Note that the order of the two 5-bit halves of the PMR number is reversed. Figure 46. Embedded.Peformance Monitor PMRs 756 Power ISATM III-E Version 2.05 E.5 Performance Monitor Soft- ware Usage Notes E.5.1 Chaining Counters An implementation may contain events that are used to "chain" counters together to provide a larger range of event counts. This is accomplished by programming the desired event into one counter and programming another counter with an event that occurs when the first counter transitions from 1 to 0 in the most significant bit. The counter chaining feature can be used to decrease the processing pollution caused by Performance Moni- tor interrupts, (things like cache contamination, and pipeline effects), by allowing a higher event count than is possible with a single counter. Chaining two counters together effectively adds 32 bits to a counter register where the first counter's carry-out event acts like a carry-out feeding the second counter. By defining the event of interest to be another PMC's overflow genera- tion, the chained counter increments each time the first counter rolls over to zero. Multiple counters may be chained together. Because the entire chained value cannot be read in a single instruction, an overflow may occur between counter reads, producing an inaccurate value. A sequence like the following is necessary to read the complete chained value when it spans multiple counters and the counters are not frozen. The example shown is for a two-counter case. loop: mfpmr Rx,pmctr1 #load from upper counter mfpmr Ry,pmctr0 #load from lower counter mfpmr Rz,pmctr1 #load from upper counter cmp cr0,0,Rz,Rx #see if `old' = `new' bc 4,2,loop #loop if carry occurred between reads The comparison and loop are necessary to ensure that a consistent set of values has been obtained. The above sequence is not necessary if the counters are frozen. E.5.2 Thresholding Threshold event measurement enables the counting of duration and usage events. Assume an example event, dLFB load miss cycles, requires a threshold value. A dLFB load miss cycles event is counted only when the number of cycles spent recovering from the miss is greater than the threshold. If the event is counted on two counters and each counter has an individual threshold, one execution of a performance monitor pro- gram can sample two different threshold values. Mea- suring code performance with multiple concurrent thresholds expedites code profiling significantly. Appendix E. Example Performance Monitor [Category: Embedded.Perfor- 757 Version 2.05 758 Power ISATM III-E Version 2.05 Book VLE: Power ISA Operating Environment Architecture - Variable Length Encoding (VLE) Environment Book VLE: Power ISA Operating Environment Architecture - 759 Version 2.05 760 Power ISATM VLE Version 2.05 Chapter 1. Variable Length Encoding Introduction 1.1 Overview. . . . . . . . . . . . . . . . . . . . 761 1.4.6 R-form (16-bit Monadic Instructions) 1.2 Documentation Conventions . . . . 762 763 1.2.1 Description of Instruction Operation 1.4.7 RR-form (16-bit Dyadic Instructions) 762 763 1.3 Instruction Mnemonics and Operands 1.4.8 SD4-form (16-bit Load/Store Instruc- 762 tions) . . . . . . . . . . . . . . . . . . . . . . . . . . 763 1.4 VLE Instruction Formats . . . . . . . . 762 1.4.9 BD15-form . . . . . . . . . . . . . . . . . 763 1.4.1 BD8-form (16-bit Branch Instruc- 1.4.10 BD24-form . . . . . . . . . . . . . . . . 763 tions) . . . . . . . . . . . . . . . . . . . . . . . . . . 762 1.4.11 D8-form . . . . . . . . . . . . . . . . . . 763 1.4.2 C-form (16-bit Control Instructions) . 1.4.12 I16A-form . . . . . . . . . . . . . . . . . 763 762 1.4.13 I16L-form . . . . . . . . . . . . . . . . . 763 1.4.3 IM5-form (16-bit register + immediate 1.4.14 M-form . . . . . . . . . . . . . . . . . . . 763 Instructions) . . . . . . . . . . . . . . . . . . . . 762 1.4.15 SCI8-form. . . . . . . . . . . . . . . . . 763 1.4.4 OIM5-form (16-bit register + offset 1.4.16 LI20-form . . . . . . . . . . . . . . . . . 763 immediate Instructions) . . . . . . . . . . . . 762 1.4.17 Instruction Fields . . . . . . . . . . . 763 1.4.5 IM7-form (16-bit Load immediate Instructions) . . . . . . . . . . . . . . . . . . . . 762 This chapter describes computation modes, document standard instruction encodings and VLE instructions for conventions, a processor overview, instruction formats, that page of memory. storage addressing, and instruction addressing. Instruction encodings in pages marked as VLE are either 16 or 32 bits long, and are aligned on 16-bit 1.1 Overview boundaries. Because of this, all instruction pages marked as VLE are required to use Big-Endian byte Variable Length Encoding (VLE) is a code density opti- ordering. mized re-encoding of much of the instruction set The programming model uses the same register set defined by Books I, II, and III-E using both 16-bit and with both instruction set encodings, although some reg- 32-bit instruction formats. isters are not accessible by VLE instructions using the VLE offers more efficient binary representations of 16-bit formats and not all condition register (CR) fields applications for the embedded processor spaces where are used by Conditional Branch instructions or instruc- code density plays a major role in affecting overall sys- tions that access the condition register executing from tem cost, and to a somewhat lesser extent, perfor- a VLE instruction page. In addition, immediate fields mance. and displacements differ in size and use, due to the more restrictive encodings imposed by VLE instruction VLE is a supplement to the instruction set defined by formats. Book I-III and code pages using VLE encoding or non- VLE encoding can be intermingled in a system provid- VLE additional instruction fields are described in ing focus on both high performance and code density Section 1.4.17, "Instruction Fields". where most needed. Other than the requirement of Big-Endian byte ordering VLE provides alternative encodings to instructions for instruction pages and the additional storage defined in Books I-III to enable reduced code footprint. attribute to identify whether the instruction page corre- This set of alternative encodings is selected on a page sponds to a VLE section of code, VLE complies with the basis. A single storage attribute bit selects between memory model, register model, timer facilities, debug facilities, and interrupt/exception model defined in Book Chapter 1. Variable Length Encoding Introduction 761 Version 2.05 I-III and therefore execute in the same environment as In some cases an instruction field must contain a partic- non-VLE instructions. ular value. If a field that must contain a particular value does not contain that value, the instruction form is invalid and the results are as described for invalid 1.2 Documentation Conventions instruction forms in Book I. Book VLE adheres to the documentation conventions VLE instructions use split field notation as defined in defined inSection 1.3 of Book I. Note however that this Section 1.6 of Book I. book defines instructions that apply to the User Instruc- tion Set Architecture, the Virtual Environment Architec- 1.4.1 BD8-form (16-bit Branch ture, and the Operating Environment Architecture. Instructions) 1.2.1 Description of Instruction 0 5 6 7 8 15 Operation OPCD BO16 BI16 BD8 OPCD X O LK BD8 The RTL (register transfer language) descriptions in Book VLE conform to the conventions described in Figure 1. BD8 instruction format Section 1.3.4 of Book I. 1.4.2 C-form (16-bit Control 1.3 Instruction Mnemonics and Instructions) Operands 0 15 The description of each instruction includes the mne- OPCD monic and a formatted list of operands. VLE instruction OPCD LK semantics are either identical or similar to those of other instructions in the architecture. Where the semantics, side-effects, and binary encodings are iden- Figure 2. C instruction format tical, the standard mnemonics and formats are used. Such unchanged instructions are listed and appropri- 1.4.3 IM5-form (16-bit register + ately referenced, but the instruction definitions are not replicated in this book. Where the semantics are similar immediate Instructions) but the binary encodings differ, the standard mnemonic 0 6 7 12 15 is typically preceded with an e_ to denote a VLE instruction. To distinguish between similar instructions OPCD X O UI5 RX available in both 16- and 32-bit forms under VLE and standard instructions, VLE instructions encoded with 16 Figure 3. IM5 instruction format bits have an se_ prefix. The following are examples: stwx RS,RA,RB // standard Book I instruction e_stw RS,D(RA) // 32-bit VLE instruction 1.4.4 OIM5-form (16-bit register + se_stw RZ,SD4(RX) // 16-bit VLE instruction offset immediate Instructions) 1.4 VLE Instruction Formats 0 OPCD 6 7 X OIM5 12 RX 15 O All VLE instructions to be executed are either two or OPCD R C OIM5 RX four bytes long and are halfword-aligned in storage. Thus, whenever instruction addresses are presented to Figure 4. OIM5 instruction format the processor (as in Branch instructions), the low-order bit is treated as 0. Similarly, whenever the processor generates an instruction address, the low-order bit is 1.4.5 IM7-form (16-bit Load imme- zero. diate Instructions) The format diagrams given below show horizontally all valid combinations of instruction fields. Only those for- 0 5 12 15 mats that are unique to VLE-defined instructions are OPCD UI7 RX included here. Instruction forms that are available in VLE or non-VLE mode are described in Section 1.6 of Figure 5. IM7 instruction format Book I and are not repeated here. 762 Power ISATM VLE Version 2.05 1.4.6 R-form (16-bit Monadic 1.4.12 I16A-form Instructions) 0 6 11 16 21 31 OPCD si RA XO si 0 6 12 15 OPCD XO RX OPCD ui RA XO ui Figure 6. R instruction format Figure 12. I16A instruction format 1.4.7 RR-form (16-bit Dyadic 1.4.13 I16L-form Instructions) 0 6 11 16 21 31 OPCD RT ui XO ui 0 6 7 8 12 15 OPCD XO RY RX Figure 13. I16L instruction format OPCD X R O C RY RX OPCD XO ARY RX OPCD XO RY ARX 1.4.14 M-form 0 6 11 16 21 26 31 Figure 7. RR instruction format OPCD RS RA SH MB ME X O OPCD RS RA SH MB ME X O 1.4.8 SD4-form (16-bit Load/Store Instructions) Figure 14. M instruction format 0 4 OPCD SD4 8 RZ 12 RX 15 1.4.15 SCI8-form 0 6 9 11 16 20 21 22 24 31 Figure 8. SD4 instruction format OPCD RT RA XO Rc F SCL UI8 OPCD RT RA XO F SCL UI8 1.4.9 BD15-form OPCD RS RA XO Rc F SCL UI8 OPCD RS RA XO F SCL UI8 0 10 12 16 31 OPCD 000 BF32 RA XO F SCL UI8 OPCD BO32 BI32 BD15 LK OPCD 001 BF32 RA XO F SCL UI8 OPCD XO RA XO F SCL UI8 Figure 9. BD15 instruction format Figure 15. SC18 instruction format 1.4.10 BD24-form 1.4.16 LI20-form 0 6 7 31 OPCD 0 BD24 LK 0 6 11 16 17 21 31 OPCD RT li20 XO li20 li20 Figure 10. BD24 instruction format Figure 16. LI20 instruction format 1.4.11 D8-form 1.4.17 Instruction Fields 0 6 11 16 24 31 OPCD RT RA XO D8 VLE uses instruction fields defined in Section 1.6.24 of OPCD RS RA XO D8 Book I as well as VLE-defined instruction fields defined below. Figure 11. D8 instruction format ARX (12:15) Field used to specify an "alternate" General Purpose Register in the range R8:R23 to be used as a destination. Chapter 1. Variable Length Encoding Introduction 763 Version 2.05 ARY (8:11) 1 Set the Link Register. The sum of the Field used to specify an "alternate" General value 2 or 4 and the address of the Purpose Register in the range R8:R23 to be Branch instruction is placed into the Link used as a source. Register. OIM5 (7:11) Offset Immediate field used to specify a 5-bit BD8 (8:15), BD15 (16:30), BD24 (7:30) unsigned fixed-point value in the range [1:32] Immediate field specifying a signed two's encoded as [0:31]. Thus the binary encoding complement branch displacement which is of 0b00000 represents an immediate value of concatenated on the right with 0b0 and sign- 1, 0b00001 represents an immediate value of extended to 64 bits. 2, and so on. BD15. (Used by 32-bit branch conditional OPCD (0:3, 0:4, 0:5, 0:9, 0:14, 0:15) class instructions) A 15-bit signed displace- Primary opcode field. ment that is sign-extended and shifted left one bit (concatenated with 0b0) and then added to Rc (6, 7, 20, 31) the current instruction address to form the RECORD bit. branch target address. 0 Do not alter the Condition Register. BD24. (Used by 32-bit branch class instruc- 1 Set Condition Register Field 0. tions) A 24-bit signed displacement that is RX (12:15) sign-extended and shifted left one bit (concat- Field used to specify a General Purpose Reg- enated with 0b0) and then added to the cur- ister in the ranges R0:R7 or R24:R31 to be rent instruction address to form the branch used as a source or as a destination. R0 is target address. encoded as 0b0000, R1 as 0b0001, etc. R24 BD8. (Used by 16-bit branch and branch con- is encoded as 0b1000, R25 as 0b1001, etc. ditional class instructions) An 8-bit signed dis- RY (8:11) placement that is sign-extended and shifted Field used to specify a General Purpose Reg- left one bit (concatenated with 0b0) and then ister in the ranges R0:R7 or R24:R31 to be added to the current instruction address to used as a source. R0 is encoded as 0b0000, form the branch target address. R1 as 0b0001, etc. R24 is encoded as BI16 (6:7), BI32 (12:15) 0b1000, R25 as 0b1001, etc. Field used to specify one of the Condition RZ (8:11) Register fields to be used as a condition of a Field used to specify a General Purpose Reg- Branch Conditional instruction. ister in the ranges R0:R7 or R24:R31 to be BO16 (5), BO32 (10:11) used as a source or as a destination for load/ store data. R0 is encoded as 0b0000, R1 as Field used to specify whether to branch if the 0b0001, etc. R24 is encoded as 0b1000, R25 condition is true, false, or to decrement the as 0b1001, etc. Count Register and branch if the Count Regis- ter is not zero in a Branch Conditional instruc- SCL (22:23) tion. Field used to specify a scale amount in Imme- diate instructions using the SCI8-form. Scaling BF32 (9:10) involves left shifting by 0, 8, 16, or 24 bits. Field used to specify one of the Condition Register fields to be used as a target of a SD4 (4:7) compare instruction. Used by 16-bit load and store class instruc- tions. The SD4 field is a 4-bit unsigned imme- D8 (24:31) diate value zero-extended to 64 bits, shifted The D8 field is a 8-bit signed displacement left according to the size of the operation, and which is sign-extended to 64 bits. then added to the base register to form a 64- F (21) Fill value used to fill the remaining 56 bits of a bit EA. For byte operations, no shift is per- scaled-immediate 8 value. formed. For half-word operations, the immedi- LI20 (17:20 || 11:15 || 21:31) ate is shifted left one bit (concatenated with A 20-bit signed immediate value which is sign- 0b0). For word operations, the immediate is extended to 64 bits for the e_li instruction. shifted left two bits (concatenated with 0b00).SI (6:10 || 21:31, 11:15 || 21:31) LK (7, 16, 31) A 16-bit signed immediate value sign- LINK bit. extended to 64 bits and used as one operand 0 Do not set the Link Register. of the instruction. 764 Power ISATM VLE Version 2.05 UI (6:10 || 21:31, 11:15 || 21:31) A 16-bit unsigned immediate value zero- extended to 64 bits or padded with 16 zeros and used as one operand of the instruction. The instruction encoding differs between the I16A and I16L instruction formats as shown in Section 1.4.12 and Section 1.4.13. UI5 (7:11) Immediate field used to specify a 5-bit unsigned fixed-point value. UI7 (5:11) Immediate field used to specify a 7-bit unsigned fixed-point value. UI8 (24:31) Immediate field used to specify an 8-bit unsigned fixed-point value. XO (6, 6:7, 6:10, 6:11, 16, 16:19,16:23) Extended opcode field. Assembler Note For scaled immediate instructions using the SCI8- form, the instruction assembly syntax requires a single immediate value, sci8, that the assembler will synthesize into the appropriate F, SCL, and UI8 fields. The F, SCL, and UI8 fields must be able to be formed correctly from the given sci8 value or the assembler will flag the assembly instruction as an error. Chapter 1. Variable Length Encoding Introduction 765 Version 2.05 766 Power ISATM VLE Version 2.05 Chapter 2. VLE Storage Addressing 2.1 Data Storage Addressing Modes . 767 2.2.1 Misaligned, Mismatched, and Byte 2.2 Instruction Storage Addressing Modes Ordering Instruction Storage Exceptions . . 768 768 2.2.2 VLE Exception Syndrome Bits . . 768 A program references memory using the effective address (EA) computed by the processor when it exe- cutes a Storage Access or Branch instruction (or cer- tain other instructions described in Book II and Book III- E), or when it fetches the next sequential instruction. 2.1 Data Storage Addressing Modes Table 1 lists data storage addressing modes supported by the VLE category. Table 1: Data Storage Addressing Modes Mode Form Description Base+16-bit displacement D-form The 16-bit D field is sign-extended and added to the contents of the GPR (32-bit instruction format) designated by RA or to zero if RA = 0 to produce the EA. Base+8-bit displacement D8-form The 8-bit D8 field is sign-extended and added to the contents of the GPR (32-bit instruction format) designated by RA or to zero if RA = 0 to produce the EA. Base+scaled 4-bit displace- SD4-form The 4-bit SD4 field zero-extended, scaled (shifted left) according to the ment size of the operand, and added to the contents of the GPR designated (16-bit instruction format) by RX to produce the EA. (Note that RX = 0 is not a special case.) Base+Index X-form The GPR contents designated by RB are added to the GPR contents (32-bit instruction format) designated by RA or to zero if RA = 0 to produce the EA. Chapter 2. VLE Storage Addressing 767 Version 2.05 2.2 Instruction Storage Addressing Modes Table 2 lists instruction storage addressing modes sup- ported by the VLE category. Table 2: Instruction Storage Addressing Modes Mode Description Taken BD24-form Branch instruc- The 24-bit BD24 field is concatenated on the right with 0b0, sign-extended, and tions (32-bit instruction format) then added to the address of the branch instruction. Taken B15-form Branch instruc- The 15-bit BD15 field is concatenated on the right with 0b0, sign-extended, and tions (32-bit instruction format) then added to the address of the branch instruction to form the EA of the next instruction. Take BD8-form Branch instruc- The 8-bit BD8 field is concatenated on the right with 0b0, sign-extended, and tions (16-bit instruction format) then added to the address of the branch instruction to form the EA of the next instruction. Sequential instruction fetching (or The value 4 [2] is added to the address of the current 32-bit [16-bit] instruction to non-taken branch instructions) form the EA of the next instruction. If the address of the current instruction is 0xFFFF_FFFF_FFFF_FFFC [0xFFFF_FFFF_FFFF_FFFE] in 64-bit mode or 0xFFFF_FFFC [0xFFFF_FFFE] in 32-bit mode, the address of the next sequential instruction is undefined. Any Branch instruction with The value 4 is added to the address of the current branch instruction and the LK = 1 (32-bit instruction for- result is placed into the LR. If the address of the current instruction is mat) 0xFFFF_FFFF_FFFF_FFFC in 64-bit mode o r0xFFFF_FFFC in 32-bit mode, the result placed into the LR is undefined. Branch se_bl. se_blrl. se_bctrl The value 2 is added to the address of the current branch instruction and the instructions (16-bit instruction result is placed into the LR. If the address of the current instruction is format) 0xFFFF_FFFF_FFFF_FFFE in 64-bit mode or 0xFFFF_FFFE in 32-bit mode, the result placed into the LR is undefined. 2.2.1 Misaligned, Mismatched, age Exception is detected and no higher priority excep- tion exists, an Instruction Storage Interrupt will occur and Byte Ordering Instruction Stor- setting SRR0 to the misaligned address for which exe- age Exceptions cution was attempted. A Byte Ordering Instruction Storage Exception occurs A Misaligned Instruction Storage Exception occurs when an implementation which supports VLE attempts when an implementation which supports VLE attempts to execute an instruction that has the VLE storage to execute an instruction that is not 32-bit aligned and attribute set to 1 and the E (Endian) storage attribute the VLE storage attribute is not set for the page that set to 1 for the page that corresponds to the effective corresponds to the effective address of the instruction. address of the instruction. If a Byte Ordering Instruction The attempted execution can be the result of a Branch Storage Exception is detected and no higher priority instruction which has bit 62 of the target address set to exception exists, an Instruction Storage Interrupt will 1 or the result of an rfi, se_rfi, rfci, se_rfci, rfdi, occur setting SRR0 to the address for which execution se_rfdi, rfmci, or se_rfmci instruction which has bit 62 was attempted. set in SRR0, SRR0, CSRR0, CSRR0, DSRR0, DSRR0, MCSRR0, or MCSRR0 respectively. If a Mis- aligned Instruction Storage Exception is detected and 2.2.2 VLE Exception Syndrome no higher priority exception exists, an Instruction Stor- age Interrupt will occur setting SRR0 to the misaligned Bits address for which execution was attempted. Two bits in the Exception Syndrome Register (ESR) A Mismatched Instruction Storage Exception occurs (see Section 5.2.9 of Book III-E) are provided to facili- when an implementation which supports VLE attempts tate VLE exception handling, VLEMI and MIF. to execute an instruction that crosses a page boundary ESRVLEMI is set when an exception and subsequent for which the first page has the VLE storage attribute interrupt is caused by the execution or attempted exe- set to 1 and the second page has the VLE storage cution of an instruction that resides in memory with the attribute bit set to 0. If a Mismatched Instruction Stor- VLE storage attribute set. 768 Power ISATM VLE Version 2.05 ESRMIF is set when an Instruction Storage Interrupt is caused by a Misaligned Instruction Storage Exception or when an Instruction TLB Error Interrupt was caused by a TLB miss on the second half of a misaligned 32-bit instruction. ESRBO is set when an Instruction Storage Interrupt is caused by a Mismatched Instruction Storage Exception or a Byte Ordering Instruction Storage Exception. Programming Note When an Instruction TLB Error Interrupt occurs as the result of a Instruction TLB miss on the second half of a 32-bit VLE instruction that is aligned to only 16-bits, SRR0 will point to the first half of the instruction and ESRMIF will be set to 1. Any other status posted as a result of the TLB miss (such as MAS register updates described in TYPE-FSL Memory Management) will reflect the page corre- sponding to the second half of the instruction which caused the Instruction TLB miss. Chapter 2. VLE Storage Addressing 769 Version 2.05 770 Power ISATM VLE Version 2.05 Chapter 3. VLE Compatibility with Books I­III 3.1 Overview. . . . . . . . . . . . . . . . . . . . 771 3.2.2 MMU Extensions . . . . . . . . . . . . 771 3.2 VLE Processor and Storage Control 3.3 VLE Limitations . . . . . . . . . . . . . . . 771 Extensions . . . . . . . . . . . . . . . . . . . . . 771 3.2.1 Instruction Extensions . . . . . . . . 771 This chapter addresses the relationship between VLE 3.2.1 Instruction Extensions and Books I­III. This section describes extensions to support VLE oper- ations. Because instructions may reside on a half-word 3.1 Overview boundary, bit 62 is not masked by instructions that read an instruction address from a register, such as the LR, Category VLE uses the same semantics as Books I­III. CTR, or a save/restore register 0, that holds an instruc- Due to the limited instruction encoding formats, VLE tion address: instructions typically support reduced immediate fields and displacements, and not all operations defined by The instruction set defined by Books I-III is modified to Books I­III are encoded in category VLE. The basic support halfword instruction addressing, as follows: philosophy is to capture all useful operations, with most 1 For Return From Interrupt instructions, such as rfi, frequent operations given priority. Immediate fields and rfci, rfdi, and rfmci no longer mask bit 62 of the displacements are provided to cover the majority of respective save/restore register 0. The destination ranges encountered in embedded control code. address is SRR00:62 || 0b0, CSRR00:62 || 0b0, Instructions are encoded in either a 16- or 32-bit for- DSRR00:62 || 0b0, MCSRR00:62 || 0b0 respec- mat, and these may be freely intermixed. tively. 1 For bclr, bclrl, bcctr, and bcctrl no longer mask VLE instructions cannot access floating-point registers bit 62 of the LR or CTR. The destination address is (FPRs). VLE instructions use GPRs and SPRs with the LR0:62 || 0b0 or CTR0:62 || 0b0. following limitations: 1 VLE instructions using the 16-bit formats are lim- ited to addressing GPR0­GPR7, and GPR24­ 3.2.2 MMU Extensions GPR31 in most instructions. Move instructions are VLE operation is indicated by the VLE storage attribute. provided to transfer register contents between When the VLE storage attribute for a page is set to 1, these registers and GPR8­GPR23. instruction fetches from that page are decoded and 1 VLE compare and bit test instructions using the 16- processed as VLE instructions. See Section 4.8.3 of bit formats implicitly set their results in CR0. Book III-E. VLE instruction encodings are generally different than When instructions are executing from a page that has instructions defined by Books I­III, except that most the VLE storage attribute set to 1, the processor is said instructions falling within primary opcode 31 are to be in VLE mode. encoded identically and have identical semantics unless they affect or access a resource not supported by category VLE. 3.3 VLE Limitations VLE instruction fetches are valid only when performed 3.2 VLE Processor and Storage in a Big-Endian mode. Attempting to fetch an instruc- Control Extensions tion in a Little-Endian mode from a page with the VLE storage attribute set causes an Instruction Storage This section describes additional functionality to sup- Byte-ordering exception. port category VLE. Chapter 3. VLE Compatibility with Books I­III 771 Version 2.05 Support for concurrent modification and execution of VLE instructions is implementation-dependent. 772 Power ISATM VLE Version 2.05 Chapter 4. Branch Operation Instructions 4.1 Branch Processor Registers . . . . . 773 4.1.2 Link Register (LR) . . . . . . . . . . . 774 4.1.1 Condition Register (CR). . . . . . . 773 4.1.3 Count Register (CTR). . . . . . . . . 774 4.1.1.1 Condition Register Setting for 4.2 Branch Instructions . . . . . . . . . . . . 775 Compare Instructions . . . . . . . . . . . . . 774 4.3 System Linkage Instructions . . . . . 778 4.1.1.2 Condition Register Setting for the 4.4 Condition Register Instructions . . . 781 Bit Test Instruction . . . . . . . . . . . . . . . 774 This section defines Branch instructions that can be 1 A specified CR field can be set as the result of a executed when a processor is in VLE mode and the fixed-point compare instruction. registers that support them. 1 CR field 0 can be set as the result of a fixed-point bit test instruction. 4.1 Branch Processor Registers Other instructions from implemented categories may also set bits in the CR in the same manner that they The registers that support branch operations are: would when not in VLE mode. 1 Section 4.1.1, "Condition Register (CR)" Instructions are provided to perform logical operations 1 Section 4.1.2, "Link Register (LR)" on individual CR bits and to test individual CR bits. 1 Section 4.1.3, "Count Register (CTR)" For all fixed-point instructions in which the Rc bit is defined and set, and for e_add2i., e_and2i.,and 4.1.1 Condition Register (CR) e_and2is., the first three bits of CR field 0 (CR32:34) are set by signed comparison of the result to zero, and The Condition Register (CR) is a 32-bit register which the fourth bit of CR field 0 (CR35) is copied from the reflects the result of certain operations, and provides a final state of XERSO. "Result" here refers to the entire mechanism for testing (and branching). The CR is 64-bit value placed into the target register in 64-bit more fully defined in Book I. mode, and to bits 32:63 of the value placed into the tar- Category VLE uses the entire CR, but some compari- get register in 32-bit mode. son operations and all Branch instructions are limited to using CR0­CR3. The full Book I condition register field if (64-bit mode) and logical operations are provided however. then M 1 0 else M 1 32 if (target_register)M:63 < 0 then c 1 0b100 CR else if (target_register)M:63 > 0 then c 1 0b010 32 63 else c 1 0b001 CR0 1 c || XERSO Figure 17. Condition Register If any portion of the result is undefined, the value The bits in the Condition Register are grouped into placed into the first three bits of CR field 0 is undefined. eight 4-bit fields, CR Field 0 (CR0) ... CR Field 7 (CR7), which are set by VLE defined instructions in one of the The bits of CR field 0 are interpreted as shown below. following ways. CR Bit Description 1 Specified fields of the condition register can be set by a move to the CR from a GPR (mtcrf, mtocrf). 32 Negative (LT) 1 A specified CR field can be set by a move to the The result is negative. CR from another CR field (e_mcrf) or from 33 Positive (GT) XER32:35 (mcrxr). The result is positive. 1 CR field 0 can be set as the implicit result of a 34 Zero (EQ) fixed-point instruction. The result is 0. Chapter 4. Branch Operation Instructions 773 Version 2.05 35 Summary overflow (SO) 4.1.2 Link Register (LR) This is a copy of the contents of XERSO at the completion of the instruction. VLE instructions use the Link Register (LR) as defined in Book I, although category VLE defines a subset of all variants of Book I conditional branches involving the 4.1.1.1 Condition Register Setting for LR. Compare Instructions For compare instructions, a CR field specified by the 4.1.3 Count Register (CTR) BF operand for the e_cmph, e_cmphl, e_cmpi, and e_cmpli instructions, or CR0 for the se_cmpl, VLE instructions use the Count Register (CTR) as e_cmp16i, e_cmph16i, e_cmphl16i, e_cmpl16i, defined in Book I, although category VLE defines a sub- se_cmp, se_cmph, se_cmphl, se_cmpi, and set of the variants of Book I conditional branches se_cmpli instructions, is set to reflect the result of the involving the CTR. comparison. The CR field bits are interpreted as shown below. A complete description of how the bits are set is given in the instruction descriptions and Section 5.6, "Fixed-Point Compare and Bit Test Instructions". Condition register bits settings for compare instructions are interpreted as follows. (Note: e_cmpi, and e_cmpli instructions have a BF32 field instead of BF field; for these instructions, BF32 should be substituted for BF in the list below.) CR Bit Description 4×BF + 32 Less Than (LT) For signed fixed-point compare, (RA) or (RX) < sci8, SI, (RB), or (RY). For unsigned fixed-point compare, (RA) or (RX) sci8, SI, (RB), or (RY). For unsigned fixed-point compare, (RA) or (RX) >u sci8, UI, UI5, (RB), or (RY). 4×BF + 34 Equal (EQ) For fixed-point compare, (RA) or (RX) = sci8, UI, UI5, SI, (RB), or (RY). 4×BF + 35 Summary Overflow (SO) For fixed-point compare, this is a copy of the contents of XERSO at the completion of the instruction. 4.1.1.2 Condition Register Setting for the Bit Test Instruction The Bit Test Immediate instruction, se_btsti, also sets CR field 0. See the instruction description and also Section 5.6, "Fixed-Point Compare and Bit Test Instructions". 774 Power ISATM VLE Version 2.05 4.2 Branch Instructions The sequence of instruction execution can be changed Encodings for the BO32 field for VLE are shown in by the branch instructions. Because VLE instructions Figure 18. must be aligned on half-word boundaries, the low-order bit of the generated branch target address is forced to 0 BO32 Description by the processor in performing the branch. 00 Branch if the condition is false. The branch instructions compute the EA of the target in 01 Branch if the condition is true. one of the following ways, as described in Section 2.2, 10 Decrement CTRM:63, then branch if the "Instruction Storage Addressing Modes" decremented CTRM:630 1. Adding a displacement to the address of the 11 Decrement CTRM:63, then branch if the branch instruction. decremented CTRM:63=0. 2. Using the address contained in the LR (Branch to Link Register [and Link]). Figure 18. BO32 field encodings 3. Using the address contained in the CTR (Branch to Encodings for the BO16 field for VLE are shown in Count Register [and Link]). Figure 19. Branching can be conditional or unconditional, and the return address can optionally be provided. If the return address is to be provided (LK = 1), the EA of the instruction following the branch instruction is placed BO16 Description into the LR after the branch target address has been 0 Branch if the condition is false. computed; this is done regardless of whether the 1 Branch if the condition is true. branch is taken. Figure 19. BO16 field encodings In branch conditional instructions, the BI32 or BI16 instruction field specifies the CR bit to be tested. For 32-bit instructions using BI32, CR32:47 (corresponding to bits in CR0:CR3) may be specified. For 16-bit instructions using BI16, only CR32:35 (bits within CR0) may be specified. In branch conditional instructions, the BO32 or BO16 field specifies the conditions under which the branch is taken and how the branch is affected by or affects the CR and CTR. Note that VLE instructions also have dif- ferent encodings for the BO32 and BO16 fields than in Book I's BO field. If the BO32 field specifies that the CTR is to be decre- mented, in 64-bit mode CTR0:63 are decremented, and in 32-bit mode CTR32:63 are decremented. If BO16 or BO32 specifies a condition that must be TRUE or FALSE, that condition is obtained from the contents of CRBI32+32 or CRBI16+32. (Note that CR bits are num- bered 32:63. BI32 or BI16 refers to the condition regis- ter bit field in the branch instruction encoding. For example, specifying BI32 = 2 refers to CR34.) For Figure 18 let M = 0 in 64-bit mode and M = 32 in 32-bit mode. Chapter 4. Branch Operation Instructions 775 Version 2.05 Branch [and Link] BD24-form Branch [and Link] BD8-form e_b target_addr (LK=0) se_b target_addr (LK=0) e_bl target_addr (LK=1) se_bl target_addr (LK=1) 30 0 BD24 LK 58 0 LK BD8 0 6 7 31 0 6 7 8 15 NIA 1iea CIA + EXTS(BD24 || 0b0) NIA 1iea CIA + EXTS(BD8 || 0b0) if LK then LR 1iea CIA + 4 if LK then LR 1iea CIA + 2 target_addr specifies the branch target address. target_addr specifies the branch target address. The branch target address is the sum of BD24 || 0b0 The branch target address is the sum of BD8 || 0b0 sign-extended and the address of this instruction, with sign-extended and the address of this instruction, with the high-order 32 bits of the branch target address set the high-order 32 bits of the branch target address set to 0 in 32-bit mode. to 0 in 32-bit mode. If LK=1 then the effective address of the instruction fol- If LK=1 then the effective address of the instruction fol- lowing the Branch instruction is placed into the Link lowing the Branch instruction is placed into the Link Register. Register. Special Registers Altered: Special Registers Altered: LR (if LK=1) LR (if LK=1) Branch Conditional [and Link] BD15-form Branch Conditional Short Form BD8-form e_bc BO32,BI32,target_addr (LK=0) se_bc BO16,BI16,target_addr e_bcl BO32,BI32,target_addr (LK=1) 28 BO16 BI16 BD8 30 8 BO32 BI32 BD15 LK 0 5 6 8 15 0 6 10 12 16 31 cond_ok 1 (CRBI16+32 BO16) if (64-bit mode) if cond_ok then then M 1 0 NIA 1iea CIA + EXTS(BD8 || 0b0) else M 1 32 else NIA 1iea CIA + 2 if BO320 then CTRM:63 1 CTRM:63 - 1 The BI16 field specifies the Condition Register bit to be ctr_ok 1 ¬BO320 | ((CTRM:63 0) BO321) cond_ok 1 BO320 | (CRBI32+32 BO321) tested. The BO16 field is used to resolve the branch as if ctr_ok & cond_ok then described in Figure 19. target_addr specifies the NIA 1iea (CIA + EXTS(BD15 || 0b0)) branch target address. else The branch target address is the sum of BD8 || 0b0 NIA 1iea CIA + 4 if LK then LR 1iea CIA + 4 sign-extended and the address of this instruction, with the high-order 32 bits of the branch target address set The BI32 field specifies the Condition Register bit to be to 0 in 32-bit mode. tested. The BO32 field is used to resolve the branch as described in Figure 18. target_addr specifies the Special Registers Altered: branch target address. None The branch target address is the sum of BD15 || 0b0 sign-extended and the address of this instruction, with the high-order 32 bits of the branch target address set to 0 in 32-bit mode. If LK=1 then the effective address of the instruction following the Branch instruction is placed into the Link Register. Special Registers Altered: CTR (if BO320=1) LR (if LK=1) 776 Power ISATM VLE Version 2.05 Branch to Count Register [and Link] Branch to Link Register [and Link]C-form C-form se_blr (LK=0) se_bctr (LK=0) se_blrl (LK=1) se_bctrl (LK=1) 02 LK 03 LK 0 15 0 15 NIA 1iea LR0:62 || 0b0 NIA 1iea CTR0:62 || 0b0 if LK then LR 1iea CIA + 2 if LK then LR 1iea CIA + 2 The branch target address is LR0:62 || 0b0 with the The branch target address is CTR0:62 || 0b0 with the high-order 32 bits of the branch target address set to 0 high-order 32 bits of the branch target address set to 0 in 32-bit mode. in 32-bit mode. If LK=1 then the effective address of the instruction If LK=1 then the effective address of the instruction following the Branch instruction is placed into the Link following the Branch instruction is placed into the Link Register. Register. Special Registers Altered: Special Registers Altered: LR (if LK=1) LR (if LK=1) Chapter 4. Branch Operation Instructions 777 Version 2.05 4.3 System Linkage Instructions The System Linkage instructions enable the program to in Book I and Book III-E with the exception of the LEV call upon the system to perform a service and provide a field, but are encoded differently. means by which the system can return from performing se_sc provides the same functionality as the Book I a service or from processing an interrupt. System Link- (and Book III-E) instruction sc without the LEV field. age instructions defined by the VLE category are identi- se_rfi, se_rfci, se_rfdi, and se_rfmci provide the cal in semantics to System Linkage instructions defined same functionality as the Book III-E instructions rfi, rfci, rfdi, and rfmci respectively. System Call C-form Illegal C-form se_sc se_illegal 02 0 0 15 0 15 SRR1 1iea MSR se_illegal is used to request an Illegal Instruction SRR0 1 CIA+2 exception. NIA 1iea IVPR0:47 || IVOR848:59 || 0b0000 MSR 1 new_value (see below) The behavior is the same as if an illegal instruction was executed. The effective address of the instruction following the System Call instruction is placed into SRR0. The con- This instruction is context synchronizing. tents of the MSR are copied into SRR1. Special Registers Altered: Then a System Call interrupt is generated. The inter- SRR0 SRR1 MSR ESR rupt causes the MSR to be set as described in Section 5.6 of Book III-E. The interrupt causes the next instruction to be fetched from effective address IVPR0:47 || IVOR848:59 || 0b0000. This instruction is context synchronizing. Special Registers Altered: SRR0 SRR MSR 778 Power ISATM VLE Version 2.05 Return From Machine Check Interrupt C- Return From Critical Interrupt C-form form se_rfci se_rfmci 09 11 0 15 0 15 MSR 1 CSRR1 MSR 1 MCSRR1 NIA 1iea CSRR00:62 || 0b0 NIA 1iea MCSRR00:62 || 0b0 The se_rfci instruction is used to return from a critical The se_rfmci instruction is used to return from a class interrupt, or as a means of establishing a new machine check class interrupt, or as a means of estab- context and synchronizing on that new context simulta- lishing a new context and synchronizing on that new neously. context simultaneously. The contents of CSRR1 are placed into the MSR. If the The contents of MCSRR1 are placed into the MSR. If new MSR value does not enable any pending excep- the new MSR value does not enable any pending tions, then the next instruction is fetched, under control exceptions, then the next instruction is fetched, under of the new MSR value, from the address control of the new MSR value, from the address CSRR00:62||0b0. If the new MSR value enables one or MCSRR00:62||0b0. If the new MSR value enables one more pending exceptions, the interrupt associated with or more pending exceptions, the interrupt associated the highest priority pending exception is generated; in with the highest priority pending exception is gener- this case the values placed into the save/restore regis- ated; in this case the values placed into the save/ ters by the interrupt processing mechanism (see Chap- restore registers by the interrupt processing mecha- ter 5 of Book III-E) is the address and MSR value of the nism (see Chapter 5 of Book III-E) is the address and instruction that would have been executed next had the MSR value of the instruction that would have been exe- interrupt not occurred (that is, the address in CSRR0 at cuted next had the interrupt not occurred (that is, the the time of the execution of the se_rfci). address in MCSRR0 at the time of the execution of the This instruction is privileged and context synchronizing. se_rfmci). Special Registers Altered: This instruction is privileged and context synchronizing. MSR Special Registers Altered: MSR Chapter 4. Branch Operation Instructions 779 Version 2.05 Return From Interrupt C-form Return From Debug Interrupt C-form se_rfi se_rfdi 08 10 0 15 0 15 MSR 1 SRR1 MSR 1 DSRR1 NIA 1iea SRR00:62 || 0b0 NIA 1iea DSRR032:62 || 0b0 The se_rfi instruction is used to return from a non-criti- The se_rfdi instruction is used to return from a debug cal class interrupt, or as a means of establishing a new class interrupt, or as a means of establishing a new context and synchronizing on that new context simulta- context and synchronizing on that new context simulta- neously. neously. The contents of SRR1 are placed into the MSR. If the The contents of DSRR1 are placed into the MSR. If the new MSR value does not enable any pending excep- new MSR value does not enable any pending excep- tions, then the next instruction is fetched under control tions, then the next instruction is fetched, under control of the new MSR value from the address SRR00:62||0b0. of the new MSR value, from the address If the new MSR value enables one or more pending DSRR00:62||0b0. If the new MSR value enables one or exceptions, the interrupt associated with the highest more pending exceptions, the interrupt associated with priority pending exception is generated; in this case the the highest priority pending exception is generated; in values placed into the save/restore registers by the this case the value placed into the save/restore regis- interrupt processing mechanism (see Chapter 5 of ters by the interrupt processing mechanism (see Chap- Book III-E) is the address and MSR value of the ter 5 of Book III-E) is the address of the instruction that instruction that would have been executed next had the would have been executed next had the interrupt not interrupt not occurred (that is, the address in SRR0 at occurred (that is, the address in DSRR0 at the time of the time of the execution of the se_rfi). the execution of se_rfdi). This instruction is privileged and context synchronizing. This instruction is privileged and context synchronizing. Special Registers Altered: Special Registers Altered: MSR MSR Corequisite Categories: Embedded.Enhanced Debug 780 Power ISATM VLE Version 2.05 4.4 Condition Register Instructions Condition Register instructions are provided to transfer does remap the CR-logical and mcrf instruction func- values to and from various portions of the CR. Cate- tionality into primary opcode 31. These instructions gory VLE does not introduce any additional functionality operate identically to the Book I instructions, but are beyond that defined in Book I for CR operations, but encoded differently. Condition Register AND XL-form Condition Register AND with Complement XL-form e_crand BT,BA,BB e_crandc BT,BA,BB 31 BT BA BB 257 / 0 6 11 16 21 31 31 BT BA BB 129 / 0 6 11 16 21 31 CRBT+32 1 CRBA+32 & CRBB+32 CRBT+32 1 CRBA+32 & ¬CRBB+32 The bit in the Condition Register specified by BA+32 is ANDed with the bit in the Condition Register specified by BB+32, and the result is placed into the bit in the The bit in the Condition Register specified by BA+32 is Condition Register specified by BT+32. ANDed with the one's complement of the bit in the Con- Special Registers Altered: dition Register specified by BB+32, and the result is CRBT+32 placed into the bit in the Condition Register specified by BT+32. Special Registers Altered: CRBT+32 Condition Register Equivalent XL-form Condition Register NAND XL-form e_creqv BT,BA,BB e_crnand BT,BA,BB 31 BT BA BB 289 / 31 BT BA BB 225 / 0 6 11 16 21 31 0 6 11 16 21 31 CRBT+32 1 CRBA+32 CRBB+32 CRBT+32 1 ¬(CRBA+32 & CRBB+32) The bit in the Condition Register specified by BA+32 is The bit in the Condition Register specified by BA+32 is XORed with the bit in the Condition Register specified ANDed with the bit in the Condition Register specified by BB+32, and the complemented result is placed into by BB+32, and the complemented result is placed into the bit in the Condition Register specified by BT+32. the bit in the Condition Register specified by BT+32. Special Registers Altered: Special Registers Altered: CRBT+32 CRBT+32 Chapter 4. Branch Operation Instructions 781 Version 2.05 Condition Register NOR XL-form Condition Register OR XL-form e_crnor BT,BA,BB e_cror BT,BA,BB 31 BT BA BB 33 / 31 BT BA BB 449 / 0 6 11 16 21 31 0 6 11 16 21 31 CRBT+32 1 ¬(CRBA+32 | CRBB+32) CRBT+32 1 CRBA+32 | CRBB+32 The bit in the Condition Register specified by BA+32 is The bit in the Condition Register specified by BA+32 is ORed with the bit in the Condition Register specified by ORed with the bit in the Condition Register specified by BB+32, and the complemented result is placed into the BB+32, and the result is placed into the bit in the Con- bit in the Condition Register specified by BT+32. dition Register specified by BT+32. Special Registers Altered: Special Registers Altered: CRBT+32 CRBT+32 Condition Register OR with Complement Condition Register XOR XL-form XL-form e_crxor BT,BA,BB e_crorc BT,BA,BB 31 BT BA BB 193 / 31 BT BA BB 417 / 0 6 11 16 21 31 0 6 11 16 21 31 CRBT+32 1 CRBA+32 CRBB+32 CRBT+32 1 CRBA+32 | ¬CRBB+32 The bit in the Condition Register specified by BA+32 is The bit in the Condition Register specified by BA+32 is XORed with the bit in the Condition Register specified ORed with the complement of the bit in the Condition by BB+32, and the result is placed into the bit in the Register specified by BB+32, and the result is placed Condition Register specified by BT+32. into the bit in the Condition Register specified by Special Registers Altered: BT+32. CRBT+32 Special Registers Altered: CRBT+32 Move CR Field XL-form e_mcrf BF,BFA 31 BF // BFA ///// 16 / 0 6 9 11 16 21 31 CR4xBF+32:4xBF+35 1 CR4xBFA+32:4xBFA+35 The contents of Condition Register field BFA are cop- ied to Condition Register field BF. Special Registers Altered: CR field BF 782 Power ISATM VLE Version 2.05 Chapter 5. Fixed-Point Instructions 5.1 Fixed-Point Load Instructions . . . . 783 5.7 Fixed-Point Trap Instructions . . . . 799 5.2 Fixed-Point Store Instructions. . . . 787 5.8 Fixed-Point Select Instruction . . . . 799 5.3 Fixed-Point Load and Store with Byte 5.9 Fixed-Point Logical, Bit, and Move Reversal Instructions . . . . . . . . . . . . . 790 Instructions . . . . . . . . . . . . . . . . . . . . . 800 5.4 Fixed-Point Load and Store Multiple 5.10 Fixed-Point Rotate and Shift Instruc- Instructions . . . . . . . . . . . . . . . . . . . . . 790 tions . . . . . . . . . . . . . . . . . . . . . . . . . . . 805 5.5 Fixed-Point Arithmetic Instructions 791 5.11 Move To/From System Register 5.6 Fixed-Point Compare and Bit Test Instructions . . . . . . . . . . . . . . . . . . . . . 808 Instructions . . . . . . . . . . . . . . . . . . . . . 795 This section lists the fixed-point instructions supported by category VLE. 5.1 Fixed-Point Load Instructions The fixed-point Load instructions compute the effective the instruction form is invalid. This is the same behavior address (EA) of the memory to be accessed as as specified for load with update instructions in Book I. described in Section 2.1, "Data Storage Addressing The fixed-point Load instructions from Book I, lbzx, Modes" lbzux, lhzx, lhzux, lwzx, and lwzux are available while The byte, halfword, word, or doubleword in storage executing in VLE mode. The mnemonics, decoding, addressed by EA is loaded into RT or RZ. and semantics for these instructions are identical to those in Book I. See Section 3.3.2 of Book I for the Category VLE supports both Big- and Little-Endian byte instruction definitions. ordering for data accesses. The fixed-point Load instructions from Book I, lwax, Some fixed-point load instructions have an update form lwaux, ldx, and ldux are available while executing in in which RA is updated with the EA. For these forms, if VLE mode on 64-bit implementations. The mnemonics, RA0 and RART, the EA is placed into RA and the decoding, and semantics for these instructions are memory element (byte, halfword, word, or doubleword) identical to those in Book I. See Section 3.3.2 of Book addressed by EA is loaded into RT. If RA=0 or RA =RT, Ifor the instruction definitions. Chapter 5. Fixed-Point Instructions 783 Version 2.05 Load Byte and Zero D-form Load Byte and Zero Short Form SD4-form e_lbz RT,D(RA) se_lbz RZ,SD4(RX) 12 RT RA D 08 SD4 RZ RX 0 6 11 16 31 0 4 8 12 15 if RA = 0 then b 1 0 EA 1 (RX)+ 600 || SD4 else b 1 (RA) RZ 1 560 || MEM(EA, 1) EA 1 b + EXTS(D) RT 1 560 || MEM(EA, 1) Let the effective address (EA) be the sum RX + SD4. The byte in storage addressed by EA is loaded into Let the effective address (EA) be the sum (RA|0) + D. RT56:63. RT0:55 are set to 0. The byte in storage addressed by EA is loaded into RT56:63. RT0:55 are set to 0. Special Registers Altered: None Special Registers Altered: None Load Byte and Zero with Update D8-form Load Halfword Algebraic D-form e_lbzu RT,D8(RA) e_lha RT,D(RA) 06 RT RA 0 D8 14 RT RA D 0 6 11 16 24 31 0 6 11 16 31 EA 1 (RA) + EXTS(D8) if RA = 0 then b 1 0 RT 1 560 || MEM(EA, 1) else b 1 (RA) RA 1 EA EA 1 b + EXTS(D) RT 1 EXTS(MEM(EA, 2)) Let the effective address (EA) be the sum (RA) + D8. The byte in storage addressed by EA is loaded into Let the effective address (EA) be the sum (RA|0) + D. RT56:63. RT0:55 are set to 0. The halfword in storage addressed by EA is loaded into RT48:63. RT0:47 are filled with a copy of bit 0 of the EA is placed into register RA. loaded halfword. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: Special Registers Altered: None None Load Halfword and Zero D-form Load Halfword and Zero Short Form SD4-form e_lhz RT,D(RA) se_lhz RZ,SD4(RX) 22 RT RA D 0 6 11 16 31 10 SD4 RZ RX 0 4 8 12 15 if RA = 0 then b 1 0 else b 1 (RA) EA 1 (RX)+ (590 || SD4 || 0) EA 1 b + EXTS(D) RZ 1 480 || MEM(EA, 2) RT 1 480 || MEM(EA, 2) Let the effective address (EA) be the sum (RX) + (SD4 Let the effective address (EA) be the sum (RA|0) + D. || 0). The halfword in storage addressed by EA is The halfword in storage addressed by EA is loaded into loaded into RZ48:63. RZ0:47 are set to 0. RT48:63. RT0:47 are set to 0. Special Registers Altered: Special Registers Altered: None None 784 Power ISATM VLE Version 2.05 Load Halfword Algebraic with Update Load Halfword and Zero with Update D8-form D8-form e_lhau RT,D8(RA) e_lhzu RT,D8(RA) 06 RT RA 03 D8 06 RT RA 01 D8 0 6 11 16 24 31 0 6 11 16 24 31 EA 1 (RA) + EXTS(D8) EA 1 (RA) + EXTS(D8) RT 1 EXTS(MEM(EA, 2)) RT 1 480 || MEM(EA, 2)) RA 1 EA RA 1 EA Let the effective address (EA) be the sum (RA) + D8. Let the effective address (EA) be the sum (RA) + D8. The halfword in storage addressed by EA is loaded into The halfword in storage addressed by EA is loaded into RT48:63. RT0:47 are filled with a copy of bit 0 of the RT48:63. RT0:47 are set to 0. loaded halfword. EA is placed into register RA. EA is placed into RA. If RA=0 or RA=RT, the instruction form is invalid. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: Special Registers Altered: None None Load Word and Zero D-form Load Word and Zero Short FormSD4-form e_lwz RT,D(RA) se_lwz RZ,SD4(RX) 20 RT RA D 12 SD4 RZ RX 0 6 11 16 31 0 4 8 12 15 if RA = 0 then b 1 0 EA 1 (RX)+ (580 || SD4 || 20) else b 1 (RA) RZ 1 320 || MEM(EA, 2) EA 1 b + EXTS(D) RT 1 320 || MEM(EA, 4) Let the effective address (EA) be the sum (RX) + (SD4 || 00). The word in storage addressed by EA is loaded Let the effective address (EA) be the sum (RA|0) + D. into RZ32:63. RZ0:31 are set to 0. The word in storage addressed by EA is loaded into RT32:63. RT0:31 are set to 0. Special Registers Altered: None Special Registers Altered: None Chapter 5. Fixed-Point Instructions 785 Version 2.05 Load Word and Zero with Update D8-form e_lwzu RT,D8(RA) 06 RT RA 02 D8 0 6 11 16 24 31 EA 1 (RA) + EXTS(D8) RT 1 320 || MEM(EA, 4)) RA 1 EA Let the effective address (EA) be the sum (RA) + D8. The word in storage addressed by EA is loaded into RT32:63. RT0:31 are set to 0. EA is placed into register RA. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: None 786 Power ISATM VLE Version 2.05 5.2 Fixed-Point Store Instructions The fixed-point Store instructions compute the EA of 1 If RS=RA, the contents of register RS are copied the memory to be accessed as described in to the target memory element and then EA is Section 2.1, "Data Storage Addressing Modes". placed into register RA (RS). The contents of register RS or RZ are stored into the The fixed-point Store instructions from Book I, stbx, byte, halfword, word, or doubleword in storage stbux, sthx, sthux, stwx, and stwux are available addressed by EA. while executing in VLE mode. The mnemonics, decod- ing, and semantics for those instructions are identical Category VLE supports both Big- and Little-Endian byte to those in Book I; see Section 3.3.3 of Book I for the ordering for data accesses. instruction definitions. Some fixed-point store instructions have an update The fixed-point Store instructions from Book I, stdx and form, in which register RA is updated with the effective stdux are available while executing in VLE mode on address. For these forms, the following rules (from 64-bit implementations. The mnemonics, decoding, Book I) apply. and semantics for these instructions are identical to 1 If RA0, the effective address is placed into regis- those in Book I; see Section 3.3.3 of Book I for the ter RA. instruction definitions. Store Byte D-form Store Byte Short Form SD4-form e_stb RS,D(RA) se_stb RZ,SD4(RX) 13 RS RA D 09 SD4 RZ RX 0 6 11 16 31 0 4 8 12 15 if RA = 0 then b 1 0 EA 1 (RX) + EXTS(SD4) else b 1 (RA) MEM(EA, 1) 1 (RZ)56:63 EA 1 b + EXTS(D) MEM(EA, 1) 1 (RS)56:63 Let the effective address (EA) be the sum (RX) + SD4. (RZ)56:63 are stored in the byte in storage addressed by Let the effective address (EA) be the sum (RA|0)+ D. EA. (RS)56:63 are stored in the byte in storage addressed by EA. Special Registers Altered: None Special Registers Altered: None Chapter 5. Fixed-Point Instructions 787 Version 2.05 Store Byte with Update D8-form e_stbu RS,D8(RA) 06 RS RA 04 D8 0 6 11 16 24 31 EA 1 (RA) + EXTS(D8) MEM(EA, 1) 1 (RS)56:63 RA 1 EA Let the effective address (EA) be the sum (RA) + D8. (RS)56:63 are stored in the byte in storage addressed by EA. EA is placed into register RA. If RA=0, the instruction form is invalid. Special Registers Altered: None Store Halfword D-form Store Halfword Short Form SD4-form e_sth RS,D(RA) se_sth RZ,SD4(RX) 23 RS RA D 11 SD4 RZ RX 0 6 11 16 31 0 4 8 12 15 if RA = 0 then b 1 0 EA 1 (RX) + (590 || SD4 || 0) else b 1 (RA) MEM(EA, 2) 1 (RZ)48:63 EA 1 b + EXTS(D) MEM(EA, 2) 1 (RS)48:63 Let the effective address (EA) be the sum (RX) + (SD4 || 0). (RZ)48:63 are stored in the halfword in storage Let the effective address (EA) be the sum (RA|0) + D. addressed by EA. (RS)48:63 are stored in the halfword in storage addressed by EA. Special Registers Altered: None Special Registers Altered: None Store Halfword with Update D8-form e_sthu RS,D8(RA) 06 RS RA 05 D8 0 6 11 16 24 31 EA 1 (RA) + EXTS(D8) MEM(EA, 2) 1 (RS)48:63 RA 1 EA Let the effective address (EA) be the sum (RA) + D8. (RS)48:63 are stored in the halfword in storage addressed by EA. EA is placed into register RA. If RA=0, the instruction form is invalid. Special Registers Altered: None 788 Power ISATM VLE Version 2.05 Store Word D-form Store Word Short Form SD4-form e_stw RS,D(RA) se_stw RZ,SD4(RX) 21 RS RA D 13 SD4 RZ RX 0 6 11 16 31 0 4 8 12 15 if RA = 0 then b 1 0 EA 1 (RX) + (580 || SD4 || 20) else b 1 (RA) MEM(EA, 4) 1 (RZ)32:63 EA 1 b + EXTS(D) MEM(EA, 4) 1 (RS)32:63 Let the effective address (EA) be the sum (RX)+ (SD4 || 00). (RZ)32:63 are stored in the word in storage Let the effective address (EA) be the sum (RA|0) + D. addressed by EA. (RS)32:63 are stored in the word in storage addressed by EA. Special Registers Altered: None Special Registers Altered: None Store Word with Update D8-form e_stwu RS,D8(RA) 06 RS RA 06 D8 0 6 11 16 24 31 EA 1 (RA) + EXTS(D8) MEM(EA, 4) 1 (RS)32:63 RA 1 EA Let the effective address (EA) be the sum (RA) + D8. (RS)32:63 are stored in the word in storage addressed by EA. EA is placed into register RA. If RA=0, the instruction form is invalid. Special Registers Altered: None Chapter 5. Fixed-Point Instructions 789 Version 2.05 5.3 Fixed-Point Load and Store with Byte Reversal Instructions The fixed-point Load with Byte Reversal and Store with Byte Reversal instructions from Book I, lhbrx, lwbrx, sthbrx, and stwbrx are available while executing in VLE mode. The mnemonics, decoding, and semantics for these instructions are identical to those in Book I. See Section 3.3.4 of Book I for the instruction defini- tions. 5.4 Fixed-Point Load and Store Multiple Instructions The Load/Store Multiple instructions have preferred forms; see Section 1.8.1 of Book I. In the preferred forms storage alignment satisfies the following rule. 1 The combination of the EA and RT (RS) is such that the low-order byte of GPR 31 is loaded (stored) from (into) the last byte of an aligned quadword in storage. Load Multiple Word D8-form Store Multiple Word D8-form e_lmw RT,D8(RA) e_stmw RS,D8(RA) 06 RT RA 08 D8 06 RS RA 9 D8 0 6 11 16 24 31 0 6 11 16 24 31 if RA = 0 then b 1 0 if RA = 0 then b 1 0 else b 1 (RA) else b 1 (RA) EA 1 b + EXTS(D8) EA 1 b + EXTS(D8) r 1 RT r 1 RS do while r 31 do while r 31 GPR(r) 1 320 || MEM(EA,4) MEM(EA,4) 1 GPR(r)32:63 r 1 r + 1 r 1 r + 1 EA 1 EA + 4 EA 1 EA + 4 Let n = (32-RT). Let the effective address (EA) be the Let n = (32-RS). Let the effective address (EA) be the sum (RA|0) + D8. sum (RA|0) + D8. n consecutive words starting at EA are loaded into the n consecutive words starting at EA are stored from the low-order 32 bits of GPRs RT through 31. The high- low-order 32 bits of GPRs RS through 31. order 32 bits of these GPRs are set to zero. Special Registers Altered: If RA is in the range of registers to be loaded, including None the case in which RA = 0, the instruction form is invalid. Special Registers Altered: None 790 Power ISATM VLE Version 2.05 5.5 Fixed-Point Arithmetic e_addic[.] and e_subfic[.] always set CA to reflect the carry out of bit 0 in 64-bit mode and out of bit 32 in 32- Instructions bit mode. The fixed-point Arithmetic instructions use the contents The fixed-point Arithmetic instructions from Book I, of the GPRs as source operands, and place results into add[.], addo[.], addc[.], addco[.], adde[.], addeo[.], GPRs, into status bits in the XER and into CR0. addme[.], addmeo[.], addze[.], addzeo[.], divw[.], divwo[.], divwu[.], divwuo[.], mulhw[.], mulhwu[.], The fixed-point Arithmetic instructions treat source mullw[.], mullwo[.] neg[.], nego[.], subf[.], subfo[.] operands as signed integers unless the instruction is subfe[.], subfeo[.], subfme[.], subfmeo[.], subfze[.], explicitly identified as performing an unsigned opera- subfzeo[.], subfc[.], and subfco[.] are available while tion. executing in VLE mode. The mnemonics, decoding, and semantics for these instructions are identical to The e_add2i. instruction and other Arithmetic instruc- those in Book I; see Section 3.3.8 of Book I for the tions with Rc=1 set the first three bits of CR0 to charac- instruction definitions. terize the result placed into the target register. In 64-bit mode, these bits are set by signed comparison of the The fixed-point Arithmetic instructions from Book I, result to 0. In 32-bit mode, these bits are set by signed mulld[.], mulldo[.], mulhd[.], muldu[.], divd[.], comparison of the low-order 32 bits of the result to divdo[.], divdu[.], and divduo[.] are available while zero. executing in VLE mode on 64-bit implementations. The mnemonics, decoding, and semantics for those instruc- tions are identical to these in Book I; see Section 3.3.8 of Book I for the instruction definitions. Chapter 5. Fixed-Point Instructions 791 Version 2.05 Add Short Form RR-form Add Immediate D-form se_add RX,RY e_add16i RT,RA,SI 01 0 RY RX 07 RT RA SI 0 6 8 12 15 0 6 11 16 31 RX 1 (RX) + (RY) RT 1 (RA) + EXTS(SI) The sum (RX) + (RY) is placed into register RX. The sum (RA) + SI is placed into register RT. Special Registers Altered: Special Registers Altered: None None Add (2 operand) Immediate and Record Add (2 operand) Immediate Shifted I16A-form I16A-form e_add2i. RA,si e_add2is RA,si 28 si RA 17 si 28 si RA 18 si 0 6 11 16 21 31 0 6 11 16 21 31 RA 1 (RA) + EXTS(si) RA1 (RA) + EXTS(si || 160) The sum (RA) + si is placed into register RT. The sum (RA) + (si || 0x0000) is placed into register RA. Special Registers Altered: Special Registers Altered: CR0 None Add Scaled Immediate SCI8-form Add Immediate Short Form OIM5-form e_addi RT,RA,sci8 (Rc=0) se_addi RX,oimm e_addi. RT,RA,sci8 (Rc=1) 08 0 OIM5 RX 06 RT RA 8 Rc F SCL UI8 0 6 7 12 15 0 6 11 16 20 21 22 24 31 oimm 1 (590 || OIM5) + 1 sci8 1 56-SCL×8F || UI8 ||SCL×8F RX 1 (RX) + oimm RT 1 (RA) + sci8 The sum (RX) + oimm is placed into RX. The value of The sum (RA) + sci8 is placed into register RT. oimm must be in the range of 1 to 32. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) None 792 Power ISATM VLE Version 2.05 Add Scaled Immediate Carrying SCI8-form e_addic RT,RA,sci8 (Rc=0) e_addic. RT,RA,sci8 (Rc=1) 06 RT RA 9 Rc F SCL UI8 0 6 11 16 20 21 22 24 31 sci8 1 56-SCL×8F || UI8 ||SCL×8F RT 1 (RA) + sci8 The sum (RA) + sci8 is placed into register RT. Special Registers Altered: CR0 (if Rc=1) CA Subtract RR-form Subtract From Short Form RR-form se_sub RX,RY se_subf RX,RY 1 2 RY RX 01 3 RY RX 0 6 8 12 15 0 6 8 12 15 RX 1 (RX) +¬(RY) + 1 RX 1 ¬(RX) + (RY) + 1 The sum (RX) + ¬(RY) + 1 is placed into register RX. The sum ¬(RX) + (RY) + 1 is placed into register RX. Special Registers Altered: Special Registers Altered: None None Subtract From Scaled Immediate Carrying Subtract Immediate OIM5-form SCI8-form se_subi RX,oimm (Rc=0) e_subfic RT,RA,sci8 (Rc=0) se_subi. RX,oimm (Rc=1) e_subfic. RT,RA,sci8 (Rc=1) 09 Rc OIM5 RX 06 RT RA 11 Rc F SCL UI8 0 6 7 12 15 0 6 11 16 20 21 22 24 31 oimm 1 (590 || OIM5) + 1 sci8 1 56-SCL×8F || UI8 ||SCL×8F RX 1 (RX) + ¬oimm + 1 RT 1 ¬(RA) + sci8 + 1 The sum (RA) + ¬oimm + 1 is placed into register RX. The sum ¬(RA) + sci8 + 1 is placed into register RT. The value of oimm must be in the range 1 to 32. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) CA Chapter 5. Fixed-Point Instructions 793 Version 2.05 Multiply Low Scaled Immediate SCI8-form Multiply (2 operand) Low Immediate I16A-form e_mulli RT,RA,sci8 e_mull2i RA,si 06 RT RA 20 F SCL UI8 0 6 11 16 21 22 24 31 28 si RA 20 si 0 6 11 16 21 31 sci8 1 56-SCL×8F || UI8 ||SCL×8F prod0:127 1 (RA) × sci8 prod0:127 1 (RA) × EXTS(si) RT 1 prod64:127 RA 1 prod64:127 The 64-bit first operand is (RA). The 64-bit second The 64-bit first operand is (RA). The 64-bit second operand is the sci8 operand. The low-order 64-bits of operand is the sign-extended value of the si operand. the 128-bit product of the operands are placed into reg- The low-order 64-bits of the 128-bit product of the oper- ister RT. ands are placed into register RA. Both operands and the product are interpreted as Both operands and the product are interpreted as signed integers. signed integers. Special Registers Altered: Special Registers Altered: None None Multiply Low Word Short Form RR-form Negate Short Form R-form se_mullw RX,RY se_neg RX 01 1 RY RX 0 03 RX 0 6 8 12 15 0 6 12 15 RX 1 (RX)32:63 × (RY)32:63 RX 1 ¬(RX)+ 1 The 32-bit operands are the low-order 32-bits of RX The sum ¬(RX) + 1 is placed into register RX and of RY. The 64-bit product of the operands is placed If the processor is in 64-bit mode and register RX con- into register RX. tains the most negative 64-bit number Both operands and the product are interpreted as (0x8000_0000_0000_0000), the result is the most neg- signed integers. ative 64-bit number. Similarly, if the processor is in 32- bit mode and register RX contains the most negative Special Registers Altered: 32-bit number (0x8000_0000), the result is the most None negative 32-bit number. Special Registers Altered: None 794 Power ISATM VLE Version 2.05 5.6 Fixed-Point Compare and Bit Test Instructions The fixed-point Compare instructions compare the con- The fixed-point Bit Test instruction tests the bit speci- tents of register RA or register RX with one of the fol- fied by the UI5 instruction field and sets the CR0 field lowing: as follows. 1 The value of the scaled immediate field sci8 . formed from the F, UI8, and SCL fields as: Bit Name Description sci8 1 56-SCL×8F || UI8 ||SCL×8F 0 LT Always set to 0 1 The zero-extended value of the UI field 1 GT RXui5 = 1 1 The zero-extended value of the UI5 field 2 EQ RXui5 = 0 1 The sign-extended value of the SI field 3 SO Summary overflow from the XER 1 The contents of register RB or register RY. The following comparisons are signed: e_cmph, The fixed-point Compare instructions from Book I, cmp e_cmpi, e_cmp16i, e_cmph16i, se_cmp, se_cmph, and cmpl are available while executing in VLE mode. and se_cmpi. The mnemonics, decoding, and semantics for these instructions are identical to those in Book I; see The following comparisons are unsigned: e_cmphl, Section 3.3.9 of Book I for the instruction definitions. e_cmpli, e_cmphl16i, e_cmpl16i, se_cmpli, se_cmpl, and se_cmphl. Bit Test Immediate IM5-form Compare Immediate Word I16A-form se_btsti RX,UI5 e_cmp16i RA,si 25 1 UI5 RX 28 si RA 19 si 0 6 7 12 15 0 6 11 16 21 31 a 1 UI5 b 1 EXTS(si) b 1 a+320 || 1 || 31-a0 if (RA)32:63 < b32:63 then c 1 0b100 c 1 (RX) & b if (RA)32:63 > b32:63 then c 1 0b010 if c = 0 then d 1 0b001 else d 1 0b010 if (RA)32:63 = b32:63 then c 1 0b001 CR0 1 d || XERSO CR0 1 c || XERSO Bit UI5+32 of register RX is tested for equality to '0' and The low-order 32 bits of register RA are compared with the result is recorded in CR0. EQ is set if the tested bit si, treating operands as signed integers. The result of is 0, LT is cleared, and GT is set to the inverse value of the comparison is placed into CR0. EQ. Special Registers Altered: Special Registers Altered: CR0 CR0 Chapter 5. Fixed-Point Instructions 795 Version 2.05 Compare Scaled Immediate Word Compare Word RR-form SCI8-form se_cmp RX,RY e_cmpi BF32,RA,sci8 3 0 RY RX 06 000 BF32 RA 21 F SCL UI8 0 6 8 12 15 0 6 9 11 16 21 22 24 31 if (RX)32:63 < (RY)32:63 then c 1 0b100 sci8 1 56-SCL×8F || UI8 ||SCL×8F if (RX)32:63 > (RY)32:63 then c 1 0b010 if (RA)32:63 < sci832:63 then c 1 0b100 if (RX)32:63 = (RY)32:63 then c 1 0b001 if (RA)32:63 > sci832:63 then c 1 0b010 CR0 1 c || XERSO if (RA)32:63 = sci832:63 then c 1 0b001 The low-order 32 bits of register RX are compared with CR4×BF32+32:4×BF32+35 1 c || XERSO the low-order 32 bits of register RY, treating operands The low-order 32 bits of register RA are compared with as signed integers. The result of the comparison is sci8, treating operands as signed integers. The result placed into CR0. of the comparison is placed into CR field BF32. Special Registers Altered: Special Registers Altered: CR0 CR field BF32 Compare Immediate Word Short Form Compare Logical Immediate Word IM5-form I16A-form se_cmpi RX,UI5 e_cmpl16i RA,ui 10 1 UI5 RX 28 ui RA 21 ui 0 6 7 12 15 0 6 11 16 21 31 b 1 590 || UI5 b 1 480 || ui if (RX)32:63 < b32:63 then c 1 0b100 if (RA)32:63 b32:63 then c 1 0b010 if (RA)32:63 >u b32:63 then c 1 0b010 if (RX)32:63 = b32:63 then c 1 0b001 if (RA)32:63 = b32:63 then c 1 0b001 CR0 1 c || XERSO CR0 1 c || XERSO The low-order 32 bits of register RX are compared with The low-order 32 bits of register RA are compared with UI5, treating operands as signed integers. The result of ui, treating operands as unsigned integers. The result the comparison is placed into CR0. of the comparison is placed into CR0. Special Registers Altered: Special Registers Altered: CR0 CR0 796 Power ISATM VLE Version 2.05 Compare Logical Scaled Immediate Word Compare Logical Word RR-form SCI8-form se_cmpl RX,RY e_cmpli BF32,RA,sci8 3 1 RY RX 06 01 BF32 RA 21 F SCL UI8 0 6 8 12 15 0 6 9 11 16 21 22 24 31 if (RX)32:63 u (RY)32:63 then c 1 0b010 if (RA)32:63 u sci832:63 then c 1 0b010 CR0 1 c || XERSO if (RA)32:63 = sci832:63 then c 1 0b001 The low-order 32 bits of register RX are compared with CR4×BF32+32:4×BF32+35 1 c || XERSO the low-order 32 bits of register RY, treating operands The low-order 32 bits of register RA are compared with as unsigned integers. The result of the comparison is sci8, treating operands as unsigned integers. The placed into CR0. result of the comparison is placed into CR field BF32. Special Registers Altered: Special Registers Altered: CR0 CR field BF32 Compare Logical Immediate Word Compare Halfword X-form OIM5-form e_cmph BF,RA,RB se_cmpli RX,oimm 31 BF // RA RB 14 / 08 1 OIM5 RX 0 6 9 11 16 21 31 0 6 7 12 15 a 1 EXTS((RA)48:63) oimm 1 590 || (OIM5 + 1) b 1 EXTS((RB)48:63) if (RX)32:63 u oimm32:63 then c 1 0b010 if a > b then c 1 0b010 if (RX)32:63 = oimm32:63 then c 1 0b001 if a = b then c 1 0b001 CR0 1 c || XERSO CR4×BF+32:4×BF+35 1 c || XERSO The low-order 32 bits of register RX are compared with The low-order 16 bits of register RA are compared with oimm, treating operands as unsigned integers. The the low-order 16 bits of register RB, treating operands result of the comparison is placed into CR0. The value as signed integers. The result of the comparison is of oimm must be in the range of 1 to 32. placed into CR field BF. Special Registers Altered: Special Registers Altered: CR0 CR field BF Chapter 5. Fixed-Point Instructions 797 Version 2.05 Compare Halfword Short Form RR-form Compare Halfword Immediate I16A-form se_cmph RX,RY e_cmph16i RA,si 3 2 RY RX 28 si RA 22 si 0 6 8 12 15 0 6 11 16 21 31 a 1 EXTS((RX)48:63) a 1 EXTS((RA)48:63) b 1 EXTS((RY)48:63) b 1 EXTS(si) if a < b then c 1 0b100 if a < b then c 1 0b100 if a > b then c 1 0b010 if a > b then c 1 0b010 if a = b then c 1 0b001 if a = b then c 1 0b001 CR0 1 c || XERSO CR0 1 c || XERSO The low-order 16 bits of register RX are compared with The low-order 16 bits of register RA are compared with the low-order 16 bits of register RY, treating operands si, treating operands as signed integers. The result of as signed integers. The result of the comparison is the comparison is placed into CR0. placed into CR0. Special Registers Altered: Special Registers Altered: CR0 CR0 Compare Halfword Logical X-form Compare Halfword Logical Short Form RR-form e_cmphl BF,RA,RB se_cmphl RX,RY 31 BF // RA RB 46 / 0 6 9 11 16 21 31 3 3 RY RX 0 6 8 12 15 a 1 EXTZ((RA)48:63) b 1 EXTZ((RB)48:63) a 1 (RX)48:63 if a u b then c 1 0b010 if a u b then c 1 0b010 CR4×BF+32:4×BF+35 1 c || XERSO if a = b then c 1 0b001 CR0 1 c || XERSO The low-order 16 bits of register RA are compared with the low-order 16 bits of register RB, treating operands The low-order 16 bits of register RX are compared with as unsigned integers. The result of the comparison is the low-order 16 bits of register RY, treating operands placed into CR field BF. as unsigned integers. The result of the comparison is placed into CR0. Special Registers Altered: CR field BF Special Registers Altered: CR0 798 Power ISATM VLE Version 2.05 Compare Halfword Logical Immediate I16A-form 5.7 Fixed-Point Trap Instruc- tions e_cmphl16i RA,ui The fixed-point Trap instruction from Book I, tw is avail- 28 ui RA 23 ui able while executing in VLE mode. The mnemonics, 0 6 11 16 21 31 decoding, and semantics for this instruction is identical to that in Book I; see Section 3.3.10 of Book I for the a 1 480 || (RA)48:63 instruction definition. b 1 480 || ui The fixed-point Trap instruction from Book I, td is avail- if a u b then c 1 0b010 if a = b then c 1 0b001 tations. The mnemonic, decoding, and semantics for CR0 1 c || XERSO the td instruction are identical to those in Book I; see Section 3.3.10 of Book I for the instruction definitions. The low-order 16 bits of register RA are compared with the ui field, treating operands as signed integers. The result of the comparison is placed into CR0. 5.8 Fixed-Point Select Instruc- Special Registers Altered: CR0 tion The fixed-point Select instruction provides a means to select one of two registers and place the result in a destination register under the control of a predicate value supplied by a CR bit. The fixed-point Select instruction from Book I, isel is available while executing in VLE mode. The mnemon- ics, decoding, and semantics for this instruction is iden- tical to that in Book I; see Section of Book I for the instruction definition. Chapter 5. Fixed-Point Instructions 799 Version 2.05 5.9 Fixed-Point Logical, Bit, and Move Instructions The Logical instructions perform bit-parallel operations The fixed-point Logical instructions from Book I, and[.], on 64-bit operands. The Bit instructions manipulate a or[.], xor[.], nand[.], nor[.], eqv[.], andc[.], orc[.], bit, or create a bit mask, in a register. The Move extsb[.], extsh[.], cntlzw[.], and popcntb are avail- instructions move a register or an immediate value into able while executing in VLE mode. The mnemonics, a register. decoding, and semantics for these instructions are identical to those in Book I; see Section 3.3.12 of Book The X-form Logical instructions with Rc=1, the SCI8- I for the instruction definitions. form Logical instructions with Rc=1, the RR-form Logi- cal instructions with Rc=1, the e_and2i. instruction, The fixed-point Logical instructions from Book I, and the e_and2is. instruction set the first three bits of extsw[.] and cntlzd[.] are available while executing in CR field 0 as the arithmetic instructions described in VLE mode on 64-bit implementations. The mnemonics, Section 5.5, "Fixed-Point Arithmetic Instructions". (Also decoding, and semantics for these instructions are see Section 4.1.1.) The Logical instructions do not identical to those in Book I; see Section 3.3.12 of Book change the SO, OV, and CA bits in the XER. I for the instruction definitions. AND (two operand) Immediate I16L-form AND (2 operand) Immediate Shifted I16L-form e_and2i. RT,ui e_and2is. RT,ui 28 RT ui 25 ui 0 6 11 16 21 31 28 RT ui 29 ui 0 6 11 16 21 31 RT 1 (RT) & (480 || ui) RT 1 (RT) & (320 || ui || 160) The contents of register RT are ANDed with 480 || ui and the result is placed into register RT. The contents of register RT are ANDed with 320 || ui || 16 0 and the result is placed into register RT. Special Registers Altered: CR0 Special Registers Altered: CR0 AND Scaled Immediate Carrying AND Immediate Short Form IM5-form SCI8-form se_andi RX,UI5 e_andi RA,RS,sci8 (Rc=0) e_andi. RA,RS,sci8 (Rc=1) 11 1 UI5 RX 0 6 7 12 15 06 RS RA 12 Rc F SCL UI8 0 6 11 16 20 21 22 24 31 RX 1 (RX) & 590 || UI5 The contents of register RX are ANDed with 590 || UI5 sci8 1 56-SCL×8F || UI8 ||SCL×8F and the result is placed into register RX. RA 1 (RS) & sci8 Special Registers Altered: The contents of register RS are ANDed with sci8 and None the result is placed into register RA. Special Registers Altered: CR0 (if Rc=1) 800 Power ISATM VLE Version 2.05 OR (two operand) Immediate I16L-form OR (2 operand) Immediate Shifted I16L-form e_or2i RT,ui e_or2is RT,ui 28 RT ui 24 ui 0 6 11 16 21 31 28 RT ui 26 ui 0 6 11 16 21 31 RT 1 (RT) | (480 || ui) RT 1 (RT) | (320 || ui || 160) The contents of register RT are ORed with 480 || ui and the result is placed into register RT. The contents of register RT are ORed with 320 || ui || 16 0 and the result is placed into register RT. Special Registers Altered: None Special Registers Altered: None OR Scaled Immediate SCI8-form XOR Scaled Immediate SCI8-form e_ori RA,RS,sci8 (Rc=0) e_xori RA,RS,sci8 (Rc=0) e_ori. RA,RS,sci8 (Rc=1) e_xori. RA,RS,sci8 (Rc=1) 06 RS RA 13 Rc F SCL UI8 06 RS RA 14 Rc F SCL UI8 0 6 11 16 20 21 22 24 31 0 6 11 16 20 21 22 24 31 sci8 1 56-SCL×8F || UI8 ||SCL×8F sci8 1 56-SCL×8F || UI8 ||SCL×8F RA 1 (RS) | sci8 RA 1 (RS) sci8 The contents of register RS are ORed with sci8 and the The contents of register RS are XORed with sci8 and result is placed into register RA. the result is placed into register RA. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) AND Short Form RR-form AND with Complement Short Form RR-form se_and RX,RY (Rc=0) se_and. RX,RY (Rc=1) se_andc RX,RY 17 1 Rc RY RX 17 1 RY RX 0 6 7 8 12 15 0 6 8 12 15 RX 1 (RX) & (RY) RX 1 (RX) & ¬(RY) The contents of register RX are ANDed with the con- The contents of register RX are ANDed with the com- tents of register RY and the result is placed into register plement of the contents of register RY and the result is RX. placed into register RX. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) None Chapter 5. Fixed-Point Instructions 801 Version 2.05 OR Short Form RR-form NOT Short Form R-form se_or RX,RY se_not RX 17 0 RY RX 0 02 RX 0 6 8 12 15 0 6 12 15 RX 1 (RX) | (RY) RX 1 ¬(RX) The contents of register RX are ORed with the contents The contents of RX are complemented and placed into of register RY and the result is placed into register RX. register RX. Special Registers Altered: Special Registers Altered: None None Bit Clear Immediate IM5-form Bit Generate Immediate IM5-form se_bclri RX,UI5 se_bgeni RX,UI5 24 0 UI5 RX 24 1 UI5 RX 0 6 7 12 15 0 6 7 12 15 a 1 UI5 a 1 UI5 RX 1 (RX) & (a+321 || 0 || 31-a1) RX 1 (a+320 || 1 || 31-a0) Bit UI5+32 of register RX is set to 0. Bit UI5+32 of register RX is set to 1. All other bits in register RX are set to 0. Special Registers Altered: None Special Registers Altered: None Bit Mask Generate Immediate IM5-form Bit Set Immediate IM5-form se_bmaski RX,UI5 se_bseti RX,UI5 11 0 UI5 RX 25 0 UI5 RX 0 6 7 12 15 0 6 7 12 15 a 1 UI5 a 1 UI5 if a = 0 then RX 1 641 RX 1 (RX) | (a+320 || 1 || 31-a0) else RX 1 64-a0 || a1 Bit UI5+32 of register RX is set to 1. If UI5 is not zero, the low-order UI5 bits are set to 1 in register RX and all other bits in register RX are set to 0. Special Registers Altered: If UI5 is 0, all bits in register RX are set to 1. None Special Registers Altered: None 802 Power ISATM VLE Version 2.05 Extend Sign Byte Short Form R-form Extend Sign Halfword Short Form R-form se_extsb RX se_extsh RX 0 13 RX 0 15 RX 0 6 12 15 0 6 12 15 s 1 (RX)56 s 1 (RX)48 RX 1 56s || (RX)56:63 RX 1 48s || (RX)48:63 (RX)56:63 are placed into RX56:63. Bit 56 of register RX (RX)48:63 are placed into RX48:63. Bit 48 of register RX is placed into RX0:55. is placed into RX0:47. Special Registers Altered: Special Registers Altered: None None Extend Zero Byte R-form Extend Zero Halfword R-form se_extzb RX se_extzh RX 0 12 RX 0 14 RX 0 6 12 15 0 6 12 15 RX 1 560 || (RX)56:63 RX 1 480 || (RX)48:63 (RX)56:63 are placed into RX56:63. RX0:55 are set to 0. (RX)48:63 are placed into RX48:63. RX0:47 are set to 0. Special Registers Altered: Special Registers Altered: None None Load Immediate LI20-form Load Immediate Short Form IM7-form e_li RT,LI20 se_li RX,UI7 28 RT li204:8 0 li200:3 li209:19 09 UI7 RX 0 6 11 16 17 21 31 0 5 12 15 RT 1 EXTS(li205:8 || li200:4 || li209:19) RX 1 570 || UI7 The sign-extended LI20 field is placed into RT. The zero-extended UI7 field is placed into RX. Special Registers Altered: Special Registers Altered: None None Load Immediate Shifted I16L-form e_lis RT,ui 28 RT ui 28 ui 0 6 11 16 21 31 RT 1 320 || ui || 160 The zero-extended value of ui shifted left 16 bits is placed into RT. Special Registers Altered: None Chapter 5. Fixed-Point Instructions 803 Version 2.05 Move from Alternate Register RR-form Move Register RR-form se_mfar RX,ARY se_mr RX,RY 0 3 ARY RX 0 1 RY RX 0 6 8 12 15 0 6 8 12 15 r 1 ARY+8 RX 1 (RY) RX 1 GPR(r) The contents of register RY are placed into RX. The contents of register ARY+8 are placed into RX. ARY specifies a register in the range R8:R23. Special Registers Altered: None Special Registers Altered: None Move to Alternate Register RR-form se_mtar ARX,RY 0 2 RY ARX 0 6 8 12 15 r 1 ARX+8 GPR(r) 1 (RY) The contents of register RY are placed into register ARX+8. ARX specifies a register in the range R8:R23. Special Registers Altered: None 804 Power ISATM VLE Version 2.05 5.10 Fixed-Point Rotate and Shift Instructions The fixed-point Shift instructions from Book I, slw[.], The fixed-point Shift instructions from Book I, sld[.], srw[.], srawi[.], and sraw[.] are available while execut- srd[.], sradi[.], and srad[.] are available while execut- ing in VLE mode. The mnemonics, decoding, and ing in VLE mode on 64-bit implementations. The mne- semantics for those instructions are identical to those in monics, decoding, and semantics for those instructions Book I; see Section 3.3.13.2 of Book I for the instruc- are identical to those in Book I; see Section 3.3.13.2 of tion definitions. Book I for the instruction definitions. Rotate Left Word X-form Rotate Left Word Immediate X-form e_rlw RA,RS,RB (Rc=0) e_rlwi RA,RS,SH (Rc=0) e_rlw. RA,RS,RB (Rc=1) e_rlwi. RA,RS,SH (Rc=1) 31 RS RA RB 280 Rc 31 RS RA SH 312 Rc 0 6 11 16 21 31 0 6 11 16 21 31 n 1 (RB)59:63 n 1 SH RA 1 ROTL32((RS)32:63,n) RA 1 ROTL32((RS)32:63,n) The contents of register RS are rotated32 left the num- The contents of register RS are rotated32 left SH bits ber of bits specified by (RB)59:63 and the result is and the result is placed into register RA. placed into register RA. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) CR0 (if Rc=1) Rotate Left Word Immediate then Mask Rotate Left Word Immediate then AND Insert M-form with Mask M-form e_rlwimi RA,RS,SH,MB,ME e_rlwinm RA,RS,SH,MB,ME 29 RS RA SH MB ME 0 29 RS RA SH MB ME 1 0 6 11 16 21 26 31 0 6 11 16 21 26 31 n 1 SH n 1 SH r 1 ROTL32((RS)32:63, n) r 1 ROTL32((RS)32:63, n) m 1 MASK(MB+32, ME+32) m 1 MASK(MB+32, ME+32) RA 1 r&m | (RA)&¬m RA 1 r & m The contents of register RS are rotated32 left SH bits. A The contents of register RS are rotated32 left SH bits. A mask is generated having 1-bits from bit MB+32 mask is generated having 1-bits from bit MB+32 through bit ME+32 and 0-bits elsewhere. The rotated through bit ME+32 and 0-bits elsewhere. The rotated data is inserted into register RA under control of the data is ANDed with the generated mask and the result generated mask. is placed into register RA. Special Registers Altered: Special Registers Altered: None None Chapter 5. Fixed-Point Instructions 805 Version 2.05 Shift Left Word Immediate X-form Shift Left Word Immediate Short Form IM5-form e_slwi RA,RS,SH (Rc=0) e_slwi. RA,RS,SH (Rc=1) se_slwi RX,UI5 31 RS RA SH 56 Rc 27 0 UI5 RX 0 6 11 16 21 31 0 6 7 12 15 n 1 SH n 1 UI5 r 1 ROTL32((RS)32:63, n) r 1 ROTL32((RX)32:63, n) m 1 MASK(32, 63-n) m 1 MASK(32, 63-n) RA 1 r & m RX 1 r & m The contents of the low-order 32 bits of register RS are The contents of the low-order 32 bits of register RX are shifted left SH bits. Bits shifted out of position 32 are shifted left UI5 bits. Bits shifted out of position 32 are lost. Zeros are supplied to the vacated positions on the lost. Zeros are supplied to the vacated positions on the right. The 32-bit result is placed into RA32:63. RA0:31 right. The 32-bit result is placed into RX32:63. RX0:31 are set to 0. are set to 0. Special Registers Altered: Special Registers Altered: CR0 (if Rc=1) None Shift Left Word RR-form Shift Right Algebraic Word Immediate IM5-form se_slw RX,RY se_srawi RX,UI5 16 2 RY RX 0 6 8 12 15 26 1 UI5 RX 0 6 7 12 15 n 1 (RY)58:63 r 1 ROTL32((RX)32:63, n) n 1 UI5 if (RY)58 = 0 then m 1 MASK(32, 63-n) r 1 ROTL32((RX)32:63, 64-n) else m 1 640 m 1 MASK(n+32, 63) RX 1 r & m s 1 (RX)32 RX 1 r&m | (64s)&¬m The contents of the low-order 32 bits of register RX are CA 1 s & ((r&¬m)32:630) shifted left the number of bits specified by (RY)58:63. Bits shifted out of position 32 are lost. Zeros are sup- The contents of the low-order 32 bits of register RX are plied to the vacated positions on the right. The 32-bit shifted right UI5 bits. Bits shifted out of position 63 are result is placed into RX32:63. RX0:31 are set to 0. Shift lost, and bit 32 of RX is replicated to fill the vacated amounts from 32-63 give a zero result. positions on the left. Bit 32 of RX is replicated to fill RX0:31 and the 32-bit result is placed into RX32:63. CA Special Registers Altered: is set to 1 if the low-order 32 bits of register RX contain None a negative value and any 1-bits are shifted out of bit position 63; otherwise CA is set to 0. A shift amount of zero causes RX to receive EXTS((RX)32:63), and CA to be set to 0. Special Registers Altered: CA 806 Power ISATM VLE Version 2.05 Shift Right Algebraic Word RR-form Shift Right Word Immediate X-form se_sraw RX,RY e_srwi RA,RS,SH (Rc=0) e_srwi. RA,RS,SH (Rc=1) 16 1 RY RX 0 6 8 12 15 31 RS RA SH 568 Rc 0 6 11 16 21 31 n 1 (RY)59:63 r 1 ROTL32((RX)32:63, 64-n) n 1 SH if (RY)58 = 0 then m 1 MASK(n+32, 63) r 1 ROTL32((RS)32:63, 64-n) else m 1 640 m 1 MASK(n+32, 63) s 1 (RX)32 RA 1 r & m RX 1 r&m | (64s)&¬m CA 1 s & ((r&¬m)32:630) The contents of the low-order 32 bits of register RS are shifted right SH bits. Bits shifted out of position 63 are The contents of the low-order 32 bits of register RX are lost. Zeros are supplied to the vacated positions on the shifted right the number of bits specified by (RY)58:63. left. The 32-bit result is placed into RA32:63. RA0:31 are Bits shifted out of position 63 are lost, and bit 32 of RX set to 0. is replicated to fill the vacated positions on the left. Bit 32 of RX is replicated to fill RX0:31 and the 32-bit result Special Registers Altered: is placed into RX32:63. CA is set to 1 if the low-order 32 CR0 (if Rc=1) bits of register RX contain a negative value and any 1- bits are shifted out of bit position 63; otherwise CA is set to 0. A shift amount of zero causes RX to receive EXTS((RX)32:63), and CA to be set to 0. Shift amounts from 32-63 give a result of 64 sign bits, and cause CA to receive the sign bit of (RX)32:63. Special Registers Altered: CA Shift Right Word Immediate Short Form Shift Right Word RR-form IM5-form se_srw RX,RY se_srwi RX,UI5 16 0 RY RX 26 0 UI5 RX 0 6 8 12 15 0 6 7 12 15 n 1 (RY)59:63 n 1 UI5 r 1 ROTL32((RX)32:63, 64-n) r 1 ROTL32((RX)32:63, 64-n) if (RY)58 = 0 then m 1 MASK(n+32, 63) m 1 MASK(n+32, 63) else m 1 640 RX 1 r & m RX 1 r & m The contents of the low-order 32 bits of register RX are The contents of the low-order 32 bits of register RX are shifted right UI5 bits. Bits shifted out of position 63 are shifted right the number of bits specified by (RY)58:63. lost. Zeros are supplied to the vacated positions on the Bits shifted out of position 63 are lost. Zeros are sup- left. The 32-bit result is placed into RX32:63. RX0:31 are plied to the vacated positions on the left. The 32-bit set to 0. result is placed into RX32:63. RX0:31 are set to 0. Shift amounts from 32 to 63 give a zero result. Special Registers Altered: None Special Registers Altered: None Chapter 5. Fixed-Point Instructions 807 Version 2.05 5.11 Move To/From System Register Instructions The VLE category provides 16-bit forms of instructions The fixed-point Move To/From System Register to move to/from the LR and CTR. instructions from Book III-E, mfspr, mtspr, mfdcr, mtdcr, mtmsr, mfmsr, wrtee, and wrteei are available The fixed-point Move To/From System Register while executing in VLE mode. The mnemonics, decod- instructions from Book I, mfspr, mtcrf, mfcr, mtocrf, ing, and semantics for these instructions are identical mfocrf, mcrxr, mtdcrux, mfdcrux, mfapidi, and to those in Book III-E; see Section 3.4.1 of Book III-E mtspr are available while executing in VLE mode. The for the instruction definitions. mnemonics, decoding, and semantics for these instruc- tions are identical to those in Book I; see Section 3.3.14 of Book I for the instruction definitions. Move From Count Register R-form Move From Link Register R-form se_mfctr RX se_mflr RX 0 10 RX 0 8 RX 0 6 12 15 0 6 12 15 RX 1 CTR RX 1 LR The CTR contents are placed into register RX. The LR contents are placed into register RX. Special Registers Altered: Special Registers Altered: None None Move To Count Register R-form Move To Link Register R-form se_mtctr RX se_mtlr RX 0 11 RX 0 9 RX 0 6 12 15 0 6 12 15 CTR 1 (RX) LR 1 (RX) The contents of register RX are placed into the CTR. The contents of register RX are placed into the LR. Special Registers Altered: Special Registers Altered: CTR LR 808 Power ISATM VLE Version 2.05 Chapter 6. Storage Control Instructions 6.1 Storage Synchronization Instructions . 6.4 TLB Management Instructions. . . . 810 809 6.5 Instruction Alignment and Byte Order- 6.2 Cache Management Instructions . 810 ing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 810 6.3 Cache Locking Instructions . . . . . 810 6.1 Storage Synchronization Instruction Synchronize C-form Instructions se_isync The memory synchronization instructions implemented 01 by category VLE are identical in semantics to those 0 15 defined in Book II and Book III-E. The se_isync instruction is defined by category VLE, but has the Executing an se_isync instruction ensures that all same semantics as isync. instructions preceding the se_isync instruction have The Load and Reserve and Store Conditional instruc- completed before the se_isync instruction completes, tions from Book II, lwarx and stwcx. are available and that no subsequent instructions are initiated until while executing in VLE mode. The mnemonics, decod- after the se_isync instruction completes. It also ing, and semantics for those instructions are identical ensures that all instruction cache block invalidations to those in Book II; see Section 3.4.2 of Book II for the caused by icbi instructions preceding the se_isync instruction definitions. instruction have been performed with respect to the processor executing the se_isync instruction, and then The Load and Reserve and Store Conditional instruc- causes any prefetched instructions to be discarded. tions from Book II, ldarx and stdcx. are available while executing in VLE mode on 64-bit implementations. The Except as described in the preceding sentence, the mnemonics, decoding, and semantics for those instruc- se_isync instruction may complete before storage tions are identical to those in Book II; see Section 3.4.2 accesses associated with instructions preceding the of Book II for the instruction definitions. se_isync instruction have been performed. This instruction is context synchronizing. The Memory Barrier instructions from Book II, sync (msync) and mbar are available while executing in The se_isync instruction has identical semantics to the VLE mode. The mnemonics, decoding, and semantics Book II isync instruction, but has a different encoding. for those instructions are identical to those in Book II; Special Registers Altered: see Section 3.4.3 of Book II for the instruction defini- None tions. The wait instruction from Book II is available while exe- cuting in VLE mode if the category Wait is imple- mented. The mnemonics, decoding, and semantics for wait are identical to those in Book II; see Section 3.4 of Book II for the instruction definition. Chapter 6. Storage Control Instructions 809 Version 2.05 6.2 Cache Management Instruc- 6.5 Instruction Alignment and tions Byte Ordering Cache management instructions implemented by cate- Only Big-Endian instruction memory is supported when gory VLE are identical to those defined in Book II and executing from a page of VLE instructions. Attempting Book III-E. to fetch VLE instructions from a page marked as Little- Endian generates an instruction storage interrupt byte- The Cache Management instructions from Book II, ordering exception. dcba, dcbf, dcbst, dcbt, dcbtst, dcbz, icbi, and icbt are available while executing in VLE mode. The mne- monics, decoding, and semantics for these instructions are identical to those in Book II; see Section 3.3 of Book II for the instruction definitions. The Cache Management instruction from Book III-E, dcbi is available while executing in VLE mode. The mnemonics, decoding, and semantics for this instruc- tion are identical to those in Book III-E; see Section 4.9.1 of Book III-E for the instruction definition. 6.3 Cache Locking Instructions Cache locking instructions implemented by category VLE are identical to those defined in Book III-E. If the Cache Locking instructions are implemented in cate- gory VLE, the category Embedded Cache Locking must also be implemented. The Cache Locking instructions from Book III-E, dcbtls, dcbtstls, dcblc, icbtls, and icblc are available while executing in VLE mode. The mnemonics, decod- ing, and semantics for these instructions are identical to those in Book III-E; see Section 4.9.2 of Book III-E for the instruction definitions. 6.4 TLB Management Instruc- tions The TLB management instructions implemented by cat- egory VLE are identical to those defined in Book III-E. The TLB Management instructions from Book III-E, tlbre, tlbwe, tlbivax, tlbsync, and tlbsx are available while executing in VLE mode. The mnemonics, decod- ing, and semantics for these instructions are identical to those in Book III-E. See Section 4.9.4.1 of Book III-E for the instruction definitions. Instructions and resources from category Embed- ded.MMU Type FSL are available if the appropriate category is implemented. 810 Power ISATM VLE Version 2.05 Chapter 7. Additional Categories Available in VLE 7.1 Move Assist . . . . . . . . . . . . . . . . . 811 7.6 External PID . . . . . . . . . . . . . . . . . 811 7.2 Vector . . . . . . . . . . . . . . . . . . . . . . 811 7.7 Embedded Performance Monitor . 812 7.3 Signal Processing Engine. . . . . . . 811 7.8 Processor Control . . . . . . . . . . . . . 812 7.4 Embedded Floating Point . . . . . . . 811 7.5 Legacy Move Assist . . . . . . . . . . . 811 Instructions and resources from categories other than Base and Embedded are available in VLE. These 7.4 Embedded Floating Point include categories for which all the instructions in the Embedded Floating Point instructions implemented by category use primary opcode 4 or primary opcode 31. category VLE are identical to those defined in Book I. If the Embedded Floating Point instructions are imple- mented in category VLE, the appropriate category 7.1 Move Assist SPE.Embedded Float Scalar Double, SPE.Embedded Float Scalar Single, or SPE.Embedded Float Vector Move Assist instructions implemented by category VLE must also be implemented. The mnemonics, decoding, are identical to those defined in Book I. If the Move and semantics for those instructions are identical to Assist instructions are implemented in category VLE, those in Book I; see Chapter 8 of Book I for the instruc- category Move Assist must also be implemented. The tion definitions. mnemonics, decoding, and semantics for those instruc- tions are identical to those in Book I; see Section 3.3.6 of Book I for the instruction definitions. 7.5 Legacy Move Assist Legacy Move Assist instructions implemented by cate- 7.2 Vector gory VLE are identical to those defined in Book I. If the Legacy Move Assist instructions are implemented in Vector instructions implemented by category VLE are category VLE, category Legacy Move Assist must also identical to those defined in Book I. If the Vector be implemented. The mnemonics, decoding, and instructions are implemented in category VLE, category semantics for those instructions are identical to those in Vector must also be implemented. The mnemonics, Book I; see Chapter 9 of Book I for the instruction defi- decoding, and semantics for those instructions are nitions. identical to those in Book I; see Chapter 6 of Book I for the instruction definitions. 7.6 External PID 7.3 Signal Processing Engine External Process ID instructions implemented by cate- gory VLE are identical to those defined in Book III-E. If Signal Processing Engine instructions implemented by the External Process ID instructions are implemented category VLE are identical to those defined in Book I. If in category VLE, category Embedded.External PID the Signal Processing Engine instructions are imple- must also be implemented. The mnemonics, decoding, mented in category VLE, category Signal Processing and semantics for those instructions are identical to Engine must also be implemented. The mnemonics, those in Book III-E; see Chapter 3.3.4 of Book III-E for decoding, and semantics for those instructions are the instruction definitions. identical to those in Book I; see Chapter 7 of Book Ifor the instruction definitions. Chapter 7. Additional Categories Available in VLE 811 Version 2.05 7.7 Embedded Performance Monitor Embedded Performance Monitor instructions imple- mented by category VLE are identical to those defined in Book III-E. If the Embedded Performance Monitor instructions are implemented in category VLE, category Embedded.Performance Monitor must also be imple- mented. The mnemonics, decoding, and semantics for those instructions are identical to those in Book III-E; see Appendix E of Book III-E for the instruction defini- tions. 7.8 Processor Control Processor Control instructions implemented by cate- gory VLE are identical to those defined in Book III-E. If the Processor Control instructions are implemented in category VLE, category Embedded.Processor Control must also be implemented. The mnemonics, decoding, and semantics for those instructions are identical to those in Book III-E; see Chapter 9 of Book III-E for the instruction definitions. 812 Power ISATM VLE Version 2.05 Appendix A. VLE Instruction Set Sorted by Mnemonic This appendix lists all the instructions available in VLE mode in the Power ISA, in order by mnemonic. Opcodes that are not defined below are treated as illegal by category VLE. Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 XO 7C000214 B add[o][.] Add XO 7C000014 B addc[o][.] Add Carrying XO 7C000114 SR B adde[o][.] Add Extended XO 7C0001D4 SR B addme[o][.] Add to Minus One Extended XO 7C000194 SR B addze[o][.] Add to Zero Extended X 7C000038 SR B and[.] AND X 7C000078 SR B andc[.] AND with Complement EVX 1000020F SP brinc Bit Reverse Increment X 7C000000 B cmp Compare X 7C000040 B cmpl Compare Logical X 7C000074 SR 64 cntlzd[.] Count Leading Zeros Doubleword X 7C000034 SR B cntlzw[.] Count Leading Zeros Word X 7C0005EC E dcba Data Cache Block Allocate X 7C0000AC B dcbf Data Cache Block Flush X 7C0000FE P E.PD dcbfep Data Cache Block Flush by External Process ID X 7C0003AC P E dcbi Data Cache Block Invalidate X 7C00030C M ECL dcblc Data Cache Block Lock Clear X 7C00006C B dcbst Data Cache Block Store X 7C00022C B dcbt Data Cache Block Touch X 7C00027E P E.PD dcbtep Data Cache Block Touch by External Process ID X 7C00014C M ECL dcbtls Data Cache Block Touch and Lock Set X 7C0001EC B dcbtst Data Cache Block Touch for Store X 7C0001FE P E.PD dcbtstep Data Cache Block Touch for Store by External Process ID X 7C00010C M ECL dcbtstls Data Cache Block Touch for Store and Lock Set X 7C0007EC B dcbz Data Cache Block set to Zero X 7C0007FE P E.PD dcbzep Data Cache Block set to Zero by External Process ID X 7C00038C P E.CI dci Data Cache Invalidate X 7C00028C P E.CD dcread Data Cache Read X 7C0003CC P E.CD dcread Data Cache Read XO 7C0003D2 SR 64 divd[o][.] Divide Doubleword XO 7C000392 SR 64 divdu[o][.] Divide Doubleword Unsigned XO 7C0003D6 SR B divw[o][.] Divide Word XO 7C000396 SR B divwu[o][.] Divide Word Unsigned D 1C000000 VLE e_add16i Add Immediate I16A 70008800 SR VLE e_add2i. Add (2 operand) Immediate and Record I16A 70009000 VLE e_add2is Add (2 operand) Immediate Shifted SCI8 18008000 SR VLE e_addi[.] Add Scaled Immediate SCI8 18009000 SR VLE e_addic[.] Add Scaled Immediate Carrying I16L 7000C800 SR VLE e_and2i. AND (2 operand) Immediate I16L 7000E800 SR VLE e_and2is. AND (2 operand) Immediate Shifted SCI8 1800C000 SR VLE e_andi[.] AND Scaled Immediate BD24 78000000 VLE e_b[l] Branch [and Link] BD15 7A000000 CT VLE e_bc[l] Branch Conditional [and Link] IA16 70009800 VLE e_cmp16i Compare Immediate Word IA16 7000B000 VLE e_cmph16i Compare Halfword Immediate Appendix A. VLE Instruction Set Sorted by Mnemonic 813 Version 2.05 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 X 7C00001C VLE e_cmph Compare Halfword IA16 7000B800 VLE e_cmphl16i Compare Halfword Logical Immediate X 7C00005C VLE e_cmphl Compare Halfword Logical SCI8 1800A800 VLE e_cmpi Compare Scaled Immediate Word I16A 7000A800 VLE e_cmpl16i Compare Logical Immediate Word SCI8 1880A800 VLE e_cmpli Compare Logical Scaled Immediate Word XL 7C000202 VLE e_crand Condition Register AND XL 7C000102 VLE e_crandc Condition Register AND with Complement XL 7C000242 VLE e_creqv Condition Register Equivalent XL 7C0001C2 VLE e_crnand Condition Register NAND XL 7C000042 VLE e_crnor Condition Register NOR XL 7C000382 VLE e_cror Condition Register OR XL 7C000342 VLE e_crorc Condition Register OR with Complement XL 7C000182 VLE e_crxor Condition Register XOR D 30000000 VLE e_lbz Load Byte and Zero D8 18000000 VLE e_lbzu Load Byte and Zero with Update D 38000000 VLE e_lha Load Halfword Algebraic D8 18000300 VLE e_lhau Load Halfword Algebraic with Update D 58000000 VLE e_lhz Load Halfword and Zero D8 18000100 VLE e_lhzu Load Halfword and Zero with Update LI20 70000000 VLE e_li Load Immediate I16L 7000E000 VLE e_lis Load Immediate Shifted D8 18000800 VLE e_lmw Load Multiple Word D 50000000 VLE e_lwz Load Word and Zero D8 18000200 VLE e_lwzu Load Word and Zero with Update XL 7C000020 VLE e_mcrf Move CR Field I16A 7000A000 VLE e_mull2i Multiply (2 operand) Low Immediate SCI8 1800A000 VLE e_mulli Multiply Low Scaled Immediate I16L 7000C000 VLE e_or2i OR (2operand) Immediate I16L 7000D000 VLE e_or2is OR (2 operand) Immediate Shifted SCI8 1800D000 SR VLE e_ori[.] OR Scaled Immediate X 7C000230 SR VLE e_rlw[.] Rotate Left Word X 7C000270 SR VLE e_rlwi[.] Rotate Left Word Immediate M 74000000 VLE e_rlwimi Rotate Left Word Immediate then Mask Insert M 74000001 VLE e_rlwinm Rotate Left Word Immediate then AND with Mask X 7C000070 SR VLE e_slwi[.] Shift Left Word Immediate X 7C000470 SR VLE e_srwi[.] Shift Right Word Immediate D 34000000 VLE e_stb Store Byte D8 18000400 VLE e_stbu Store Byte with Update D 5C000000 VLE e_sth Store Halfword D8 18000500 VLE e_sthu Store Halfword with Update D8 18000900 VLE e_stmw Store Multiple Word D 54000000 VLE e_stw Store Word D8 18000600 VLE e_stwu Store word with Update SCI8 1800B000 SR VLE e_subfic[.] Subtract From Scaled Immediate Carrying SCI8 1800E000 SR VLE e_xori[.] XOR Scaled Immediate EVX 100002E4 SP.FD efdabs Floating-Point Double-Precision Absolute Value EVX 100002E0 SP.FD efdadd Floating-Point Double-Precision Add EVX 100002EF SP.FD efdcfs Floating-Point Double-Precision Convert from Single-Preci- sion EVX 100002F3 SP.FD efdcfsf Convert Floating-Point Double-Precision from Signed Frac- tion EVX 100002F1 SP.FD efdcfsi Convert Floating-Point Double-Precision from Signed Inte- ger EVX 100002E3 SP.FD efdcfsid Convert Floating-Point Double-Precision from Signed Inte- ger Doubleword EVX 100002F2 SP.FD efdcfuf Convert Floating-Point Double-Precision from Unsigned Fraction EVX 100002F0 SP.FD efdcfui Convert Floating-Point Double-Precision from Unsigned Integer 814 Power ISATM VLE Version 2.05 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 100002E2 SP.FD efdcfuid Convert Floating-Point Double-Precision from Unsigned Integer Doubleword EVX 100002EE SP.FD efdcmpeq Floating-Point Double-Precision Compare Equal EVX 100002EC SP.FD efdcmpgt Floating-Point Double-Precision Compare Greater Than EVX 100002ED SP.FD efdcmplt Floating-Point Double-Precision Compare Less Than EVX 100002F7 SP.FD efdctsf Convert Floating-Point Double-Precision to Signed Fraction EVX 100002F5 SP.FD efdctsi Convert Floating-Point Double-Precision to Signed Integer EVX 100002EB SP.FD efdctsidz Convert Floating-Point Double-Precision to Signed Integer Doubleword with Round Towards Zero EVX 100002FA SP.FD efdctsiz Convert Floating-Point Double-Precision to Signed Integer with Round Towards Zero EVX 100002F6 SP.FD efdctuf Convert Floating-Point Double-Precision to Unsigned Frac- tion EVX 100002F4 SP.FD efdctui Convert Floating-Point Double-Precision to Unsigned Inte- ger EVX 100002EA SP.FD efdctuidz Convert Floating-Point Double-Precision to Unsigned Inte- ger Doubleword with Round Towards Zero EVX 100002F8 SP.FD efdctuiz Convert Floating-Point Double-Precision to Unsigned Inte- ger with Round Towards Zero EVX 100002E9 SP.FD efddiv Floating-Point Double-Precision Divide EVX 100002E8 SP.FD efdmul Floating-Point Double-Precision Multiply EVX 100002E5 SP.FD efdnabs Floating-Point Double-Precision Negative Absolute Value EVX 100002E6 SP.FD efdneg Floating-Point Double-Precision Negate EVX 100002E1 SP.FD efdsub Floating-Point Double-Precision Subtract EVX 100002FE SP.FD efdtsteq Floating-Point Double-Precision Test Equal EVX 100002FC SP.FD efdtstgt Floating-Point Double-Precision Test Greater Than EVX 100002FD SP.FD efdtstlt Floating-Point Double-Precision Test Less Than EVX 100002E4 SP.FS efsabs Floating-Point Single-Precision Absolute Value EVX 100002E0 SP.FS efsadd Floating-Point Single-Precision Add EVX 100002CF SP.FD efscfd Floating-Point Single-Precision Convert from Double-Preci- sion EVX 100002F3 SP.FS efscfsf Convert Floating-Point Single-Precision from Signed Frac- tion EVX 100002F1 SP.FS efscfsi Convert Floating-Point Single-Precision from Signed Integer EVX 100002E3 SP.FS efscfsid Convert Floating-Point Single-Precision from Signed Integer Doubleword EVX 100002F2 SP.FS efscfuf Convert Floating-Point Single-Precision from Unsigned Fraction EVX 100002F0 SP.FS efscfui Convert Floating-Point Single-Precision from Unsigned Inte- ger EVX 100002E2 SP.FS efscfuid Convert Floating-Point Single-Precision from Unsigned Inte- ger Doubleword EVX 100002EE SP.FS efscmpeq Floating-Point Single-Precision Compare Equal EVX 100002EC SP.FS efscmpgt Floating-Point Single-Precision Compare Greater Than EVX 100002ED SP.FS efscmplt Floating-Point Single-Precision Compare Less Than EVX 100002F7 SP.FS efsctsf Convert Floating-Point Single-Precision to Signed Fraction EVX 100002F5 SP.FS efsctsi Convert Floating-Point Single-Precision to Signed Integer EVX 100002EB SP.FS efsctsidz Convert Floating-Point Single-Precision to Signed Integer Doubleword with Round Towards Zero EVX 100002FA SP.FS efsctsiz Convert Floating-Point Single-Precision to Signed Integer with Round Towards Zero EVX 100002F6 SP.FS efsctuf Convert Floating-Point Single-Precision to Unsigned Frac- tion EVX 100002F4 SP.FS efsctui Convert Floating-Point Single-Precision to Unsigned Integer EVX 100002EA SP.FS efsctuidz Convert Floating-Point Single-Precision to Unsigned Integer Doubleword with Round Towards Zero EVX 100002F8 SP.FS efsctuiz Convert Floating-Point Single-Precision to Unsigned Integer with Round Towards Zero EVX 100002E9 SP.FS efsdiv Floating-Point Single-Precision Divide EVX 100002E8 SP.FS efsmul Floating-Point Single-Precision Multiply Appendix A. VLE Instruction Set Sorted by Mnemonic 815 Version 2.05 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 100002E5 SP.FS efsnabs Floating-Point Single-Precision Negative Absolute Value EVX 100002E6 SP.FS efsneg Floating-Point Single-Precision Negate EVX 100002E1 SP.FS efssub Floating-Point Single-Precision Subtract EVX 100002FE SP.FS efststeq Floating-Point Single-Precision Test Equal EVX 100002FC SP.FS efststgt Floating-Point Single-Precision Test Greater Than EVX 100002FD SP.FS efststlt Floating-Point Single-Precision Test Less Than X 7C000238 SR B eqv[.] Equivalent EVX 10000208 SP evabs Vector Absolute Value EVX 10000202 SP evaddiw Vector Add Immediate Word EVX 100004C9 SP evaddsmiaaw Vector Add Signed, Modulo, Integer to Accumulator Word EVX 100004C1 SP evaddssiaaw Vector Add Signed, Saturate, Integer to Accumulator Word EVX 100004C8 SP evaddumiaaw Vector Add Unsigned, Modulo, Integer to Accumulator Word EVX 100004C0 SP evaddusiaaw Vector Add Unsigned, Saturate, Integer to Accumulator Word EVX 10000200 SP evaddw Vector Add Word EVX 10000211 SP evand Vector AND EVX 10000212 SP evandc Vector AND with Complement EVX 10000234 SP evcmpeq Vector Compare Equal EVX 10000231 SP evcmpgts Vector Compare Greater Than Signed EVX 10000230 SP evcmpgtu Vector Compare Greater Than Unsigned EVX 10000233 SP evcmplts Vector Compare Less Than Signed EVX 10000232 SP evcmpltu Vector Compare Less Than Unsigned EVX 1000020E SP evcntlsw Vector Count Leading Sign Bits Word EVX 1000020D SP evcntlzw Vector Count Leading Zeros Bits Word EVX 100004C6 SP evdivws Vector Divide Word Signed EVX 100004C7 SP evdivwu Vector Divide Word Unsigned EVX 10000219 SP eveqv Vector Equivalent EVX 1000020A SP evextsb Vector Extend Sign Byte EVX 1000020B SP evextsh Vector Extend Sign Halfword EVX 10000284 SP.FV evfsabs Vector Floating-Point Single-Precision Absolute Value EVX 10000280 SP.FV evfsadd Vector Floating-Point Single-Precision Add EVX 10000293 SP.FV evfscfsf Vector Convert Floating-Point Single-Precision from Signed Fraction EVX 10000291 SP.FV evfscfsi Vector Convert Floating-Point Single-Precision from Signed Integer EVX 10000292 SP.FV evfscfuf Vector Convert Floating-Point Single-Precision from Unsigned Fraction EVX 10000290 SP.FV evfscfui Vector Convert Floating-Point Single-Precision from Unsigned Integer EVX 1000028E SP.FV evfscmpeq Vector Floating-Point Single-Precision Compare Equal EVX 1000028C SP.FV evfscmpgt Vector Floating-Point Single-Precision Compare Greater Than EVX 1000028D SP.FV evfscmplt Vector Floating-Point Single-Precision Compare Less Than EVX 10000297 SP.FV evfsctsf Vector Convert Floating-Point Single-Precision to Signed Fraction EVX 10000295 SP.FV evfsctsi Vector Convert Floating-Point Single-Precision to Signed Integer EVX 1000029A SP.FV evfsctsiz Vector Convert Floating-Point Single-Precision to Signed Integer with Round Towards Zero EVX 10000296 SP.FV evfsctuf Vector Convert Floating-Point Single-Precision to Unsigned Fraction EVX 10000294 SP.FV evfsctui Vector Convert Floating-Point Single-Precision to Unsigned Integer EVX 10000298 SP.FV evfsctuiz Vector Convert Floating-Point Single-Precision to Unsigned Integer with Round Towards Zero EVX 10000289 SP.FV evfsdiv Vector Floating-Point Single-Precision Divide EVX 10000288 SP.FV evfsmul Vector Floating-Point Single-Precision Multiply EVX 10000285 SP.FV evfsnabs Vector Floating-Point Single-Precision Negative Absolute Value EVX 10000286 SP.FV evfsneg Vector Floating-Point Single-Precision Negate 816 Power ISATM VLE Version 2.05 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 10000281 SP.FV evfssub Vector Floating-Point Single-Precision Subtract EVX 1000029E SP.FV evfststeq Vector Floating-Point Single-Precision Test Equal EVX 1000029C SP.FV evfststgt Vector Floating-Point Single-Precision Test Greater Than EVX 1000029D SP.FV evfststlt Vector Floating-Point Single-Precision Test Less Than EVX 10000301 SP evldd Vector Load Doubleword into Doubleword EVX 7C00011D P E.PD evlddepx Vector Load Doubleword into Doubleword by External Pro- cess ID Indexed EVX 10000300 SP evlddx Vector Load Doubleword into Doubleword Indexed EVX 10000305 SP evldh Vector Load Doubleword into 4 Halfwords EVX 10000304 SP evldhx Vector Load Doubleword into 4 Halfwords Indexed EVX 10000303 SP evldw Vector Load Doubleword into 2 Words EVX 10000302 SP evldwx Vector Load Doubleword into 2 Words Indexed EVX 10000309 SP evlhhesplat Vector Load Halfword into Halfwords Even and Splat EVX 10000308 SP evlhhesplatx Vector Load Halfword into Halfwords Even and Splat Indexed EVX 1000030F SP evlhhossplat Vector Load Halfword into Halfwords Odd and Splat EVX 1000030E SP evlhhossplatx Vector Load Halfword into Halfwords Odd Signed and Splat Indexed EVX 1000030D SP evlhhousplat Vector Load Halfword into Halfwords Odd Unsigned and Splat EVX 1000030C SP evlhhousplatx Vector Load Halfword into Halfwords Odd Unsigned and Splat Indexed EVX 10000311 SP evlwhe Vector Load Word into Two Halfwords Even EVX 10000310 SP evlwhex Vector Load Word into Two Halfwords Even Indexed EVX 10000317 SP evlwhos Vector Load Word into Two Halfwords Odd Signed (with sign extension) EVX 10000316 SP evlwhosx Vector Load Word into Two Halfwords Odd Signed Indexed (with sign extension) EVX 10000315 SP evlwhou Vector Load Word into Two Halfwords Odd Unsigned (zero- extended) EVX 10000314 SP evlwhoux Vector Load Word into Two Halfwords Odd Unsigned Indexed (zero-extended) EVX 1000031D SP evlwhsplat Vector Load Word into Two Halfwords and Splat EVX 1000031C SP evlwhsplatx Vector Load Word into Two Halfwords and Splat Indexed EVX 10000319 SP evlwwsplat Vector Load Word into Word and Splat EVX 10000318 SP evlwwsplatx Vector Load Word into Word and Splat Indexed EVX 1000022C SP evmergehi Vector Merge High EVX 1000022E SP evmergehilo Vector Merge High/Low EVX 1000022D SP evmergelo Vector Merge Low EVX 1000022F SP evmergelohi Vector Merge Low/High EVX 1000052B SP evmhegsmfaa Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate EVX 100005AB SP evmhegsmfan Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate Negative EVX 10000529 SP evmhegsmiaa Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate EVX 100005A9 SP evmhegsmian Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate Negative EVX 10000528 SP evmhegumiaa Vector Multiply Halfwords, Even, Guarded, Unsigned, Mod- ulo, Integer and Accumulate EVX 100005A8 SP evmhegumian Vector Multiply Halfwords, Even, Guarded, Unsigned, Mod- ulo, Integer and Accumulate Negative EVX 1000040B SP evmhesmf Vector Multiply Halfwords, Even, Signed, Modulo, Fractional EVX 1000042B SP evmhesmfa Vector Multiply Halfwords, Even, Signed, Modulo, Fractional to Accumulate EVX 1000050B SP evmhesmfaaw Vector Multiply Halfwords, Even, Signed, Modulo, Fractional and Accumulate into Words EVX 1000058B SP evmhesmfanw Vector Multiply Halfwords, Even, Signed, Modulo, Fractional and Accumulate Negative into Words EVX 10000409 SP evmhesmi Vector Multiply Halfwords, Even, Signed, Modulo, Integer Appendix A. VLE Instruction Set Sorted by Mnemonic 817 Version 2.05 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 10000429 SP evmhesmia Vector Multiply Halfwords, Even, Signed, Modulo, Integer to Accumulator EVX 10000509 SP evmhesmiaaw Vector Multiply Halfwords, Even, Signed, Modulo, Integer and Accumulate into Words EVX 10000589 SP evmhesmianw Vector Multiply Halfwords, Even, Signed, Modulo, Integer and Accumulate Negative into Words EVX 10000403 SP evmhessf Vector Multiply Halfwords, Even, Signed, Saturate, Frac- tional EVX 10000423 SP evmhessfa Vector Multiply Halfwords, Even, Signed, Saturate, Frac- tional to Accumulator EVX 10000503 SP evmhessfaaw Vector Multiply Halfwords, Even, Signed, Saturate, Frac- tional and Accumulate into Words EVX 10000583 SP evmhessfanw Vector Multiply Halfwords, Even, Signed, Saturate, Frac- tional and Accumulate Negative into Words EVX 10000501 SP evmhessiaaw Vector Multiply Halfwords, Even, Signed, Saturate, Integer and Accumulate into Words EVX 10000581 SP evmhessianw Vector Multiply Halfwords, Even, Signed, Saturate, Integer and Accumulate Negative into Words EVX 10000408 SP evmheumi Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer EVX 10000428 SP evmheumia Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer to Accumulator EVX 10000508 SP evmheumiaaw Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate into Words EVX 10000588 SP evmheumianw Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate Negative into Words EVX 10000500 SP evmheusiaaw Vector Multiply Halfwords, Even, Unsigned, Saturate Integer and Accumulate into Words EVX 10000580 SP evmheusianw Vector Multiply Halfwords, Even, Unsigned, Saturate Integer and Accumulate Negative into Words EVX 1000052F SP evmhogsmfaa Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Fractional and Accumulate EVX 100005AF SP evmhogsmfan Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Fractional and Accumulate Negative EVX 1000052D SP evmhogsmiaa Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Integer and Accumulate EVX 100005AD SP evmhogsmian Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Integer and Accumulate Negative EVX 1000052C SP evmhogumiaa Vector Multiply Halfwords, Odd, Guarded, Unsigned, Mod- ulo, Integer and Accumulate EVX 100005AC SP evmhogumian Vector Multiply Halfwords, Odd, Guarded, Unsigned, Mod- ulo, Integer and Accumulate Negative EVX 1000040F SP evmhosmf Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional EVX 1000042F SP evmhosmfa Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional to Accumulator EVX 1000050F SP evmhosmfaaw Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional and Accumulate into Words EVX 1000058F SP evmhosmfanw Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional and Accumulate Negative into Words EVX 1000040D SP evmhosmi Vector Multiply Halfwords, Odd, Signed, Modulo, Integer EVX 1000042D SP evmhosmia Vector Multiply Halfwords, Odd, Signed, Modulo, Integer to Accumulator EVX 1000050D SP evmhosmiaaw Vector Multiply Halfwords, Odd, Signed, Modulo, Integer and Accumulate into Words EVX 1000058D SP evmhosmianw Vector Multiply Halfwords, Odd, Signed, Modulo, Integer and Accumulate Negative into Words EVX 10000407 SP evmhossf Vector Multiply Halfwords, Odd, Signed, Fractional EVX 10000427 SP evmhossfa Vector Multiply Halfwords, Odd, Signed, Fractional to Accu- mulator 818 Power ISATM VLE Version 2.05 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 10000507 SP evmhossfaaw Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional and Accumulate into Words EVX 10000587 SP evmhossfanw Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional and Accumulate Negative into Words EVX 10000505 SP evmhossiaaw Vector Multiply Halfwords, Odd, Signed, Saturate, Integer and Accumulate into Words EVX 10000585 SP evmhossianw Vector Multiply Halfwords, Odd, Signed, Saturate, Integer and Accumulate Negative into Words EVX 1000040C SP evmhoumi Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer EVX 1000042C SP evmhoumia Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer to Accumulator EVX 1000050C SP evmhoumiaaw Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate into Words EVX 1000058C SP evmhoumianw Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate Negative into Words EVX 10000504 SP evmhousiaaw Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate into Words EVX 10000584 SP evmhousianw Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate Negative into Words EVX 100004C4 SP evmra Initialize Accumulator EVX 1000044F SP evmwhsmf Vector Multiply Word High Signed, Modulo, Fractional EVX 1000046F SP evmwhsmfa Vector Multiply Word High Signed, Modulo, Fractional to Accumulator EVX 1000054F SP evmwhsmfaaw Vector Multiply Word High Signed, Modulo, Fractional and Accumulate into Words EVX 100005CF SP evmwhsmfanw Vector Multiply Word High Signed, Modulo, Fractional and Accumulate Negative into Words EVX 1000044D SP evmwhsmi Vector Multiply Word High Signed, Modulo, Integer EVX 1000046D SP evmwhsmia Vector Multiply Word High Signed, Modulo, Integer to Accu- mulator EVX 1000054D SP evmwhsmiaaw Vector Multiply Word High Signed, Modulo, Integer and Accumulate into Words EVX 100005CD SP evmwhsmianw Vector Multiply Word High Signed, Modulo, Integer and Accumulate Negative into Words EVX 10000447 SP evmwhssf Vector Multiply Word High Signed, Fractional EVX 10000467 SP evmwhssfa Vector Multiply Word High Signed, Fractional to Accumula- tor EVX 10000547 SP evmwhssfaaw Vector Multiply Word High Signed, Fractional and Accumu- late into Words EVX 100005C7 SP evmwhssfanw Vector Multiply Word High Signed, Fractional and Accumu- late Negative into Words EVX 100005C5 SP evmwhssianw Vector Multiply Word High Signed, Integer and Accumulate Negative into Words EVX 1000044C SP evmwhumi Vector Multiply Word High Unsigned, Modulo, Integer EVX 1000046C SP evmwhumia Vector Multiply Word High Unsigned, Modulo, Integer to Accumulator EVX 1000054C SP evmwhumiaaw Vector Multiply Word High Unsigned, Modulo, Integer and Accumulate into Words EVX 100005CC SP evmwhumianw Vector Multiply Word High Unsigned, Modulo, Integer and Accumulate Negative into Words EVX 10000544 SP evmwhusiaaw Vector Multiply Word High Unsigned, Integer and Accumu- late into Words EVX 100005C4 SP evmwhusianw Vector Multiply Word High Unsigned, Integer and Accumu- late Negative into Words EVX 10000549 SP evmwlsmiaaw Vector Multiply Word Low Signed, Modulo, Integer and Accumulate into Words EVX 100005C9 SP evmwlsmianw Vector Multiply Word Low Signed, Modulo, Integer and Accumulate Negative into Words Appendix A. VLE Instruction Set Sorted by Mnemonic 819 Version 2.05 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 10000541 SP evmwlssiaaw Vector Multiply Word Low Signed, Saturate, Integer and Accumulate into Words EVX 100005C1 SP evmwlssianw Vector Multiply Word Low Signed, Saturate, Integer and Accumulate Negative into Words EVX 10000448 SP evmwlumi Vector Multiply Word Low Unsigned, Modulo, Integer EVX 10000468 SP evmwlumia Vector Multiply Word Low Unsigned, Modulo, Integer to Accumulator EVX 10000548 SP evmwlumiaaw Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate into Words EVX 100005C8 SP evmwlumianw Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate Negative into Words EVX 10000540 SP evmwlusiaaw Vector Multiply Word Low Unsigned Saturate, Integer and Accumulate into Words EVX 100005C0 SP evmwlusianw Vector Multiply Word Low Unsigned Saturate, Integer and Accumulate Negative into Words EVX 1000045B SP evmwsmf Vector Multiply Word Signed, Modulo, Fractional EVX 1000047B SP evmwsmfa Vector Multiply Word Signed, Modulo, Fractional to Accu- mulator EVX 1000055B SP evmwsmfaa Vector Multiply Word Signed, Modulo, Fractional and Accu- mulate EVX 100005DB SP evmwsmfan Vector Multiply Word Signed, Modulo, Fractional and Accu- mulate Negative EVX 10000459 SP evmwsmi Vector Multiply Word Signed, Modulo, Integer EVX 10000479 SP evmwsmia Vector Multiply Word Signed, Modulo, Integer to Accumula- tor EVX 10000559 SP evmwsmiaa Vector Multiply Word Signed, Modulo, Integer and Accumu- late EVX 100005D9 SP evmwsmian Vector Multiply Word Signed, Modulo, Integer and Accumu- late Negative EVX 10000453 SP evmwssf Vector Multiply Word Signed, Saturate, Fractional EVX 10000473 SP evmwssfa Vector Multiply Word Signed, Saturate, Fractional to Accu- mulator EVX 10000553 SP evmwssfaa Vector Multiply Word Signed, Saturate, Fractional and Accu- mulate EVX 100005D3 SP evmwssfan Vector Multiply Word Signed, Saturate, Fractional and Accu- mulate Negative EVX 10000458 SP evmwumi Vector Multiply Word Unsigned, Modulo, Integer EVX 10000478 SP evmwumia Vector Multiply Word Unsigned, Modulo, Integer to Accumu- lator EVX 10000558 SP evmwumiaa Vector Multiply Word Unsigned, Modulo, Integer and Accu- mulate EVX 100005D8 SP evmwumian Vector Multiply Word Unsigned, Modulo, Integer and Accu- mulate Negative EVX 1000021E SP evnand Vector NAND EVX 10000209 SP evneg Vector Negate EVX 10000218 SP evnor Vector NOR EVX 10000217 SP evor Vector OR EVX 1000021B SP evorc Vector OR with Complement EVX 10000228 SP evrlw Vector Rotate Left Word EVX 1000022A SP evrlwi Vector Rotate Left Word Immediate EVX 1000020C SP evrndw Vector Round Word EVSE 10000278 SP evsel Vector Select L EVX 10000224 SP evslw Vector Shift Left Word EVX 10000226 SP evslwi Vector Shift Left Word Immediate EVX 1000022B SP evsplatfi Vector Splat Fractional Immediate EVX 10000229 SP evsplati Vector Splat Immediate EVX 10000223 SP evsrwis Vector Shift Right Word Immediate Signed EVX 10000222 SP evsrwiu Vector Shift Right Word Immediate Unsigned EVX 10000221 SP evsrws Vector Shift Right Word Signed 820 Power ISATM VLE Version 2.05 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 10000220 SP evsrwu Vector Shift Right Word Unsigned EVX 10000321 SP evstdd Vector Store Doubleword of Doubleword EVX 7C00019D P E.PD evstddepx Vector Store Doubleword into Doubleword by External Pro- cess ID Indexed EVX 10000320 SP evstddx Vector Store Doubleword of Doubleword Indexed EVX 10000325 SP evstdh Vector Store Doubleword of Four Halfwords EVX 10000324 SP evstdhx Vector Store Doubleword of Four Halfwords Indexed EVX 10000323 SP evstdw Vector Store Doubleword of Two Words EVX 10000322 SP evstdwx Vector Store Doubleword of Two Words Indexed EVX 10000331 SP evstwhe Vector Store Word of Two Halfwords from Even EVX 10000330 SP evstwhex Vector Store Word of Two Halfwords from Even Indexed EVX 10000335 SP evstwho Vector Store Word of Two Halfwords from Odd EVX 10000334 SP evstwhox Vector Store Word of Two Halfwords from Odd Indexed EVX 10000339 SP evstwwe Vector Store Word of Word from Even EVX 10000338 SP evstwwex Vector Store Word of Word from Even Indexed EVX 1000033D SP evstwwo Vector Store Word of Word from Odd EVX 1000033C SP evstwwox Vector Store Word of Word from Odd Indexed EVX 100004CB SP evsubfsmiaaw Vector Subtract Signed, Modulo, Integer to Accumulator Word EVX 100004C3 SP evsubfssiaaw Vector Subtract Signed, Saturate, Integer to Accumulator Word EVX 100004CA SP evsubfumiaaw Vector Subtract Unsigned, Modulo, Integer to Accumulator Word EVX 100004C2 SP evsubfusiaaw Vector Subtract Unsigned, Saturate, Integer to Accumulator Word EVX 10000204 SP evsubfw Vector Subtract from Word EVX 10000206 SP evsubifw Vector Subtract Immediate from Word EVX 10000216 SP evxor Vector XOR X 7C000774 SR B extsb[.] Extend Shign Byte X 7C000734 SR B extsh[.] Extend Sign Halfword X 7C0007B4 SR 64 extsw[.] Extend Sign Word X 7C0007AC B icbi Instruction Cache Block Invalidate X 7C0007BE P E.PD icbiep Instruction Cache Block Invalidate by External Process ID X 7C0001CC M ECL icblc Instruction Cache Block Lock Clear X 7C00002C E icbt Instruction Cache Block Touch X 7C0003CC M ECL icbtls Instruction Cache Block Touch and Lock Set X 7C00078C P E.CI ici Instruction Cache Invalidate X 7C0007CC P E.CD icread Instruction Cache Read A 7C00001E B.in isel Integer Select X 7C0000BE P E.PD lbepx Load Byte by External Process ID Indexed X 7C0000EE B lbzux Load Byte and Zero with Update Indexed X 7C0000AE B lbzx Load Byte and Zero Indexed X 7C0000A8 64 ldarx Load Doubleword and Reserve Indexed X 7C00003A P E.PD ldepx Load Doubleword by External Process ID Indexed X 7C00006A 64 ldux Load Doubleword with Update Indexed X 7C00002A 64 ldx Load Doubleword Indexed X 7C0004BE P E.PD lfdepx Load Floating-Point Double by External Process ID Indexed X 7C0002EE B lhaux Load Halfword Algebraic with Update Indexed X 7C0002AE B lhax Load Halfword Algebraic Indexed X 7C00062C B lhbrx Load Halfword Byte-Reversed Indexed X 7C00023E P E.PD lhepx Load Halfword by External Process ID Indexed X 7C00026E B lhzux Load Halfword and Zero with Update Indexed X 7C00022E B lhzx Load Halfword and Zero Indexed X 7C0004AA MA lswi Load String Word Immediate X 7C00042A MA lswx Load String Word Indexed X 7C00000E V lvebx Load Vector Element Byte Indexed X 7C00004E V lvehx Load Vector Element Halfword Indexed X 7C00024E P E.PD lvepx Load Vector by External Process ID Indexed X 7C00020E P E.PD lvepxl Load Vector by External Process ID Indexed LRU X 7C00008E V lvewx Load Vector Element Word Indexed X 7C00000C V lvsl Load Vector for Shift Left Indexed Appendix A. VLE Instruction Set Sorted by Mnemonic 821 Version 2.05 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 X 7C00004C V lvsr Load Vector for Shift Right Indexed X 7C0000CE V lvx[l] Load Vector Indexed [Last] X 7C000028 B lwarx Load Word and Reserve Indexed X 7C0002EA 64 lwaux Load Word Algebraic with Update Indexed X 7C0002AA 64 lwax Load Word Algebraic Indexed X 7C00042C B lwbrx Load Word Byte-Reversed Indexed X 7C00003E P E.PD lwepx Load Word by External Process ID Indexed X 7C00006E B lwzux Load Word and Zero with Update Indexed X 7C00002E B lwzx Load Word and Zero Indexed X 10000158 SR LIM macchw[o][.] Multiply Accumulate Cross Halfword to Word Modulo Signed X 100001D8 SR LIM macchws[o][.] Multiply Accumulate Cross Halfword to Word Saturate Signed X 10000198 SR LIM macchwsu[o][.] Multiply Accumulate Cross Halfword to Word Saturate Unsigned X 10000118 SR LIM macchwu[o][.] Multiply Accumulate Cross Halfword to Word Modulo Unsigned X 10000058 SR LIM machhw[o][.] Multiply Accumulate High Halfword to Word Modulo Signed X 100000D8 SR LIM machhws[o][.] Multiply Accumulate High Halfword to Word Saturate Signed X 10000098 SR LIM machhwsu[o][.] Multiply Accumulate High Halfword to Word Saturate Unsigned X 10000018 SR LIM machhwu[o][.] Multiply Accumulate High Halfword to Word Modulo Unsigned X 10000358 SR LIM maclhw[o][.] Multiply Accumulate Low Halfword to Word Modulo Signed X 100003D8 SR LIM maclhws[o][.] Multiply Accumulate Low Halfword to Word Saturate Signed X 10000398 SR LIM maclhwsu[o][.] Multiply Accumulate Low Halfword to Word Saturate Unsigned X 10000318 SR LIM maclhwu[o][.] Multiply Accumulate Low Halfword to Word Modulo Unsigned XFX 7C0006AC E mbar Memory Barrier X 7C000400 B mcrxr Move To Condition Register From XER XFX 7C000026 B mfcr Move From Condition Register XFX 7C000286 P E mfdcr Move From Device Control Register XFX 7C000246 P E mfdcrux Move From Device Control Register User-mode Indexed XFX 7C000206 P E mfdcrx Move From Device Control Register Indexed X 7C0000A6 P B mfmsr Move From Machine State Register XFX 7C100026 B mfocrf Move From One Condition Register Field XFX 7C00029C O E.PM mfpmr Move From Performance Monitor Register XFX 7C0002A6 O B mfspr Move From Special Purpose Register VX 10000604 V mfvscr Move from Vector Status and Control Register X 7C0001DC P E.PC msgclr Message Clear X 7C00019C P E.PC msgsnd Message Send XFX 7C000120 B mtcrf Move To Condition Register Fields XFX 7C000386 P E mtdcr Move To Device Control Register X 7C000346 E mtdcrux Move To Device Control Register User-mode Indexed X 7C000306 P E mtdcrx Move To Device Control Register Indexed X 7C000124 P E mtmsr Move To Machine State Register XFX 7C100120 B mtocrf Move To One Condition Register Field XFX 7C00039C O E.PM mtpmr Move To Performance Monitor Register XFX 7C0003A6 O B mtspr Move To Special Purpose Register VX 10000644 V mtvscr Move to Vector Status and Control Register X 10000150 SR LIM mulchw[o][.] Multiply Cross Halfword to Word Signed X 10000110 SR LIM mulchwu[o][.] Multiply Cross Halfword to Word Unsigned XO 7C000092 SR 64 mulhd[.] Multiply High Doubleword XO 7C000012 SR 64 mulhdu[.] Multiply High Doubleword Unsigned X 10000050 SR LIM mulhhw[o][.] Multiply High Halfword to Word Signed X 10000010 SR LIM mulhhwu[o][.] Multiply High Halfword to Word Unsigned XO 7C000096 SR B mulhw[.] Multiply High Word XO 7C000016 SR B mulhwu[.] Multiply High Word Unsigned XO 7C0001D2 SR 64 mulld[o][.] Multiply Low Doubleword 822 Power ISATM VLE Version 2.05 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 XO 7C0001D6 SR B mullw[o][.] Multiply Low Word X 7C0003B8 SR B nand[.] NAND X 7C0000D0 SR B neg[o][.] Negate X 1000015C SR LIM nmacchw[o][.] Negative Multiply Accumulate Cross Halfword to Word Mod- ulo Signed X 100001DC SR LIM nmacchws[o][.] Negative Multiply Accumulate Cross Halfword to Word Sat- urate Signed X 1000005C SR LIM nmachhw[o][.] Negative Multiply Accumulate High Halfword to Word Mod- ulo Signed X 100000DC SR LIM nmachhws[o][.] Negative Multiply Accumulate High Halfword to Word Satu- rate Signed X 1000035C SR LIM nmaclhw[o][.] Negative Multiply Accumulate Low Halfword to Word Mod- ulo Signed X 100003DC SR LIM nmaclhws[o][.] Negative Multiply Accumulate Low Halfword to Word Satu- rate Signed X 7C0000F8 SR B nor[.] NOR X 7C000378 SR B or[.] OR X 7C000338 SR B orc[.] OR with Complement X 7C0000F4 B popcntb Population Count Bytes RR 0400---- VLE se_add Add Short Form OIM5 2000---- VLE se_addi Add Immediate Short Form RR 4600---- SR VLE se_and[.] AND Short Form RR 4500---- VLE se_andc AND with Complement Short Form IM5 2E00---- VLE se_andi AND Immediate Short Form BD8 E800---- VLE se_b[l] Branch [and Link] BD8 E000---- VLE se_bc Branch Conditional Short Form IM5 6000---- VLE se_bclri Bit Clear Immediate C 0006---- VLE se_bctr Branch To Count Register [and Link] IM5 6200---- VLE se_bgeni Bit Generate Immediate C 0004---- VLE se_blr Branch To Link Register [and Link] IM5 2C00---- VLE se_bmaski Bit Mask Generate Immediate IM5 6400---- VLE se_bseti Bit Set Immediate IM5 6600---- VLE se_btsti Bit Test Immediate RR 0C00---- VLE se_cmp Compare Word RR 0E00---- VLE se_cmph Compare Halfword Short Form RR 0F00---- VLE se_cmphl Compare Halfword Logical Short Form IM5 2A00---- VLE se_cmpi Compare Immediate Word Short Form RR 0D00---- VLE se_cmpl Compare Logical Word OIM5 2200---- VLE se_cmpli Compare Logical Immendiate Word R 00D0---- VLE se_extsb Extend Sign Byte Short Form R 00F0---- VLE se_extsh Extend Sign Halfword Short Form R 00C0---- VLE se_extzb Extend Zero Byte R 00E0---- VLE se_extzh Extend Zero Halfword C 0000---- VLE se_illegal Illegal C 0001---- VLE se_isync Instruction Synchronize SD4 8000---- VLE se_lbz Load Byte and Zero Short Form SD4 A000---- VLE se_lhz Load Halfword and Zero Short Form IM7 4800---- VLE se_li Load Immediate Short Form SD4 C000---- VLE se_lwz Load Word and Zero Short Form RR 0300---- VLE se_mfar Move from Alternate Register R 00A0---- VLE se_mfctr Move From Count Register R 0080---- VLE se_mflr Move From Link Register RR 0100---- VLE se_mr Move Register RR 0200---- VLE se_mtar Move To Alternate Register R 00B0---- VLE se_mtctr Move To Count Register R 0090---- VLE se_mtlr Move To Link Register RR 0500---- VLE se_mullw Multiply Low Word Short Form R 0030---- VLE se_neg Negate Short Form R 0020---- VLE se_not NOT Short Form RR 4400---- VLE se_or OR SHort Form C 0009---- P VLE se_rfci Return From Critical Interrupt Appendix A. VLE Instruction Set Sorted by Mnemonic 823 Version 2.05 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 C 000A---- P VLE se_rfdi Return From Debug Interrupt C 0008---- P VLE se_rfi Return from Interrupt C 000B---- P VLE se_rfmci Return From Machine Check Interrupt C 0002---- VLE se_sc System Call RR 4200---- VLE se_slw Shift Left Word IM5 6C00---- VLE se_slwi Shift Left Word Immediate Short Form RR 4100---- SR VLE se_sraw Shift Right Algebraic Word IM5 6A00---- SR VLE se_srawi Shift Right Algebraic Immediate RR 4000---- VLE se_srw Shift Right Word IM5 6800---- VLE se_srwi Shift Right Word Immediate Short Form SD4 9000---- VLE se_stb Store Byte Short Form SD4 B000---- VLE se_sth Store Halfword SHort Form SD4 D000---- VLE se_stw Store Word Short Form RR 0600---- VLE se_sub Subtract RR 0700---- VLE se_subf Subtract From Short Form OIM5 2400---- SR VLE se_subi[.] Subtract Immediate X 7C000036 SR 64 sld[.] Shift Left Doubleword X 7C000030 SR B slw[.] Shift Left Word X 7C000634 SR 64 srad[.] Shift Right Algebraic Doubleword X 7C000674 SR 64 sradi[.] Shift Right Algebraic Doubleword Immediate X 7C000630 SR B sraw[.] Shift Right Algebraic Word X 7C000670 SR B srawi[.] Shift Right Algebraic Word Immediate X 7C000436 SR 64 srd[.] Shift Right Doubleword X 7C000430 SR B srw[.] Shift Right Word X 7C0001BE P E.PD stbepx Store Byte by External Process ID Indexed X 7C0001EE B stbux Store Byte with Update Indexed X 7C0001AE B stbx Store Bye Indexed X 7C0001AD 64 stdcx. Store Doubleword Conditional Indexed X 7C00013A P E.PD stdepx Store Doubleword by External Process ID Indexed X 7C00016A 64 stdux Store Doubleword with Update Indexed X 7C00012A 64 stdx Store Doubleword Indexed X 7C0005BE P E.PD stfdepx Store Floating-Point Double by External Process ID Indexed X 7C00072C B sthbrx Store Halfword Byte-Reversed Indexed X 7C00033E P E.PD sthepx Store Halfword by External Process ID Indexed X 7C00036E B sthux Store Halfword with Update Indexed X 7C00032E B sthx Store Halfword Indexed X 7C0005AA MA stswi Store String Word Immediate X 7C00052A MA stswx Store String Word Indexed VX 7C00010E V stvebx Store Vector Element Byte Indexed VX 7C00014E V stvehx Store Vector Element Halfword Indexed X 7C00064E P E.PD stvepx Store Vector by External Process ID Indexed X 7C00060E P E.PD stvepxl Store Vector by External Process ID Indexed LRU VX 7C00018E V stvewx Store Vector Element Word Indexed VX 7C0001CE V stvx[l] Store Vector Indexed [Last] X 7C00052C B stwbrx Store Word Byte-Reversed Indexed X 7C00012D B stwcx. Store Word Conditional Indexed X 7C00013E P E.PD stwepx Store Word by External Process ID Indexed X 7C00016E B stwux Store Word with Update Indexed X 7C00012E B stwx Store Word Indexed XO 7C000050 SR B subf[o][.] Subtract From XO 7C000010 SR B subfc[o][.] Subtract From Carrying XO 7C000110 SR B subfe[o][.] Subtract From Extended XO 7C0001D0 SR B subfme[o][.] Subtract From Minus One Extended XO 7C000190 SR B subfze[o][.] Subtract From Zero Extended X 7C0004AC B sync Synchronize X 7C000088 64 td Trap Doubleword X 7C000624 P E tlbivax TLB Invalidate Virtual Address Indexed X 7C000764 P E tlbre TLB Read Entry X 7C000724 P E tlbsx TLB Search Indexed X 7C00046C P E tlbsync TLB Synchronize X 7C0007A4 P E tlbwe TLB Write Entry 824 Power ISATM VLE Version 2.05 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 X 7C000008 B tw Trap Word VX 10000180 V vaddcuw Vector Add Carryout Unsigned Word VX 1000000A V vaddfp Vector Add Floating-Point VX 10000300 V vaddsbs Vector Add Signed Byte Saturate VX 10000340 V vaddshs Vector Add Signed Halfword Saturate VX 10000380 V vaddsws Vector Add Signed Word Saturate VX 10000000 V vaddubm Vector Add Unsigned Byte Modulo VX 10000200 V vaddubs Vector Add Unsigned Byte Saturate VX 10000040 V vadduhm Vector Add Unsigned Halfword Modulo VX 10000240 V vadduhs Vector Add Unsigned Halfword Saturate VX 10000080 V vadduwm Vector Add Unsigned Word Modulo VX 10000280 V vadduws Vector Add Unsigned Word Saturate VX 10000404 V vand Vector AND VX 10000444 V vandc Vector AND with Complement VX 10000502 V vavgsb Vector Average Signed Byte VX 10000542 V vavgsh Vector Average Signed Halfword VX 10000582 V vavgsw Vector Average Signed Word VX 10000402 V vavgub Vector Average Unsigned Byte VX 10000442 V vavguh Vector Average Unsigned Halfword VX 10000482 V vavguw Vector Average Unsigned Word VX 100003CA V vcfpsxws Vector Convert from Single-Precision to Signed Fixed-Point Word Saturate VX 1000038A V vcfpuxws Vector Convert from Single-Precision to Unsigned Fixed- Point Word Saturate VX 100003C6 V vcmpbfp[.] Vector Compare Bounds Single-Precision VC 100000C6 V vcmpeqfp[.] Vector Compare Equal To Single-Precision VC 10000006 V vcmpequb[.] Vector Compare Equal To Unsigned Byte VC 10000046 V vcmpequh[.] Vector Compare Equal To Unsigned Halfword VC 10000086 V vcmpequw[.] Vector Compare Equal To Unsigned Word VC 100001C6 V vcmpgefp[.] Vector Compare Greater Than or Equal To Single-Precision VC 100002C6 V vcmpgtfp[.] Vector Compare Greater Than Single-Precision VC 10000306 V vcmpgtsb[.] Vector Compare Greater Than Signed Byte VC 10000346 V vcmpgtsh[.] Vector Compare Greater Than Signed Halfword VC 10000386 V vcmpgtsw[.] Vector Compare Greater Than Signed Word VC 10000206 V vcmpgtub[.] Vector Compare Greater Than Unsigned Byte VC 10000246 V vcmpgtuh[.] Vector Compare Greater Than Unsigned Halfword VC 10000286 V vcmpgtuw[.] Vector Compare Greater Than Unsigned Word VX 1000034A V vcsxwfp Vector Convert from Signed Fixed-Point Word to Single- Precision VX 1000030A V vcuxwfp Vector Convert from Unsigned Fixed-Point Word to Single- Precision VX 1000018A V vexptefp Vector 2 Raised to the Exponent Estimate Floating-Point VX 100001CA V vlogefp Vector Log Base 2 Estimate Floating-Point VA 1000002E V vmaddfp Vector Multiply-Add Single-Precision VX 1000040A V vmaxfp Vector Maximum Single-Precision VX 10000102 V vmaxsb Vector Maximum Signed Byte VX 10000142 V vmaxsh Vector Maximum Signed Halfword VX 10000182 V vmaxsw Vector Maximum Signed Word VX 10000002 V vmaxub Vector Maximum Unsigned Byte VX 10000042 V vmaxuh Vector Maximum Unsigned Halfword VX 10000082 V vmaxuw Vector Maximum Unsigned Word VA 10000020 V vmhaddshs Vector Multiply-High-Add Signed Halfword Saturate VA 10000021 V vmhraddshs Vector Multiply-High-Round-Add Signed Halfword Saturate VX 1000044A V vminfp Vector Minimum Single-Precision VX 10000302 V vminsb Vector Minimum Signed Byte VX 10000342 V vminsh Vector Minimum Signed Halfword VX 10000382 V vminsw Vector Minimum Signed Word VX 10000202 V vminub Vector Minimum Unsigned Byte VX 10000242 V vminuh Vector Minimum Unsigned Halfword VX 10000282 V vminuw Vector Minimum Unsigned Word VA 10000022 V vmladduhm Vector Multiply-Low-Add Unsigned Halfword Modulo Appendix A. VLE Instruction Set Sorted by Mnemonic 825 Version 2.05 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 VX 1000000C V vmrghb Vector Merge High Byte VX 1000004C V vmrghh Vector Merge High Halfword VX 1000008C V vmrghw Vector Merge High Word VX 1000010C V vmrglb Vector Merge Low Byte VX 1000014C V vmrglh Vector Merge Low Halfword VX 1000018C V vmrglw Vector Merge Low Word VA 10000025 V vmsummbm Vector Multiply-Sum Mixed Byte Modulo VA 10000028 V vmsumshm Vector Multiply-Sum Signed Halfword Modulo VA 10000029 V vmsumshs Vector Multiply-Sum Signed Halfword Saturate VA 10000024 V vmsumubm Vector Multiply-Sum Unsigned Byte Modulo VA 10000026 V vmsumuhm Vector Multiply-Sum Unsigned Halfword Modulo VA 10000027 V vmsumuhs Vector Multiply-Sum Unsigned Halfword Saturate VX 10000308 V vmulesb Vector Multiply Even Signed Byte VX 10000348 V vmulesh Vector Multiply Even Signed Halfword VX 10000208 V vmuleub Vector Multiply Even Unsigned Byte VX 10000248 V vmuleuh Vector Multiply Even Unsigned Halfword VX 10000108 V vmulosb Vector Multiply Odd Signed Byte VX 10000148 V vmulosh Vector Multiply Odd Signed Halfword VX 10000008 V vmuloub Vector Multiply Odd Unsigned Byte VX 10000048 V vmulouh Vector Multiply Odd Unsigned Halfword VA 1000002F V vnmsubfp Vector Negative Multiply-Subtract Single-Precision VX 10000504 V vnor Vector NOR VX 10000484 V vor Vector OR VA 1000002B V vperm Vector Permute VX 1000030E V vpkpx Vector Pack Pixel VX 1000018E V vpkshss Vector Pack Signed Halfword Signed Saturate VX 1000010E V vpkshus Vector Pack Signed Halfword Unsigned Saturate VX 100001CE V vpkswss Vector Pack Signed Word Signed Saturate VX 1000014E V vpkswus Vector Pack Signed Word Unsigned Saturate VX 1000000E V vpkuhum Vector Pack Unsigned Halfword Unsigned Modulo VX 1000008E V vpkuhus Vector Pack Unsigned Halfword Unsigned Saturate VX 1000004E V vpkuwum Vector Pack Unsigned Word Unsigned Modulo VX 100000CE V vpkuwus Vector Pack Unsigned Word Unsigned Saturate VX 1000010A V vrefp Vector Reciprocal Estimate Single-Precision VX 100002CA V vrfim Vector Round to Single-Precision Integer toward -Infinity VX 1000020A V vrfin Vector Round to Single-Precision Integer Nearest VX 1000028A V vrfip Vector Round to Single-Precision Integer toward +Infinity VX 1000024A V vrfiz Vector Round to Single-Precision Integer toward Zero VX 10000004 V vrlb Vector Rotate Left Byte VX 10000044 V vrlh Vector Rotate Left Halfword VX 10000084 V vrlw Vector Rotate Left Word VX 1000014A V vrsqrtefp Vector Reciprocal Square Root Estimate Single-Precision VA 1000002A V vsel Vector Select VX 100001C4 V vsl Vector Shift Left VX 10000104 V vslb Vector Shift Left Byte VA 1000002C V vsldoi Vector Shift Left Double by Octet Immediate VX 10000144 V vslh Vector Shift Left Halfword VX 1000040C V vslo Vector Shift Left by Octet VX 10000184 V vslw Vector Shift Left Word VX 1000020C V vspltb Vector Splat Byte VX 1000024C V vsplth Vector Splat Halfword VX 1000030C V vspltisb Vector Splat Immediate Signed Byte VX 1000034C V vspltish Vector Splat Immediate Signed Halfword VX 1000038C V vspltisw Vector Splat Immediate Signed Word VX 1000028C V vspltw Vector Splat Word VX 100002C4 V vsr Vector Shift Right VX 10000304 V vsrab Vector Shift Right Algebraic Word VX 10000344 V vsrah Vector Shift Right Algebraic Word VX 10000384 V vsraw Vector Shift Right Algebraic Word VX 10000204 V vsrb Vector Shift Right Byte VX 10000244 V vsrh Vector Shift Right Halfword 826 Power ISATM VLE Version 2.05 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 VX 1000044C V vsro Vector Shift Right by Octet VX 10000284 V vsrw Vector Shift Right Word VX 10000580 V vsubcuw Vector Subtract and Write Carry-Out Unsigned Word VX 1000004A V vsubfp Vector Subtract Single-Precision VX 10000700 V vsubsbs Vector Subtract Signed Byte Saturate VX 10000740 V vsubshs Vector Subtract Signed Halfword Saturate VX 10000780 V vsubsws Vector Subtract Signed Word Saturate VX 10000400 V vsububm Vector Subtract Unsigned Byte Modulo VX 10000600 V vsububs Vector Subtract Unsigned Byte Saturate VX 10000440 V vsubuhm Vector Subtract Unsigned Byte Modulo VX 10000640 V vsubuhs Vector Subtract Unsigned Halfword Saturate VX 10000480 V vsubuwm Vector Subtract Unsigned Word Modulo VX 10000680 V vsubuws Vector Subtract Unsigned Word Saturate VX 10000688 V vsum2sws Vector Sum across Half Signed Word Saturate VX 10000708 V vsum4sbs Vector Sum across Quarter Signed Byte Saturate VX 10000648 V vsum4shs Vector Sum across Quarter Signed Halfword Saturate VX 10000608 V vsum4ubs Vector Sum across Quarter Unsigned Byte Saturate VX 10000788 V vsumsws Vector Sum across Signed Word Saturate VX 1000034E V vupkhpx Vector Unpack High Pixel VX 1000020E V vupkhsb Vector Unpack High Signed Byte VX 1000024E V vupkhsh Vector Unpack High Signed Halfword VX 100003CE V vupklpx Vector Unpack Low Pixel VX 1000028E V vupklsb Vector Unpack Low Signed Byte VX 100002CE V vupklsh Vector Unpack Low Signed Halfword VX 100004C4 V vxor Vector XOR X 7C00007C WT wait Wait X 7C000106 P E wrtee Write MSR External Enable X 7C000146 P E wrteei Write MSR External Enable Immediate D 7C000278 SR B xor[.] XOR 1 See the key to the mode dependency and privilege columns on page 905 and the key to the category column in Section 1.3.5 of Book I. 2 For 16-bit instructions, the "Opcode" column represents the 16-bit hexadecimal instruction encoding with the opcode and extended opcode in the corresponding fields in the instruction, and with 0's in bit positions which are not opcode bits; dashes are used following the opcode to indicate the form is a 16-bit instruction. For 32-bit instructions, the "Opcode" column represents the 32-bit hexadecimal instruction encoding with the opcode and extended opcode in the corresponding fields in the instruction, and with 0's in bit positions which are not opcode bits. Appendix A. VLE Instruction Set Sorted by Mnemonic 827 Version 2.05 828 Power ISATM VLE Version 2.05 Appendix B. VLE Instruction Set Sorted by Opcode This appendix lists all the instructions available in VLE mode in the Power ISA , in order by opcode. Opcodes that are not defined below are treated as illegal by category VLE. Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 C 0000---- VLE se_illegal Illegal C 0001---- VLE se_isync Instruction Synchronize C 0002---- VLE se_sc System Call C 0004---- VLE se_blr Branch To Link Register [and Link] C 0006---- VLE se_bctr Branch To Count Register [and Link] C 0008---- P VLE se_rfi Return from Interrupt C 0009---- P VLE se_rfci Return From Critical Interrupt C 000A---- P VLE se_rfdi Return From Debug Interrupt C 000B---- P VLE se_rfmci Return From Machine Check Interrupt R 0020---- VLE se_not NOT Short Form R 0030---- VLE se_neg Negate Short Form R 0080---- VLE se_mflr Move From Link Register R 0090---- VLE se_mtlr Move To Link Register R 00A0---- VLE se_mfctr Move From Count Register R 00B0---- VLE se_mtctr Move To Count Register R 00C0---- VLE se_extzb Extend Zero Byte R 00D0---- VLE se_extsb Extend Sign Byte Short Form R 00E0---- VLE se_extzh Extend Zero Halfword R 00F0---- VLE se_extsh Extend Sign Halfword Short Form RR 0100---- VLE se_mr Move Register RR 0200---- VLE se_mtar Move To Alternate Register RR 0300---- VLE se_mfar Move from Alternate Register RR 0400---- VLE se_add Add Short Form RR 0500---- VLE se_mullw Multiply Low Word Short Form RR 0600---- VLE se_sub Subtract RR 0700---- VLE se_subf Subtract From Short Form RR 0C00---- VLE se_cmp Compare Word RR 0D00---- VLE se_cmpl Compare Logical Word RR 0E00---- VLE se_cmph Compare Halfword Short Form RR 0F00---- VLE se_cmphl Compare Halfword Logical Short Form VX 10000000 V vaddubm Vector Add Unsigned Byte Modulo VX 10000002 V vmaxub Vector Maximum Unsigned Byte VX 10000004 V vrlb Vector Rotate Left Byte VC 10000006 V vcmpequb[.] Vector Compare Equal To Unsigned Byte VX 10000008 V vmuloub Vector Multiply Odd Unsigned Byte VX 1000000A V vaddfp Vector Add Floating-Point VX 1000000C V vmrghb Vector Merge High Byte VX 1000000E V vpkuhum Vector Pack Unsigned Halfword Unsigned Modulo X 10000010 SR LIM mulhhwu[o][.] Multiply High Halfword to Word Unsigned X 10000018 SR LIM machhwu[o][.] Multiply Accumulate High Halfword to Word Modulo Unsigned VA 10000020 V vmhaddshs Vector Multiply-High-Add Signed Halfword Saturate VA 10000021 V vmhraddshs Vector Multiply-High-Round-Add Signed Halfword Saturate VA 10000022 V vmladduhm Vector Multiply-Low-Add Unsigned Halfword Modulo VA 10000024 V vmsumubm Vector Multiply-Sum Unsigned Byte Modulo Appendix B. VLE Instruction Set Sorted by Opcode 829 Version 2.05 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 VA 10000025 V vmsummbm Vector Multiply-Sum Mixed Byte Modulo VA 10000026 V vmsumuhm Vector Multiply-Sum Unsigned Halfword Modulo VA 10000027 V vmsumuhs Vector Multiply-Sum Unsigned Halfword Saturate VA 10000028 V vmsumshm Vector Multiply-Sum Signed Halfword Modulo VA 10000029 V vmsumshs Vector Multiply-Sum Signed Halfword Saturate VA 1000002A V vsel Vector Select VA 1000002B V vperm Vector Permute VA 1000002C V vsldoi Vector Shift Left Double by Octet Immediate VA 1000002E V vmaddfp Vector Multiply-Add Single-Precision VA 1000002F V vnmsubfp Vector Negative Multiply-Subtract Single-Precision VX 10000040 V vadduhm Vector Add Unsigned Halfword Modulo VX 10000042 V vmaxuh Vector Maximum Unsigned Halfword VX 10000044 V vrlh Vector Rotate Left Halfword VC 10000046 V vcmpequh[.] Vector Compare Equal To Unsigned Halfword VX 10000048 V vmulouh Vector Multiply Odd Unsigned Halfword VX 1000004A V vsubfp Vector Subtract Single-Precision VX 1000004C V vmrghh Vector Merge High Halfword VX 1000004E V vpkuwum Vector Pack Unsigned Word Unsigned Modulo X 10000050 SR LIM mulhhw[o][.] Multiply High Halfword to Word Signed X 10000058 SR LIM machhw[o][.] Multiply Accumulate High Halfword to Word Modulo Signed X 1000005C SR LIM nmachhw[o][.] Negative Multiply Accumulate High Halfword to Word Mod- ulo Signed VX 10000080 V vadduwm Vector Add Unsigned Word Modulo VX 10000082 V vmaxuw Vector Maximum Unsigned Word VX 10000084 V vrlw Vector Rotate Left Word VC 10000086 V vcmpequw[.] Vector Compare Equal To Unsigned Word VX 1000008C V vmrghw Vector Merge High Word VX 1000008E V vpkuhus Vector Pack Unsigned Halfword Unsigned Saturate X 10000098 SR LIM machhwsu[o][.] Multiply Accumulate High Halfword to Word Saturate Unsigned VC 100000C6 V vcmpeqfp[.] Vector Compare Equal To Single-Precision VX 100000CE V vpkuwus Vector Pack Unsigned Word Unsigned Saturate X 100000D8 SR LIM machhws[o][.] Multiply Accumulate High Halfword to Word Saturate Signed X 100000DC SR LIM nmachhws[o][.] Negative Multiply Accumulate High Halfword to Word Satu- rate Signed VX 10000102 V vmaxsb Vector Maximum Signed Byte VX 10000104 V vslb Vector Shift Left Byte VX 10000108 V vmulosb Vector Multiply Odd Signed Byte VX 1000010A V vrefp Vector Reciprocal Estimate Single-Precision VX 1000010C V vmrglb Vector Merge Low Byte VX 1000010E V vpkshus Vector Pack Signed Halfword Unsigned Saturate X 10000110 SR LIM mulchwu[o][.] Multiply Cross Halfword to Word Unsigned X 10000118 SR LIM macchwu[o][.] Multiply Accumulate Cross Halfword to Word Modulo Unsigned VX 10000142 V vmaxsh Vector Maximum Signed Halfword VX 10000144 V vslh Vector Shift Left Halfword VX 10000148 V vmulosh Vector Multiply Odd Signed Halfword VX 1000014A V vrsqrtefp Vector Reciprocal Square Root Estimate Single-Precision VX 1000014C V vmrglh Vector Merge Low Halfword VX 1000014E V vpkswus Vector Pack Signed Word Unsigned Saturate X 10000150 SR LIM mulchw[o][.] Multiply Cross Halfword to Word Signed X 10000158 SR LIM macchw[o][.] Multiply Accumulate Cross Halfword to Word Modulo Signed X 1000015C SR LIM nmacchw[o][.] Negative Multiply Accumulate Cross Halfword to Word Mod- ulo Signed VX 10000180 V vaddcuw Vector Add Carryout Unsigned Word VX 10000182 V vmaxsw Vector Maximum Signed Word VX 10000184 V vslw Vector Shift Left Word VX 1000018A V vexptefp Vector 2 Raised to the Exponent Estimate Floating-Point VX 1000018C V vmrglw Vector Merge Low Word 830 Power ISATM VLE Version 2.05 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 VX 1000018E V vpkshss Vector Pack Signed Halfword Signed Saturate X 10000198 SR LIM macchwsu[o][.] Multiply Accumulate Cross Halfword to Word Saturate Unsigned VX 100001C4 V vsl Vector Shift Left VC 100001C6 V vcmpgefp[.] Vector Compare Greater Than or Equal To Single-Precision VX 100001CA V vlogefp Vector Log Base 2 Estimate Floating-Point VX 100001CE V vpkswss Vector Pack Signed Word Signed Saturate X 100001D8 SR LIM macchws[o][.] Multiply Accumulate Cross Halfword to Word Saturate Signed X 100001DC SR LIM nmacchws[o][.] Negative Multiply Accumulate Cross Halfword to Word Sat- urate Signed EVX 10000200 SP evaddw Vector Add Word VX 10000200 V vaddubs Vector Add Unsigned Byte Saturate EVX 10000202 SP evaddiw Vector Add Immediate Word VX 10000202 V vminub Vector Minimum Unsigned Byte EVX 10000204 SP evsubfw Vector Subtract from Word VX 10000204 V vsrb Vector Shift Right Byte EVX 10000206 SP evsubifw Vector Subtract Immediate from Word VC 10000206 V vcmpgtub[.] Vector Compare Greater Than Unsigned Byte EVX 10000208 SP evabs Vector Absolute Value VX 10000208 V vmuleub Vector Multiply Even Unsigned Byte EVX 10000209 SP evneg Vector Negate EVX 1000020A SP evextsb Vector Extend Sign Byte VX 1000020A V vrfin Vector Round to Single-Precision Integer Nearest EVX 1000020B SP evextsh Vector Extend Sign Halfword EVX 1000020C SP evrndw Vector Round Word VX 1000020C V vspltb Vector Splat Byte EVX 1000020D SP evcntlzw Vector Count Leading Zeros Bits Word EVX 1000020E SP evcntlsw Vector Count Leading Sign Bits Word VX 1000020E V vupkhsb Vector Unpack High Signed Byte EVX 1000020F SP brinc Bit Reverse Increment EVX 10000211 SP evand Vector AND EVX 10000212 SP evandc Vector AND with Complement EVX 10000216 SP evxor Vector XOR EVX 10000217 SP evor Vector OR EVX 10000218 SP evnor Vector NOR EVX 10000219 SP eveqv Vector Equivalent EVX 1000021B SP evorc Vector OR with Complement EVX 1000021E SP evnand Vector NAND EVX 10000220 SP evsrwu Vector Shift Right Word Unsigned EVX 10000221 SP evsrws Vector Shift Right Word Signed EVX 10000222 SP evsrwiu Vector Shift Right Word Immediate Unsigned EVX 10000223 SP evsrwis Vector Shift Right Word Immediate Signed EVX 10000224 SP evslw Vector Shift Left Word EVX 10000226 SP evslwi Vector Shift Left Word Immediate EVX 10000228 SP evrlw Vector Rotate Left Word EVX 10000229 SP evsplati Vector Splat Immediate EVX 1000022A SP evrlwi Vector Rotate Left Word Immediate EVX 1000022B SP evsplatfi Vector Splat Fractional Immediate EVX 1000022C SP evmergehi Vector Merge High EVX 1000022D SP evmergelo Vector Merge Low EVX 1000022E SP evmergehilo Vector Merge High/Low EVX 1000022F SP evmergelohi Vector Merge Low/High EVX 10000230 SP evcmpgtu Vector Compare Greater Than Unsigned EVX 10000231 SP evcmpgts Vector Compare Greater Than Signed EVX 10000232 SP evcmpltu Vector Compare Less Than Unsigned EVX 10000233 SP evcmplts Vector Compare Less Than Signed EVX 10000234 SP evcmpeq Vector Compare Equal VX 10000240 V vadduhs Vector Add Unsigned Halfword Saturate VX 10000242 V vminuh Vector Minimum Unsigned Halfword VX 10000244 V vsrh Vector Shift Right Halfword Appendix B. VLE Instruction Set Sorted by Opcode 831 Version 2.05 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 VC 10000246 V vcmpgtuh[.] Vector Compare Greater Than Unsigned Halfword VX 10000248 V vmuleuh Vector Multiply Even Unsigned Halfword VX 1000024A V vrfiz Vector Round to Single-Precision Integer toward Zero VX 1000024C V vsplth Vector Splat Halfword VX 1000024E V vupkhsh Vector Unpack High Signed Halfword EVSE 10000278 SP evsel Vector Select L EVX 10000280 SP.FV evfsadd Vector Floating-Point Single-Precision Add VX 10000280 V vadduws Vector Add Unsigned Word Saturate EVX 10000281 SP.FV evfssub Vector Floating-Point Single-Precision Subtract VX 10000282 V vminuw Vector Minimum Unsigned Word EVX 10000284 SP.FV evfsabs Vector Floating-Point Single-Precision Absolute Value VX 10000284 V vsrw Vector Shift Right Word EVX 10000285 SP.FV evfsnabs Vector Floating-Point Single-Precision Negative Absolute Value EVX 10000286 SP.FV evfsneg Vector Floating-Point Single-Precision Negate VC 10000286 V vcmpgtuw[.] Vector Compare Greater Than Unsigned Word EVX 10000288 SP.FV evfsmul Vector Floating-Point Single-Precision Multiply EVX 10000289 SP.FV evfsdiv Vector Floating-Point Single-Precision Divide VX 1000028A V vrfip Vector Round to Single-Precision Integer toward +Infinity EVX 1000028C SP.FV evfscmpgt Vector Floating-Point Single-Precision Compare Greater Than VX 1000028C V vspltw Vector Splat Word EVX 1000028D SP.FV evfscmplt Vector Floating-Point Single-Precision Compare Less Than EVX 1000028E SP.FV evfscmpeq Vector Floating-Point Single-Precision Compare Equal VX 1000028E V vupklsb Vector Unpack Low Signed Byte EVX 10000290 SP.FV evfscfui Vector Convert Floating-Point Single-Precision from Unsigned Integer EVX 10000291 SP.FV evfscfsi Vector Convert Floating-Point Single-Precision from Signed Integer EVX 10000292 SP.FV evfscfuf Vector Convert Floating-Point Single-Precision from Unsigned Fraction EVX 10000293 SP.FV evfscfsf Vector Convert Floating-Point Single-Precision from Signed Fraction EVX 10000294 SP.FV evfsctui Vector Convert Floating-Point Single-Precision to Unsigned Integer EVX 10000295 SP.FV evfsctsi Vector Convert Floating-Point Single-Precision to Signed Integer EVX 10000296 SP.FV evfsctuf Vector Convert Floating-Point Single-Precision to Unsigned Fraction EVX 10000297 SP.FV evfsctsf Vector Convert Floating-Point Single-Precision to Signed Fraction EVX 10000298 SP.FV evfsctuiz Vector Convert Floating-Point Single-Precision to Unsigned Integer with Round Towards Zero EVX 1000029A SP.FV evfsctsiz Vector Convert Floating-Point Single-Precision to Signed Integer with Round Towards Zero EVX 1000029C SP.FV evfststgt Vector Floating-Point Single-Precision Test Greater Than EVX 1000029D SP.FV evfststlt Vector Floating-Point Single-Precision Test Less Than EVX 1000029E SP.FV evfststeq Vector Floating-Point Single-Precision Test Equal VX 100002C4 V vsr Vector Shift Right VC 100002C6 V vcmpgtfp[.] Vector Compare Greater Than Single-Precision VX 100002CA V vrfim Vector Round to Single-Precision Integer toward -Infinity VX 100002CE V vupklsh Vector Unpack Low Signed Halfword EVX 100002CF SP.FD efscfd Floating-Point Single-Precision Convert from Double-Preci- sion EVX 100002E0 SP.FD efdadd Floating-Point Double-Precision Add EVX 100002E0 SP.FS efsadd Floating-Point Single-Precision Add EVX 100002E1 SP.FD efdsub Floating-Point Double-Precision Subtract EVX 100002E1 SP.FS efssub Floating-Point Single-Precision Subtract 832 Power ISATM VLE Version 2.05 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 100002E2 SP.FD efdcfuid Convert Floating-Point Double-Precision from Unsigned Integer Doubleword EVX 100002E2 SP.FS efscfuid Convert Floating-Point Single-Precision from Unsigned Inte- ger Doubleword EVX 100002E3 SP.FD efdcfsid Convert Floating-Point Double-Precision from Signed Inte- ger Doubleword EVX 100002E3 SP.FS efscfsid Convert Floating-Point Single-Precision from Signed Integer Doubleword EVX 100002E4 SP.FD efdabs Floating-Point Double-Precision Absolute Value EVX 100002E4 SP.FS efsabs Floating-Point Single-Precision Absolute Value EVX 100002E5 SP.FD efdnabs Floating-Point Double-Precision Negative Absolute Value EVX 100002E5 SP.FS efsnabs Floating-Point Single-Precision Negative Absolute Value EVX 100002E6 SP.FD efdneg Floating-Point Double-Precision Negate EVX 100002E6 SP.FS efsneg Floating-Point Single-Precision Negate EVX 100002E8 SP.FD efdmul Floating-Point Double-Precision Multiply EVX 100002E8 SP.FS efsmul Floating-Point Single-Precision Multiply EVX 100002E9 SP.FD efddiv Floating-Point Double-Precision Divide EVX 100002E9 SP.FS efsdiv Floating-Point Single-Precision Divide EVX 100002EA SP.FD efdctuidz Convert Floating-Point Double-Precision to Unsigned Inte- ger Doubleword with Round Towards Zero EVX 100002EA SP.FS efsctuidz Convert Floating-Point Single-Precision to Unsigned Integer Doubleword with Round Towards Zero EVX 100002EB SP.FD efdctsidz Convert Floating-Point Double-Precision to Signed Integer Doubleword with Round Towards Zero EVX 100002EB SP.FS efsctsidz Convert Floating-Point Single-Precision to Signed Integer Doubleword with Round Towards Zero EVX 100002EC SP.FD efdcmpgt Floating-Point Double-Precision Compare Greater Than EVX 100002EC SP.FS efscmpgt Floating-Point Single-Precision Compare Greater Than EVX 100002ED SP.FD efdcmplt Floating-Point Double-Precision Compare Less Than EVX 100002ED SP.FS efscmplt Floating-Point Single-Precision Compare Less Than EVX 100002EE SP.FD efdcmpeq Floating-Point Double-Precision Compare Equal EVX 100002EE SP.FS efscmpeq Floating-Point Single-Precision Compare Equal EVX 100002EF SP.FD efdcfs Floating-Point Double-Precision Convert from Single-Preci- sion EVX 100002F0 SP.FD efdcfui Convert Floating-Point Double-Precision from Unsigned Integer EVX 100002F0 SP.FS efscfui Convert Floating-Point Single-Precision from Unsigned Inte- ger EVX 100002F1 SP.FD efdcfsi Convert Floating-Point Double-Precision from Signed Inte- ger EVX 100002F1 SP.FS efscfsi Convert Floating-Point Single-Precision from Signed Integer EVX 100002F2 SP.FD efdcfuf Convert Floating-Point Double-Precision from Unsigned Fraction EVX 100002F2 SP.FS efscfuf Convert Floating-Point Single-Precision from Unsigned Fraction EVX 100002F3 SP.FD efdcfsf Convert Floating-Point Double-Precision from Signed Frac- tion EVX 100002F3 SP.FS efscfsf Convert Floating-Point Single-Precision from Signed Frac- tion EVX 100002F4 SP.FD efdctui Convert Floating-Point Double-Precision to Unsigned Inte- ger EVX 100002F4 SP.FS efsctui Convert Floating-Point Single-Precision to Unsigned Integer EVX 100002F5 SP.FD efdctsi Convert Floating-Point Double-Precision to Signed Integer EVX 100002F5 SP.FS efsctsi Convert Floating-Point Single-Precision to Signed Integer EVX 100002F6 SP.FD efdctuf Convert Floating-Point Double-Precision to Unsigned Frac- tion EVX 100002F6 SP.FS efsctuf Convert Floating-Point Single-Precision to Unsigned Frac- tion EVX 100002F7 SP.FD efdctsf Convert Floating-Point Double-Precision to Signed Fraction EVX 100002F7 SP.FS efsctsf Convert Floating-Point Single-Precision to Signed Fraction Appendix B. VLE Instruction Set Sorted by Opcode 833 Version 2.05 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 100002F8 SP.FD efdctuiz Convert Floating-Point Double-Precision to Unsigned Inte- ger with Round Towards Zero EVX 100002F8 SP.FS efsctuiz Convert Floating-Point Single-Precision to Unsigned Integer with Round Towards Zero EVX 100002FA SP.FD efdctsiz Convert Floating-Point Double-Precision to Signed Integer with Round Towards Zero EVX 100002FA SP.FS efsctsiz Convert Floating-Point Single-Precision to Signed Integer with Round Towards Zero EVX 100002FC SP.FD efdtstgt Floating-Point Double-Precision Test Greater Than EVX 100002FC SP.FS efststgt Floating-Point Single-Precision Test Greater Than EVX 100002FD SP.FD efdtstlt Floating-Point Double-Precision Test Less Than EVX 100002FD SP.FS efststlt Floating-Point Single-Precision Test Less Than EVX 100002FE SP.FD efdtsteq Floating-Point Double-Precision Test Equal EVX 100002FE SP.FS efststeq Floating-Point Single-Precision Test Equal EVX 10000300 SP evlddx Vector Load Doubleword into Doubleword Indexed VX 10000300 V vaddsbs Vector Add Signed Byte Saturate EVX 10000301 SP evldd Vector Load Doubleword into Doubleword EVX 10000302 SP evldwx Vector Load Doubleword into 2 Words Indexed VX 10000302 V vminsb Vector Minimum Signed Byte EVX 10000303 SP evldw Vector Load Doubleword into 2 Words EVX 10000304 SP evldhx Vector Load Doubleword into 4 Halfwords Indexed VX 10000304 V vsrab Vector Shift Right Algebraic Word EVX 10000305 SP evldh Vector Load Doubleword into 4 Halfwords VC 10000306 V vcmpgtsb[.] Vector Compare Greater Than Signed Byte EVX 10000308 SP evlhhesplatx Vector Load Halfword into Halfwords Even and Splat Indexed VX 10000308 V vmulesb Vector Multiply Even Signed Byte EVX 10000309 SP evlhhesplat Vector Load Halfword into Halfwords Even and Splat VX 1000030A V vcuxwfp Vector Convert from Unsigned Fixed-Point Word to Single- Precision EVX 1000030C SP evlhhousplatx Vector Load Halfword into Halfwords Odd Unsigned and Splat Indexed VX 1000030C V vspltisb Vector Splat Immediate Signed Byte EVX 1000030D SP evlhhousplat Vector Load Halfword into Halfwords Odd Unsigned and Splat EVX 1000030E SP evlhhossplatx Vector Load Halfword into Halfwords Odd Signed and Splat Indexed VX 1000030E V vpkpx Vector Pack Pixel EVX 1000030F SP evlhhossplat Vector Load Halfword into Halfwords Odd and Splat EVX 10000310 SP evlwhex Vector Load Word into Two Halfwords Even Indexed EVX 10000311 SP evlwhe Vector Load Word into Two Halfwords Even EVX 10000314 SP evlwhoux Vector Load Word into Two Halfwords Odd Unsigned Indexed (zero-extended) EVX 10000315 SP evlwhou Vector Load Word into Two Halfwords Odd Unsigned (zero- extended) EVX 10000316 SP evlwhosx Vector Load Word into Two Halfwords Odd Signed Indexed (with sign extension) EVX 10000317 SP evlwhos Vector Load Word into Two Halfwords Odd Signed (with sign extension) EVX 10000318 SP evlwwsplatx Vector Load Word into Word and Splat Indexed X 10000318 SR LIM maclhwu[o][.] Multiply Accumulate Low Halfword to Word Modulo Unsigned EVX 10000319 SP evlwwsplat Vector Load Word into Word and Splat EVX 1000031C SP evlwhsplatx Vector Load Word into Two Halfwords and Splat Indexed EVX 1000031D SP evlwhsplat Vector Load Word into Two Halfwords and Splat EVX 10000320 SP evstddx Vector Store Doubleword of Doubleword Indexed EVX 10000321 SP evstdd Vector Store Doubleword of Doubleword EVX 10000322 SP evstdwx Vector Store Doubleword of Two Words Indexed EVX 10000323 SP evstdw Vector Store Doubleword of Two Words EVX 10000324 SP evstdhx Vector Store Doubleword of Four Halfwords Indexed 834 Power ISATM VLE Version 2.05 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 10000325 SP evstdh Vector Store Doubleword of Four Halfwords EVX 10000330 SP evstwhex Vector Store Word of Two Halfwords from Even Indexed EVX 10000331 SP evstwhe Vector Store Word of Two Halfwords from Even EVX 10000334 SP evstwhox Vector Store Word of Two Halfwords from Odd Indexed EVX 10000335 SP evstwho Vector Store Word of Two Halfwords from Odd EVX 10000338 SP evstwwex Vector Store Word of Word from Even Indexed EVX 10000339 SP evstwwe Vector Store Word of Word from Even EVX 1000033C SP evstwwox Vector Store Word of Word from Odd Indexed EVX 1000033D SP evstwwo Vector Store Word of Word from Odd VX 10000340 V vaddshs Vector Add Signed Halfword Saturate VX 10000342 V vminsh Vector Minimum Signed Halfword VX 10000344 V vsrah Vector Shift Right Algebraic Word VC 10000346 V vcmpgtsh[.] Vector Compare Greater Than Signed Halfword VX 10000348 V vmulesh Vector Multiply Even Signed Halfword VX 1000034A V vcsxwfp Vector Convert from Signed Fixed-Point Word to Single- Precision VX 1000034C V vspltish Vector Splat Immediate Signed Halfword VX 1000034E V vupkhpx Vector Unpack High Pixel X 10000358 SR LIM maclhw[o][.] Multiply Accumulate Low Halfword to Word Modulo Signed X 1000035C SR LIM nmaclhw[o][.] Negative Multiply Accumulate Low Halfword to Word Mod- ulo Signed VX 10000380 V vaddsws Vector Add Signed Word Saturate VX 10000382 V vminsw Vector Minimum Signed Word VX 10000384 V vsraw Vector Shift Right Algebraic Word VC 10000386 V vcmpgtsw[.] Vector Compare Greater Than Signed Word VX 1000038A V vcfpuxws Vector Convert from Single-Precision to Unsigned Fixed- Point Word Saturate VX 1000038C V vspltisw Vector Splat Immediate Signed Word X 10000398 SR LIM maclhwsu[o][.] Multiply Accumulate Low Halfword to Word Saturate Unsigned VC 100003C6 V vcmpbfp[.] Vector Compare Bounds Single-Precision VX 100003CA V vcfpsxws Vector Convert from Single-Precision to Signed Fixed-Point Word Saturate VX 100003CE V vupklpx Vector Unpack Low Pixel X 100003D8 SR LIM maclhws[o][.] Multiply Accumulate Low Halfword to Word Saturate Signed X 100003DC SR LIM nmaclhws[o][.] Negative Multiply Accumulate Low Halfword to Word Satu- rate Signed VX 10000400 V vsububm Vector Subtract Unsigned Byte Modulo VX 10000402 V vavgub Vector Average Unsigned Byte EVX 10000403 SP evmhessf Vector Multiply Halfwords, Even, Signed, Saturate, Frac- tional VX 10000404 V vand Vector AND EVX 10000407 SP evmhossf Vector Multiply Halfwords, Odd, Signed, Fractional EVX 10000408 SP evmheumi Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer EVX 10000409 SP evmhesmi Vector Multiply Halfwords, Even, Signed, Modulo, Integer VX 1000040A V vmaxfp Vector Maximum Single-Precision EVX 1000040B SP evmhesmf Vector Multiply Halfwords, Even, Signed, Modulo, Fractional EVX 1000040C SP evmhoumi Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer VX 1000040C V vslo Vector Shift Left by Octet EVX 1000040D SP evmhosmi Vector Multiply Halfwords, Odd, Signed, Modulo, Integer EVX 1000040F SP evmhosmf Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional EVX 10000423 SP evmhessfa Vector Multiply Halfwords, Even, Signed, Saturate, Frac- tional to Accumulator EVX 10000427 SP evmhossfa Vector Multiply Halfwords, Odd, Signed, Fractional to Accu- mulator EVX 10000428 SP evmheumia Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer to Accumulator EVX 10000429 SP evmhesmia Vector Multiply Halfwords, Even, Signed, Modulo, Integer to Accumulator EVX 1000042B SP evmhesmfa Vector Multiply Halfwords, Even, Signed, Modulo, Fractional to Accumulate Appendix B. VLE Instruction Set Sorted by Opcode 835 Version 2.05 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 1000042C SP evmhoumia Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer to Accumulator EVX 1000042D SP evmhosmia Vector Multiply Halfwords, Odd, Signed, Modulo, Integer to Accumulator EVX 1000042F SP evmhosmfa Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional to Accumulator VX 10000440 V vsubuhm Vector Subtract Unsigned Byte Modulo VX 10000442 V vavguh Vector Average Unsigned Halfword VX 10000444 V vandc Vector AND with Complement EVX 10000447 SP evmwhssf Vector Multiply Word High Signed, Fractional EVX 10000448 SP evmwlumi Vector Multiply Word Low Unsigned, Modulo, Integer VX 1000044A V vminfp Vector Minimum Single-Precision EVX 1000044C SP evmwhumi Vector Multiply Word High Unsigned, Modulo, Integer VX 1000044C V vsro Vector Shift Right by Octet EVX 1000044D SP evmwhsmi Vector Multiply Word High Signed, Modulo, Integer EVX 1000044F SP evmwhsmf Vector Multiply Word High Signed, Modulo, Fractional EVX 10000453 SP evmwssf Vector Multiply Word Signed, Saturate, Fractional EVX 10000458 SP evmwumi Vector Multiply Word Unsigned, Modulo, Integer EVX 10000459 SP evmwsmi Vector Multiply Word Signed, Modulo, Integer EVX 1000045B SP evmwsmf Vector Multiply Word Signed, Modulo, Fractional EVX 10000467 SP evmwhssfa Vector Multiply Word High Signed, Fractional to Accumula- tor EVX 10000468 SP evmwlumia Vector Multiply Word Low Unsigned, Modulo, Integer to Accumulator EVX 1000046C SP evmwhumia Vector Multiply Word High Unsigned, Modulo, Integer to Accumulator EVX 1000046D SP evmwhsmia Vector Multiply Word High Signed, Modulo, Integer to Accu- mulator EVX 1000046F SP evmwhsmfa Vector Multiply Word High Signed, Modulo, Fractional to Accumulator EVX 10000473 SP evmwssfa Vector Multiply Word Signed, Saturate, Fractional to Accu- mulator EVX 10000478 SP evmwumia Vector Multiply Word Unsigned, Modulo, Integer to Accumu- lator EVX 10000479 SP evmwsmia Vector Multiply Word Signed, Modulo, Integer to Accumula- tor EVX 1000047B SP evmwsmfa Vector Multiply Word Signed, Modulo, Fractional to Accu- mulator VX 10000480 V vsubuwm Vector Subtract Unsigned Word Modulo VX 10000482 V vavguw Vector Average Unsigned Word VX 10000484 V vor Vector OR EVX 100004C0 SP evaddusiaaw Vector Add Unsigned, Saturate, Integer to Accumulator Word EVX 100004C1 SP evaddssiaaw Vector Add Signed, Saturate, Integer to Accumulator Word EVX 100004C2 SP evsubfusiaaw Vector Subtract Unsigned, Saturate, Integer to Accumulator Word EVX 100004C3 SP evsubfssiaaw Vector Subtract Signed, Saturate, Integer to Accumulator Word EVX 100004C4 SP evmra Initialize Accumulator VX 100004C4 V vxor Vector XOR EVX 100004C6 SP evdivws Vector Divide Word Signed EVX 100004C7 SP evdivwu Vector Divide Word Unsigned EVX 100004C8 SP evaddumiaaw Vector Add Unsigned, Modulo, Integer to Accumulator Word EVX 100004C9 SP evaddsmiaaw Vector Add Signed, Modulo, Integer to Accumulator Word EVX 100004CA SP evsubfumiaaw Vector Subtract Unsigned, Modulo, Integer to Accumulator Word EVX 100004CB SP evsubfsmiaaw Vector Subtract Signed, Modulo, Integer to Accumulator Word EVX 10000500 SP evmheusiaaw Vector Multiply Halfwords, Even, Unsigned, Saturate Integer and Accumulate into Words 836 Power ISATM VLE Version 2.05 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 10000501 SP evmhessiaaw Vector Multiply Halfwords, Even, Signed, Saturate, Integer and Accumulate into Words VX 10000502 V vavgsb Vector Average Signed Byte EVX 10000503 SP evmhessfaaw Vector Multiply Halfwords, Even, Signed, Saturate, Frac- tional and Accumulate into Words EVX 10000504 SP evmhousiaaw Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate into Words VX 10000504 V vnor Vector NOR EVX 10000505 SP evmhossiaaw Vector Multiply Halfwords, Odd, Signed, Saturate, Integer and Accumulate into Words EVX 10000507 SP evmhossfaaw Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional and Accumulate into Words EVX 10000508 SP evmheumiaaw Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate into Words EVX 10000509 SP evmhesmiaaw Vector Multiply Halfwords, Even, Signed, Modulo, Integer and Accumulate into Words EVX 1000050B SP evmhesmfaaw Vector Multiply Halfwords, Even, Signed, Modulo, Fractional and Accumulate into Words EVX 1000050C SP evmhoumiaaw Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate into Words EVX 1000050D SP evmhosmiaaw Vector Multiply Halfwords, Odd, Signed, Modulo, Integer and Accumulate into Words EVX 1000050F SP evmhosmfaaw Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional and Accumulate into Words EVX 10000528 SP evmhegumiaa Vector Multiply Halfwords, Even, Guarded, Unsigned, Mod- ulo, Integer and Accumulate EVX 10000529 SP evmhegsmiaa Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate EVX 1000052B SP evmhegsmfaa Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate EVX 1000052C SP evmhogumiaa Vector Multiply Halfwords, Odd, Guarded, Unsigned, Mod- ulo, Integer and Accumulate EVX 1000052D SP evmhogsmiaa Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Integer and Accumulate EVX 1000052F SP evmhogsmfaa Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Fractional and Accumulate EVX 10000540 SP evmwlusiaaw Vector Multiply Word Low Unsigned Saturate, Integer and Accumulate into Words EVX 10000541 SP evmwlssiaaw Vector Multiply Word Low Signed, Saturate, Integer and Accumulate into Words VX 10000542 V vavgsh Vector Average Signed Halfword EVX 10000544 SP evmwhusiaaw Vector Multiply Word High Unsigned, Integer and Accumu- late into Words EVX 10000547 SP evmwhssfaaw Vector Multiply Word High Signed, Fractional and Accumu- late into Words EVX 10000548 SP evmwlumiaaw Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate into Words EVX 10000549 SP evmwlsmiaaw Vector Multiply Word Low Signed, Modulo, Integer and Accumulate into Words EVX 1000054C SP evmwhumiaaw Vector Multiply Word High Unsigned, Modulo, Integer and Accumulate into Words EVX 1000054D SP evmwhsmiaaw Vector Multiply Word High Signed, Modulo, Integer and Accumulate into Words EVX 1000054F SP evmwhsmfaaw Vector Multiply Word High Signed, Modulo, Fractional and Accumulate into Words EVX 10000553 SP evmwssfaa Vector Multiply Word Signed, Saturate, Fractional and Accu- mulate EVX 10000558 SP evmwumiaa Vector Multiply Word Unsigned, Modulo, Integer and Accu- mulate Appendix B. VLE Instruction Set Sorted by Opcode 837 Version 2.05 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 10000559 SP evmwsmiaa Vector Multiply Word Signed, Modulo, Integer and Accumu- late EVX 1000055B SP evmwsmfaa Vector Multiply Word Signed, Modulo, Fractional and Accu- mulate EVX 10000580 SP evmheusianw Vector Multiply Halfwords, Even, Unsigned, Saturate Integer and Accumulate Negative into Words VX 10000580 V vsubcuw Vector Subtract and Write Carry-Out Unsigned Word EVX 10000581 SP evmhessianw Vector Multiply Halfwords, Even, Signed, Saturate, Integer and Accumulate Negative into Words VX 10000582 V vavgsw Vector Average Signed Word EVX 10000583 SP evmhessfanw Vector Multiply Halfwords, Even, Signed, Saturate, Frac- tional and Accumulate Negative into Words EVX 10000584 SP evmhousianw Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate Negative into Words EVX 10000585 SP evmhossianw Vector Multiply Halfwords, Odd, Signed, Saturate, Integer and Accumulate Negative into Words EVX 10000587 SP evmhossfanw Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional and Accumulate Negative into Words EVX 10000588 SP evmheumianw Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate Negative into Words EVX 10000589 SP evmhesmianw Vector Multiply Halfwords, Even, Signed, Modulo, Integer and Accumulate Negative into Words EVX 1000058B SP evmhesmfanw Vector Multiply Halfwords, Even, Signed, Modulo, Fractional and Accumulate Negative into Words EVX 1000058C SP evmhoumianw Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate Negative into Words EVX 1000058D SP evmhosmianw Vector Multiply Halfwords, Odd, Signed, Modulo, Integer and Accumulate Negative into Words EVX 1000058F SP evmhosmfanw Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional and Accumulate Negative into Words EVX 100005A8 SP evmhegumian Vector Multiply Halfwords, Even, Guarded, Unsigned, Mod- ulo, Integer and Accumulate Negative EVX 100005A9 SP evmhegsmian Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate Negative EVX 100005AB SP evmhegsmfan Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate Negative EVX 100005AC SP evmhogumian Vector Multiply Halfwords, Odd, Guarded, Unsigned, Mod- ulo, Integer and Accumulate Negative EVX 100005AD SP evmhogsmian Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Integer and Accumulate Negative EVX 100005AF SP evmhogsmfan Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Fractional and Accumulate Negative EVX 100005C0 SP evmwlusianw Vector Multiply Word Low Unsigned Saturate, Integer and Accumulate Negative into Words EVX 100005C1 SP evmwlssianw Vector Multiply Word Low Signed, Saturate, Integer and Accumulate Negative into Words EVX 100005C4 SP evmwhusianw Vector Multiply Word High Unsigned, Integer and Accumu- late Negative into Words EVX 100005C5 SP evmwhssianw Vector Multiply Word High Signed, Integer and Accumulate Negative into Words EVX 100005C7 SP evmwhssfanw Vector Multiply Word High Signed, Fractional and Accumu- late Negative into Words EVX 100005C8 SP evmwlumianw Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate Negative into Words EVX 100005C9 SP evmwlsmianw Vector Multiply Word Low Signed, Modulo, Integer and Accumulate Negative into Words EVX 100005CC SP evmwhumianw Vector Multiply Word High Unsigned, Modulo, Integer and Accumulate Negative into Words 838 Power ISATM VLE Version 2.05 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 EVX 100005CD SP evmwhsmianw Vector Multiply Word High Signed, Modulo, Integer and Accumulate Negative into Words EVX 100005CF SP evmwhsmfanw Vector Multiply Word High Signed, Modulo, Fractional and Accumulate Negative into Words EVX 100005D3 SP evmwssfan Vector Multiply Word Signed, Saturate, Fractional and Accu- mulate Negative EVX 100005D8 SP evmwumian Vector Multiply Word Unsigned, Modulo, Integer and Accu- mulate Negative EVX 100005D9 SP evmwsmian Vector Multiply Word Signed, Modulo, Integer and Accumu- late Negative EVX 100005DB SP evmwsmfan Vector Multiply Word Signed, Modulo, Fractional and Accu- mulate Negative VX 10000600 V vsububs Vector Subtract Unsigned Byte Saturate VX 10000604 V mfvscr Move from Vector Status and Control Register VX 10000608 V vsum4ubs Vector Sum across Quarter Unsigned Byte Saturate VX 10000640 V vsubuhs Vector Subtract Unsigned Halfword Saturate VX 10000644 V mtvscr Move to Vector Status and Control Register VX 10000648 V vsum4shs Vector Sum across Quarter Signed Halfword Saturate VX 10000680 V vsubuws Vector Subtract Unsigned Word Saturate VX 10000688 V vsum2sws Vector Sum across Half Signed Word Saturate VX 10000700 V vsubsbs Vector Subtract Signed Byte Saturate VX 10000708 V vsum4sbs Vector Sum across Quarter Signed Byte Saturate VX 10000740 V vsubshs Vector Subtract Signed Halfword Saturate VX 10000780 V vsubsws Vector Subtract Signed Word Saturate VX 10000788 V vsumsws Vector Sum across Signed Word Saturate D8 18000000 VLE e_lbzu Load Byte and Zero with Update D8 18000100 VLE e_lhzu Load Halfword and Zero with Update D8 18000200 VLE e_lwzu Load Word and Zero with Update D8 18000300 VLE e_lhau Load Halfword Algebraic with Update D8 18000400 VLE e_stbu Store Byte with Update D8 18000500 VLE e_sthu Store Halfword with Update D8 18000600 VLE e_stwu Store word with Update D8 18000800 VLE e_lmw Load Multiple Word D8 18000900 VLE e_stmw Store Multiple Word SCI8 18008000 SR VLE e_addi[.] Add Scaled Immediate SCI8 18009000 SR VLE e_addic[.] Add Scaled Immediate Carrying SCI8 1800A000 VLE e_mulli Multiply Low Scaled Immediate SCI8 1800A800 VLE e_cmpi Compare Scaled Immediate Word SCI8 1800B000 SR VLE e_subfic[.] Subtract From Scaled Immediate Carrying SCI8 1800C000 SR VLE e_andi[.] AND Scaled Immediate SCI8 1800D000 SR VLE e_ori[.] OR Scaled Immediate SCI8 1800E000 SR VLE e_xori[.] XOR Scaled Immediate SCI8 1880A800 VLE e_cmpli Compare Logical Scaled Immediate Word D 1C000000 VLE e_add16i Add Immediate OIM5 2000---- VLE se_addi Add Immediate Short Form OIM5 2200---- VLE se_cmpli Compare Logical Immediate Word OIM5 2400---- SR VLE se_subi[.] Subtract Immediate IM5 2A00---- VLE se_cmpi Compare Immediate Word Short Form IM5 2C00---- VLE se_bmaski Bit Mask Generate Immediate IM5 2E00---- VLE se_andi AND Immediate Short Form D 30000000 VLE e_lbz Load Byte and Zero D 34000000 VLE e_stb Store Byte D 38000000 VLE e_lha Load Halfword Algebraic RR 4000---- VLE se_srw Shift Right Word RR 4100---- SR VLE se_sraw Shift Right Algebraic Word RR 4200---- VLE se_slw Shift Left Word RR 4400---- VLE se_or OR SHort Form RR 4500---- VLE se_andc AND with Complement Short Form RR 4600---- SR VLE se_and[.] AND Short Form IM7 4800---- VLE se_li Load Immediate Short Form D 50000000 VLE e_lwz Load Word and Zero Appendix B. VLE Instruction Set Sorted by Opcode 839 Version 2.05 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 D 54000000 VLE e_stw Store Word D 58000000 VLE e_lhz Load Halfword and Zero D 5C000000 VLE e_sth Store Halfword IM5 6000---- VLE se_bclri Bit Clear Immediate IM5 6200---- VLE se_bgeni Bit Generate Immediate IM5 6400---- VLE se_bseti Bit Set Immediate IM5 6600---- VLE se_btsti Bit Test Immediate IM5 6800---- VLE se_srwi Shift Right Word Immediate Short Form IM5 6A00---- SR VLE se_srawi Shift Right Algebraic Immediate IM5 6C00---- VLE se_slwi Shift Left Word Immediate Short Form LI20 70000000 VLE e_li Load Immediate I16A 70008800 SR VLE e_add2i. Add (2 operand) Immediate and Record I16A 70009000 VLE e_add2is Add (2 operand) Immediate Shifted IA16 70009800 VLE e_cmp16i Compare Immediate Word I16A 7000A000 VLE e_mull2i Multiply (2 operand) Low Immediate I16A 7000A800 VLE e_cmpl16i Compare Logical Immediate Word IA16 7000B000 VLE e_cmph16i Compare Halfword Immediate IA16 7000B800 VLE e_cmphl16i Compare Halfword Logical Immediate I16L 7000C000 VLE e_or2i OR (2operand) Immediate I16L 7000C800 SR VLE e_and2i. AND (2 operand) Immediate I16L 7000D000 VLE e_or2is OR (2 operand) Immediate Shifted I16L 7000E000 VLE e_lis Load Immediate Shifted I16L 7000E800 SR VLE e_and2is. AND (2 operand) Immediate Shifted M 74000000 VLE e_rlwimi Rotate Left Word Immediate then Mask Insert M 74000001 VLE e_rlwinm Rotate Left Word Immediate then AND with Mask BD24 78000000 VLE e_b[l] Branch [and Link] BD15 7A000000 CT VLE e_bc[l] Branch Conditional [and Link] X 7C000000 B cmp Compare X 7C000008 B tw Trap Word X 7C00000C V lvsl Load Vector for Shift Left Indexed X 7C00000E V lvebx Load Vector Element Byte Indexed XO 7C000010 SR B subfc[o][.] Subtract From Carrying XO 7C000012 SR 64 mulhdu[.] Multiply High Doubleword Unsigned XO 7C000014 B addc[o][.] Add Carrying XO 7C000016 SR B mulhwu[.] Multiply High Word Unsigned X 7C00001C VLE e_cmph Compare Halfword A 7C00001E B.in isel Integer Select XL 7C000020 VLE e_mcrf Move CR Field XFX 7C000026 B mfcr Move From Condition Register X 7C000028 B lwarx Load Word and Reserve Indexed X 7C00002A 64 ldx Load Doubleword Indexed X 7C00002C E icbt Instruction Cache Block Touch X 7C00002E B lwzx Load Word and Zero Indexed X 7C000030 SR B slw[.] Shift Left Word X 7C000034 SR B cntlzw[.] Count Leading Zeros Word X 7C000036 SR 64 sld[.] Shift Left Doubleword X 7C000038 SR B and[.] AND X 7C00003A P E.PD ldepx Load Doubleword by External Process ID Indexed X 7C00003E P E.PD lwepx Load Word by External Process ID Indexed X 7C000040 B cmpl Compare Logical XL 7C000042 VLE e_crnor Condition Register NOR X 7C00004C V lvsr Load Vector for Shift Right Indexed X 7C00004E V lvehx Load Vector Element Halfword Indexed XO 7C000050 SR B subf[o][.] Subtract From X 7C00005C VLE e_cmphl Compare Halfword Logical X 7C00006A 64 ldux Load Doubleword with Update Indexed X 7C00006C B dcbst Data Cache Block Store X 7C00006E B lwzux Load Word and Zero with Update Indexed X 7C000070 SR VLE e_slwi[.] Shift Left Word Immediate X 7C000074 SR 64 cntlzd[.] Count Leading Zeros Doubleword X 7C000078 SR B andc[.] AND with Complement 840 Power ISATM VLE Version 2.05 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 X 7C00007C WT wait Wait X 7C000088 64 td Trap Doubleword X 7C00008E V lvewx Load Vector Element Word Indexed XO 7C000092 SR 64 mulhd[.] Multiply High Doubleword XO 7C000096 SR B mulhw[.] Multiply High Word X 7C0000A6 P B mfmsr Move From Machine State Register X 7C0000A8 64 ldarx Load Doubleword and Reserve Indexed X 7C0000AC B dcbf Data Cache Block Flush X 7C0000AE B lbzx Load Byte and Zero Indexed X 7C0000BE P E.PD lbepx Load Byte by External Process ID Indexed X 7C0000CE V lvx[l] Load Vector Indexed [Last] X 7C0000D0 SR B neg[o][.] Negate X 7C0000EE B lbzux Load Byte and Zero with Update Indexed X 7C0000F4 B popcntb Population Count Bytes X 7C0000F8 SR B nor[.] NOR X 7C0000FE P E.PD dcbfep Data Cache Block Flush by External Process ID XL 7C000102 VLE e_crandc Condition Register AND with Completement X 7C000106 P E wrtee Write MSR External Enable X 7C00010C M ECL dcbtstls Data Cache Block Touch for Store and Lock Set VX 7C00010E V stvebx Store Vector Element Byte Indexed XO 7C000110 SR B subfe[o][.] Subtract From Extended XO 7C000114 SR B adde[o][.] Add Extended EVX 7C00011D P E.PD evlddepx Vector Load Doubleword into Doubleword by External Pro- cess ID Indexed XFX 7C000120 B mtcrf Move To Condition Register Fields X 7C000124 P E mtmsr Move To Machine State Register X 7C00012A 64 stdx Store Doubleword Indexed X 7C00012D B stwcx. Store Word Conditional Indexed X 7C00012E B stwx Store Word Indexed X 7C00013A P E.PD stdepx Store Doubleword by External Process ID Indexed X 7C00013E P E.PD stwepx Store Word by External Process ID Indexed X 7C000146 P E wrteei Write MSR External Enable Immediate X 7C00014C M ECL dcbtls Data Cache Block Touch and Lock Set VX 7C00014E V stvehx Store Vector Element Halfword Indexed X 7C00016A 64 stdux Store Doubleword with Update Indexed X 7C00016E B stwux Store Word with Update Indexed XL 7C000182 VLE e_crxor Condition Register XOR VX 7C00018E V stvewx Store Vector Element Word Indexed XO 7C000190 SR B subfze[o][.] Subtract From Zero Extended XO 7C000194 SR B addze[o][.] Add to Zero Extended X 7C00019C P E.PC msgsnd Message Send EVX 7C00019D P E.PD evstddepx Vector Store Doubleword into Doubleword by External Pro- cess ID Indexed X 7C0001AD 64 stdcx. Store Doubleword Conditional Indexed X 7C0001AE B stbx Store Bye Indexed X 7C0001BE P E.PD stbepx Store Byte by External Process ID Indexed XL 7C0001C2 VLE e_crnand Condition Register NAND X 7C0001CC M ECL icblc Instruction Cache Block Lock Clear VX 7C0001CE V stvx[l] Store Vector Indexed [Last] XO 7C0001D0 SR B subfme[o][.] Subtract From Minus One Extended XO 7C0001D2 SR 64 mulld[o][.] Multiply Low Doubleword XO 7C0001D4 SR B addme[o][.] Add to Minus One Extended XO 7C0001D6 SR B mullw[o][.] Multiply Low Word X 7C0001DC P E.PC msgclr Message Clear X 7C0001EC B dcbtst Data Cache Block Touch for Store X 7C0001EE B stbux Store Byte with Update Indexed X 7C0001FE P E.PD dcbtstep Data Cache Block Touch for Store by External Process ID XL 7C000202 VLE e_crand Condition Register AND XFX 7C000206 P E mfdcrx Move From Device Control Register Indexed X 7C00020E P E.PD lvepxl Load Vector by External Process ID Indexed LRU XO 7C000214 B add[o][.] Add Appendix B. VLE Instruction Set Sorted by Opcode 841 Version 2.05 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 X 7C00022C B dcbt Data Cache Block Touch X 7C00022E B lhzx Load Halfword and Zero Indexed X 7C000230 SR VLE e_rlw[.] Rotate Left Word X 7C000238 SR B eqv[.] Equivalent X 7C00023E P E.PD lhepx Load Halfword by External Process ID Indexed XL 7C000242 VLE e_creqv Condition Register Equivalent XFX 7C000246 P E mfdcrux Move From Device Control Register User-mode Indexed X 7C00024E P E.PD lvepx Load Vector by External Process ID Indexed X 7C00026E B lhzux Load Halfword and Zero with Update Indexed X 7C000270 SR VLE e_rlwi[.] Rotate Left Word Immediate D 7C000278 SR B xor[.] XOR X 7C00027E P E.PD dcbtep Data Cache Block Touch by External Process ID XFX 7C000286 P E mfdcr Move From Device Control Register X 7C00028C P E.CD dcread Data Cache Read XFX 7C00029C O E.PM mfpmr Move From Performance Monitor Register XFX 7C0002A6 O B mfspr Move From Special Purpose Register X 7C0002AA 64 lwax Load Word Algebraic Indexed X 7C0002AE B lhax Load Halfword Algebraic Indexed X 7C0002EA 64 lwaux Load Word Algebraic with Update Indexed X 7C0002EE B lhaux Load Halfword Algebraic with Update Indexed X 7C000306 P E mtdcrx Move To Device Control Register Indexed X 7C00030C M ECL dcblc Data Cache Block Lock Clear X 7C00032E B sthx Store Halfword Indexed X 7C000338 SR B orc[.] OR with Complement X 7C00033E P E.PD sthepx Store Halfword by External Process ID Indexed XL 7C000342 VLE e_crorc Condition Register OR with Complement X 7C000346 E mtdcrux Move To Device Control Register User-mode Indexed X 7C00036E B sthux Store Halfword with Update Indexed X 7C000378 SR B or[.] OR XL 7C000382 VLE e_cror Condition Register OR XFX 7C000386 P E mtdcr Move To Device Control Register X 7C00038C P E.CI dci Data Cache Invalidate XO 7C000392 SR 64 divdu[o][.] Divide Doubleword Unsigned XO 7C000396 SR B divwu[o][.] Divide Word Unsigned XFX 7C00039C O E.PM mtpmr Move To Performance Monitor Register XFX 7C0003A6 O B mtspr Move To Special Purpose Register X 7C0003AC P E dcbi Data Cache Block Invalidate X 7C0003B8 SR B nand[.] NAND X 7C0003CC M ECL icbtls Instruction Cache Block Touch and Lock Set X 7C0003CC P E.CD dcread Data Cache Read XO 7C0003D2 SR 64 divd[o][.] Divide Doubleword XO 7C0003D6 SR B divw[o][.] Divide Word X 7C000400 B mcrxr Move To Condition Register From XER X 7C00042A MA lswx Load String Word Indexed X 7C00042C B lwbrx Load Word Byte-Reversed Indexed X 7C000430 SR B srw[.] Shift Right Word X 7C000436 SR 64 srd[.] Shift Right Doubleword X 7C00046C P E tlbsync TLB Synchronize X 7C000470 SR VLE e_srwi[.] Shift Right Word Immediate X 7C0004AA MA lswi Load String Word Immediate X 7C0004AC B sync Synchronize X 7C0004BE P E.PD lfdepx Load Floating-Point Double by External Process ID Indexed X 7C00052A MA stswx Store String Word Indexed X 7C00052C B stwbrx Store Word Byte-Reversed Indexed X 7C0005AA MA stswi Store String Word Immediate X 7C0005BE P E.PD stfdepx Store Floating-Point Double by External Process ID Indexed X 7C0005EC E dcba Data Cache Block Allocate X 7C00060E P E.PD stvepxl Store Vector by External Process ID Indexed LRU X 7C000624 P E tlbivax TLB Invalidate Virtual Address Indexed X 7C00062C B lhbrx Load Halfword Byte-Reversed Indexed X 7C000630 SR B sraw[.] Shift Right Algebraic Word 842 Power ISATM VLE Version 2.05 Opcode Dep.1 Mode Form Priv1 (hexadeci- Cat1 Mnemonic Instruction mal)2 X 7C000634 SR 64 srad[.] Shift Right Algebraic Doubleword X 7C00064E P E.PD stvepx Store Vector by External Process ID Indexed X 7C000670 SR B srawi[.] Shift Right Algebraic Word Immediate X 7C000674 SR 64 sradi[.] Shift Right Algebraic Doubleword Immediate XFX 7C0006AC E mbar Memory Barrier X 7C000724 P E tlbsx TLB Search Indexed X 7C00072C B sthbrx Store Halfword Byte-Reversed Indexed X 7C000734 SR B extsh[.] Extend Sign Halfword X 7C000764 P E tlbre TLB Read Entry X 7C000774 SR B extsb[.] Extend Shign Byte X 7C00078C P E.CI ici Instruction Cache Invalidate X 7C0007A4 P E tlbwe TLB Write Entry X 7C0007AC B icbi Instruction Cache Block Invalidate X 7C0007B4 SR 64 extsw[.] Extend Sign Word X 7C0007BE P E.PD icbiep Instruction Cache Block Invalidate by External Process ID X 7C0007CC P E.CD icread Instruction Cache Read X 7C0007EC B dcbz Data Cache Block set to Zero X 7C0007FE P E.PD dcbzep Data Cache Block set to Zero by External Process ID XFX 7C100026 B mfocrf Move From One Condition Register Field XFX 7C100120 B mtocrf Move To One Condition Register Field SD4 8000---- VLE se_lbz Load Byte and Zero Short Form SD4 9000---- VLE se_stb Store Byte Short Form SD4 A000---- VLE se_lhz Load Halfword and Zero Short Form SD4 B000---- VLE se_sth Store Halfword SHort Form SD4 C000---- VLE se_lwz Load Word and Zero Short Form SD4 D000---- VLE se_stw Store Word Short Form BD8 E000---- VLE se_bc Branch Conditional Short Form BD8 E800---- VLE se_b[l] Branch [and Link] 1 See the key to the mode dependency and privilege column below and the key to the category column in Section 1.3.5 of Book I. 2For 16-bit instructions, the "Opcode" column represents the 16-bit hexadecimal instruction encoding with the opcode and extended opcode in the corresponding fields in the instruction, and with 0's in bit positions which are not opcode bits; dashes are used following the opcode to indicate the form is a 16-bit instruction. For 32-bit instructions, the "Opcode" column represents the 32-bit hexadecimal instruction encoding with the opcode and extended opcode in the corresponding fields in the instruction, and with 0's in bit positions which are not opcode bits. Mode Dependency and Privilege Abbreviations Except as described below and in Section 1.10.3, "Effective Address Calculation", in Book I, all instructions are inde- pendent of whether the processor is in 32-bit or 64-bit mode. Mode Dep. Description CT If the instruction tests the Count Register, it tests the low-order 32 bits in 32-bit mode and all 64 bits in 64-bit mode. SR The setting of status registers (such as XER and CR0) is mode-dependent. 32 The instruction must be executed only in 32- bit mode. 64 The instruction must be executed only in 64- bit mode. Key to Privilege Column Priv. Description P Denotes a privileged instruction. Appendix B. VLE Instruction Set Sorted by Opcode 843 Version 2.05 Priv. Description O Denotes an instruction that is treated as priv- ileged or nonprivileged (or hypervisor, for mtspr), depending on the SPR number. M Denotes an instruction that is treated as priv- ileged or nonprivileged, depending on the value of the UCLE bit of the MSR. H Denotes an instruction that can be executed only in hypervisor state. 844 Power ISATM VLE Version 2.05 Appendices: Power ISA Book I-III Appendices Appendices: Power ISA Book I-III Appendices 845 Version 2.05 846 Power ISATM Appendices Version 2.05 Appendix A. Incompatibilities with the POWER Architecture This appendix identifies the known incompatibilities In general, the incompatibilities identified here are that must be managed in the migration from the those that affect a POWER application program; POWER Architecture to the Power ISA. Some of the incompatibilities for instructions that can be used only incompatibilities can, at least in principle, be detected by POWER system programs are not necessarily dis- by the processor, which could trap and let software cussed. simulate the POWER operation. Others cannot be detected by the processor even in principle. A.1 New Instructions, Formerly In several cases the Power ISA assumes that reserved fields in POWER instructions indeed contain zero. The Privileged Instructions cases include the following. 1 bclr[l] and bcctr[l] assume that bits 19:20 in the Instructions new to Power ISA typically use opcode val- POWER instructions contain zero. ues (including extended opcode) that are illegal in 1 cmpi, cmp, cmpli, and cmpl assume that bit 10 in POWER. A few instructions that are privileged in the POWER instructions contains zero. POWER (e.g., dclz, called dcbz in Power ISA) have 1 mtspr and mfspr assume that bits 16:20 in the been made nonprivileged in Power ISA. Any POWER POWER instructions contain zero. program that executes one of these now-valid or now- 1 mtcrf and mfcr assume that bit 11 in the POWER nonprivileged instructions, expecting to cause the sys- instructions is contains zero. tem illegal instruction error handler or the system privi- 1 Synchronize assumes that bits 9:10 in the POWER leged instruction error handler to be invoked, will not instruction (dcs) contain zero. (This assumption execute correctly on Power ISA. provides compatibility for application programs, but not necessarily for operating system programs; A.2 Newly Privileged see Section A.22.) 1 mtmsr assumes that bit 15 in the POWER instruc- Instructions tion contains zero. The following instructions are nonprivileged in POWER but privileged in Power ISA. A.4 Reserved Bits in Registers mfmsr Both POWER and Power ISA permit software to write mfsr any value to these bits. However in POWER reading such a bit always returns 0, while in Power ISA reading it may return either 0 or the value that was last written A.3 Reserved Fields in to it. Instructions These fields are shown with "/"s in the instruction lay- A.5 Alignment Check outs. In both POWER and Power ISA these fields are ignored by the processor. The Power ISA states that The POWER MSR AL bit (bit 24) is no longer sup- these fields must contain zero. The POWER Architec- ported; the corresponding Power ISA MSR bit, bit 56, is ture lacks such a statement, but it is expected that reserved. The low-order bits of the EA are always used. essentially all POWER programs contain zero in these (Notice that the value 0 -- the normal value for a fields. reserved bit --- means "ignore the low-order EA bits" in POWER, and the value 1 means "use the low-order EA Appendix A. Incompatibilities with the POWER Architecture 847 Version 2.05 bits".) POWER-compatible operating system code will Power ISA shows these bits as "z", "a", or "t". The "z" probably write the value 1 to this bit. bits are ignored, as in POWER. However, the "a" and "t" bits can be used by software to provide a hint about how the branch is likely to behave. If a POWER pro- A.6 Condition Register gram has the "wrong" value for these bits, the program will produce the same results as on POWER but perfor- The following instructions specify a field in the CR mance may be affected. explicitly (via the BF field) and also, in POWER, use bit 31 as the Record bit. In Power ISA, bit 31 is a reserved field for these instructions and is ignored by the proces- A.9 BH Field sor. In POWER, if bit 31 contains 1 the instructions execute normally (i.e., as if the bit contained 0) except Bits 19:20 of the Branch Conditional to Link Register as follows: and Branch Conditional to Count Register instructions are reserved in POWER but are defined as a branch cmp CR0 is undefined if Rc=1 and BF0 hint (BH) field in Power ISA. Because these bits are cmpl CR0 is undefined if Rc=1 and BF0 hints, they may affect performance but do not affect the mcrxr CR0 is undefined if Rc=1 and BF0 results of executing the instruction. fcmpu CR1 is undefined if Rc=1 fcmpo CR1 is undefined if Rc=1 mcrfs CR1 is undefined if Rc=1 and BF1 A.10 Branch Conditional to Count Register A.7 LK and Rc Bits For the case in which the Count Register is decre- mented and tested (i.e., the case in which BO2=0), For the instructions listed below, if bit 31 (LK or Rc bit in POWER specifies only that the branch target address POWER) contains 1, in POWER the instruction exe- is undefined, with the implication that the Count Regis- cutes as if the bit contained 0 except as follows: if ter, and the Link Register if LK=1, are updated in the LK=1, the Link Register is set (to an undefined value, normal way. Power ISA specifies that this instruction except for svc); if Rc=1, Condition Register Field 0 or 1 form is invalid. is set to an undefined value. In Power ISA, bit 31 is a reserved field for these instructions and is ignored by the processor. A.11 System Call Power ISA instructions for which bit 31 is the LK bit in POWER: There are several respects in which Power ISA is incompatible with POWER for System Call instructions sc (svc in POWER) -- which in POWER are called Supervisor Call instruc- the Condition Register Logical instructions tions. mcrf isync (ics in POWER) 1 POWER provides a version of the Supervisor Call instruction (bit 30 = 0) that allows instruction fetch- Power ISA instructions for which bit 31 is the Rc bit in ing to continue at any one of 128 locations. It is POWER: used for "fast SVCs". Power ISA provides no such fixed-point X-form Load and Store instructions version: if bit 30 of the instruction is 0 the instruc- fixed-point X-form Compare instructions tion form is invalid. the X-form Trap instruction 1 POWER provides a version of the Supervisor Call mtspr, mfspr, mtcrf, mcrxr, mfcr, mtocrf, mfo- instruction (bits 30:31 = 0b11) that resumes crf instruction fetching at one location and sets the floating-point X-form Load and Store instructions Link Register to the address of the next instruction. floating-point Compare instructions Power ISA provides no such version: bit 31 is a mcrfs reserved field. dcbz (dclz in POWER) 1 For POWER, information from the MSR is saved in the Count Register. For Power ISA this information A.8 BO Field is saved in SRR1. 1 In POWER bits 16:19 and 27:29 of the instruction POWER shows certain bits in the BO field -- used by comprise defined instruction fields or a portion Branch Conditional instructions -- as "x". Although the thereof, while in Power ISA these bits comprise POWER Architecture does not say how these bits are reserved fields. to be interpreted, they are in fact ignored by the proces- sor. 848 Power ISATM Appendices Version 2.05 1 In POWER bits 20:26 of the instruction comprise a 1 If the EA is not word-aligned, in Power ISA either portion of the SV field, while in Power ISA these an Alignment exception occurs or the addressed bits comprise the LEV field. bytes are loaded, while in POWER an Alignment interrupt occurs if MSRAL=1 (the low-order two bits 1 POWER saves the low-order 16 bits of the instruc- of the EA are ignored if MSRAL=0). tion, in the Count Register. Power ISA does not save them. 1 In Power ISA the instruction may be interrupted by a system-caused interrupt, while in POWER the 1 The settings of MSR bits by the associated inter- instruction cannot be thus interrupted. rupt differ between POWER and Power ISA; see POWER Processor Architecture and Book III. A.16 Move Assist Instructions A.12 Fixed-Point Exception There are several respects in which Power ISA is Register (XER) incompatible with POWER for Move Assist instructions. 1 In Power ISA an lswx instruction with zero length Bits 48:55 of the XER are reserved in Power ISA, while leaves the contents of RT undefined (if RTRA and in POWER the corresponding bits (16:23) are defined RTRB) or is an invalid instruction form (if RT=RA and contain the comparison byte for the lscbx instruc- or RT=RB), while in POWER the corresponding tion (which Power ISA lacks). instruction (lsx) is a no-op in these cases. 1 In Power ISA an lswx instruction with zero length A.13 Update Forms of Storage may alter the Reference bit, and a stswx instruc- tion with zero length may alter the Reference and Access Instructions Change bits, while in POWER the corresponding instructions (lsx and stsx) do not alter the Refer- Power ISA requires that RA not be equal to either RT ence and Change bits in this case. (fixed-point Load only) or 0. If the restriction is violated the instruction form is invalid. POWER permits these 1 In Power ISA a Move Assist instruction may be cases, and simply avoids saving the EA. interrupted by a system-caused interrupt, while in POWER the instruction cannot be thus interrupted. A.14 Multiple Register Loads A.17 Move To/From SPR Power ISArequires that RA, and RB if present in the instruction format, not be in the range of registers to be There are several respects in which Power ISA is loaded, while POWER permits this and does not alter incompatible with POWER for Move To/From Special RA or RB in this case. (The Power ISA restriction Purpose Register instructions. applies even if RA=0, although there is no obvious ben- 1 The SPR field is ten bits long in Power ISA, but efit to the restriction in this case since RA is not used to only five in POWER (see also Section A.3, compute the effective address if RA=0.) If the Power "Reserved Fields in Instructions"). ISA restriction is violated, either the system illegal instruction error handler is invoked or the results are 1 mfspr can be used to read the Decrementer in boundedly undefined. The instructions affected are: problem state in POWER, but only in privileged state in Power ISA. lmw (lm in POWER) lswi (lsi in POWER) 1 If the SPR value specified in the instruction is not lswx (lsx in POWER) one of the defined values, POWER behaves as fol- lows. For example, an lmw instruction that loads all 32 regis- - If the instruction is executed in problem state ters is valid in POWER but is an invalid form in Power and SPR0=1, a Privileged Instruction type ISA. Program interrupt occurs. No architected reg- isters are altered except those set by the inter- rupt. A.15 Load/Store Multiple - Otherwise no architected registers are altered. Instructions In this same case, Power ISA behaves as follows. - If the instruction is executed in problem state There are two respects in which Power ISA is incom- and spr0=1, a Privileged Instruction type Pro- patible with POWER for Load Multiple and Store Multi- gram interrupt occurs. No architected regis- ple instructions. ters are altered except those set by the interrupt. Appendix A. Incompatibilities with the POWER Architecture 849 Version 2.05 - (See Section 4.4.5, "Move To/From System A.22 Synchronization Register Instructions" in Book III-S.). The Synchronize instruction (called dcs in POWER) and the isync instruction (called ics in POWER) cause A.18 Effects of Exceptions on more pervasive synchronization in Power ISA than in FPSCR Bits FR and FI POWER. However, unlike dcs, Synchronize does not wait until data cache block writes caused by preceding For the following cases, POWER does not specify how instructions have been performed in main storage. FR and FI are set, while Power ISA preserves them for Also, Synchronize has an L field while dcs does not, Invalid Operation Exception caused by a Compare and some uses of the instruction by the operating sys- instruction, sets FI to 1 and FR to an undefined value tem require L=2. (The L field corresponds to for disabled Overflow Exception, and clears them oth- reserved bits in dcs and hence is expected to be zero erwise. in POWER programs; see Section A.3.) 1 Invalid Operation Exception (enabled or disabled) 1 Zero Divide Exception (enabled or disabled) 1 Disabled Overflow Exception A.23 Move To Machine State Register Instruction A.19 Store Floating-Point Sin- The mtmsr instruction has an L field in Power ISA but gle Instructions not in POWER. The function of the variant of mtmsr with L=1 differs from the function of the instruction in There are several respects in which Power ISA is the POWER architecture in the following ways. incompatible with POWER for Store Floating-Point Sin- 1 In Power ISA, this variant of mtmsr modifies only gle instructions. the EE and RI bits of the MSR, while in the 1 POWER uses FPSCRUE to help determine POWER mtmsr modifies all bits of the MSR. whether denormalization should be done, while 1 This variant of mtmsr is execution synchronizing Power ISA does not. Using FPSCRUE is in fact in Power ISA but is context synchronizing in incorrect: if FPSCRUE=1 and a denormalized sin- POWER. (The POWER architecture lacks Power gle-precision number is copied from one storage ISA's distinction between execution synchroniza- location to another by means of lfs followed by tion and context synchronization. The statement in stfs, the two "copies" may not be the same. the POWER architecture specification that mtmsr is "synchronizing" is equivalent to stating that the 1 For an operand having an exponent that is less instruction is context synchronizing.) than 874 (unbiased exponent less than -149), POWER stores a zero (if FPSCRUE=0) while Also, mtmsr is optional in Power ISA but required in Power ISA stores an undefined value. POWER. A.20 Move From FPSCR A.24 Direct-Store Segments POWER defines the high-order 32 bits of the result of POWER's direct-store segments are not supported in mffs to be 0xFFFF_FFFF, while Power ISA copies the Power ISA. high-order 32-bits of the FPSCR. A.25 Segment Register A.21 Zeroing Bytes in the Data Manipulation Instructions Cache The definitions of the four Segment Register Manipula- The dclz instruction of POWER and the dcbz instruc- tion instructions mtsr, mtsrin, mfsr, and mfsrin differ tion of Power ISA have the same opcode. However, the in two respects between POWER and Power ISA. functions differ in the following respects. Instructions similar to mtsrin and mfsrin are called 1 dclz clears a line while dcbz clears a block. mtsri and mfsri in POWER. 1 dclz saves the EA in RA (if RA0) while dcbz does privilege: mfsr and mfsri are problem state instruc- not. tions in POWER, while mfsr and mfsrin 1 dclz is privileged while dcbz is not. are privileged in Power ISA. function: the "indirect" instructions (mtsri and mfsri) in POWER use an RA register in computing the Segment Register number, and the computed EA is stored into RA (if 850 Power ISATM Appendices Version 2.05 RA0 and RART), while in Power ISA Both the RTC and the TB are 64-bit Special Purpose mtsrin and mfsrin have no RA field and Registers, but they differ in the following respects. the EA is not stored. 1 The RTC counts seconds and nanoseconds, while mtsr, mtsrin (mtsri), and mfsr have the same the TB counts "ticks". The ticking rate of the TB is opcodes in Power ISA as in POWER. mfsri (POWER) implementation-dependent. and mfsrin (Power ISA) have different opcodes. 1 The RTC increments discontinuously: 1 is added to RTCU when the value in RTCL passes Also, the Segment Register Manipulation instructions 999_999_999. The TB increments continuously: 1 are required in POWER whereas they are optional in is added to TBU when the value in TBL passes Power ISA. 0xFFFF_FFFF. 1 The RTC is written and read by the mtspr and mfspr instructions, using SPR numbers that A.26 TLB Entry Invalidation denote the RTCU and RTCL. The TB is written and read by the same instructions using different SPR The tlbi instruction of POWER and the tlbie instruction numbers. of Power ISA have the same opcode. However, the 1 The SPR numbers that denote POWER's RTCL functions differ in the following respects. and RTCU are invalid in Power ISA. 1 tlbi computes the EA as (RA|0) + (RB), while tlbie 1 The RTC is guaranteed to increment at least once lacks an RA field and computes the EA and related in the time required to execute ten Add Immediate information as (RB). instructions. No analogous guarantee is made for 1 tlbi saves the EA in RA (if RA0), while tlbie lacks the TB. an RA field and does not save the EA. 1 Not all bits of RTCL need be implemented, while 1 For tlbi the high-order 36 bits of RB are used in all bits of the TB must be implemented. computing the EA, while for tlbie these bits contain additional information that is not directly related to the EA. A.29.2 Decrementer 1 tlbie has an L field, while tlbi does not. The Power ISA Decrementer differs from the POWER Also, tlbi is required in POWER whereas tlbie is Decrementer in the following respects. optional in Power ISA. 1 The Power ISA DEC decrements at the same rate that the TB increments, while the POWER DEC decrements every nanosecond (which is the same A.27 Alignment Interrupts rate that the RTC increments). 1 Not all bits of the POWER DEC need be imple- Placing information about the interrupting instruction mented, while all bits of the Power ISA DEC must into the DSISR and the DAR when an Alignment inter- be implemented. rupt occurs is optional in Power ISA but required in 1 The interrupt caused by the DEC has its own inter- POWER. rupt vector location in Power ISA, but is consid- ered an External interrupt in POWER. A.28 Floating-Point Interrupts POWER uses MSR bit 20 to control the generation of interrupts for floating-point enabled exceptions, and Power ISA uses the corresponding MSR bit, bit 52, for the same purpose. However, in Power ISA this bit is part of a two-bit value that controls the occurrence, pre- cision, and recoverability of the interrupt, while in POWER this bit is used independently to control the occurrence of the interrupt (in POWER all floating-point interrupts are precise). A.29 Timing Facilities A.29.1 Real-Time Clock The POWER Real-Time Clock is not supported in Power ISA. Instead, Power ISA provides a Time Base. Appendix A. Incompatibilities with the POWER Architecture 851 Version 2.05 A.30 Deleted Instructions MNEM PRI XOP The following instructions are part of the POWER abs 31 360 Architecture but have been dropped from the Power clcs 31 531 ISA. clf 31 118 cli (*) 31 502 abs Absolute dclst 31 630 clcs Cache Line Compute Size div 31 331 clf Cache Line Flush divs 31 363 cli (*) Cache Line Invalidate doz 31 264 dclst Data Cache Line Store dozi 09 - div Divide lscbx 31 277 divs Divide Short maskg 31 29 doz Difference Or Zero maskir 31 541 dozi Difference Or Zero Immediate mfsri 31 627 lscbx Load String And Compare Byte Indexed mul 31 107 maskg Mask Generate nabs 31 488 maskir Mask Insert From Register rac (*) 31 818 mfsri Move From Segment Register Indirect rfi (*) 19 50 mul Multiply rfsvc 19 82 nabs Negative Absolute rlmi 22 - rac (*) Real Address Compute rrib 31 537 rfi (*) Return From Interrupt sle 31 153 rfsvc Return From SVC sleq 31 217 rlmi Rotate Left Then Mask Insert sliq 31 184 rrib Rotate Right And Insert Bit slliq 31 248 sle Shift Left Extended sllq 31 216 sleq Shift Left Extended With MQ slq 31 152 sliq Shift Left Immediate With MQ sraiq 31 952 slliq Shift Left Long Immediate With MQ sraq 31 920 sllq Shift Left Long With MQ sre 31 665 slq Shift Left With MQ srea 31 921 sraiq Shift Right Algebraic Immediate With MQ sreq 31 729 sraq Shift Right Algebraic With MQ sriq 31 696 sre Shift Right Extended srliq 31 760 srea Shift Right Extended Algebraic srlq 31 728 sreq Shift Right Extended With MQ srq 31 664 sriq Shift Right Immediate With MQ srliq Shift Right Long Immediate With MQ (*) This instruction is privileged. srlq Shift Right Long With MQ srq Shift Right With MQ Assembler Note It might be helpful to current software writers for the (*) This instruction is privileged. Assembler to flag the discontinued POWER Note: Many of these instructions use the MQ register. instructions. The MQ is not defined in the Power ISA. A.31 Discontinued Opcodes The opcodes listed below are defined in the POWER Architecture but have been dropped from the Power ISA. The list contains the POWER mnemonic (MNEM), the primary opcode (PRI), and the extended opcode (XOP) if appropriate. The corresponding instructions are reserved in Power ISA. 852 Power ISATM Appendices Version 2.05 A.32 POWER2 Compatibility The POWER2 instruction set is a superset of the section, as are the new POWER2 instructions that are POWER instruction set. Some of the instructions added not included in the Power ISA. for POWER2 are included in the Power ISA. Those that Other incompatibilities are also listed. have been renamed in the Power ISA are listed in this A.32.1 Cross-Reference for monic in the second column of the table: the remainder of the line gives the Power ISA mnemonic and the page Changed POWER2 Mnemonics on which the instruction is described, as well as the instruction names. The following table lists the new POWER2 instruction mnemonics that have been changed in the Power ISA POWER2 mnemonics that have not changed are not User Instruction Set Architecture, sorted by POWER2 listed. mnemonic. To determine the Power ISA mnemonic for one of these POWER2 mnemonics, find the POWER2 mne- POWER2 Power ISA Page Mnemonic Instruction Mnemonic Instruction 135 fcir[.] Floating Convert Double to Inte- fctiw[.] Floating Convert To Integer Word ger with Round 136 fcirz[.] Floating Convert Double to Inte- fctiwz[.] Floating Convert To Integer Word ger with Round to Zero with round toward Zero A.32.2 Load/Store Floating-Point A.32.3 Floating-Point Conversion Double to Integer Several of the opcodes for the Load/Store Floating- The fcir and fcirz instructions of POWER2 have the Point Quad instructions of the POWER2 architecture same opcodes as do the fctiw and fctiwz instructions, have been reclaimed by the Load/Store Foating-Point respectively, of Power ISA. However, the functions dif- Double [Indexed] instructions (entries with a '-' in the fer in the following respects. Power ISA column have not been reclaimed): 1 fcir and fcirz set the high-order 32 bits of the tar- MNEMONIC get FPR to 0xFFFF_FFFF, while fctiw and fctiwz POWER2 POWER ISA PRI XOP set them to an undefined value. 1 Except for enabled Invalid Operation Exceptions, lfq lq 56 - fcir and fcirz set the FPRF field of the FPSCR lfqu lfdp 57 0 based on the result, while fctiw and fctiwz set it to lfqux - 31 823 an undefined value. lfqx lfdpx 31 791 1 fcir and fcirz do not affect the VXSNAN bit of the stfq - 60 - FPSCR, while fctiw and fctiwz do. stfqu stfdp 61 - 1 fcir and fcirz set FPSCRXX to 1 for certain cases stfqux - 31 951 of "Large Operands" (i.e., operands that are too stfqx stfdpx 31 919 large to be represented as a 32-bit signed fixed- Differences between the l/stfdp[x] instructions and the point integer), while fctiw and fctiwz do not alter it POWER2 l/stfq[u][x] instructions include the following. for any case of "Large Operand". (The IEEE stan- 1 The storage operand for the l/stfdp[x] instructions dard requires not altering it for "Large Operands".) must be quadword aligned for optimal perfor- mance. 1 The register pairs for the l/stfdp[x] instructions must be even-odd pairs, instead of any consecu- tive pair. 1 The l/stfdp[x] instructions do not have update forms. Appendix A. Incompatibilities with the POWER Architecture 853 Version 2.05 A.32.4 Floating-Point Interrupts POWER2 uses MSR bits 20 and 23 to control the gen- eration of interrupts for floating-point enabled excep- tions, and Power ISA uses the corresponding MSR bits, bits 52 and 55, for the same purpose. However, in Power ISA these bits comprise a two-bit value that con- trols the occurrence, precision, and recoverability of the interrupt, while in POWER2 these bits are used inde- pendently to control the occurrence (bit 20) and the precision (bit 23) of the interrupt. Moreover, in Power ISA all floating-point interrupts are considered Program interrupts, while in POWER2 imprecise floating-point interrupts have their own interrupt vector location. A.32.5 Trace The Trace interrupt vector location differs between the two architectures, and there are many other differ- ences. A.33 Deleted Instructions The following instructions are new in POWER2 imple- mentations of the POWER Architecture but have been dropped from the Power ISA. lfq Load Floating-Point Quad lfqu Load Floating-Point Quad with Update lfqux Load Floating-Point Quad with Update Indexed lfqx Load Floating-Point Quad Indexed stfq Store Floating-Point Quad stfqu Store Floating-Point Quad with Update stfqux Store Floating-Point Quad with Update Indexed stfqx Store Floating-Point Quad Indexed A.33.1 Discontinued Opcodes The opcodes listed below are new in POWER2 imple- mentations of the POWER Architecture but have been dropped from the Power ISA. The list contains the POWER2 mnemonic (MNEM), the primary opcode (PRI), and the extended opcode (XOP) if appropriate. The instructions are either illegal or reserved in Power ISA; see Appendix D. MNEM PRI XOP lfq 56 - lfqx 31 791 stfqx 31 919 854 Power ISATM Appendices Version 2.05 Appendix B. Platform Support Requirements As described in Chapter 1 of Book I, the architecture is structured as a collection of categories. Each category is comprised of facilities and/or instructions that together provide a unit of functionality. The Server and Embedded categories are referred to as "special" because all implementations must support at least one of these categories. Each special category, when taken together with the Base category, is referred to as an "environment", and provides the minimum functionality required to develop operating systems and applica- tions. Every processor implementation supports at least one of the environments, and may also support a set of cat- egories chosen based on the target market for the implementation. However, a Server implementation supports only those categories designated as part of the Server platform in Figure 20. To facilitate the devel- opment of operating systems and applications for a well-defined purpose or customer set, usually embod- ied in a unique hardware platform, this appendix docu- ments the association between a platform and the set of categories it requires. Adding a new platform may permit cost-performance optimization by clearly identifying a unique set of cate- gories. However, this has the potential to fragment the application base. As a result, new platforms will be added only when the optimization benefit clearly out- weighs the loss due to fragmentation. The platform support requirements are documented in Figure 20. An "x" in a column indicates that the cate- gory is required. A "+" in a column indicates that the requirement is being phased in. Appendix B. Platform Support Requirements 855 Version 2.05 Category Server Plat- Embedded form Platform Base x x Server x Embedded x Alternate Time Base BCD Assistance Cache Specification Decimal Floating-Point x2 Embedded.Cache Debug Embedded.Cache Initialization Embedded.Enhanced Debug Embedded.External PID Embedded.Little-Endian Embedded.MMU Type FSL Embedded.Performance Monitor Embedded.Processor Control Embedded Cache Locking External Control External Proxy Floating-Point x Floating-Point.Record x Hypervisor Emulation Assistance Legacy Move Assist Legacy Integer Multiply-Accumulate Load/Store Quadword x3 Memory Coherence x Move Assist x Processor Compatibility Server.Performance Monitor x Signal Processing Engine SPE.Embedded Float Scalar Double SPE.Embedded Float Scalar Single SPE.Embedded Float Vector Stream x Trace x Variable Length Encoding Vector + Vector.Little-Endian +1 Wait 1. If the Vector category is supported, Vector.Little-Endian is required on the Server platform. 2. The Decimal Floating-Point category may be emulated through sup- port for the BCD Assistance and Hypervisor Emulation Assistance cat- egories. 3. Optional for the Server Platform. Figure 20. Platform Support Requirements (Sheet 1 of 2) 856 Power ISATM Appendices Version 2.05 Category Server Plat- Embedded form Platform 64-Bit x 1. If the Vector category is supported, Vector.Little-Endian is required on the Server platform. 2. The Decimal Floating-Point category may be emulated through sup- port for the BCD Assistance and Hypervisor Emulation Assistance cat- egories. 3. Optional for the Server Platform. Figure 20. Platform Support Requirements (Sheet 2 of 2) Programming Note The requirement to support the Hypervisor Emula- tion Assistance and BCD Assistance categories if the Decimal Floating-Point category is emulated is to provide a minimum level of performance. Appendix B. Platform Support Requirements 857 Version 2.05 858 Power ISATM Appendices Version 2.05 Appendix C. Complete SPR List This appendix lists all the Special Purpose Registers in the Power ISA , ordered by SPR number. SPR1 Register Privileged Length decimal Cat2 spr5:9 spr0:4 Name mtspr mfspr (bits) 1 00000 00001 XER no no 64 B 8 00000 01000 LR no no 64 B 9 00000 01001 CTR no no 64 B 17 00000 10001 DSCR yes yes 64 S 18 00000 10010 DSISR yes yes 32 S 19 00000 10011 DAR yes yes 64 S 22 00000 10110 DEC yes yes 32 B 25 00000 11001 SDR1 hypv3 hypv3 64 S 26 00000 11010 SRR0 yes yes 64 B 27 00000 11011 SRR1 yes yes 64 B 28 00000 11100 CFAR yes yes 64 S 29 00000 11101 AMR yes yes 64 S 48 00001 10000 PID yes yes 32 E 54 00001 10110 DECAR yes yes 32 E 58 00001 11010 CSRR0 yes yes 64 E 59 00001 11011 CSRR1 yes yes 32 E 61 00001 11101 DEAR yes yes 64 E 62 00001 11110 ESR yes yes 32 E 63 00001 11111 IVPR yes yes 64 E 136 00100 01000 CTRL - no 32 S 152 00100 11000 CTRL yes - 32 S 256 01000 00000 VRSAVE no no 32 E,V 259 01000 00011 SPRG3 - no 64 B 260-263 01000 001xx SPRG[4-7] - no 64 E 268 01000 01100 TB - no 64 B 269 01000 01101 TBU - no 32 B 272-275 01000 100xx SPRG[0-3] yes yes 64 B 276-279 01000 101xx SPRG[4-7] yes yes 64 E 282 01000 11010 EAR hypv3 hypv3 32 EC 284 01000 11100 TBL hypv4 - 32 B 285 01000 11101 TBU hypv4 - 32 B 286 01000 11110 TBU40 hypv - 64 S 286 01000 11110 PIR - yes 32 E 287 01000 11111 PVR - yes 32 B 304 01001 10000 HSPRG0 hypv3 hypv3 64 S 304 01001 10000 DBSR yes5 yes 32 E 305 01001 10001 HSPRG1 hypv3 hypv3 64 S 306 01001 10010 HDSISR hypv3 hypv3 32 B 307 01001 10011 HDAR hypv3 hypv3 64 B Figure 21. SPR Numbers (Sheet 1 of 3) Appendix C. Complete SPR List 859 Version 2.05 SPR1 Register Privileged Length decimal Cat2 spr5:9 spr0:4 Name mtspr mfspr (bits) 308 01001 10100 DBCR0 yes yes 32 E 308 01001 10100 SPURR hypv3 yes 64 S 309 01001 10101 PURR hypv3 yes 64 S 309 01001 10101 DBCR1 yes yes 32 E 310 01001 10110 HDEC hypv3 hypv3 32 S 310 01001 10110 DBCR2 yes yes 32 E 312 01001 11000 RMOR hypv3 hypv3 64 S 312 01001 11000 IAC1 yes yes 64 E 313 01001 11001 HRMOR hypv3 hypv3 64 S 313 01001 11001 IAC2 yes yes 64 E 314 01001 11010 HSRR0 hypv3 hypv3 64 S 314 01001 11010 IAC3 yes yes 64 E 315 01001 11011 HSRR1 hypv3 hypv3 64 S 315 01001 11011 IAC4 yes yes 64 E 316 01001 11100 DAC1 yes yes 64 E 317 01001 11101 DAC2 yes yes 64 E 318 01001 11110 LPCR hypv3 hypv3 64 S 318 01001 11110 DVC1 yes yes 64 E 319 01001 11111 LPIDR hypv3 hypv3 32 S 319 01001 11111 DVC2 yes yes 64 E 336 01010 10000 TSR yes5 yes 32 E 336 01010 10000 HMER hypv3,8 hypv3 64 S 337 01010 10001 HMEER hypv3 hypv3 64 S 338 01010 10010 PCR hypv3 hypv3 64 S 339 01010 10011 HEIR hypv3 hypv3 32 HEA 340 01010 10100 TCR yes yes 32 E 400-415 01100 1xxxx IVOR[0-15] yes yes 32 E 512 10000 00000 SPEFSCR no no 32 SP 526 10000 01110 ATB/ATBL - no 64 ATB 527 10000 01111 ATBU - no 32 ATB 528 10000 10000 IVOR32 yes yes 32 SP 529 10000 10001 IVOR33 yes yes 32 SP 530 10000 10010 IVOR34 yes yes 32 SP 531 10000 10011 IVOR35 yes yes 32 E.PM 532 10000 10100 IVOR36 yes yes 32 E.PC 533 10000 10101 IVOR37 yes yes 32 E.PC 570 10001 11010 MCSRR0 yes yes 64 E 571 10001 11011 MCSRR1 yes yes 32 E 572 10001 11100 MCSR yes yes 64 E 574 10001 11110 DSRR0 yes yes 64 E.ED 575 10001 11111 DSRR1 yes yes 32 E.ED 604 10010 11100 SPRG8 yes yes 64 E 605 10010 11101 SPRG9 yes yes 64 E.ED 624 10011 10000 MAS0 yes yes 32 E.MF 625 10011 10001 MAS1 yes yes 32 E.MF 626 10011 10010 MAS2 yes yes 64 E.MF 627 10011 10011 MAS3 yes yes 32 E.MF 628 10011 10100 MAS4 yes yes 32 E.MF 630 10011 10110 MAS6 yes yes 32 E.MF 633 10011 11001 PID1 yes yes 32 E.MF 634 10011 11010 PID2 yes yes 32 E.MF 688-691 10101 100xx TLB[0-3]CFG yes yes 32 E.MF 702 10101 11110 EPR - yes 32 EXP 768-783 11000 0xxxx perf_mon - no 64 S.PM Figure 21. SPR Numbers (Sheet 2 of 3) 860 Power ISATM Appendices Version 2.05 SPR1 Register Privileged Length decimal Cat2 spr5:9 spr0:4 Name mtspr mfspr (bits) 784-799 11000 1xxxx perf_mon varies yes 64 S.PM 896 11100 00000 PPR no no 64 S 924 11100 11100 DCDBTRL -6 yes 32 E.CD 925 11100 11101 DCDBTRH -6 yes 32 E.CD 926 11100 11110 ICDBTRL -7 yes 32 E.CD 927 11100 11111 ICDBTRH -7 yes 32 E.CD 944 11101 10000 MAS7 yes yes 32 E.MF 947 11101 10011 EPLC yes yes 32 E.PD 948 11101 10100 EPSC yes yes 32 E.PD 979 11110 10011 ICDBDR -7 yes 32 E.CD 1012 11111 10100 MMUCSR0 yes yes 32 E.MF 1013 11111 10101 DABR hypv3 hypv3 64 S 1015 11111 10111 DABRX hypv3 hypv3 64 S 1015 11111 10111 MMUCFG yes yes 32 E.MF 1023 11111 11111 PIR - yes 32 S - This register is not defined for this instruction. 1 Note that the order of the two 5-bit halves of the SPR number is reversed. 2 See Section 1.3.5 of Book I. 3 This register is a hypervisor resource, and can be modified by this instruc- tion only in hypervisor state (see Chapter 2 of Book III-S). 4 This register is a hypervisor resource, and can be modified by this instruction only in hypervisor state (see Chapter 2 of Book III-S). This register is privileged. 5 This register cannot be directly written to. Instead, bits in the register corre- sponding to 1 bits in (RS) can be cleared using mtspr SPR,RS. 6 The register can be written by the dcread instruction. 7 The register can be written by the icread instruction. 8 This register cannot be directly written. Instead, bits in the register corre- sponding to 0 bits in (RS) can be cleared using mtspr SPR,RS. All SPR numbers that are not shown above and are not implementation-spe- cific are reserved. Figure 21. SPR Numbers (Sheet 3 of 3) Appendix C. Complete SPR List 861 Version 2.05 862 Power ISATM Appendices Version 2.05 Appendix D. Illegal Instructions With the exception of the instruction consisting entirely of binary 0s, the instructions in this class are available for future extensions of the Power ISA; that is, some future version of the Power ISA may define any of these instructions to perform new functions. The following primary opcodes are illegal. 1, 5, 6, 60 The following primary opcodes have unused extended opcodes. Their unused extended opcodes can be determined from the opcode maps in Appendix F of Book Appendices. All unused extended opcodes are illegal. 4, 19, 30, 31, 56, 57, 58, 59, 62, 63 An instruction consisting entirely of binary 0s is illegal, and is guaranteed to be illegal in all future versions of this architecture. Appendix D. Illegal Instructions 863 Version 2.05 864 Power ISATM Appendices Version 2.05 Appendix E. Reserved Instructions The instructions in this class are allocated to specific purposes that are outside the scope of the Power ISA. The following types of instruction are included in this class. 1. The instruction having primary opcode 0, except the instruction consisting entirely of binary 0s (which is an illegal instruction; see Section 1.7.2, "Illegal Instruction Class" on page 21) and the extended opcode shown below. 256 Service Processor "Attention" 2. Instructions for the POWER Architecture that have not been included in the Power ISA. These are listed in Section A.31, "Discontinued Opcodes" and Section A.33.1, "Discontinued Opcodes". 3. Implementation-specific instructions used to con- form to the Power ISA specification. 4. Any other implementation-dependent instructions that are not defined in the Power ISA. Appendix E. Reserved Instructions 865 Version 2.05 866 Power ISATM Appendices Version 2.05 Appendix F. Opcode Maps This appendix contains tables showing the opcodes reserved because it is "overlaid", by a fixed-point and extended opcodes. or Storage Access instruction having only a pri- mary opcode, by an instruction having an For the primary opcode table (Table 3 on page 868), extended opcode in primary opcode 30, 58, or 62, each cell is in the following format. or by a potential instruction in any of the categories just mentioned. The overlaying instruction, if any, Opcode in Opcode in is also shown. A cell thus reserved should not be Decimal Hexadecimal assigned to an instruction having primary opcode 31. (The overlaying is a consequence of opcode Instruction decoding for fixed-point instructions: the primary Mnemonic opcode, and the extended opcode if any, are mapped internally to a 10-bit "compressed Category Instruction opcode" for ease of subsequent decoding.) Format 1 Parentheses around the opcode or extended The category abbreviations are shown on Section 1.3.5 opcode mean that the instruction was defined in of Book I. However, the categories "Phased-In", earlier versions of the Power ISA but is no longer "Phased-Out", and floating-point "Record" are not listed defined in the Power ISA. in the opcode tables. 1 Curly brackets around the opcode or extended opcode mean that the instruction will be defined in The extended opcode tables show the extended future versions of the Power ISA. opcode in decimal, the instruction mnemonic, the cate- gory, and the instruction format. These tables appear in 1 long is used as filler for mnemonics that are longer order of primary opcode within three groups. The first than a table cell. group consists of the primary opcodes that have small An empty cell, a cell containing only an asterisk, or a extended opcode fields (2-4 bits), namely 30, 58, and cell in which the opcode or extended opcode is paren- 62. The second group consists of primary opcodes that thesized, corresponds to an illegal instruction. have 11-bit extended opcode fields. The third group consists of primary opcodes that have 10-bit extended The instruction consisting entirely of binary 0s causes opcode fields. The tables for the second and third the system illegal instruction error handler to be groups are rotated. invoked for all members of the POWER family, and this is likely to remain true in future models (it is guaranteed In the extended opcode tables several special mark- in the Power ISA). An instruction having primary ings are used. opcode 0 but not consisting entirely of binary 0s is 1 A prime (`) following an instruction mnemonic reserved except for the following extended opcode denotes an additional cell, after the lowest-num- (instruction bits 21:30). bered one, used by the instruction. For example, 256 Service Processor "Attention" (Power ISA subfc occupies cells 8 and 520 of primary opcode only) 31, with the former corresponding to OE=0 and the latter to OE=1. Similarly, sradi occupies cells 826 and 827, with the former corresponding to sh5=0 and the latter to sh5=1 (the 9-bit extended opcode 413, shown on page 91, excludes the sh5 bit). 1 Two vertical bars (||) are used instead of primed mnemonics when an instruction occupies an entire column of a table. The instruction mnemonic is repeated in the last cell of the column. 1 For primary opcode 31, an asterisk (*) in a cell that would otherwise be empty means that the cell is Appendix F. Opcode Maps 867 Version 2.05 Table 3: Primary opcodes 0 00 1 01 2 02 3 03 See primary opcode 0 extensions on page 867 Illegal, tdi twi Reserved Trap Doubleword Immediate 64 D B D Trap Word Immediate 4 04 5 05 6 06 7 07 See Table 8 and Table 9 Vector, LMA, mulli SP V, LMA, SP BD Multiply Low Immediate 8 08 9 09 10 0A 11 0B Subtract From Immediate Carrying subfic cmpli cmpi Compare Logical Immediate B D B D B D Compare Immediate 12 0C 13 0D 14 0E 15 0F Add Immediate Carrying addic addic. addi addis Add Immediate Carrying and Record Add Immediate B D B D B D B D Add Immediate Shifted 16 10 17 11 18 12 19 13 Branch Conditional bc sc b CR ops, System Call etc. Branch B B B SC B I XL See Table 11 on page 880 20 14 21 15 22 16 23 17 Rotate Left Word Imm. then Mask Insert rlwimi rlwinm rlwnm Rotate Left Word Imm. then AND with Mask B M B M B M Rotate Left Word then AND with Mask 24 18 25 19 26 1A 27 1B OR Immediate ori oris xori xoris OR Immediate Shifted XOR Immediate B D B D B D B D XOR Immediate Shifted 28 1C 29 1D 30 1E 31 1F AND Immediate andi. andis. FX Dwd Rot FX AND Immediate Shifted Extended Ops See Table 4 on page 869 B D B D MD[S] See Table 11 on page 880 32 20 33 21 34 22 35 23 Load Word and Zero lwz lwzu lbz lbzu Load Word and Zero with Update Load Byte and Zero B D B D B D B D Load Byte and Zero with Update 36 24 37 25 38 26 39 27 Store Word stw stwu stb stbu Store Word with Update Store Byte B D B D B D B D Store Byte with Update 40 28 41 29 42 2A 43 2B Load Half and Zero lhz lhzu lha lhau Load Half and Zero with Update Load Half Algebraic B D B D B D B D Load Half Algebraic with Update 44 2C 45 2D 46 2E 47 2F Store Half sth sthu lmw stmw Store Half with Update Load Multiple Word B D B D B D B D Store Multiple Word 48 30 49 31 50 32 51 33 Load Floating-Point Single lfs lfsu lfd lfdu Load Floating-Point Single with Update Load Floating-Point Double FP D FP D FP D FP D Load Floating-Point Double with Update 52 34 53 35 54 36 55 37 Store Floating-Point Single stfs stfsu stfd stfdu Store Floating-Point Single with Update Store Floating-Point Double FP D FP D FP D FP D Store Floating-Point Double with Update 56 38 57 39 58 3A 59 3B Load Quadword lq FX DS-form FP Single See Table 5 on page 869 Loads & DFP Ops See Table 6 on page 869 LSQ DQ DS See Table 16 on page 884 60 3C 61 3D 62 3E 63 3F stfdp FX DS-form FP Double Store Floating-Point Double Pair Stores &DFP Ops See Table 7 on page 869 FP DS DS See Table 17 on page 886 868 Power ISATM Appendices Version 2.05 Table 4: Extended opcodes for primary opcode 30 (instruction bits 27:30) 00 01 10 11 0 1 2 3 rldicl rldicl' rldicr rldicr' 00 64 64 MD MD MD MD 4 5 6 7 rldic rldic' rldimi rldimi' 01 64 64 MD MD MD MD 8 9 rldcl rldcr 10 64 64 MDS MDS 11 Table 5: Extended opcodes for primary opcode 57 (instruction bits 30:31) 0 1 0 lfdp 0 FP DS 1 Table 6: Extended opcodes for primary opcode 58 (instruction bits 30:31) 0 1 0 1 ld ldu 0 64 64 DS DS 2 lwa 1 64 DS Table 7: Extended opcodes for primary opcode 62 (instruction bits 30:31) 0 1 0 1 std stdu 0 64 64 DS DS 2 stq 1 LSQ DS Appendix F. Opcode Maps 869 Version 2.05 Table 8: (Left) Extended opcodes for primary opcode 4 [Category: V & LMA] (instruction bits 21:31) 000000 000001 000010 000011 000100 000101 000110 000111 001000 001001 001010 001011 001100 001101 001110 001111 0 2 4 6 8 10 12 14 00000 vaddubm vmaxub vrlb vcmpequb vmuloub vaddfp vmrghb vpkuhum V VX V VX V VX V VC V VX V VX V VX V VX 64 66 68 70 72 74 76 78 00001 vadduhm vmaxuh vrlh vcmpequh vmulouh vsubfp vmrghh vpkuwum V VX V VX V VX V VC V VX V VX V VX V VX 128 130 132 134 140 142 00010 vadduwm vmaxuw vrlw vcmpequw vmrghw vpkuhus V VX V VX V VX V VC V VX V VX 198 206 00011 vcmpeqfp vpkuwus V VC V VX 258 260 264 266 268 270 00100 vmaxsb vslb vmulosb vrefp vmrglb vpkshus V VX V VX V VX V VX V VX V VX 322 324 328 330 332 334 00101 vmaxsh vslh vmulosh vrsqrtefp vmrglh vpkswus V VX V VX V VX V VX V VX V VX 384 386 388 394 396 398 00110 vaddcuw vmaxsw vslw vexptefp vmrglw vpkshss V VX V VX V VX V VX V VX V VX 452 454 458 462 00111 vsl vcmpgefp vlogefp vpkswss V VX V VC V VX V VX 512 514 516 518 520 522 524 526 01000 vaddubs vminub vsrb vcmpgtub vmuleub vrfin vspltb vupkhsb V VX V VX V VX V VC V VX V VX V VX V VX 576 578 580 582 584 586 588 590 01001 vadduhs vminuh vsrh vcmpgtuh vmuleuh vrfiz vsplth vupkhsh V VX V VX V VX V VC V VX V VX V VX V VX 640 642 644 646 650 652 654 01010 vadduws vminuw vsrw vcmpgtuw vrfip vspltw vupklsb V VX V VX V VX V VC V VX V VX V VX 708 710 714 718 01011 vsr vcmpgtfp vrfim vupklsh V VX V VC V VX V VX 768 770 772 774 776 778 780 782 01100 vaddsbs vminsb vsrab vcmpgtsb vmulesb vcuxwfp vspltisb vpkpx V VX V VX V VX V VC V VX V VX V VX V VX 832 834 836 838 840 842 844 846 01101 vaddshs vminsh vsrah vcmpgtsh vmulesh vcsxwfp vspltish vupkhpx V VX V VX V VX V VC V VX V VX V VX V VX 896 898 900 902 906 908 01110 vaddsws vminsw vsraw vcmpgtsw vcfpuxws vspltisw V VX V VX V VX V VC V VX V VX 966 970 974 01111 vcmpbfp vcfpsxws vupklpx V VC V VX V VX 1024 1026 1028 1030 1034 1036 10000 vsububm vavgub vand vcmpequb. vmaxfp vslo V VX V VX V VX V VC V VX V VX 1088 1090 1092 1094 1098 1100 10001 vsubuhm vavgub vandc vcmpequh. vminfp vsro V VX V VX V VX V VC V VX V VX 1152 1154 1156 1158 10010 vsubuwm vavgub vor vcmpequw. V VX V VX V VX V VC 1220 1222 10011 vxor vcmpeqfp. V VX V VC 1282 1284 10100 vavgsb vnor V VX V VX 1346 10101 vavgsb V VX 1408 1410 10110 vsubcuw vavgsb V VX V VX 1478 10111 vcmpgefp V VC 1536 1540 1542 1544 11000 vsububs mfvscr vcmpgtub. vsum4ubs V VX V VX V VC V VX 1600 1604 1606 1608 11001 vsubuhs mtvscr vcmpgtuh. vsum4shs V VX V VX V VC V VX 1664 1670 1672 11010 vsubuws vcmpgtuw. vsum2sws V VX V VC V VX 1734 11011 vcmpgtfp. V VC 1792 1798 1800 11100 vsubsbs vcmpgtsb. vsum4sbs V VX V VC V VX 1856 1862 11101 vsubshs vcmpgtsh. V VX V VC 1920 1926 1928 11110 vsubsws vcmpgtsw. vsumsws V VX V VC V VX 1990 11111 vcmpbfp. V VC 870 Power ISATM Appendices Version 2.05 Table 8 (Left-Center) Extended opcodes for primary opcode 4 [Category: V & LMA] (instruction bits 21:31) 010000 010001 010010 010011 010100 010101 010110 010111 011000 011001 011010 011011 011100 011101 011110 011111 16 17 24 24 00000 mulhhwu mulhhwu. machhwu long LMA XO LMA XO LMA XO LMA XO 80 81 88 89 92 93 00001 mullhw mullhw. machhw machhw. nmachhw long LMA XO LMA XO LMA XO LMA XO LMA XO LMA XO 152 153 00010 long long LMA XO LMA XO 216 217 220 220 00011 machhws long long long LMA XO LMA XO LMA XO LMA XO 272 273 280 281 00100 mulchwu mulchwu. macchwu long LMA X LMA X LMA XO LMA XO 336 337 344 345 348 349 00101 mulchw mulchw. macchw macchw. nmacchw long LMA X LMA X LMA XO LMA XO LMA XO LMA XO 408 409 00110 long long LMA XO LMA XO 472 473 476 477 00111 macchws long long long LMA XO LMA XO LMA XO LMA XO 01000 01001 01010 01011 784 784 792 793 01100 mullhwu mullhwu. maclhwu maclhwu. LMA X LMA X LMA XO LMA XO 848 849 856 857 860 861 01101 mullhw mullhw. maclhw maclhw. nmaclhw nmaclhw. LMA X LMA X LMA XO LMA XO LMA XO LMA XO 920 921 01110 long long LMA XO LMA XO 984 985 988 989 01111 maclhws maclhws. long long LMA XO LMA XO LMA XO LMA XO 1040 1041 1048 1049 10000 long long long long LMA XO LMA XO LMA XO LMA XO 1104 1105 1112 1113 1116 1117 10001 mullhwo. mullhwo. machhwo long long long LMA XO LMA XO LMA XO LMA XO LMA XO LMA XO 1176 1177 10010 long long LMA XO LMA XO 1240 1241 1244 1245 10011 long long long long LMA XO LMA XO LMA XO LMA XO 1304 1305 10100 long long LMA XO LMA XO 1368 1369 1372 1373 10101 macchwo long long long LMA XO LMA XO LMA XO LMA XO 1432 1433 10110 long long LMA XO LMA XO 1496 1497 1500 1501 10111 long long long long LMA XO LMA XO LMA XO LMA XO 11000 11001 11010 11011 1816 1817 11100 long long LMA XO LMA XO 1880 1881 1884 1885 11101 maclhwo maclhwo. long long LMA XO LMA XO LMA XO LMA XO 1944 1946 11110 long long LMA XO LMA XO 2008 2009 2012 2013 11111 long long long long LMA XO LMA XO LMA XO LMA XO 871 Version 2.05 Table 8 (Right-Center) Extended opcodes for primary opcode 4 [Category: V & LMA] (instruction bits 21:31) 100000 100001 100010 100011 100100 100101 100110 100111 101000 101001 101010 101011 101100 101101 101110 101111 32 32 34 36 37 38 39 40 41 42 43 44 46 47 00000 vmhaddshs vmhraddshs vmladduhm vmsumubm vmsummbm vmsumuhm vmsumuhs vmsumshm vmsumshs vsel vperm vsdoi vmaddfp vnmsubfp V VA V VA V VA V VA V VA V VA V VA V VA V VA V VA V VA V VA V VA V VA || || || || || || || || || || || || || || 00001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00111 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01000 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01111 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10000 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10111 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11000 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11111 || || || || || || || || || || || || || || vmhaddshs vmhraddshs vmladduhm vmsumubm vmsummbm vmsumuhm vmsumuhs vmsumshm vmsumshs vsel vperm vsdoi vmaddfp vnmsubfp 872 Power ISATM Appendices Version 2.05 Table 8 (Right) Extended opcodes for primary opcode 4 [Category: V & LMA] (instruction bits 21:31) 110000 110001 110010 110011 110100 110101 110110 110111 111000 111001 111010 111011 111100 111101 111110 111111 00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 01101 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111 873 Version 2.05 Table 9: (Left) Extended opcodes for primary opcode 4 [Category: SP.*] (instruction bits 21:31) 000000 000001 000010 000011 000100 000101 000110 000111 001000 001001 001010 001011 001100 001101 001110 001111 00000 00001 00010 00011 00100 00101 00110 00111 512 514 516 518 520 521 522 523 524 525 526 527 01000 evaddw evaddiw evsubfw evsubifw evabs evneg evextsb evextsh evrndw evcntlzw evcntlsw brinc SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 01001 640 641 644 645 646 648 649 652 653 654 01010 evfsadd evssub evfsabs evfsnabs evfsneg evfsmul evfsdiv long evfscmplt long sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX 704 705 708 709 710 712 713 716 717 718 719 01011 efsadd efssub efsabs efsnabs efsneg efsmul efsdiv efscmpgt efscmplt efscmpeq efscfd sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fd EVX 768 769 770 771 772 773 776 777 780 781 782 783 01100 evlddx evldd evldwx evldw evldhx evldh long long long long long long SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 01101 01110 01111 1027 1031 1032 1033 1035 1036 1037 1039 10000 evmhessf evmhossf long long long long long long SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 1095 1096 1100 1101 1103 10001 long long long long long SP EVX SP EVX SP EVX SP EVX SP EVX 10010 1216 1217 1218 1219 1220 1222 1223 1224 1225 1226 1227 10011 long long long long evmra evdivws evdivwu long long long long SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 1280 1281 1283 1285 1287 1288 1289 1291 1292 1293 1295 10100 long long long long long long long long long long long SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 1344 1345 1352 1353 10101 long long long long SP EVX SP EVX SP EVX SP EVX 1408 1409 1411 1412 1413 1415 1416 1417 1419 1420 1421 1423 10110 long long long long long long long long long long long long SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 1472 1473 1480 1481 10111 long long long long SP EVX SP EVX SP EVX SP EVX 11000 11001 11010 11011 11100 11101 11110 11111 874 Power ISATM Appendices Version 2.05 Table 9 (Left-Center) Extended opcodes for primary opcode 4 [Category: SP.*] (instruction bits 21:31) 010000 010001 010010 010011 010100 010101 010110 010111 011000 011001 011010 011011 011100 011101 011110 011111 00000 00001 00010 00011 00100 00101 00110 00111 529 530 534 535 536 537 539 542 01000 evand evandc evxor evor evnor eveqv evorc evnand SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 01001 656 657 658 659 660 661 662 663 664 666 668 669 670 01010 evfsfui evfscfsi evfscfuf evfscfsf evfsctui evfsctsi evfsctuf evfsctsf evfsctuiz evfsctsiz evfststgt evfststlt evfststeq sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX sp.fv EVX 720 721 722 723 724 725 726 727 728 730 732 733 734 01011 efscfui efscfsi efscfuf efscfsf efsctui efsctsi efsctuf efsctsf efsctuiz efsctsiz efststgt efststlt efststeq sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX sp.fs EVX 784 785 788 789 790 791 792 793 796 797 01100 evlwhex evlwhe evlwhoux evlwhou evlwhosx evlwhos long long long long SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 01101 01110 01111 10000 1112 1113 1115 10001 long long long SP EVX SP EVX SP EVX 10010 10011 10100 1363 1368 1369 1371 10101 long long long long SP EVX SP EVX SP EVX SP EVX 10110 1491 1496 1497 1499 10111 long long long long SP EVX SP EVX SP EVX SP EVX 11000 11001 11010 11011 11100 11101 11110 11111 875 Version 2.05 Table 9 (Right-Center) Extended opcodes for primary opcode 4 [Category: SP.*] (instruction bits 21:31) 100000 100001 100010 100011 100100 100101 100110 100111 101000 101001 101010 101011 101100 101101 101110 101111 00000 00001 00010 00011 00100 00101 00110 00111 544 545 546 547 548 550 552 553 554 555 556 557 558 559 01000 evsrwu evsrws evsrwiu evsrwis evslw evslwi evrlw evsplati evrlwi evsplatfi long long long long SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 01001 01010 736 737 738 739 740 741 742 744 745 746 747 748 749 750 751 01011 efdadd efdsub efdcfuid efdcfsid efdabs efdnabs efdneg efdmul efddiv efdctuidz efdctsidz efdcmpgt efdcmplt efdcmpeq efdcfs sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX 800 801 802 803 804 805 01100 evstddx evstdd evstdwx evstdw evstdhx evstdh SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 01101 01110 01111 1059 1063 1064 1065 1067 1068 1069 1071 10000 long long long long long long long long SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 1127 1128 1132 1133 1135 10001 long long long long long SP EVX SP EVX SP EVX SP EVX SP EVX 10010 10011 1320 1321 1323 1324 1325 1327 10100 long long long long long long SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 10101 1448 1449 1451 1452 1453 1455 10110 long long long long long long SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 10111 11000 11001 11010 11011 11100 11101 11110 11111 876 Power ISATM Appendices Version 2.05 Table 9 (Right) Extended opcodes for primary opcode 4 [Category: SP.*] (instruction bits 21:31) 110000 110001 110010 110011 110100 110101 110110 110111 111000 111001 111010 111011 111100 111101 111110 111111 00000 00001 00010 00011 00100 00101 00110 00111 560 561 562 563 564 01000 evcmpgtu evcmpgts evcmpltu evcmplts evcmpeq SP EVX SP EVX SP EVX SP EVX SP EVX 632 633 634 635 636 637 638 639 01001 evsel evsel' evsel' evsel' evsel' evsel' evsel' evsel' SP EVS SP EVS SP EVS SP EVS SP EVS SP EVS SP EVS SP EVS 01010 752 753 754 755 756 757 758 759 760 762 764 765 766 01011 efdcfui efdcfsi efdcfuf efdcfsf efdctui efdctsi efdctuf efdctsf efdctuiz efdctsiz efdtstgt efdtstlt efdtsteq sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX 816 817 820 821 824 825 828 829 01100 evstwhex evstwhe evstwhox evstwho evstwwex evstwwe evstwwox evstwwo SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX 01101 01110 01111 10000 1139 1145 1147 10001 long long long SP EVX SP EVX SP EVX 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111 877 Version 2.05 Table 10: (Left) Extended opcodes for primary opcode 19 (instruction bits 21:30) 00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 01101 01110 01111 0 00000 mcrf B XL 33 38 39 00001 crnor rfmci rfdi B XL E XL E.ED X 00010 00011 129 00100 crandc B XL 00101 193 198 00110 crxor dnh B XL E.EDXFX 225 00111 crnand B XL 257 01000 crand B XL 289 01001 creqv B XL 01010 01011 01100 417 01101 crorc B XL 449 01110 cror B XL 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111 878 Power ISATM Appendices Version 2.05 Table 10. (Right) Extended opcodes for primary opcode 19 (instruction bits 21:30) 10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111 16 18 00000 bclr rfid B XL S XL 50 51 00001 rfi rfci E XL E XL (82) 00010 rfsvc XL 00011 150 00100 isync B XL 00101 00110 00111 274 01000 hrfid S XL 01001 01010 01011 402 01100 doze B XL 434 01101 nap B XL 466 01110 sleep B XL 498 01111 rvwinkle B XL 528 10000 bcctr B XL 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111 879 Version 2.05 Table 11: (Left) Extended opcodes for primary opcode 31 (instruction bits 21:30) 00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 01101 01110 01111 0 4 6 7 8 9 10 11 14 15 00000 cmp tw lvsl lvebx subfc mulhdu addc mulhwu Res'd See B X B X V X V X B XO 64 XO B XO B XO VLE Table 15 32 33 38 39 40 46 || 00001 cmpl Res'd lvsr lvehx subf Res'd || B X VLE V X V X B XO VLE || 68 71 73 74 75 78 || 00010 td lvewx mulhd addg6s mulhw dlmzb || 64 X V X 64 XO BCDA XO B XO LMA X || 103 104 || 00011 lvx neg || V X B XO || 129 131 134 135 136 138 || 00100 Res'd wrtee dcbtstls stvebx subfe add || VLE E X ECL X V X B XO B XO || 163 166 167 || 00101 wrteei dcbtls stvehx || E X ECL X V X || 193 199 200 202 206 || 00110 Res'd stvewx subfze addz msgsnd || VLE V X B XO B XO E.PC X || 225 230 231 232 233 234 235 238 || 00111 Res'd icblc stvx subfme mulld addme mullw msgclr || VLE ECL X V X B XO 64 XO B XO B XO E.PC X || 257 259 262 263 266 || 01000 Res'd mfdrx Res'd lvepxl add || VLE E X AP E.PD X B XO || 289 291 295 || 01001 Res'd mfdrux lvepx || VLE E X E.PD X || 323 326 334 || 01010 mfdcr dcread mfpmr || E X E.CD X E.PM X || {359} || 01011 lvxl || V X || 387 390 || 01100 mtdcrx dcblc || E X ECL X || 417 419 || 01101 Res'd mtdcrux || VLE E X || 449 451 454 457 459 462 || 01110 Res'd mtdcr dci divdu divwu mtpmr || VLE E X E.CI X 64 XO B XO E.PM X || 486 {487} 489 491 || 01111 Res'd stvxl divd divw || AP V X 64 XO B XO || 512 {519} 520 521 522 523 || 10000 mcrxr lvlx subfc' mulhdu' addc' mulhwu' || E X V X B XO 64XO B XO B XO || {551} 552 || 10001 lvrx subf' || V X B XO || 585 587 || 10010 mulhd' mulhw' || 64 XO B XO || 616 || 10011 neg' || B XO || {647} 648 650 || 10100 stvlx subfe' adde' || V X B XO B XO || {679} || 10101 stvrx || V X || 712 714 || 10110 subfze' addze' || B XO B XO || 744 745 746 747 || 10111 subfme' mulld' addme' mullw' || B XO 64 XO B XO B XO || 775 778 || 11000 stvepxl add' || E.PD X B XO || 807 || 11001 stvepx || E.PD X || || 11010 || || || 11011 || || 903 || 11100 stvlxl || V X || 935 || 11101 stvrxl || V X || 966 969 971 || 11110 ici divdu' divwu' || E.CI X 64 XO B XO || 998 1001 1003 || 11111 icread divd' divw' See E.CD X 64 XO B XO Table 15 880 Power ISATM Appendices Version 2.05 Table 11. (Right) Extended opcodes for primary opcode 31 (instruction bits 21:30) 10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111 16 19 20 21 22 23 24 26 27 28 29 30 31 00000 Res'd mfcr lwarx ldx icbt lwzx slw cntlzw sld and ldepx rldicl* lwepx VLE B XFX B X 64 X E X B X B X B X 64 X B X E.PD X 64 MD E.PD X 53 54 55 56 58 60 62 00001 ldux dcbst lwzux Res'd cntlzd andc See 64 X B X B X VLE 64 X B X Table 12 (82) 83 84 86 87 94 95 00010 mtsrd mfmsr ldarx dcbf lbzx rldicr* lbepx X B X 64 X B X B X 64 MD E.PD X (114) 118 119 122 124 126 127 00011 mtsrdin clf lbzux popcntb nor rldicr* dcbfep X X B X B X B X 64 MD E.PD X 144 146 149 150 151 154 157 158 159 00100 mtcrf mtmsr stdx stwcx. stwx prtyw stdepx rldic* See B XFX B X 64 X B X B X B X E.PD X 64 MD Table 14 178 181 183 186 190 191 00101 mtmsrd stdux stwux prtyd rldic* rlwinm* S X 64 X B X 64 X 64 MD B M 210 214 215 222 223 00110 mtsr stdcx. stbx rldimi* stbepx S X 64 X B X 64 MD E.PD X 242 246 247 254 255 00111 mtsrin dcbtst stbux rldimi* See S X B X B X 64 MD Table 14 274 275 278 279 280 282 284 285 286 286 01000 tlbiel mfapidi dcbt lhzx Res'd cdtbcd eqv evlddepx rldcl* See S X E X B X B X VLE BCDA X B X E.PD evx 64 MDS Table 14 306 308 310 311 312 314 316 318 319 01001 tlbie Res'd eciwx lhzux Res'd cbcdtd xor rldcr* See S X EC X B X VLE BCDA X B X 64 MDS Table 14 339 341 342 343 350 351 01010 mfspr lwax Res'd lhax * xori* B XFX 64 X AP B X B D 370 371 373 374 375 382 383 01011 tlbia mftb lwaux Res'd lhaux * xoris* S X S XFX 64 X AP B X B D 402 407 412 413 414 415 01100 slbmte sthx orc evstddepx * See S X B X B X E.PD evx Table 14 434 438 439 444 446 447 01101 slbie ecowx sthux or * andis.* S X EC X B X B X B D 467 469 470 471 476 478 01110 mtspr * dcbi lmw* nand * B XFX E X All D B X 498 501 503 508 510 01111 slbia * stmw* cmpb * S X All D B X (530) 532 533 534 535 536 539 10000 no-op Res'd lswx lwbrx lfsx srw srd B MA B X FP X B X 64 X (562) 566 567 568 10001 no-op tlbsync lfsux Res'd S X FP X VLE (594) 595 597 598 599 607 10010 no-op mfsr lswi sync lfdx lfdepx S X B MA B X FP X E.PD X (626) 631 10011 no-op lfdux FP X (658) 659 660 661 662 663 10100 no-op mfsrin Res'd stswx stwbrx stfsx S X B MA B X FP X (690) 695 10101 no-op stfsux FP X (722) 725 727 735 10110 no-op stswi stfdx stfdepx B MA FP X E.PD X (754) 758 759 10111 no-op dcba stfdux E X FP X 786 789 790 791 792 794 11000 tlbivax lwzcix lhbrx lfdpx sraw srad E X S X B X FP X B X 64 X 818 821 822 823 824 826 827 11001 rac lhzcix Res'd Res'd srawi sradi sradi' X S X B X 64 XS 64 XS 851 853 854 855 11010 slbmfev lbzcix See lfiwax S X S X Table 13 S.PI X 885 11011 ldcix S X 914 915 917 918 919 922 11100 tlbsx slbmfee stwcix sthbrx stfdpx extsh E X S X S X B X FP X B X 946 949 951 954 11101 tlbre sthcix Res'd extsb E X S X AP B X 978 979 981 982 983 986 991 11110 tlbwe slbfee stbcix icbi stfiwx extsw icbiep E X S X S X B X FP X 64 X E.PD X 1010 1013 1014 1023 11111 Res'd stdcix dcbz dcbzep S X B X E.PD X 881 Version 2.05 Table 15: Opcode: 31, Extended Opcode: 15 Table 12: Opcode: 31, Extended Opcode: 62 01111 0 00001 15 00000 isel 62 62 B.in A 00001 rldicl* wait 47 || 64 MD WT X 00001 * || || 79 || 00010 tdi* || Table 13: Opcode: 31, Extended Opcode: 854 64 D || 10110 111 || 00011 twi* || 854 854 B D || 11010 eieio mbar S X E X 143 || 00100 * || || 175 || Table 14: Opcode: 31, Extended Opcode: 159 00101 * || || 11111 207 || 159 159 00110 * || 00100 rlwimi* stwepx || B M E.PD X 239 || 191 00111 mulli* || 00101 rlwinm* B D || B M 271 || 223 01000 subfic* || 00110 stbepx B D || E.PD X || 255 255 01001 || 00111 rlwnm* dcbstep || B M E.PD X 335 || 287 287 01010 cmpli* || 01000 ori* lhepx B D || B D E.PD X 367 || 319 319 01011 cmpi* || 01001 oris* dcbtep B D || B D E.PD X 399 || 351 01100 addic* || 01010 xori* B D || B D 431 || 383 01101 addic.* || 01011 xoris* B D || B D 463 || 415 415 01110 addi* || 01100 andi.* sthepx B D || B D E.PD X 495 || 01111 addis* || B D || || 10000 || || || 10001 || || || 10010 || || || 10011 || || || 10100 || || || 10101 || || || 10110 || || || 10111 || || || 11000 || || || 11001 || || || 11010 || || || 11011 || || || 11100 || || || 11101 || || || 11110 || || || 11111 || isel 882 Power ISATM Appendices Version 2.05 883 Version 2.05 Table 16:(Left) Extended opcodes for primary opcode 59 (instruction bits 21:30) 00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 01101 01110 01111 2 3 00000 dadd dqua DFP X DFP Z 34 35 00001 dmul drrnd DFP X DFP Z 66 67 00010 dscli dquai DFP Z22 DFP Z 98 99 00011 dscri drintx DFP Z DFP Z23 130 00100 dcmpo DFP X 162 00101 dtstex DFP X 194 00110 dtstdc DFP Z23 226 227 00111 dtstdg drintn DFP Z23 DFP Z23 258 01000 dctdps DFP X 290 01001 dctfix DFP X 322 01010 ddedpd DFP X 354 01011 dxex DFP X 01100 01101 01110 01111 514 10000 dsub DFP X 546 10001 ddiv DFP X 10010 10011 642 10100 dcmpu DFP X 674 10101 dtstsf DFP X 10110 10111 770 11000 drsp DFP X 11001 834 11010 denbcd DFP X 866 11011 diex DFP X 11100 11101 11110 11111 884 Power ISATM Appendices Version 2.05 Table 16. (Right) Extended opcodes for primary opcode 59 (instruction bits 21:30) 10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111 18 20 21 22 24 25 26 28 29 30 31 00000 fdivs fsubs fadds fsqrts fres fmuls frsqrtes fmsub fmadds fnmsubs fnmadds FP A FP A FP A FP A FP A FP A FP A FP A FP A FP A FP A || || || || || || || || || || || 00001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00111 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01000 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01111 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10000 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10111 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11000 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11111 || || || || || || || || || || || fdivs fsubs fadds fsqrts fres fmuls frsqrtes fmsub fmadds fnmsubs fnmadds 885 Version 2.05 Table 17:(Left) Extended opcodes for primary opcode 63 (instruction bits 21:30) 00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 01101 01110 01111 0 2 3 8 12 14 15 00000 fcmpu daddq dquaq fcpsgn frsp fctiw fctiwz FP X DFP X DFP Z FP X FP X FP X FP X 32 34 35 38 40 00001 fcmpo dmulqq drrndq mtfsb1 fneg FP X DFP X DFP Z23 FP X FP X 64 66 67 70 72 00010 mcrfs dscliq dquaiq mtfsb0 fmr FP X DFP Z22 DFP Z FP X FP X 98 99 00011 dscriq drintxq DFP Z DFP Z23 130 134 136 00100 dcmpoq mtfsfi fnabs DFP X FP X FP X 162 00101 dtstexq DFP X 194 00110 dtstdcq DFP Z22 226 227 00111 dtstdgq drintnq DFP Z22 DFP Z23 258 264 01000 dctqpq fabs DFP X FP X 290 01001 dctfixq DFP X 322 01010 ddedpdq DFP X 354 01011 dxexq DFP X 392 01100 frin FP X 424 01101 friz FP X 456 01110 frip FP X 488 01111 frim FP X 514 10000 dsubq DFP X 546 10001 ddivq DFP X 583 10010 mffs FP X 10011 642 10100 dcmpuq DFP X 674 10101 dtstsfq DFP X 711 10110 mtfsf FP XFL 10111 770 11000 drdpq DFP X 802 814 815 11001 dcffixq fctid fctidz DFP X FP X FP X 834 846 11010 denbcdq fcfid DFP X FP X 866 11011 diexq DFP X 11100 11101 11110 11111 886 Power ISATM Appendices Version 2.05 Table 17. (Right) Extended opcodes for primary opcode 63 (instruction bits 21:30) 10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111 18 20 21 22 23 24 25 26 28 29 30 31 00000 fdiv fsub fadd fsqrt fsel fre fmul frsqrte fmsub fmadd fnmsub fnmadd FP A FP A FP A FP A FP A FP A FP A FP A FP A FP A FP A FP A || || || || || || || || || || || || 00001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 00111 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01000 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 01111 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10000 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 10111 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11000 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11001 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11010 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11011 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11100 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11101 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11110 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || 11111 || || || || || || || || || || || || fdiv fsub fadd fsqrt fsel fre fmul frsqrte fmsub fmadd fnmsub fnmadd 887 Version 2.05 888 Power ISATM Appendices Version 2.05 Appendix G. Power ISA Instruction Set Sorted by Mnemonic This appendix lists all the instructions in the Power ISA, in order by mnemonic. Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext XO 31 266 SR 63 B add[o][.] Add XO 31 10 SR 64 B addc[o][.] Add Carrying XO 31 138 SR 65 B adde[o][.] Add Extended XO 31 74 SR H 495 BCDA addg6s Add and Generate Sixes D 14 62 B addi Add Immediate D 12 SR 63 B addic Add Immediate Carrying D 13 SR 63 B addic. Add Immediate Carrying and Record D 15 62 B addis Add Immediate Shifted XO 31 234 SR 65 B addme[o][.] Add to Minus One Extended XO 31 202 SR 66 B addze[o][.] Add to Zero Extended X 31 28 SR 77 B and[.] AND X 31 60 SR 78 B andc[.] AND with Complement D 28 SR 75 B andi. AND Immediate D 29 SR 75 B andis. AND Immediate Shifted I 18 35 B b[l][a] Branch B 16 CT 35 B bc[l][a] Branch Conditional XL 19 528 CT 36 B bcctr[l] Branch Conditional to Count Register XL 19 16 CT 36 B bclr[l] Branch Conditional to Link Register EVX 4 527 268 SP brinc Bit Reversed Increment X 31 314 H 494 BCDA cbcdtd Convert Binary Coded Decimal to Declets X 31 282 H 494 BCDA cdtbcd Convert Declets To Binary Coded Decimal X 31 0 71 B cmp Compare X 31 508 79 B cmpb Compare Bytes D 11 71 B cmpi Compare Immediate X 31 32 72 B cmpl Compare Logical D 10 72 B cmpli Compare Logical Immediate X 31 58 SR 81 64 cntlzd[.] Count Leading Zeros Doubleword X 31 26 SR 79 B cntlzw[.] Count Leading Zeros Word XL 19 257 37 B crand Condition Register AND XL 19 129 38 B crandc Condition Register AND with Complement XL 19 289 38 B creqv Condition Register Equivalent XL 19 225 37 B crnand Condition Register NAND XL 19 33 38 B crnor Condition Register NOR XL 19 449 37 B cror Condition Register OR XL 19 417 38 B crorc Condition Register OR with Complement XL 19 193 37 B crxor Condition Register XOR X 59 2 163 DFP dadd DFP Add X 63 2 163 DFP daddq DFP Add Quad X 31 758 433 E dcba Data Cache Block Allocate X 31 86 437 B dcbf Data Cache Block Flush X 31 127 P 632 E.PD dcbfep Data Cache Block Flush by External PID X 31 470 P 652 E dcbi Data Cache Block Invalidate X 31 390 M 656 ECL dcblc Data Cache Block Lock Clear X 31 54 436 B dcbst Data Cache Block Store X 31 63 P 631 E.PD dcbstep Data Cache Block Store by External PID X 31 278 434 B dcbt Data Cache Block Touch Appendix G. Power ISA Instruction Set Sorted by Mnemonic 889 Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 31 319 P 631 E.PD dcbtep Data Cache Block Touch by External PID X 31 166 M 655 ECL dcbtls Data Cache Block Touch and Lock Set X 31 246 435 B dcbtst Data Cache Block Touch for Store X 31 255 P 633 E.PD dcbtstep Data Cache Block Touch for Store by External PID X 31 134 M 655 ECL dcbtstls Data Cache Block Touch for Store and Lock Set X 31 1014 436 B dcbz Data Cache Block set to Zero X 31 1023 P 634 E.PD dcbzep Data Cache Block set to Zero by External PID X 63 802 185 DFP dcffixq DFP Convert From Fixed Quad X 31 454 P 727 E.CI dci Data Cache Invalidate X 59 130 169 DFP dcmpo DFP Compare Ordered X 63 130 169 DFP dcmpoq DFP Compare Ordered Quad X 59 642 168 DFP dcmpu DFP Compare Unordered X 63 642 169 DFP dcmpuq DFP Compare Unordered Quad X 31 326 P 730 E.CD dcread Data Cache Read [Alternative Encoding] X 31 486 P 730 E.CD dcread Data Cache Read X 59 258 183 DFP dctdp DFP Convert To DFP Long X 59 290 185 DFP dctfix DFP Convert To Fixed X 63 290 185 DFP dctfixq DFP Convert To Fixed Quad X 63 258 183 DFP dctqpq DFP Convert To DFP Extended X 59 322 187 DFP ddedpd DFP Decode DPD To BCD X 63 322 187 DFP ddedpdq DFP Decode DPD To BCD Quad X 59 546 166 DFP ddiv DFP Divide X 63 546 166 DFP ddivq DFP Divide Quad X 59 834 187 DFP denbcd DFP Encode BCD To DPD X 63 834 187 DFP denbcdq DFP Encode BCD To DPD Quad X 59 866 188 DFP diex DFP Insert Biased Exponent X 63 866 188 DFP diexq DFP Insert Biased Exponent Quad XO 31 489 SR 70 64 divd[o][.] Divide Doubleword XO 31 457 SR 70 64 divdu[o][.] Divide Doubleword Unsigned XO 31 491 SR 68 B divw[o][.] Divide Word XO 31 459 SR 68 B divwu[o][.] Divide Word Unsigned X 31 78 349 LMV dlmzb[.] Determine Leftmost Zero Byte X 59 34 165 DFP dmul DFP Multiply X 63 34 165 DFP dmulq DFP Multiply Quad XFX 19 198 718 E.ED dnh Debugger Notify Halt XL 19 402 H 482 S doze Doze Z 59 3 174 DFP dqua DFP Quantize Z23 59 67 173 DFP dquai[.] DFP Quantize Immediate Z23 63 67 173 DFP dquaiq[.] DFP Quantize Immediate Quad Z23 63 3 174 DFP dquaq[.] DFP Quantize Quad X 63 770 184 DFP drdpq DFP Round To DFP Long Z23 59 227 181 DFP drintn[.] DFP Round To FP Integer Without Inexact Z23 63 227 181 DFP drintnq[.] DFP Round To FP Integer Without Inexact Quad Z23 59 99 179 DFP drintx[.] DFP Round To FP Integer With Inexact Z23 63 99 179 DFP drintxq[.] DFP Round To FP Integer With Inexact Quad Z 59 35 176 DFP drrnd DFP Reround Z23 63 35 176 DFP drrndq[.] DFP Reround Quad X 59 770 184 DFP drsp DFP Round To DFP Short Z23 59 66 190 DFP dscli[.] DFP Shift Significand Left Immediate Z23 63 66 190 DFP dscliq[.] DFP Shift Significand Left Immediate Quad Z 59 98 190 DFP dscri DFP Shift Significand Right Immediate Z 63 98 190 DFP dscriq DFP Shift Significand Right Immediate Quad X 59 514 163 DFP dsub DFP Subtract X 63 514 163 DFP dsubq DFP Subtract Quad Z23 59 194 170 DFP dtstdc DFP Test Data Class Z23 63 194 170 DFP dtstdcq DFP Test Data Class Quad Z23 59 226 170 DFP dtstdg DFP Test Data Group Z23 63 226 170 DFP dtstdgq DFP Test Data Group Quad X 59 162 171 DFP dtstex DFP Test Exponent X 63 162 171 DFP dtstexq DFP Test Exponent Quad X 59 674 172 DFP dtstsf DFP Test Significance 890 Power ISATM Appendices Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 63 674 172 DFP dtstsfq DFP Test Significance Quad X 59 354 188 DFP dxex DFP Extract Biased Exponent X 63 354 188 DFP dxexq DFP Extract Biased Exponent Quad X 31 310 456 EC eciwx External Control In Word Indexed X 31 438 456 EC ecowx External Control Out Word Indexed EVX 4 740 335 SP.FD efdabs Floating-Point Double-Precision Absolute Value EVX 4 736 336 SP.FD efdadd Floating-Point Double-Precision Add EVX 4 751 342 SP.FD efdcfs Floating-Point Double-Precision Convert from Single- Precision EVX 4 755 340 SP.FD efdcfsf Convert Floating-Point Double-Precision from Signed Fraction EVX 4 753 339 SP.FD efdcfsi Convert Floating-Point Double-Precision from Signed Integer EVX 4 739 340 SP.FD efdcfsid Convert Floating-Point Double-Precision from Signed Integer Doubleword EVX 4 754 340 SP.FD efdcfuf Convert Floating-Point Double-Precision from Unsigned Fraction EVX 4 752 339 SP.FD efdcfui Convert Floating-Point Double-Precision from Unsigned Integer EVX 4 738 340 SP.FD efdcfuid Convert Floating-Point Double-Precision from Unsigned Integer Doubleword EVX 4 750 337 SP.FD efdcmpeq Floating-Point Double-Precision Compare Equal EVX 4 748 337 SP.FD efdcmpgt Floating-Point Double-Precision Compare Greater Than EVX 4 749 337 SP.FD efdcmplt Floating-Point Double-Precision Compare Less Than EVX 4 759 342 SP.FD efdctsf Convert Floating-Point Double-Precision to Signed Fraction EVX 4 757 340 SP.FD efdctsi Convert Floating-Point Double-Precision to Signed Inte- ger EVX 4 747 341 SP.FD efdctsidz Convert Floating-Point Double-Precision to Signed Inte- ger Doubleword with Round toward Zero EVX 4 762 342 SP.FD efdctsiz Convert Floating-Point Double-Precision to Signed Inte- ger with Round toward Zero EVX 4 758 342 SP.FD efdctuf Convert Floating-Point Double-Precision to Unsigned Fraction EVX 4 756 340 SP.FD efdctui Convert Floating-Point Double-Precision to Unsigned Integer EVX 4 746 341 SP.FD efdctuidz Convert Floating-Point Double-Precision to Unsigned Integer Doubleword with Round toward Zero EVX 4 760 342 SP.FD efdctuiz Convert Floating-Point Double-Precision to Unsigned Integer with Round toward Zero EVX 4 745 336 SP.FD efddiv Floating-Point Double-Precision Divide EVX 4 744 336 SP.FD efdmul Floating-Point Double-Precision Multiply EVX 4 741 335 SP.FD efdnabs Floating-Point Double-Precision Negative Absolute Value EVX 4 742 335 SP.FD efdneg Floating-Point Double-Precision Negate EVX 4 737 336 SP.FD efdsub Floating-Point Double-Precision Subtract EVX 4 766 338 SP.FD efdtsteq Floating-Point Double-Precision Test Equal EVX 4 764 337 SP.FD efdtstgt Floating-Point Double-Precision Test Greater Than EVX 4 765 338 SP.FD efdtstlt Floating-Point Double-Precision Test Less Than EVX 4 708 328 SP.FS efsabs Floating-Point Single-Precision Absolute Value EVX 4 704 329 SP.FS efsadd Floating-Point Single-Precision Add EVX 4 719 343 SP.FD efscfd Floating-Point Single-Precision Convert from Double- Precision EVX 4 723 333 SP.FS efscfsf Convert Floating-Point Single-Precision from Signed Fraction EVX 4 721 333 SP.FS efscfsi Convert Floating-Point Single-Precision from Signed Integer EVX 4 722 333 SP.FS efscfuf Convert Floating-Point Single-Precision from Unsigned Fraction Appendix G. Power ISA Instruction Set Sorted by Mnemonic 891 Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 720 333 SP.FS efscfui Convert Floating-Point Single-Precision from Unsigned Integer EVX 4 718 331 SP.FS efscmpeq Floating-Point Single-Precision Compare Equal EVX 4 716 330 SP.FS efscmpgt Floating-Point Single-Precision Compare Greater Than EVX 4 717 330 SP.FS efscmplt Floating-Point Single-Precision Compare Less Than EVX 4 727 334 SP.FS efsctsf Convert Floating-Point Single-Precision to Signed Frac- tion EVX 4 725 333 SP.FS efsctsi Convert Floating-Point Single-Precision to Signed Inte- ger EVX 4 730 334 SP.FS efsctsiz Convert Floating-Point Single-Precision to Signed Inte- ger with Round toward Zero EVX 4 726 334 SP.FS efsctuf Convert Floating-Point Single-Precision to Unsigned Fraction EVX 4 724 333 SP.FS efsctui Convert Floating-Point Single-Precision to Unsigned Integer EVX 4 728 334 SP.FS efsctuiz Convert Floating-Point Single-Precision to Unsigned Integer with Round toward Zero EVX 4 713 329 SP.FS efsdiv Floating-Point Single-Precision Divide EVX 4 712 329 SP.FS efsmul Floating-Point Single-Precision Multiply EVX 4 709 328 SP.FS efsnabs Floating-Point Single-Precision Negative Absolute Value EVX 4 710 328 SP.FS efsneg Floating-Point Single-Precision Negate EVX 4 705 329 SP.FS efssub Floating-Point Single-Precision Subtract EVX 4 734 332 SP.FS efststeq Floating-Point Single-Precision Test Equal EVX 4 732 331 SP.FS efststgt Floating-Point Single-Precision Test Greater Than EVX 4 733 332 SP.FS efststlt Floating-Point Single-Precision Test Less Than X 31 854 448 S eieio Enforce In-order Execution of I/O X 31 284 SR 78 B eqv[.] Equivalent EVX 4 520 268 SP evabs Vector Absolute Value EVX 4 514 268 SP evaddiw Vector Add Immediate Word EVX 4 1225 268 SP evaddsmiaaw Vector Add Signed, Modulo, Integer to Accumulator Word EVX 4 1217 269 SP evaddssiaaw Vector Add Signed, Saturate, Integer to Accumulator Word EVX 4 1224 269 SP evaddumiaaw Vector Add Unsigned, Modulo, Integer to Accumulator Word EVX 4 1216 269 SP evaddusiaaw Vector Add Unsigned, Saturate, Integer to Accumulator Word EVX 4 512 269 SP evaddw Vector Add Word EVX 4 529 270 SP evand Vector AND EVX 4 530 270 SP evandc Vector AND with Complement EVX 4 564 270 SP evcmpeq Vector Compare Equal EVX 4 561 270 SP evcmpgts Vector Compare Greater Than Signed EVX 4 560 271 SP evcmpgtu Vector Compare Greater Than Unsigned EVX 4 563 271 SP evcmplts Vector Compare Less Than Signed EVX 4 562 271 SP evcmpltu Vector Compare Less Than Unsigned EVX 4 526 272 SP evcntlsw Vector Count Leading Signed Bits Word EVX 4 525 272 SP evcntlzw Vector Count Leading Zeros Word EVX 4 1222 272 SP evdivws Vector Divide Word Signed EVX 4 1223 273 SP evdivwu Vector Divide Word Unsigned EVX 4 537 273 SP eveqv Vector Equivalent EVX 4 522 273 SP evextsb Vector Extend Sign Byte EVX 4 523 273 SP evextsh Vector Extend Sign Halfword EVX 4 644 319 SP.FV evfsabs Vector Floating-Point Single-Precision Absolute Value EVX 4 640 320 SP.FV evfsadd Vector Floating-Point Single-Precision Add EVX 4 659 324 SP.FV evfscfsf Vector Convert Floating-Point Single-Precision from Signed Fraction EVX 4 657 324 SP.FV evfscfsi Vector Convert Floating-Point Single-Precision from Signed Integer 892 Power ISATM Appendices Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 658 324 SP.FV evfscfuf Vector Convert Floating-Point Single-Precision from Unsigned Fraction EVX 4 656 324 SP.FV evfscfui Vector Convert Floating-Point Single-Precision from Unsigned Integer EVX 4 654 322 SP.FV evfscmpeq Vector Floating-Point Single-Precision Compare Equal EVX 4 652 321 SP.FV evfscmpgt Vector Floating-Point Single-Precision Compare Greater Than EVX 4 653 321 SP.FV evfscmplt Vector Floating-Point Single-Precision Compare Less Than EVX 4 663 326 SP.FV evfsctsf Vector Convert Floating-Point Single-Precision to Signed Fraction EVX 4 661 325 SP.FV evfsctsi Vector Convert Floating-Point Single-Precision to Signed Integer EVX 4 666 325 SP.FV evfsctsiz Vector Convert Floating-Point Single-Precision to Signed Integer with Round toward Zero EVX 4 662 326 SP.FV evfsctuf Vector Convert Floating-Point Single-Precision to Unsigned Fraction EVX 4 660 325 SP.FV evfsctui Vector Convert Floating-Point Single-Precision to Unsigned Integer EVX 4 664 325 SP.FV evfsctuiz Vector Convert Floating-Point Single-Precision to Unsigned Integer with Round toward Zero EVX 4 649 320 SP.FV evfsdiv Vector Floating-Point Single-Precision Divide EVX 4 648 320 SP.FV evfsmul Vector Floating-Point Single-Precision Multiply EVX 4 645 319 SP.FV evfsnabs Vector Floating-Point Single-Precision Negative Abso- lute Value EVX 4 646 319 SP.FV evfsneg Vector Floating-Point Single-Precision Negate EVX 4 641 320 SP.FV evfssub Vector Floating-Point Single-Precision Subtract EVX 4 670 323 SP.FV evfststeq Vector Floating-Point Single-Precision Test Equal EVX 4 668 322 SP.FV evfststgt Vector Floating-Point Single-Precision Test Greater Than EVX 4 669 323 SP.FV evfststlt Vector Floating-Point Single-Precision Test Less Than EVX 4 769 274 SP evldd Vector Load Double Word into Double Word EVX 31 285 P 636 E.PD evlddepx Vector Load Doubleword into Doubleword by External Process ID Indexed EVX 4 768 274 SP evlddx Vector Load Double Word into Double Word Indexed EVX 4 773 274 SP evldh Vector Load Double into Four Halfwords EVX 4 772 274 SP evldhx Vector Load Double into Four Halfwords Indexed EVX 4 771 275 SP evldw Vector Load Double into Two Words EVX 4 770 275 SP evldwx Vector Load Double into Two Words Indexed EVX 4 777 275 SP evlhhesplat Vector Load Halfword into Halfwords Even and Splat EVX 4 776 275 SP evlhhesplatx Vector Load Halfword into Halfwords Even and Splat Indexed EVX 4 783 276 SP evlhhossplat Vector Load Halfword into Halfword Odd Signed and Splat EVX 4 782 276 SP evlhhossplatx Vector Load Halfword into Halfword Odd Signed and Splat Indexed EVX 4 781 276 SP evlhhousplat Vector Load Halfword into Halfword Odd Unsigned and Splat EVX 4 780 276 SP evlhhousplatx Vector Load Halfword into Halfword Odd Unsigned and Splat Indexed EVX 4 785 277 SP evlwhe Vector Load Word into Two Halfwords Even EVX 4 784 277 SP evlwhex Vector Load Word into Two Halfwords Even Indexed EVX 4 791 277 SP evlwhos Vector Load Word into Two Halfwords Odd Signed (with sign extension) EVX 4 790 277 SP evlwhosx Vector Load Word into Two Halfwords Odd Signed Indexed (with sign extension) EVX 4 789 278 SP evlwhou Vector Load Word into Two Halfwords Odd Unsigned (zero-extended) EVX 4 788 278 SP evlwhoux Vector Load Word into Two Halfwords Odd Unsigned Indexed (zero-extended) Appendix G. Power ISA Instruction Set Sorted by Mnemonic 893 Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 797 278 SP evlwhsplat Vector Load Word into Two Halfwords and Splat EVX 4 796 278 SP evlwhsplatx Vector Load Word into Two Halfwords and Splat Indexed EVX 4 793 279 SP evlwwsplat Vector Load Word into Word and Splat EVX 4 792 279 SP evlwwsplatx Vector Load Word into Word and Splat Indexed EVX 4 556 279 SP evmergehi Vector Merge High EVX 4 558 280 SP evmergehilo Vector Merge High/Low EVX 4 557 279 SP evmergelo Vector Merge Low EVX 4 559 280 SP evmergelohi Vector Merge Low/High EVX 4 1323 280 SP evmhegsmfaa Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate EVX 4 1451 280 SP evmhegsmfan Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate Negative EVX 4 1321 281 SP evmhegsmiaa Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate EVX 4 1449 281 SP evmhegsmian Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate Negative EVX 4 1320 281 SP evmhegumiaa Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Accumulate EVX 4 1448 281 SP evmhegumian Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Accumulate Negative EVX 4 1035 282 SP evmhesmf Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional EVX 4 1067 282 SP evmhesmfa Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional to Accumulator EVX 4 1291 282 SP evmhesmfaaw Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional and Accumulate into Words EVX 4 1419 282 SP evmhesmfanw Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional and Accumulate Negative into Words EVX 4 1033 283 SP evmhesmi Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger EVX 4 1065 283 SP evmhesmia Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger to Accumulator EVX 4 1289 283 SP evmhesmiaaw Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger and Accumulate into Words EVX 4 1417 283 SP evmhesmianw Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger and Accumulate Negative into Words EVX 4 1027 284 SP evmhessf Vector Multiply Halfwords, Even, Signed, Saturate, Fractional EVX 4 1059 284 SP evmhessfa Vector Multiply Halfwords, Even, Signed, Saturate, Fractional to Accumulator EVX 4 1283 285 SP evmhessfaaw Vector Multiply Halfwords, Even, Signed, Saturate, Fractional and Accumulate into Words EVX 4 1411 285 SP evmhessfanw Vector Multiply Halfwords, Even, Signed, Saturate, Fractional and Accumulate Negative into Words EVX 4 1281 286 SP evmhessiaaw Vector Multiply Halfwords, Even, Signed, Saturate, Inte- ger and Accumulate into Words EVX 4 1409 286 SP evmhessianw Vector Multiply Halfwords, Even, Signed, Saturate, Inte- ger and Accumulate Negative into Words EVX 4 1032 287 SP evmheumi Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer EVX 4 1064 287 SP evmheumia Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer to Accumulator EVX 4 1288 287 SP evmheumiaaw Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate into Words EVX 4 1416 287 SP evmheumianw Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate Negative into Words EVX 4 1280 288 SP evmheusiaaw Vector Multiply Halfwords, Even, Unsigned, Saturate, Integer and Accumulate into Words 894 Power ISATM Appendices Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 1408 288 SP evmheusianw Vector Multiply Halfwords, Even, Unsigned, Saturate, Integer and Accumulate Negative into Words EVX 4 1327 289 SP evmhogsmfaa Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Fractional and Accumulate EVX 4 1455 289 SP evmhogsmfan Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Fractional and Accumulate Negative EVX 4 1325 289 SP evmhogsmiaa Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Integer and Accumulate EVX 4 1453 289 SP evmhogsmian Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Integer and Accumulate Negative EVX 4 1324 290 SP evmhogumiaa Vector Multiply Halfwords, Odd, Guarded, Unsigned, Modulo, Integer and Accumulate EVX 4 1452 290 SP evmhogumian Vector Multiply Halfwords, Odd, Guarded, Unsigned, Modulo, Integer and Accumulate Negative EVX 4 1039 290 SP evmhosmf Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional EVX 4 1071 290 SP evmhosmfa Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional to Accumulator EVX 4 1295 291 SP evmhosmfaaw Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional and Accumulate into Words EVX 4 1423 291 SP evmhosmfanw Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional and Accumulate Negative into Words EVX 4 1037 291 SP evmhosmi Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger EVX 4 1069 291 SP evmhosmia Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger to Accumulator EVX 4 1293 292 SP evmhosmiaaw Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger and Accumulate into Words EVX 4 1421 291 SP evmhosmianw Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger and Accumulate Negative into Words EVX 4 1031 293 SP evmhossf Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional EVX 4 1063 293 SP evmhossfa Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional to Accumulator EVX 4 1287 294 SP evmhossfaaw Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional and Accumulate into Words EVX 4 1415 294 SP evmhossfanw Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional and Accumulate Negative into Words EVX 4 1285 295 SP evmhossiaaw Vector Multiply Halfwords, Odd, Signed, Saturate, Inte- ger and Accumulate into Words EVX 4 1413 295 SP evmhossianw Vector Multiply Halfwords, Odd, Signed, Saturate, Inte- ger and Accumulate Negative into Words EVX 4 1036 295 SP evmhoumi Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer EVX 4 1068 295 SP evmhoumia Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer to Accumulator EVX 4 1292 296 SP evmhoumiaaw Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate into Words EVX 4 1420 292 SP evmhoumianw Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate Negative into Words EVX 4 1284 296 SP evmhousiaaw Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate into Words EVX 4 1412 296 SP evmhousianw Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate Negative into Words EVX 4 1220 297 SP evmra Initialize Accumulator EVX 4 1103 297 SP evmwhsmf Vector Multiply Word High Signed, Modulo, Fractional EVX 4 1135 297 SP evmwhsmfa Vector Multiply Word High Signed, Modulo, Fractional to Accumulator EVX 4 1101 297 SP evmwhsmi Vector Multiply Word High Signed, Modulo, Integer Appendix G. Power ISA Instruction Set Sorted by Mnemonic 895 Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 1133 297 SP evmwhsmia Vector Multiply Word High Signed, Modulo, Integer to Accumulator EVX 4 1095 298 SP evmwhssf Vector Multiply Word High Signed, Saturate, Fractional EVX 4 1127 298 SP evmwhssfa Vector Multiply Word High Signed, Saturate, Fractional to Accumulator EVX 4 1100 298 SP evmwhumi Vector Multiply Word High Unsigned, Modulo, Integer EVX 4 1132 298 SP evmwhumia Vector Multiply Word High Unsigned, Modulo, Integer to Accumulator EVX 4 1353 299 SP evmwlsmiaaw Vector Multiply Word Low Signed, Modulo, Integer and Accumulate into Words EVX 4 1481 299 SP evmwlsmianw Vector Multiply Word Low Signed, Modulo, Integer and Accumulate Negative in Words EVX 4 1345 299 SP evmwlssiaaw Vector Multiply Word Low Signed, Saturate, Integer and Accumulate into Words EVX 4 1473 299 SP evmwlssianw Vector Multiply Word Low Signed, Saturate, Integer and Accumulate Negative in Words EVX 4 1096 300 SP evmwlumi Vector Multiply Word Low Unsigned, Modulo, Integer EVX 4 1128 300 SP evmwlumia Vector Multiply Word Low Unsigned, Modulo, Integer to Accumulator EVX 4 1352 300 SP evmwlumiaaw Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate into Words EVX 4 1480 300 SP evmwlumianw Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate Negative in Words EVX 4 1344 301 SP evmwlusiaaw Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate into Words EVX 4 1472 301 SP evmwlusianw Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate Negative in Words EVX 4 1115 301 SP evmwsmf Vector Multiply Word Signed, Modulo, Fractional EVX 4 1147 301 SP evmwsmfa Vector Multiply Word Signed, Modulo, Fractional to Accumulator EVX 4 1371 302 SP evmwsmfaa Vector Multiply Word Signed, Modulo, Fractional and Accumulate EVX 4 1499 302 SP evmwsmfan Vector Multiply Word Signed, Modulo, Fractional and Accumulate Negative EVX 4 1113 302 SP evmwsmi Vector Multiply Word Signed, Modulo, Integer EVX 4 1145 302 SP evmwsmia Vector Multiply Word Signed, Modulo, Integer to Accu- mulator EVX 4 1369 302 SP evmwsmiaa Vector Multiply Word Signed, Modulo, Integer and Accumulate EVX 4 1497 302 SP evmwsmian Vector Multiply Word Signed, Modulo, Integer and Accumulate Negative EVX 4 1107 303 SP evmwssf Vector Multiply Word Signed, Saturate, Fractional EVX 4 1139 303 SP evmwssfa Vector Multiply Word Signed, Saturate, Fractional to Accumulator EVX 4 1363 303 SP evmwssfaa Vector Multiply Word Signed, Saturate, Fractional and Accumulate EVX 4 1491 304 SP evmwssfan Vector Multiply Word Signed, Saturate, Fractional and Accumulate Negative EVX 4 1112 304 SP evmwumi Vector Multiply Word Unsigned, Modulo, Integer EVX 4 1144 304 SP evmwumia Vector Multiply Word Unsigned, Modulo, Integer to Accumulator EVX 4 1368 305 SP evmwumiaa Vector Multiply Word Unsigned, Modulo, Integer and Accumulate EVX 4 1496 305 SP evmwumian Vector Multiply Word Unsigned, Modulo, Integer and Accumulate Negative EVX 4 542 305 SP evnand Vector NAND EVX 4 521 305 SP evneg Vector Negate EVX 4 536 305 SP evnor Vector NOR EVX 4 535 306 SP evor Vector OR EVX 4 539 306 SP evorc Vector OR with Complement 896 Power ISATM Appendices Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 552 306 SP evrlw Vector Rotate Left Word EVX 4 554 307 SP evrlwi Vector Rotate Left Word Immediate EVX 4 524 307 SP evrndw Vector Round Word EVS 4 79 307 SP evsel Vector Select EVX 4 548 308 SP evslw Vector Shift Left Word EVX 4 550 308 SP evslwi Vector Shift Left Word Immediate EVX 4 555 308 SP evsplatfi Vector Splat Fractional Immediate EVX 4 553 308 SP evsplati Vector Splat Immediate EVX 4 547 308 SP evsrwis Vector Shift Right Word Immediate Signed EVX 4 546 308 SP evsrwiu Vector Shift Right Word Immediate Unsigned EVX 4 545 309 SP evsrws Vector Shift Right Word Signed EVX 4 544 309 SP evsrwu Vector Shift Right Word Unsigned EVX 4 801 309 SP evstdd Vector Store Double of Double EVX 31 413 P 636 E.PD evstddepx Vector Store Doubleword into Doubleword by External Process ID Indexed EVX 4 800 309 SP evstddx Vector Store Double of Double Indexed EVX 4 805 310 SP evstdh Vector Store Double of Four Halfwords EVX 4 804 310 SP evstdhx Vector Store Double of Four Halfwords Indexed EVX 4 803 310 SP evstdw Vector Store Double of Two Words EVX 4 802 310 SP evstdwx Vector Store Double of Two Words Indexed EVX 4 817 311 SP evstwhe Vector Store Word of Two Halfwords from Even EVX 4 816 311 SP evstwhex Vector Store Word of Two Halfwords from Even Indexed EVX 4 821 311 SP evstwho Vector Store Word of Two Halfwords from Odd EVX 4 820 311 SP evstwhox Vector Store Word of Two Halfwords from Odd Indexed EVX 4 825 311 SP evstwwe Vector Store Word of Word from Even EVX 4 824 311 SP evstwwex Vector Store Word of Word from Even Indexed EVX 4 829 312 SP evstwwo Vector Store Word of Word from Odd EVX 4 828 312 SP evstwwox Vector Store Word of Word from Odd Indexed EVX 4 1227 312 SP evsubfsmiaaw Vector Subtract Signed, Modulo, Integer to Accumulator Word EVX 4 1219 312 SP evsubfssiaaw Vector Subtract Signed, Saturate, Integer to Accumula- tor Word EVX 4 1226 313 SP evsubfumiaaw Vector Subtract Unsigned, Modulo, Integer to Accumu- lator Word EVX 4 1218 313 SP evsubfusiaaw Vector Subtract Unsigned, Saturate, Integer to Accumu- lator Word EVX 4 516 313 SP evsubfw Vector Subtract from Word EVX 4 518 313 SP evsubifw Vector Subtract Immediate from Word EVX 4 534 313 SP evxor Vector XOR X 31 954 SR 79 B extsb[.] Extend Sign Byte X 31 922 SR 79 B extsh[.] Extend Sign Halfword X 31 986 SR 81 64 extsw[.] Extend Sign Word X 63 264 126 FP[R] fabs[.] Floating Absolute Value A 63 21 127 FP[R] fadd[.] Floating Add A 59 21 127 FP[R] fadds[.] Floating Add Single X 63 846 136 FP[R] fcfid[.] Floating Convert From Integer Doubleword X 63 32 138 FP fcmpo Floating Compare Ordered X 63 0 138 FP fcmpu Floating Compare Unordered X 63 8 126 FP[R] fcpsgn[.] Floating Copy Sign X 63 814 134 FP[R] fctid[.] Floating Convert To Integer Doubleword X 63 815 135 FP[R] fctidz[.] Floating Convert To Integer Doubleword with round toward Zero X 63 14 135 FP[R] fctiw[.] Floating Convert To Integer Word X 63 15 136 FP[R] fctiwz[.] Floating Convert To Integer Word with round toward Zero A 63 18 128 FP[R] fdiv[.] Floating Divide A 59 18 128 FP[R] fdivs[.] Floating Divide Single A 63 29 132 FP[R] fmadd[.] Floating Multiply-Add A 59 29 132 FP[R] fmadds[.] Floating Multiply-Add Single X 63 72 126 FP[R] fmr[.] Floating Move Register A 63 28 132 FP[R] fmsub[.] Floating Multiply-Subtract Appendix G. Power ISA Instruction Set Sorted by Mnemonic 897 Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext A 59 28 132 FP[R] fmsubs[.] Floating Multiply-Subtract Single A 63 25 128 FP[R] fmul[.] Floating Multiply A 59 25 128 FP[R] fmuls[.] Floating Multiply Single X 63 136 126 FP[R] fnabs[.] Floating Negative Absolute Value X 63 40 126 FP[R] fneg[.] Floating Negate A 63 31 133 FP[R] fnmadd[.] Floating Negative Multiply-Add A 59 31 133 FP[R] fnmadds[.] Floating Negative Multiply-Add Single A 63 30 133 FP[R] fnmsub[.] Floating Negative Multiply-Subtract A 59 30 133 FP[R] fnmsubs[.] Floating Negative Multiply-Subtract Single A 63 24 129 FP[R] fre[.] Floating Reciprocal Estimate A 59 24 129 FP[R] fres[.] Floating Reciprocal Estimate Single X 63 488 137 FP[R].in frim[.] Floating Round to Integer Minus X 63 392 137 FP[R].in frin[.] Floating Round to Integer Nearest X 63 456 137 FP[R].in frip[.] Floating Round to Integer Plus X 63 424 137 FP[R].in friz[.] Floating Round to Integer Toward Zero X 63 12 134 FP[R] frsp[.] Floating Round to Single-Precision A 63 26 130 FP[R].in frsqrte[.] Floating Reciprocal Square Root Estimate A 59 26 130 FP[R].in frsqrtes[.] Floating Reciprocal Square Root Estimate Single A 63 23 139 FP[R] fsel[.] Floating Select A 63 22 129 FP[R] fsqrt[.] Floating Square Root A 59 22 129 FP[R] fsqrts[.] Floating Square Root Single A 63 20 127 FP[R] fsub[.] Floating Subtract A 59 20 127 FP[R] fsubs[.] Floating Subtract Single XL 19 274 H 480 S hrfid Hypervisor Return From Interrupt Doubleword X 31 982 428 B icbi Instruction Cache Block Invalidate X 31 991 P 634 E.PD icbiep Instruction Cache Block Invalidate by External PID X 31 230 M 657 ECL icblc Instruction Cache Block Lock Clear X 31 22 428 E icbt Instruction Cache Block Touch X 31 486 M 656 ECL icbtls Instruction Cache Block Touch and Lock Set X 31 966 P 727 E.CI ici Instruction Cache Invalidate X 31 998 P 731 E.CD icread Instruction Cache Read A 31 15 74 B.in isel Integer Select XL 19 150 440 B isync Instruction Synchronize X 31 95 P 627 E.PD lbepx Load Byte by External Process ID Indexed D 34 45 B lbz Load Byte and Zero X 31 853 H 491 S lbzcix Load Byte and Zero Caching Inhibited Indexed D 35 45 B lbzu Load Byte and Zero with Update X 31 119 45 B lbzux Load Byte and Zero with Update Indexed X 31 87 46 B lbzx Load Byte and Zero Indexed DS 58 0 50 64 ld Load Doubleword X 31 84 444 64 ldarx Load Doubleword And Reserve Indexed X 31 885 H 491 S ldcix Load Doubleword Caching Inhibited Indexed X 31 29 P 628 E.PD ldepx Load Doubleword by External Process ID Indexed DS 58 1 50 64 ldu Load Doubleword with Update X 31 53 50 64 ldux Load Doubleword with Update Indexed X 31 21 50 64 ldx Load Doubleword Indexed D 50 119 FP lfd Load Floating-Point Double X 31 607 P 635 E.PD lfdepx Load Floating-Point Double by External Process ID Indexed DS 57 0 125 FP.out lfdp Load Floating-Point Double Pair X 31 791 125 FP.out lfdpx Load Floating-Point Double Pair Indexed D 51 119 FP lfdu Load Floating-Point Double with Update X 31 631 119 FP lfdux Load Floating-Point Double with Update Indexed X 31 599 119 FP lfdx Load Floating-Point Double Indexed X 31 855 120 FP lfiwax Load Floating-Point as Integer Word Algebraic Indexed D 48 122 FP lfs Load Floating-Point Single D 49 122 FP lfsu Load Floating-Point Single with Update X 31 567 122 FP lfsux Load Floating-Point Single with Update Indexed X 31 535 122 FP lfsx Load Floating-Point Single Indexed D 42 47 B lha Load Halfword Algebraic D 43 47 B lhau Load Halfword Algebraic with Update 898 Power ISATM Appendices Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 31 375 47 B lhaux Load Halfword Algebraic with Update Indexed X 31 343 47 B lhax Load Halfword Algebraic Indexed X 31 790 55 B lhbrx Load Halfword Byte-Reverse Indexed X 31 287 P 627 E.PD lhepx Load Halfword by External Process ID Indexed D 40 46 B lhz Load Halfword and Zero X 31 821 H 491 S lhzcix Load Halfword and Zero Caching Inhibited Indexed D 41 46 B lhzu Load Halfword and Zero with Update X 31 311 46 B lhzux Load Halfword and Zero with Update Indexed X 31 279 46 B lhzx Load Halfword and Zero Indexed D 46 56 B lmw Load Multiple Word DQ 56 P 493 LSQ lq Load Quadword X 31 597 59 MA lswi Load String Word Immediate X 31 533 59 MA lswx Load String Word Indexed X 31 7 206 V lvebx Load Vector Element Byte Indexed X 31 39 203 V lvehx Load Vector Element Halfword Indexed X 31 295 P 637 E.PD lvepx Load Vector by External Process ID Indexed X 31 263 P 637 E.PD lvepxl Load Vector by External Process ID Indexed LRU X 31 71 203 V lvewx Load Vector Element Word Indexed X 31 6 208 V lvsl Load Vector for Shift Left Indexed X 31 38 208 V lvsr Load Vector for Shift Right Indexed X 31 103 204 V lvx Load Vector Indexed X 31 359 204 V lvxl Load Vector Indexed LRU DS 58 2 49 64 lwa Load Word Algebraic X 31 20 442 B lwarx Load Word And Reserve Indexed X 31 373 49 64 lwaux Load Word Algebraic with Update Indexed X 31 341 49 64 lwax Load Word Algebraic Indexed X 31 534 55 B lwbrx Load Word Byte-Reverse Indexed X 31 31 P 628 E.PD lwepx Load Word by External Process ID Indexed D 32 48 B lwz Load Word and Zero X 31 789 H 491 S lwzcix Load Word and Zero Caching Inhibited Indexed D 33 48 B lwzu Load Word and Zero with Update X 31 55 48 B lwzux Load Word and Zero with Update Indexed X 31 23 48 B lwzx Load Word and Zero Indexed XO 4 172 351 LMA macchw[o][.] Multiply Accumulate Cross Halfword to Word Modulo Signed XO 4 236 351 LMA macchws[o][.] Multiply Accumulate Cross Halfword to Word Saturate Signed XO 4 204 352 LMA macchwsu[o][.] Multiply Accumulate Cross Halfword to Word Saturate Unsigned XO 4 140 352 LMA macchwu[o][.] Multiply Accumulate Cross Halfword to Word Modulo Unsigned XO 4 44 353 LMA machhw[o][.] Multiply Accumulate High Halfword to Word Modulo Signed XO 4 108 353 LMA machhws[o][.] Multiply Accumulate High Halfword to Word Saturate Signed XO 4 76 354 LMA machhwsu[o][.] Multiply Accumulate High Halfword to Word Saturate Unsigned XO 4 12 354 LMA machhwu[o][.] Multiply Accumulate High Halfword to Word Modulo Unsigned XO 4 428 355 LMA maclhw[o][.] Multiply Accumulate Low Halfword to Word Modulo Signed XO 4 492 355 LMA maclhws[o][.] Multiply Accumulate Low Halfword to Word Saturate Signed XO 4 460 356 LMA maclhwsu[o][.] Multiply Accumulate Low Halfword to Word Saturate Unsigned XO 4 396 356 LMA maclhwu[o][.] Multiply Accumulate Low Halfword to Word Modulo Unsigned X 31 854 448 E mbar Memory Barrier XL 19 0 38 B mcrf Move Condition Register Field X 63 64 140 FP mcrfs Move to Condition Register from FPSCR X 31 512 97 E mcrxr Move to Condition Register from XER Appendix G. Power ISA Instruction Set Sorted by Mnemonic 899 Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 31 275 97 E mfapidi Move From APID Indirect XFX 31 19 95 B mfcr Move From Condition Register XFX 31 323 P 625 E mfdcr Move From Device Control Register X 31 291 P 97 E mfdcrux Move From Device Control Register User-mode Indexed X 31 259 P 625 E mfdcrx Move From Device Control Register Indexed X 63 583 140 FP[R] mffs[.] Move From FPSCR X 31 83 P 503, B mfmsr Move From Machine State Register 625 XFX 31 19 96 B.in mfocrf Move From One Condition Register Field XFX 31 334 O 756 E.PM mfpmr Move From Performance Monitor Register XFX 31 339 O 94,4 B mfspr Move From Special Purpose Register 51 X 31 595 32 P 538 S mfsr Move From Segment Register X 31 659 32 P 538 S mfsrin Move From Segment Register Indirect XFX 31 371 451 S.out mftb Move From Time Base VX 4 1540 259 V mfvscr Move From Vector Status and Control Register X 31 238 P 721 E.PC msgclr Message Clear X 31 206 P 721 E.PC msgsnd Message Send XFX 31 144 95 B mtcrf Move To Condition Register Fields XFX 31 451 P 624 E mtdcr Move To Device Control Register X 31 419 97 E mtdcrux Move To Device Control Register User-mode Indexed X 31 387 P 624 E mtdcrx Move To Device Control Register Indexed X 63 70 142 FP[R] mtfsb0[.] Move To FPSCR Bit 0 X 63 38 142 FP[R] mtfsb1[.] Move To FPSCR Bit 1 XFL 63 711 141 FP[R] mtfsf[.] Move To FPSCR Fields X 63 134 141 FP[R] mtfsfi[.] Move To FPSCR Field Immediate X 31 146 P 625 E mtmsr Move To Machine State Register X 31 146 P 501 S mtmsr Move To Machine State Register X 31 178 P 502 S mtmsrd Move To Machine State Register Doubleword XFX 31 144 96 B.in mtocrf Move To One Condition Register Field XFX 31 462 O 756 E.PM mtpmr Move To Performance Monitor Register XFX 31 467 O 93 B mtspr Move To Special Purpose Register X 31 210 32 P 537 S mtsr Move To Segment Register X 31 242 32 P 537 S mtsrin Move To Segment Register Indirect VX 4 1604 259 V mtvscr Move To Vector Status and Control Register X 4 168 356 LMA mulchw[.] Multiply Cross Halfword to Word Signed X 4 136 356 LMA mulchwu[.] Multiply Cross Halfword to Word Unsigned XO 31 73 SR 69 64 mulhd[.] Multiply High Doubleword XO 31 9 SR 69 64 mulhdu[.] Multiply High Doubleword Unsigned X 4 40 357 LMA mulhhw[.] Multiply High Halfword to Word Signed X 4 8 357 LMA mulhhwu[.] Multiply High Halfword to Word Unsigned XO 31 75 SR 67 B mulhw[.] Multiply High Word XO 31 11 SR 67 B mulhwu[.] Multiply High Word Unsigned XO 31 233 SR 69 64 mulld[o][.] Multiply Low Doubleword X 4 424 357 LMA mullhw[.] Multiply Low Halfword to Word Signed X 4 392 357 LMA mullhwu[.] Multiply Low Halfword to Word Unsigned D 7 67 B mulli Multiply Low Immediate XO 31 235 SR 67 B mullw[o][.] Multiply Low Word X 31 476 SR 77 B nand[.] NAND XL 19 434 H 482 S nap Nap XO 31 104 SR 66 B neg[o][.] Negate XO 4 174 358 LMA nmacchw[o][.] Negative Multiply Accumulate Cross Halfword to Word Modulo Signed XO 4 238 358 LMA nmacchws[o][.] Negative Multiply Accumulate Cross Halfword to Word Saturate Signed XO 4 46 359 LMA nmachhw[o][.] Negative Multiply Accumulate High Halfword to Word Modulo Signed XO 4 110 359 LMA nmachhws[o][.] Negative Multiply Accumulate High Halfword to Word Saturate Signed 900 Power ISATM Appendices Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext XO 4 430 360 LMA nmaclhw[o][.] Negative Multiply Accumulate Low Halfword to Word Modulo Signed XO 4 494 360 LMA nmaclhws[o][.] Negative Multiply Accumulate Low Halfword to Word Saturate Signed X 31 124 SR 78 B nor[.] NOR X 31 444 SR 77 B or[.] OR X 31 412 SR 78 B orc[.] OR with Complement D 24 75 B ori OR Immediate D 25 76 B oris OR Immediate Shifted X 31 122 81 B.in popcntb Population Count Bytes X 31 186 80 64 prtyd Parity Doubleword X 31 154 80 B prtyw Parity Word XL 19 51 P 614 E rfci Return From Critical Interrupt X 19 39 P 614 E.ED rfdi Return From Debug Interrupt XL 19 50 P 613 E rfi Return From Interrupt XL 19 18 P 480 S rfid Return From Interrupt Doubleword XL 19 38 P 614 E rfmci Return From Machine Check Interrupt MDS 30 8 SR 86 64 rldcl[.] Rotate Left Doubleword then Clear Left MDS 30 9 SR 87 64 rldcr[.] Rotate Left Doubleword then Clear Right MD 30 2 SR 86 64 rldic[.] Rotate Left Doubleword Immediate then Clear MD 30 0 SR 85 64 rldicl[.] Rotate Left Doubleword Immediate then Clear Left MD 30 1 SR 85 64 rldicr[.] Rotate Left Doubleword Immediate then Clear Right MD 30 3 SR 87 64 rldimi[.] Rotate Left Doubleword Immediate then Mask Insert M 20 SR 84 B rlwimi[.] Rotate Left Word Immediate then Mask Insert M 21 SR 82 B rlwinm[.] Rotate Left Word Immediate then AND with Mask M 23 SR 83 B rlwnm[.] Rotate Left Word then AND with Mask XL 19 498 H 483 S rvwinkle Rip Van Winkle SC 17 39, B sc System Call 479, 613 X 31 979 SR P 535 S slbfee. SLB Find Entry ESID X 31 498 P 532 S slbia SLB Invalidate All X 31 434 P 531 S slbie SLB Invalidate Entry X 31 915 P 534 S slbmfee SLB Move From Entry ESID X 31 851 P 534 S slbmfev SLB Move From Entry VSID X 31 402 P 533 S slbmte SLB Move To Entry X 31 27 SR 90 64 sld[.] Shift Left Doubleword XL 19 466 H 483 S sleep Sleep X 31 24 SR 88 B slw[.] Shift Left Word X 31 794 SR 91 64 srad[.] Shift Right Algebraic Doubleword XS 31 413 SR 91 64 sradi[.] Shift Right Algebraic Doubleword Immediate X 31 792 SR 89 B sraw[.] Shift Right Algebraic Word X 31 824 SR 89 B srawi[.] Shift Right Algebraic Word Immediate X 31 539 SR 90 64 srd[.] Shift Right Doubleword X 31 536 SR 88 B srw[.] Shift Right Word D 38 51 B stb Store Byte X 31 981 H 492 S stbcix Store Byte Caching Inhibited Indexed X 31 223 P 629 E.PD stbepx Store Byte by External Process ID Indexed D 39 51 B stbu Store Byte with Update X 31 247 51 B stbux Store Byte with Update Indexed X 31 215 51 B stbx Store Byte Indexed DS 62 0 54 64 std Store Doubleword X 31 1013 H 492 S stdcix Store Doubleword Caching Inhibited Indexed X 31 214 444 64 stdcx. Store Doubleword Conditional Indexed X 31 157 P 630 E.PD stdepx Store Doubleword by External Process ID Indexed DS 62 1 54 64 stdu Store Doubleword with Update X 31 181 54 64 stdux Store Doubleword with Update Indexed X 31 149 54 64 stdx Store Doubleword Indexed D 54 123 FP stfd Store Floating-Point Double X 31 735 P 635 E.PD stfdepx Store Floating-Point Double by External Process ID Indexed Appendix G. Power ISA Instruction Set Sorted by Mnemonic 901 Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext DS 61 - 125 FP.out stfdp Store Floating-Point Double Pair X 31 919 125 FP.out stfdpx Store Floating-Point Double Pair Indexed D 55 123 FP stfdu Store Floating-Point Double with Update X 31 759 123 FP stfdux Store Floating-Point Double with Update Indexed X 31 727 123 FP stfdx Store Floating-Point Double Indexed X 31 983 124 FP stfiwx Store Floating-Point as Integer Word Indexed D 52 122 FP stfs Store Floating-Point Single D 53 122 FP stfsu Store Floating-Point Single with Update X 31 695 122 FP stfsux Store Floating-Point Single with Update Indexed X 31 663 122 FP stfsx Store Floating-Point Single Indexed D 44 52 B sth Store Halfword X 31 918 55 B sthbrx Store Halfword Byte-Reverse Indexed X 31 949 H 492 S sthcix Store Halfword Caching Inhibited Indexed X 31 415 P 629 E.PD sthepx Store Halfword by External Process ID Indexed D 45 52 B sthu Store Halfword with Update X 31 439 52 B sthux Store Halfword with Update Indexed X 31 407 52 B sthx Store Halfword Indexed D 47 57 B stmw Store Multiple Word DS 62 2 P 493 LSQ stq Store Quadword X 31 725 60 MA stswi Store String Word Immediate X 31 661 60 MA stswx Store String Word Indexed X 31 135 206 V stvebx Store Vector Element Byte Indexed X 31 167 206 V stvehx Store Vector Element Halfword Indexed X 31 807 P 638 E.PD stvepx Store Vector by External Process ID Indexed X 31 775 P 638 E.PD stvepxl Store Vector by External Process ID Indexed LRU X 31 199 207 V stvewx Store Vector Element Word Indexed X 31 231 204 V stvx Store Vector Indexed X 31 487 207 V stvxl Store Vector Indexed LRU D 36 53 B stw Store Word X 31 662 55 B stwbrx Store Word Byte-Reverse Indexed X 31 917 H 492 S stwcix Store Word Caching Inhibited Indexed X 31 150 442 B stwcx. Store Word Conditional Indexed X 31 159 P 630 E.PD stwepx Store Word by External Process ID Indexed D 37 53 B stwu Store Word with Update X 31 183 53 B stwux Store Word with Update Indexed X 31 151 53 B stwx Store Word Indexed XO 31 40 SR 63 B subf[o][.] Subtract From XO 31 8 SR 64 B subfc[o][.] Subtract From Carrying XO 31 136 SR 65 B subfe[o][.] Subtract From Extended D 8 SR 64 B subfic Subtract From Immediate Carrying XO 31 232 SR 65 B subfme[o][.] Subtract From Minus One Extended XO 31 200 SR 66 B subfze[o][.] Subtract From Zero Extended X 31 598 446 B sync Synchronize X 31 68 74 64 td Trap Doubleword D 2 74 64 tdi Trap Doubleword Immediate X 31 370 H 542 S tlbia TLB Invalidate All X 31 306 64 H 539 S tlbie TLB Invalidate Entry X 31 274 64 H 541 S tlbiel TLB Invalidate Entry Local X 31 786 P 658, E tlbivax TLB Invalidate Virtual Address Indexed 747 X 31 946 P 658, E tlbre TLB Read Entry 748 X 31 914 P 659, E tlbsx TLB Search Indexed 748 X 31 566 H 542, B tlbsync TLB Synchronize 659, 749 X 31 978 P 660, E tlbwe TLB Write Entry 749 X 31 4 73 B tw Trap Word D 3 73 B twi Trap Word Immediate 902 Power ISATM Appendices Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext VX 4 384 220 V vaddcuw Vector Add and Write Carry-Out Unsigned Word VX 4 10 249 V vaddfp Vector Add Single-Precision VX 4 768 220 V vaddsbs Vector Add Signed Byte Saturate VX 4 832 220 V vaddshs Vector Add Signed Halfword Saturate VX 4 896 220 V vaddsws Vector Add Signed Word Saturate VX 4 0 221 V vaddubm Vector Add Unsigned Byte Modulo VX 4 512 222 V vaddubs Vector Add Unsigned Byte Saturate VX 4 64 221 V vadduhm Vector Add Unsigned Halfword Modulo VX 4 576 222 V vadduhs Vector Add Unsigned Halfword Saturate VX 4 128 221 V vadduwm Vector Add Unsigned Word Modulo VX 4 640 222 V vadduws Vector Add Unsigned Word Saturate VX 4 1028 244 V vand Vector Logical AND VX 4 1092 244 V vandc Vector Logical AND with Complement VX 4 1282 235 V vavgsb Vector Average Signed Byte VX 4 1346 235 V vavgsh Vector Average Signed Halfword VX 4 1410 235 V vavgsw Vector Average Signed Word VX 4 1026 236 V vavgub Vector Average Unsigned Byte VX 4 1090 236 V vavguh Vector Average Unsigned Halfword VX 4 1154 236 V vavguw Vector Average Unsigned Word VX 4 842 253 V vcfsx Vector Convert From Signed Fixed-Point Word VX 4 778 253 V vcfux Vector Convert From Unsigned Fixed-Point Word VC 4 966 255 V vcmpbfp[.] Vector Compare Bounds Single-Precision VC 4 198 255 V vcmpeqfp[.] Vector Compare Equal To Single-Precision VC 4 6 241 V vcmpequb[.] Vector Compare Equal To Unsigned Byte VC 4 70 241 V vcmpequh[.] Vector Compare Equal To Unsigned Halfword VC 4 134 242 V vcmpequw[.] Vector Compare Equal To Unsigned Word VC 4 454 256 V vcmpgefp[.] Vector Compare Greater Than or Equal To Single-Pre- cision VC 4 710 256 V vcmpgtfp[.] Vector Compare Greater Than Single-Precision VC 4 774 242 V vcmpgtsb[.] Vector Compare Greater Than Signed Byte VC 4 838 242 V vcmpgtsh[.] Vector Compare Greater Than Signed Halfword VC 4 902 242 V vcmpgtsw[.] Vector Compare Greater Than Signed Word VC 4 518 243 V vcmpgtub[.] Vector Compare Greater Than Unsigned Byte VC 4 582 243 V vcmpgtuh[.] Vector Compare Greater Than Unsigned Halfword VC 4 646 243 V vcmpgtuw[.] Vector Compare Greater Than Unsigned Word VX 4 970 252 V vctsxs Vector Convert To Signed Fixed-Point Word Saturate VX 4 906 252 V vctuxs Vector Convert To Unsigned Fixed-Point Word Saturate VX 4 394 257 V vexptefp Vector 2 Raised to the Exponent Estimate Floating- Point VX 4 458 257 V vlogefp Vector Log Base 2 Estimate Floating-Point VA 4 46 250 V vmaddfp Vector Multiply-Add Single-Precision VX 4 1034 251 V vmaxfp Vector Maximum Single-Precision VX 4 258 237 V vmaxsb Vector Maximum Signed Byte VX 4 322 237 V vmaxsh Vector Maximum Signed Halfword VX 4 386 237 V vmaxsw Vector Maximum Signed Word VX 4 2 238 V vmaxub Vector Maximum Unsigned Byte VX 4 66 238 V vmaxuh Vector Maximum Unsigned Halfword VX 4 130 238 V vmaxuw Vector Maximum Unsigned Word VA 4 32 228 V vmhaddshs Vector Multiply-High-Add Signed Halfword Saturate VA 4 33 228 V vmhraddshs Vector Multiply-High-Round-Add Signed Halfword Satu- rate VX 4 1098 251 V vminfp Vector Minimum Single-Precision VX 4 770 239 V vminsb Vector Minimum Signed Byte VX 4 834 239 V vminsh Vector Minimum Signed Halfword VX 4 898 239 V vminsw Vector Minimum Signed Word VX 4 514 240 V vminub Vector Minimum Unsigned Byte VX 4 578 240 V vminuh Vector Minimum Unsigned Halfword VX 4 642 240 V vminuw Vector Minimum Unsigned Word VA 4 34 229 V vmladduhm Vector Multiply-Low-Add Unsigned Halfword Modulo VX 4 12 214 V vmrghb Vector Merge High Byte VX 4 76 214 V vmrghh Vector Merge High Halfword Appendix G. Power ISA Instruction Set Sorted by Mnemonic 903 Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext VX 4 140 214 V vmrghw Vector Merge High Word VX 4 268 215 V vmrglb Vector Merge Low Byte VX 4 332 215 V vmrglh Vector Merge Low Halfword VX 4 396 215 V vmrglw Vector Merge Low Word VA 4 37 230 V vmsummbm Vector Multiply-Sum Mixed Byte Modulo VA 4 40 230 V vmsumshm Vector Multiply-Sum Signed Halfword Modulo VA 4 41 231 V vmsumshs Vector Multiply-Sum Signed Halfword Saturate VA 4 36 229 V vmsumubm Vector Multiply-Sum Unsigned Byte Modulo VA 4 38 231 V vmsumuhm Vector Multiply-Sum Unsigned Halfword Modulo VA 4 39 232 V vmsumuhs Vector Multiply-Sum Unsigned Halfword Saturate VX 4 776 226 V vmulesb Vector Multiply Even Signed Byte VX 4 840 226 V vmulesh Vector Multiply Even Signed Halfword VX 4 520 226 V vmuleub Vector Multiply Even Unsigned Byte VX 4 584 226 V vmuleuh Vector Multiply Even Unsigned Halfword VX 4 264 227 V vmulosb Vector Multiply Odd Signed Byte VX 4 328 227 V vmulosh Vector Multiply Odd Signed Halfword VX 4 8 227 V vmuloub Vector Multiply Odd Unsigned Byte VX 4 72 227 V vmulouh Vector Multiply Odd Unsigned Halfword VA 4 47 250 V vnmsubfp Vector Negative Multiply-Subtract Single-Precision VX 4 1284 244 V vnor Vector Logical NOR VX 4 1156 244 V vor Vector Logical OR VA 4 43 217 V vperm Vector Permute VX 4 782 209 V vpkpx Vector Pack Pixel VX 4 398 210 V vpkshss Vector Pack Signed Halfword Signed Saturate VX 4 270 210 V vpkshus Vector Pack Signed Halfword Unsigned Saturate VX 4 462 210 V vpkswss Vector Pack Signed Word Signed Saturate VX 4 334 210 V vpkswus Vector Pack Signed Word Unsigned Saturate VX 4 14 211 V vpkuhum Vector Pack Unsigned Halfword Unsigned Modulo VX 4 142 211 V vpkuhus Vector Pack Unsigned Halfword Unsigned Saturate VX 4 78 211 V vpkuwum Vector Pack Unsigned Word Unsigned Modulo VX 4 206 211 V vpkuwus Vector Pack Unsigned Word Unsigned Saturate VX 4 266 258 V vrefp Vector Reciprocal Estimate Single-Precision VX 4 714 254 V vrfim Vector Round to Single-Precision Integer toward -Infin- ity VX 4 522 254 V vrfin Vector Round to Single-Precision Integer Nearest VX 4 650 254 V vrfip Vector Round to Single-Precision Integer toward +Infin- ity VX 4 586 254 V vrfiz Vector Round to Single-Precision Integer toward Zero VX 4 4 245 V vrlb Vector Rotate Left Byte VX 4 68 245 V vrlh Vector Rotate Left Halfword VX 4 132 245 V vrlw Vector Rotate Left Word VX 4 330 258 V vrsqrtefp Vector Reciprocal Square Root Estimate Single-Preci- sion VA 4 42 217 V vsel Vector Select VX 4 452 218 V vsl Vector Shift Left VX 4 260 246 V vslb Vector Shift Left Byte VA 4 44 218 V vsldoi Vector Shift Left Double by Octet Immediate VX 4 324 246 V vslh Vector Shift Left Halfword VX 4 1036 218 V vslo Vector Shift Left by Octet VX 4 388 246 V vslw Vector Shift Left Word VX 4 524 216 V vspltb Vector Splat Byte VX 4 588 216 V vsplth Vector Splat Halfword VX 4 780 216 V vspltisb Vector Splat Immediate Signed Byte VX 4 844 216 V vspltish Vector Splat Immediate Signed Halfword VX 4 908 216 V vspltisw Vector Splat Immediate Signed Word VX 4 652 216 V vspltw Vector Splat Word VX 4 708 219 V vsr Vector Shift Right VX 4 772 248 V vsrab Vector Shift Right Algebraic Byte VX 4 836 248 V vsrah Vector Shift Right Algebraic Halfword VX 4 900 248 V vsraw Vector Shift Right Algebraic Word VX 4 516 247 V vsrb Vector Shift Right Byte 904 Power ISATM Appendices Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext VX 4 580 247 V vsrh Vector Shift Right Halfword VX 4 1100 219 V vsro Vector Shift Right by Octet VX 4 644 247 V vsrw Vector Shift Right Word VX 4 1408 223 V vsubcuw Vector Subtract and Write Carry-Out Unsigned Word VX 4 74 249 V vsubfp Vector Subtract Single-Precision VX 4 1792 223 V vsubsbs Vector Subtract Signed Byte Saturate VX 4 1856 223 V vsubshs Vector Subtract Signed Halfword Saturate VX 4 1920 223 V vsubsws Vector Subtract Signed Word Saturate VX 4 1024 224 V vsububm Vector Subtract Unsigned Byte Modulo VX 4 1536 225 V vsububs Vector Subtract Unsigned Byte Saturate VX 4 1088 224 V vsubuhm Vector Subtract Unsigned Halfword Modulo VX 4 1600 224 V vsubuhs Vector Subtract Unsigned Halfword Saturate VX 4 1152 224 V vsubuwm Vector Subtract Unsigned Word Modulo VX 4 1664 225 V vsubuws Vector Subtract Unsigned Word Saturate VX 4 1672 233 V vsum2sws Vector Sum across Half Signed Word Saturate VX 4 1800 234 V vsum4sbs Vector Sum across Quarter Signed Byte Saturate VX 4 1608 234 V vsum4shs Vector Sum across Quarter Signed Halfword Saturate VX 4 1544 234 V vsum4ubs Vector Sum across Quarter Unsigned Byte Saturate VX 4 1928 233 V vsumsws Vector Sum across Signed Word Saturate VX 4 846 212 V vupkhpx Vector Unpack High Pixel VX 4 526 212 V vupkhsb Vector Unpack High Signed Byte VX 4 590 212 V vupkhsh Vector Unpack High Signed Halfword VX 4 974 213 V vupklpx Vector Unpack Low Pixel VX 4 654 213 V vupklsb Vector Unpack Low Signed Byte VX 4 718 213 V vupklsh Vector Unpack Low Signed Halfword VX 4 1220 244 V vxor Vector Logical XOR X 31 62 449 WT wait Wait X 31 131 P 626 E wrtee Write MSR External Enable X 31 163 P 626 E wrteei Write MSR External Enable Immediate X 31 316 SR 77 B xor[.] XOR D 26 76 B xori XOR Immediate D 27 76 B xoris XOR Immediate Shifted 1 See the key to the mode dependency and privilege columns on page 905 and the key to the category column in Section 1.3.5 of Book I. Mode Dependency and Privilege Abbreviations Except as described below and in Section 1.10.3, "Effective Address Calculation", in Book I, all instructions are inde- pendent of whether the processor is in 32-bit or 64-bit mode. Key to Mode Dependency Column Mode Dep. Description CT If the instruction tests the Count Register, it tests the low-order 32 bits in 32-bit mode and all 64 bits in 64-bit mode. SR The setting of status registers (such as XER and CR0) is mode-dependent. 32 The instruction can be executed only in 32-bit mode. 64 The instruction can be executed only in 64-bit mode. Key to Privilege Column Priv. Description P Denotes a privileged instruction. Appendix G. Power ISA Instruction Set Sorted by Mnemonic 905 Version 2.05 Priv. Description O Denotes an instruction that is treated as privi- leged or nonprivileged (or hypervisor, for mtspr), depending on the SPR or PMR num- ber. H Denotes an instruction that can be executed only in hypervisor state M Denotes an instruction that is treated as privi- leged or nonprivileged, depending on the value of the UCLE bit in the MSR. 906 Power ISATM Appendices Version 2.05 Appendix H. Power ISA Instruction Set Sorted by Category This appendix lists all the instructions in the Power ISA, grouped by category, and in order by mnemonic within cate- gory. . Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 31 58 SR 81 64 cntlzd[.] Count Leading Zeros Doubleword XO 31 489 SR 70 64 divd[o][.] Divide Doubleword XO 31 457 SR 70 64 divdu[o][.] Divide Doubleword Unsigned X 31 986 SR 81 64 extsw[.] Extend Sign Word DS 58 0 50 64 ld Load Doubleword X 31 84 444 64 ldarx Load Doubleword And Reserve Indexed DS 58 1 50 64 ldu Load Doubleword with Update X 31 53 50 64 ldux Load Doubleword with Update Indexed X 31 21 50 64 ldx Load Doubleword Indexed DS 58 2 49 64 lwa Load Word Algebraic X 31 373 49 64 lwaux Load Word Algebraic with Update Indexed X 31 341 49 64 lwax Load Word Algebraic Indexed XO 31 73 SR 69 64 mulhd[.] Multiply High Doubleword XO 31 9 SR 69 64 mulhdu[.] Multiply High Doubleword Unsigned XO 31 233 SR 69 64 mulld[o][.] Multiply Low Doubleword X 31 186 80 64 prtyd Parity Doubleword MDS 30 8 SR 86 64 rldcl[.] Rotate Left Doubleword then Clear Left MDS 30 9 SR 87 64 rldcr[.] Rotate Left Doubleword then Clear Right MD 30 2 SR 86 64 rldic[.] Rotate Left Doubleword Immediate then Clear MD 30 0 SR 85 64 rldicl[.] Rotate Left Doubleword Immediate then Clear Left MD 30 1 SR 85 64 rldicr[.] Rotate Left Doubleword Immediate then Clear Right MD 30 3 SR 87 64 rldimi[.] Rotate Left Doubleword Immediate then Mask Insert X 31 27 SR 90 64 sld[.] Shift Left Doubleword X 31 794 SR 91 64 srad[.] Shift Right Algebraic Doubleword XS 31 413 SR 91 64 sradi[.] Shift Right Algebraic Doubleword Immediate X 31 539 SR 90 64 srd[.] Shift Right Doubleword DS 62 0 54 64 std Store Doubleword X 31 214 444 64 stdcx. Store Doubleword Conditional Indexed DS 62 1 54 64 stdu Store Doubleword with Update X 31 181 54 64 stdux Store Doubleword with Update Indexed X 31 149 54 64 stdx Store Doubleword Indexed X 31 68 74 64 td Trap Doubleword D 2 74 64 tdi Trap Doubleword Immediate XO 31 266 SR 63 B add[o][.] Add XO 31 10 SR 64 B addc[o][.] Add Carrying XO 31 138 SR 65 B adde[o][.] Add Extended D 14 62 B addi Add Immediate D 12 SR 63 B addic Add Immediate Carrying D 13 SR 63 B addic. Add Immediate Carrying and Record D 15 62 B addis Add Immediate Shifted XO 31 234 SR 65 B addme[o][.] Add to Minus One Extended XO 31 202 SR 66 B addze[o][.] Add to Zero Extended X 31 28 SR 77 B and[.] AND Appendix H. Power ISA Instruction Set Sorted by Category 907 Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 31 60 SR 78 B andc[.] AND with Complement D 28 SR 75 B andi. AND Immediate D 29 SR 75 B andis. AND Immediate Shifted I 18 35 B b[l][a] Branch B 16 CT 35 B bc[l][a] Branch Conditional XL 19 528 CT 36 B bcctr[l] Branch Conditional to Count Register XL 19 16 CT 36 B bclr[l] Branch Conditional to Link Register X 31 0 71 B cmp Compare X 31 508 79 B cmpb Compare Bytes D 11 71 B cmpi Compare Immediate X 31 32 72 B cmpl Compare Logical D 10 72 B cmpli Compare Logical Immediate X 31 26 SR 79 B cntlzw[.] Count Leading Zeros Word XL 19 257 37 B crand Condition Register AND XL 19 129 38 B crandc Condition Register AND with Complement XL 19 289 38 B creqv Condition Register Equivalent XL 19 225 37 B crnand Condition Register NAND XL 19 33 38 B crnor Condition Register NOR XL 19 449 37 B cror Condition Register OR XL 19 417 38 B crorc Condition Register OR with Complement XL 19 193 37 B crxor Condition Register XOR X 31 86 437 B dcbf Data Cache Block Flush X 31 54 436 B dcbst Data Cache Block Store X 31 278 434 B dcbt Data Cache Block Touch X 31 246 435 B dcbtst Data Cache Block Touch for Store X 31 1014 436 B dcbz Data Cache Block set to Zero XO 31 491 SR 68 B divw[o][.] Divide Word XO 31 459 SR 68 B divwu[o][.] Divide Word Unsigned X 31 284 SR 78 B eqv[.] Equivalent X 31 954 SR 79 B extsb[.] Extend Sign Byte X 31 922 SR 79 B extsh[.] Extend Sign Halfword X 31 982 428 B icbi Instruction Cache Block Invalidate XL 19 150 440 B isync Instruction Synchronize D 34 45 B lbz Load Byte and Zero D 35 45 B lbzu Load Byte and Zero with Update X 31 119 45 B lbzux Load Byte and Zero with Update Indexed X 31 87 46 B lbzx Load Byte and Zero Indexed D 42 47 B lha Load Halfword Algebraic D 43 47 B lhau Load Halfword Algebraic with Update X 31 375 47 B lhaux Load Halfword Algebraic with Update Indexed X 31 343 47 B lhax Load Halfword Algebraic Indexed X 31 790 55 B lhbrx Load Halfword Byte-Reverse Indexed D 40 46 B lhz Load Halfword and Zero D 41 46 B lhzu Load Halfword and Zero with Update X 31 311 46 B lhzux Load Halfword and Zero with Update Indexed X 31 279 46 B lhzx Load Halfword and Zero Indexed D 46 56 B lmw Load Multiple Word X 31 20 442 B lwarx Load Word And Reserve Indexed X 31 534 55 B lwbrx Load Word Byte-Reverse Indexed D 32 48 B lwz Load Word and Zero D 33 48 B lwzu Load Word and Zero with Update X 31 55 48 B lwzux Load Word and Zero with Update Indexed X 31 23 48 B lwzx Load Word and Zero Indexed XL 19 0 38 B mcrf Move Condition Register Field XFX 31 19 95 B mfcr Move From Condition Register X 31 83 P 503, B mfmsr Move From Machine State Register 625 XFX 31 339 O 94,4 B mfspr Move From Special Purpose Register 51 XFX 31 144 95 B mtcrf Move To Condition Register Fields XFX 31 467 O 93 B mtspr Move To Special Purpose Register 908 Power ISATM Appendices Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext XO 31 75 SR 67 B mulhw[.] Multiply High Word XO 31 11 SR 67 B mulhwu[.] Multiply High Word Unsigned D 7 67 B mulli Multiply Low Immediate XO 31 235 SR 67 B mullw[o][.] Multiply Low Word X 31 476 SR 77 B nand[.] NAND XO 31 104 SR 66 B neg[o][.] Negate X 31 124 SR 78 B nor[.] NOR X 31 444 SR 77 B or[.] OR X 31 412 SR 78 B orc[.] OR with Complement D 24 75 B ori OR Immediate D 25 76 B oris OR Immediate Shifted X 31 154 80 B prtyw Parity Word M 20 SR 84 B rlwimi[.] Rotate Left Word Immediate then Mask Insert M 21 SR 82 B rlwinm[.] Rotate Left Word Immediate then AND with Mask M 23 SR 83 B rlwnm[.] Rotate Left Word then AND with Mask SC 17 39, B sc System Call 479, 613 X 31 24 SR 88 B slw[.] Shift Left Word X 31 792 SR 89 B sraw[.] Shift Right Algebraic Word X 31 824 SR 89 B srawi[.] Shift Right Algebraic Word Immediate X 31 536 SR 88 B srw[.] Shift Right Word D 38 51 B stb Store Byte D 39 51 B stbu Store Byte with Update X 31 247 51 B stbux Store Byte with Update Indexed X 31 215 51 B stbx Store Byte Indexed D 44 52 B sth Store Halfword X 31 918 55 B sthbrx Store Halfword Byte-Reverse Indexed D 45 52 B sthu Store Halfword with Update X 31 439 52 B sthux Store Halfword with Update Indexed X 31 407 52 B sthx Store Halfword Indexed D 47 57 B stmw Store Multiple Word D 36 53 B stw Store Word X 31 662 55 B stwbrx Store Word Byte-Reverse Indexed X 31 150 442 B stwcx. Store Word Conditional Indexed D 37 53 B stwu Store Word with Update X 31 183 53 B stwux Store Word with Update Indexed X 31 151 53 B stwx Store Word Indexed XO 31 40 SR 63 B subf[o][.] Subtract From XO 31 8 SR 64 B subfc[o][.] Subtract From Carrying XO 31 136 SR 65 B subfe[o][.] Subtract From Extended D 8 SR 64 B subfic Subtract From Immediate Carrying XO 31 232 SR 65 B subfme[o][.] Subtract From Minus One Extended XO 31 200 SR 66 B subfze[o][.] Subtract From Zero Extended X 31 598 446 B sync Synchronize X 31 566 H 542, B tlbsync TLB Synchronize 659, 749 X 31 4 73 B tw Trap Word D 3 73 B twi Trap Word Immediate X 31 316 SR 77 B xor[.] XOR D 26 76 B xori XOR Immediate D 27 76 B xoris XOR Immediate Shifted A 31 15 74 B.in isel Integer Select XFX 31 19 96 B.in mfocrf Move From One Condition Register Field XFX 31 144 96 B.in mtocrf Move To One Condition Register Field X 31 122 81 B.in popcntb Population Count Bytes XO 31 74 SR H 495 BCDA addg6s Add and Generate Sixes X 31 314 H 494 BCDA cbcdtd Convert Binary Coded Decimal to Declets X 31 282 H 494 BCDA cdtbcd Convert Declets To Binary Coded Decimal X 59 2 163 DFP dadd DFP Add Appendix H. Power ISA Instruction Set Sorted by Category 909 Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 63 2 163 DFP daddq DFP Add Quad X 63 802 185 DFP dcffixq DFP Convert From Fixed Quad X 59 130 169 DFP dcmpo DFP Compare Ordered X 63 130 169 DFP dcmpoq DFP Compare Ordered Quad X 59 642 168 DFP dcmpu DFP Compare Unordered X 63 642 169 DFP dcmpuq DFP Compare Unordered Quad X 59 258 183 DFP dctdp DFP Convert To DFP Long X 59 290 185 DFP dctfix DFP Convert To Fixed X 63 290 185 DFP dctfixq DFP Convert To Fixed Quad X 63 258 183 DFP dctqpq DFP Convert To DFP Extended X 59 322 187 DFP ddedpd DFP Decode DPD To BCD X 63 322 187 DFP ddedpdq DFP Decode DPD To BCD Quad X 59 546 166 DFP ddiv DFP Divide X 63 546 166 DFP ddivq DFP Divide Quad X 59 834 187 DFP denbcd DFP Encode BCD To DPD X 63 834 187 DFP denbcdq DFP Encode BCD To DPD Quad X 59 866 188 DFP diex DFP Insert Biased Exponent X 63 866 188 DFP diexq DFP Insert Biased Exponent Quad X 59 34 165 DFP dmul DFP Multiply X 63 34 165 DFP dmulq DFP Multiply Quad Z 59 3 174 DFP dqua DFP Quantize Z23 59 67 173 DFP dquai[.] DFP Quantize Immediate Z23 63 67 173 DFP dquaiq[.] DFP Quantize Immediate Quad Z23 63 3 174 DFP dquaq[.] DFP Quantize Quad X 63 770 184 DFP drdpq DFP Round To DFP Long Z23 59 227 181 DFP drintn[.] DFP Round To FP Integer Without Inexact Z23 63 227 181 DFP drintnq[.] DFP Round To FP Integer Without Inexact Quad Z23 59 99 179 DFP drintx[.] DFP Round To FP Integer With Inexact Z23 63 99 179 DFP drintxq[.] DFP Round To FP Integer With Inexact Quad Z 59 35 176 DFP drrnd DFP Reround Z23 63 35 176 DFP drrndq[.] DFP Reround Quad X 59 770 184 DFP drsp DFP Round To DFP Short Z23 59 66 190 DFP dscli[.] DFP Shift Significand Left Immediate Z23 63 66 190 DFP dscliq[.] DFP Shift Significand Left Immediate Quad Z 59 98 190 DFP dscri DFP Shift Significand Right Immediate Z 63 98 190 DFP dscriq DFP Shift Significand Right Immediate Quad X 59 514 163 DFP dsub DFP Subtract X 63 514 163 DFP dsubq DFP Subtract Quad Z23 59 194 170 DFP dtstdc DFP Test Data Class Z23 63 194 170 DFP dtstdcq DFP Test Data Class Quad Z23 59 226 170 DFP dtstdg DFP Test Data Group Z23 63 226 170 DFP dtstdgq DFP Test Data Group Quad X 59 162 171 DFP dtstex DFP Test Exponent X 63 162 171 DFP dtstexq DFP Test Exponent Quad X 59 674 172 DFP dtstsf DFP Test Significance X 63 674 172 DFP dtstsfq DFP Test Significance Quad X 59 354 188 DFP dxex DFP Extract Biased Exponent X 63 354 188 DFP dxexq DFP Extract Biased Exponent Quad X 31 758 433 E dcba Data Cache Block Allocate X 31 470 P 652 E dcbi Data Cache Block Invalidate X 31 22 428 E icbt Instruction Cache Block Touch X 31 854 448 E mbar Memory Barrier X 31 512 97 E mcrxr Move to Condition Register from XER X 31 275 97 E mfapidi Move From APID Indirect XFX 31 323 P 625 E mfdcr Move From Device Control Register X 31 291 P 97 E mfdcrux Move From Device Control Register User-mode Indexed X 31 259 P 625 E mfdcrx Move From Device Control Register Indexed XFX 31 451 P 624 E mtdcr Move To Device Control Register X 31 419 97 E mtdcrux Move To Device Control Register User-mode Indexed X 31 387 P 624 E mtdcrx Move To Device Control Register Indexed X 31 146 P 625 E mtmsr Move To Machine State Register 910 Power ISATM Appendices Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext XL 19 51 P 614 E rfci Return From Critical Interrupt XL 19 50 P 613 E rfi Return From Interrupt XL 19 38 P 614 E rfmci Return From Machine Check Interrupt X 31 786 P 658, E tlbivax TLB Invalidate Virtual Address Indexed 747 X 31 946 P 658, E tlbre TLB Read Entry 748 X 31 914 P 659, E tlbsx TLB Search Indexed 748 X 31 978 P 660, E tlbwe TLB Write Entry 749 X 31 131 P 626 E wrtee Write MSR External Enable X 31 163 P 626 E wrteei Write MSR External Enable Immediate X 31 326 P 730 E.CD dcread Data Cache Read [Alternative Encoding] X 31 486 P 730 E.CD dcread Data Cache Read X 31 998 P 731 E.CD icread Instruction Cache Read X 31 454 P 727 E.CI dci Data Cache Invalidate X 31 966 P 727 E.CI ici Instruction Cache Invalidate XFX 19 198 718 E.ED dnh Debugger Notify Halt X 19 39 P 614 E.ED rfdi Return From Debug Interrupt X 31 238 P 721 E.PC msgclr Message Clear X 31 206 P 721 E.PC msgsnd Message Send X 31 127 P 632 E.PD dcbfep Data Cache Block Flush by External PID X 31 63 P 631 E.PD dcbstep Data Cache Block Store by External PID X 31 319 P 631 E.PD dcbtep Data Cache Block Touch by External PID X 31 255 P 633 E.PD dcbtstep Data Cache Block Touch for Store by External PID X 31 1023 P 634 E.PD dcbzep Data Cache Block set to Zero by External PID EVX 31 285 P 636 E.PD evlddepx Vector Load Doubleword into Doubleword by External Process ID Indexed EVX 31 413 P 636 E.PD evstddepx Vector Store Doubleword into Doubleword by External Process ID Indexed X 31 991 P 634 E.PD icbiep Instruction Cache Block Invalidate by External PID X 31 95 P 627 E.PD lbepx Load Byte by External Process ID Indexed X 31 29 P 628 E.PD ldepx Load Doubleword by External Process ID Indexed X 31 607 P 635 E.PD lfdepx Load Floating-Point Double by External Process ID Indexed X 31 287 P 627 E.PD lhepx Load Halfword by External Process ID Indexed X 31 295 P 637 E.PD lvepx Load Vector by External Process ID Indexed X 31 263 P 637 E.PD lvepxl Load Vector by External Process ID Indexed LRU X 31 31 P 628 E.PD lwepx Load Word by External Process ID Indexed X 31 223 P 629 E.PD stbepx Store Byte by External Process ID Indexed X 31 157 P 630 E.PD stdepx Store Doubleword by External Process ID Indexed X 31 735 P 635 E.PD stfdepx Store Floating-Point Double by External Process ID Indexed X 31 415 P 629 E.PD sthepx Store Halfword by External Process ID Indexed X 31 807 P 638 E.PD stvepx Store Vector by External Process ID Indexed X 31 775 P 638 E.PD stvepxl Store Vector by External Process ID Indexed LRU X 31 159 P 630 E.PD stwepx Store Word by External Process ID Indexed XFX 31 334 O 756 E.PM mfpmr Move From Performance Monitor Register XFX 31 462 O 756 E.PM mtpmr Move To Performance Monitor Register X 31 310 456 EC eciwx External Control In Word Indexed X 31 438 456 EC ecowx External Control Out Word Indexed X 31 390 M 656 ECL dcblc Data Cache Block Lock Clear X 31 166 M 655 ECL dcbtls Data Cache Block Touch and Lock Set X 31 134 M 655 ECL dcbtstls Data Cache Block Touch for Store and Lock Set X 31 230 M 657 ECL icblc Instruction Cache Block Lock Clear X 31 486 M 656 ECL icbtls Instruction Cache Block Touch and Lock Set X 63 32 138 FP fcmpo Floating Compare Ordered X 63 0 138 FP fcmpu Floating Compare Unordered D 50 119 FP lfd Load Floating-Point Double D 51 119 FP lfdu Load Floating-Point Double with Update Appendix H. Power ISA Instruction Set Sorted by Category 911 Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 31 631 119 FP lfdux Load Floating-Point Double with Update Indexed X 31 599 119 FP lfdx Load Floating-Point Double Indexed X 31 855 120 FP lfiwax Load Floating-Point as Integer Word Algebraic Indexed D 48 122 FP lfs Load Floating-Point Single D 49 122 FP lfsu Load Floating-Point Single with Update X 31 567 122 FP lfsux Load Floating-Point Single with Update Indexed X 31 535 122 FP lfsx Load Floating-Point Single Indexed X 63 64 140 FP mcrfs Move to Condition Register from FPSCR D 54 123 FP stfd Store Floating-Point Double D 55 123 FP stfdu Store Floating-Point Double with Update X 31 759 123 FP stfdux Store Floating-Point Double with Update Indexed X 31 727 123 FP stfdx Store Floating-Point Double Indexed X 31 983 124 FP stfiwx Store Floating-Point as Integer Word Indexed D 52 122 FP stfs Store Floating-Point Single D 53 122 FP stfsu Store Floating-Point Single with Update X 31 695 122 FP stfsux Store Floating-Point Single with Update Indexed X 31 663 122 FP stfsx Store Floating-Point Single Indexed DS 57 0 125 FP.out lfdp Load Floating-Point Double Pair X 31 791 125 FP.out lfdpx Load Floating-Point Double Pair Indexed DS 61 - 125 FP.out stfdp Store Floating-Point Double Pair X 31 919 125 FP.out stfdpx Store Floating-Point Double Pair Indexed X 63 264 126 FP[R] fabs[.] Floating Absolute Value A 63 21 127 FP[R] fadd[.] Floating Add A 59 21 127 FP[R] fadds[.] Floating Add Single X 63 846 136 FP[R] fcfid[.] Floating Convert From Integer Doubleword X 63 8 126 FP[R] fcpsgn[.] Floating Copy Sign X 63 814 134 FP[R] fctid[.] Floating Convert To Integer Doubleword X 63 815 135 FP[R] fctidz[.] Floating Convert To Integer Doubleword with round toward Zero X 63 14 135 FP[R] fctiw[.] Floating Convert To Integer Word X 63 15 136 FP[R] fctiwz[.] Floating Convert To Integer Word with round toward Zero A 63 18 128 FP[R] fdiv[.] Floating Divide A 59 18 128 FP[R] fdivs[.] Floating Divide Single A 63 29 132 FP[R] fmadd[.] Floating Multiply-Add A 59 29 132 FP[R] fmadds[.] Floating Multiply-Add Single X 63 72 126 FP[R] fmr[.] Floating Move Register A 63 28 132 FP[R] fmsub[.] Floating Multiply-Subtract A 59 28 132 FP[R] fmsubs[.] Floating Multiply-Subtract Single A 63 25 128 FP[R] fmul[.] Floating Multiply A 59 25 128 FP[R] fmuls[.] Floating Multiply Single X 63 136 126 FP[R] fnabs[.] Floating Negative Absolute Value X 63 40 126 FP[R] fneg[.] Floating Negate A 63 31 133 FP[R] fnmadd[.] Floating Negative Multiply-Add A 59 31 133 FP[R] fnmadds[.] Floating Negative Multiply-Add Single A 63 30 133 FP[R] fnmsub[.] Floating Negative Multiply-Subtract A 59 30 133 FP[R] fnmsubs[.] Floating Negative Multiply-Subtract Single A 63 24 129 FP[R] fre[.] Floating Reciprocal Estimate A 59 24 129 FP[R] fres[.] Floating Reciprocal Estimate Single X 63 12 134 FP[R] frsp[.] Floating Round to Single-Precision A 63 23 139 FP[R] fsel[.] Floating Select A 63 22 129 FP[R] fsqrt[.] Floating Square Root A 59 22 129 FP[R] fsqrts[.] Floating Square Root Single A 63 20 127 FP[R] fsub[.] Floating Subtract A 59 20 127 FP[R] fsubs[.] Floating Subtract Single X 63 583 140 FP[R] mffs[.] Move From FPSCR X 63 70 142 FP[R] mtfsb0[.] Move To FPSCR Bit 0 X 63 38 142 FP[R] mtfsb1[.] Move To FPSCR Bit 1 XFL 63 711 141 FP[R] mtfsf[.] Move To FPSCR Fields X 63 134 141 FP[R] mtfsfi[.] Move To FPSCR Field Immediate X 63 488 137 FP[R].in frim[.] Floating Round to Integer Minus 912 Power ISATM Appendices Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 63 392 137 FP[R].in frin[.] Floating Round to Integer Nearest X 63 456 137 FP[R].in frip[.] Floating Round to Integer Plus X 63 424 137 FP[R].in friz[.] Floating Round to Integer Toward Zero A 63 26 130 FP[R].in frsqrte[.] Floating Reciprocal Square Root Estimate A 59 26 130 FP[R].in frsqrtes[.] Floating Reciprocal Square Root Estimate Single XO 4 172 351 LMA macchw[o][.] Multiply Accumulate Cross Halfword to Word Modulo Signed XO 4 236 351 LMA macchws[o][.] Multiply Accumulate Cross Halfword to Word Saturate Signed XO 4 204 352 LMA macchwsu[o][.] Multiply Accumulate Cross Halfword to Word Saturate Unsigned XO 4 140 352 LMA macchwu[o][.] Multiply Accumulate Cross Halfword to Word Modulo Unsigned XO 4 44 353 LMA machhw[o][.] Multiply Accumulate High Halfword to Word Modulo Signed XO 4 108 353 LMA machhws[o][.] Multiply Accumulate High Halfword to Word Saturate Signed XO 4 76 354 LMA machhwsu[o][.] Multiply Accumulate High Halfword to Word Saturate Unsigned XO 4 12 354 LMA machhwu[o][.] Multiply Accumulate High Halfword to Word Modulo Unsigned XO 4 428 355 LMA maclhw[o][.] Multiply Accumulate Low Halfword to Word Modulo Signed XO 4 492 355 LMA maclhws[o][.] Multiply Accumulate Low Halfword to Word Saturate Signed XO 4 460 356 LMA maclhwsu[o][.] Multiply Accumulate Low Halfword to Word Saturate Unsigned XO 4 396 356 LMA maclhwu[o][.] Multiply Accumulate Low Halfword to Word Modulo Unsigned X 4 168 356 LMA mulchw[.] Multiply Cross Halfword to Word Signed X 4 136 356 LMA mulchwu[.] Multiply Cross Halfword to Word Unsigned X 4 40 357 LMA mulhhw[.] Multiply High Halfword to Word Signed X 4 8 357 LMA mulhhwu[.] Multiply High Halfword to Word Unsigned X 4 424 357 LMA mullhw[.] Multiply Low Halfword to Word Signed X 4 392 357 LMA mullhwu[.] Multiply Low Halfword to Word Unsigned XO 4 174 358 LMA nmacchw[o][.] Negative Multiply Accumulate Cross Halfword to Word Modulo Signed XO 4 238 358 LMA nmacchws[o][.] Negative Multiply Accumulate Cross Halfword to Word Saturate Signed XO 4 46 359 LMA nmachhw[o][.] Negative Multiply Accumulate High Halfword to Word Modulo Signed XO 4 110 359 LMA nmachhws[o][.] Negative Multiply Accumulate High Halfword to Word Saturate Signed XO 4 430 360 LMA nmaclhw[o][.] Negative Multiply Accumulate Low Halfword to Word Modulo Signed XO 4 494 360 LMA nmaclhws[o][.] Negative Multiply Accumulate Low Halfword to Word Saturate Signed X 31 78 349 LMV dlmzb[.] Determine Leftmost Zero Byte DQ 56 P 493 LSQ lq Load Quadword DS 62 2 P 493 LSQ stq Store Quadword X 31 597 59 MA lswi Load String Word Immediate X 31 533 59 MA lswx Load String Word Indexed X 31 725 60 MA stswi Store String Word Immediate X 31 661 60 MA stswx Store String Word Indexed XL 19 402 H 482 S doze Doze X 31 854 448 S eieio Enforce In-order Execution of I/O XL 19 274 H 480 S hrfid Hypervisor Return From Interrupt Doubleword X 31 853 H 491 S lbzcix Load Byte and Zero Caching Inhibited Indexed X 31 885 H 491 S ldcix Load Doubleword Caching Inhibited Indexed X 31 821 H 491 S lhzcix Load Halfword and Zero Caching Inhibited Indexed Appendix H. Power ISA Instruction Set Sorted by Category 913 Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 31 789 H 491 S lwzcix Load Word and Zero Caching Inhibited Indexed X 31 595 32 P 538 S mfsr Move From Segment Register X 31 659 32 P 538 S mfsrin Move From Segment Register Indirect X 31 146 P 501 S mtmsr Move To Machine State Register X 31 178 P 502 S mtmsrd Move To Machine State Register Doubleword X 31 210 32 P 537 S mtsr Move To Segment Register X 31 242 32 P 537 S mtsrin Move To Segment Register Indirect XL 19 434 H 482 S nap Nap XL 19 18 P 480 S rfid Return From Interrupt Doubleword XL 19 498 H 483 S rvwinkle Rip Van Winkle X 31 979 SR P 535 S slbfee. SLB Find Entry ESID X 31 498 P 532 S slbia SLB Invalidate All X 31 434 P 531 S slbie SLB Invalidate Entry X 31 915 P 534 S slbmfee SLB Move From Entry ESID X 31 851 P 534 S slbmfev SLB Move From Entry VSID X 31 402 P 533 S slbmte SLB Move To Entry XL 19 466 H 483 S sleep Sleep X 31 981 H 492 S stbcix Store Byte Caching Inhibited Indexed X 31 1013 H 492 S stdcix Store Doubleword Caching Inhibited Indexed X 31 949 H 492 S sthcix Store Halfword Caching Inhibited Indexed X 31 917 H 492 S stwcix Store Word Caching Inhibited Indexed X 31 370 H 542 S tlbia TLB Invalidate All X 31 306 64 H 539 S tlbie TLB Invalidate Entry X 31 274 64 H 541 S tlbiel TLB Invalidate Entry Local XFX 31 371 451 S.out mftb Move From Time Base EVX 4 527 268 SP brinc Bit Reversed Increment EVX 4 520 268 SP evabs Vector Absolute Value EVX 4 514 268 SP evaddiw Vector Add Immediate Word EVX 4 1225 268 SP evaddsmiaaw Vector Add Signed, Modulo, Integer to Accumulator Word EVX 4 1217 269 SP evaddssiaaw Vector Add Signed, Saturate, Integer to Accumulator Word EVX 4 1224 269 SP evaddumiaaw Vector Add Unsigned, Modulo, Integer to Accumulator Word EVX 4 1216 269 SP evaddusiaaw Vector Add Unsigned, Saturate, Integer to Accumulator Word EVX 4 512 269 SP evaddw Vector Add Word EVX 4 529 270 SP evand Vector AND EVX 4 530 270 SP evandc Vector AND with Complement EVX 4 564 270 SP evcmpeq Vector Compare Equal EVX 4 561 270 SP evcmpgts Vector Compare Greater Than Signed EVX 4 560 271 SP evcmpgtu Vector Compare Greater Than Unsigned EVX 4 563 271 SP evcmplts Vector Compare Less Than Signed EVX 4 562 271 SP evcmpltu Vector Compare Less Than Unsigned EVX 4 526 272 SP evcntlsw Vector Count Leading Signed Bits Word EVX 4 525 272 SP evcntlzw Vector Count Leading Zeros Word EVX 4 1222 272 SP evdivws Vector Divide Word Signed EVX 4 1223 273 SP evdivwu Vector Divide Word Unsigned EVX 4 537 273 SP eveqv Vector Equivalent EVX 4 522 273 SP evextsb Vector Extend Sign Byte EVX 4 523 273 SP evextsh Vector Extend Sign Halfword EVX 4 769 274 SP evldd Vector Load Double Word into Double Word EVX 4 768 274 SP evlddx Vector Load Double Word into Double Word Indexed EVX 4 773 274 SP evldh Vector Load Double into Four Halfwords EVX 4 772 274 SP evldhx Vector Load Double into Four Halfwords Indexed EVX 4 771 275 SP evldw Vector Load Double into Two Words EVX 4 770 275 SP evldwx Vector Load Double into Two Words Indexed EVX 4 777 275 SP evlhhesplat Vector Load Halfword into Halfwords Even and Splat EVX 4 776 275 SP evlhhesplatx Vector Load Halfword into Halfwords Even and Splat Indexed 914 Power ISATM Appendices Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 783 276 SP evlhhossplat Vector Load Halfword into Halfword Odd Signed and Splat EVX 4 782 276 SP evlhhossplatx Vector Load Halfword into Halfword Odd Signed and Splat Indexed EVX 4 781 276 SP evlhhousplat Vector Load Halfword into Halfword Odd Unsigned and Splat EVX 4 780 276 SP evlhhousplatx Vector Load Halfword into Halfword Odd Unsigned and Splat Indexed EVX 4 785 277 SP evlwhe Vector Load Word into Two Halfwords Even EVX 4 784 277 SP evlwhex Vector Load Word into Two Halfwords Even Indexed EVX 4 791 277 SP evlwhos Vector Load Word into Two Halfwords Odd Signed (with sign extension) EVX 4 790 277 SP evlwhosx Vector Load Word into Two Halfwords Odd Signed Indexed (with sign extension) EVX 4 789 278 SP evlwhou Vector Load Word into Two Halfwords Odd Unsigned (zero-extended) EVX 4 788 278 SP evlwhoux Vector Load Word into Two Halfwords Odd Unsigned Indexed (zero-extended) EVX 4 797 278 SP evlwhsplat Vector Load Word into Two Halfwords and Splat EVX 4 796 278 SP evlwhsplatx Vector Load Word into Two Halfwords and Splat Indexed EVX 4 793 279 SP evlwwsplat Vector Load Word into Word and Splat EVX 4 792 279 SP evlwwsplatx Vector Load Word into Word and Splat Indexed EVX 4 556 279 SP evmergehi Vector Merge High EVX 4 558 280 SP evmergehilo Vector Merge High/Low EVX 4 557 279 SP evmergelo Vector Merge Low EVX 4 559 280 SP evmergelohi Vector Merge Low/High EVX 4 1323 280 SP evmhegsmfaa Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate EVX 4 1451 280 SP evmhegsmfan Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate Negative EVX 4 1321 281 SP evmhegsmiaa Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate EVX 4 1449 281 SP evmhegsmian Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate Negative EVX 4 1320 281 SP evmhegumiaa Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Accumulate EVX 4 1448 281 SP evmhegumian Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Accumulate Negative EVX 4 1035 282 SP evmhesmf Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional EVX 4 1067 282 SP evmhesmfa Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional to Accumulator EVX 4 1291 282 SP evmhesmfaaw Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional and Accumulate into Words EVX 4 1419 282 SP evmhesmfanw Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional and Accumulate Negative into Words EVX 4 1033 283 SP evmhesmi Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger EVX 4 1065 283 SP evmhesmia Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger to Accumulator EVX 4 1289 283 SP evmhesmiaaw Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger and Accumulate into Words EVX 4 1417 283 SP evmhesmianw Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger and Accumulate Negative into Words EVX 4 1027 284 SP evmhessf Vector Multiply Halfwords, Even, Signed, Saturate, Fractional EVX 4 1059 284 SP evmhessfa Vector Multiply Halfwords, Even, Signed, Saturate, Fractional to Accumulator Appendix H. Power ISA Instruction Set Sorted by Category 915 Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 1283 285 SP evmhessfaaw Vector Multiply Halfwords, Even, Signed, Saturate, Fractional and Accumulate into Words EVX 4 1411 285 SP evmhessfanw Vector Multiply Halfwords, Even, Signed, Saturate, Fractional and Accumulate Negative into Words EVX 4 1281 286 SP evmhessiaaw Vector Multiply Halfwords, Even, Signed, Saturate, Inte- ger and Accumulate into Words EVX 4 1409 286 SP evmhessianw Vector Multiply Halfwords, Even, Signed, Saturate, Inte- ger and Accumulate Negative into Words EVX 4 1032 287 SP evmheumi Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer EVX 4 1064 287 SP evmheumia Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer to Accumulator EVX 4 1288 287 SP evmheumiaaw Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate into Words EVX 4 1416 287 SP evmheumianw Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate Negative into Words EVX 4 1280 288 SP evmheusiaaw Vector Multiply Halfwords, Even, Unsigned, Saturate, Integer and Accumulate into Words EVX 4 1408 288 SP evmheusianw Vector Multiply Halfwords, Even, Unsigned, Saturate, Integer and Accumulate Negative into Words EVX 4 1327 289 SP evmhogsmfaa Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Fractional and Accumulate EVX 4 1455 289 SP evmhogsmfan Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Fractional and Accumulate Negative EVX 4 1325 289 SP evmhogsmiaa Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Integer and Accumulate EVX 4 1453 289 SP evmhogsmian Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Integer and Accumulate Negative EVX 4 1324 290 SP evmhogumiaa Vector Multiply Halfwords, Odd, Guarded, Unsigned, Modulo, Integer and Accumulate EVX 4 1452 290 SP evmhogumian Vector Multiply Halfwords, Odd, Guarded, Unsigned, Modulo, Integer and Accumulate Negative EVX 4 1039 290 SP evmhosmf Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional EVX 4 1071 290 SP evmhosmfa Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional to Accumulator EVX 4 1295 291 SP evmhosmfaaw Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional and Accumulate into Words EVX 4 1423 291 SP evmhosmfanw Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional and Accumulate Negative into Words EVX 4 1037 291 SP evmhosmi Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger EVX 4 1069 291 SP evmhosmia Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger to Accumulator EVX 4 1293 292 SP evmhosmiaaw Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger and Accumulate into Words EVX 4 1421 291 SP evmhosmianw Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger and Accumulate Negative into Words EVX 4 1031 293 SP evmhossf Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional EVX 4 1063 293 SP evmhossfa Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional to Accumulator EVX 4 1287 294 SP evmhossfaaw Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional and Accumulate into Words EVX 4 1415 294 SP evmhossfanw Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional and Accumulate Negative into Words EVX 4 1285 295 SP evmhossiaaw Vector Multiply Halfwords, Odd, Signed, Saturate, Inte- ger and Accumulate into Words 916 Power ISATM Appendices Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 1413 295 SP evmhossianw Vector Multiply Halfwords, Odd, Signed, Saturate, Inte- ger and Accumulate Negative into Words EVX 4 1036 295 SP evmhoumi Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer EVX 4 1068 295 SP evmhoumia Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer to Accumulator EVX 4 1292 296 SP evmhoumiaaw Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate into Words EVX 4 1420 292 SP evmhoumianw Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate Negative into Words EVX 4 1284 296 SP evmhousiaaw Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate into Words EVX 4 1412 296 SP evmhousianw Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate Negative into Words EVX 4 1220 297 SP evmra Initialize Accumulator EVX 4 1103 297 SP evmwhsmf Vector Multiply Word High Signed, Modulo, Fractional EVX 4 1135 297 SP evmwhsmfa Vector Multiply Word High Signed, Modulo, Fractional to Accumulator EVX 4 1101 297 SP evmwhsmi Vector Multiply Word High Signed, Modulo, Integer EVX 4 1133 297 SP evmwhsmia Vector Multiply Word High Signed, Modulo, Integer to Accumulator EVX 4 1095 298 SP evmwhssf Vector Multiply Word High Signed, Saturate, Fractional EVX 4 1127 298 SP evmwhssfa Vector Multiply Word High Signed, Saturate, Fractional to Accumulator EVX 4 1100 298 SP evmwhumi Vector Multiply Word High Unsigned, Modulo, Integer EVX 4 1132 298 SP evmwhumia Vector Multiply Word High Unsigned, Modulo, Integer to Accumulator EVX 4 1353 299 SP evmwlsmiaaw Vector Multiply Word Low Signed, Modulo, Integer and Accumulate into Words EVX 4 1481 299 SP evmwlsmianw Vector Multiply Word Low Signed, Modulo, Integer and Accumulate Negative in Words EVX 4 1345 299 SP evmwlssiaaw Vector Multiply Word Low Signed, Saturate, Integer and Accumulate into Words EVX 4 1473 299 SP evmwlssianw Vector Multiply Word Low Signed, Saturate, Integer and Accumulate Negative in Words EVX 4 1096 300 SP evmwlumi Vector Multiply Word Low Unsigned, Modulo, Integer EVX 4 1128 300 SP evmwlumia Vector Multiply Word Low Unsigned, Modulo, Integer to Accumulator EVX 4 1352 300 SP evmwlumiaaw Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate into Words EVX 4 1480 300 SP evmwlumianw Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate Negative in Words EVX 4 1344 301 SP evmwlusiaaw Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate into Words EVX 4 1472 301 SP evmwlusianw Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate Negative in Words EVX 4 1115 301 SP evmwsmf Vector Multiply Word Signed, Modulo, Fractional EVX 4 1147 301 SP evmwsmfa Vector Multiply Word Signed, Modulo, Fractional to Accumulator EVX 4 1371 302 SP evmwsmfaa Vector Multiply Word Signed, Modulo, Fractional and Accumulate EVX 4 1499 302 SP evmwsmfan Vector Multiply Word Signed, Modulo, Fractional and Accumulate Negative EVX 4 1113 302 SP evmwsmi Vector Multiply Word Signed, Modulo, Integer EVX 4 1145 302 SP evmwsmia Vector Multiply Word Signed, Modulo, Integer to Accu- mulator EVX 4 1369 302 SP evmwsmiaa Vector Multiply Word Signed, Modulo, Integer and Accumulate Appendix H. Power ISA Instruction Set Sorted by Category 917 Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 1497 302 SP evmwsmian Vector Multiply Word Signed, Modulo, Integer and Accumulate Negative EVX 4 1107 303 SP evmwssf Vector Multiply Word Signed, Saturate, Fractional EVX 4 1139 303 SP evmwssfa Vector Multiply Word Signed, Saturate, Fractional to Accumulator EVX 4 1363 303 SP evmwssfaa Vector Multiply Word Signed, Saturate, Fractional and Accumulate EVX 4 1491 304 SP evmwssfan Vector Multiply Word Signed, Saturate, Fractional and Accumulate Negative EVX 4 1112 304 SP evmwumi Vector Multiply Word Unsigned, Modulo, Integer EVX 4 1144 304 SP evmwumia Vector Multiply Word Unsigned, Modulo, Integer to Accumulator EVX 4 1368 305 SP evmwumiaa Vector Multiply Word Unsigned, Modulo, Integer and Accumulate EVX 4 1496 305 SP evmwumian Vector Multiply Word Unsigned, Modulo, Integer and Accumulate Negative EVX 4 542 305 SP evnand Vector NAND EVX 4 521 305 SP evneg Vector Negate EVX 4 536 305 SP evnor Vector NOR EVX 4 535 306 SP evor Vector OR EVX 4 539 306 SP evorc Vector OR with Complement EVX 4 552 306 SP evrlw Vector Rotate Left Word EVX 4 554 307 SP evrlwi Vector Rotate Left Word Immediate EVX 4 524 307 SP evrndw Vector Round Word EVS 4 79 307 SP evsel Vector Select EVX 4 548 308 SP evslw Vector Shift Left Word EVX 4 550 308 SP evslwi Vector Shift Left Word Immediate EVX 4 555 308 SP evsplatfi Vector Splat Fractional Immediate EVX 4 553 308 SP evsplati Vector Splat Immediate EVX 4 547 308 SP evsrwis Vector Shift Right Word Immediate Signed EVX 4 546 308 SP evsrwiu Vector Shift Right Word Immediate Unsigned EVX 4 545 309 SP evsrws Vector Shift Right Word Signed EVX 4 544 309 SP evsrwu Vector Shift Right Word Unsigned EVX 4 801 309 SP evstdd Vector Store Double of Double EVX 4 800 309 SP evstddx Vector Store Double of Double Indexed EVX 4 805 310 SP evstdh Vector Store Double of Four Halfwords EVX 4 804 310 SP evstdhx Vector Store Double of Four Halfwords Indexed EVX 4 803 310 SP evstdw Vector Store Double of Two Words EVX 4 802 310 SP evstdwx Vector Store Double of Two Words Indexed EVX 4 817 311 SP evstwhe Vector Store Word of Two Halfwords from Even EVX 4 816 311 SP evstwhex Vector Store Word of Two Halfwords from Even Indexed EVX 4 821 311 SP evstwho Vector Store Word of Two Halfwords from Odd EVX 4 820 311 SP evstwhox Vector Store Word of Two Halfwords from Odd Indexed EVX 4 825 311 SP evstwwe Vector Store Word of Word from Even EVX 4 824 311 SP evstwwex Vector Store Word of Word from Even Indexed EVX 4 829 312 SP evstwwo Vector Store Word of Word from Odd EVX 4 828 312 SP evstwwox Vector Store Word of Word from Odd Indexed EVX 4 1227 312 SP evsubfsmiaaw Vector Subtract Signed, Modulo, Integer to Accumulator Word EVX 4 1219 312 SP evsubfssiaaw Vector Subtract Signed, Saturate, Integer to Accumula- tor Word EVX 4 1226 313 SP evsubfumiaaw Vector Subtract Unsigned, Modulo, Integer to Accumu- lator Word EVX 4 1218 313 SP evsubfusiaaw Vector Subtract Unsigned, Saturate, Integer to Accumu- lator Word EVX 4 516 313 SP evsubfw Vector Subtract from Word EVX 4 518 313 SP evsubifw Vector Subtract Immediate from Word EVX 4 534 313 SP evxor Vector XOR EVX 4 740 335 SP.FD efdabs Floating-Point Double-Precision Absolute Value EVX 4 736 336 SP.FD efdadd Floating-Point Double-Precision Add 918 Power ISATM Appendices Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 751 342 SP.FD efdcfs Floating-Point Double-Precision Convert from Single- Precision EVX 4 755 340 SP.FD efdcfsf Convert Floating-Point Double-Precision from Signed Fraction EVX 4 753 339 SP.FD efdcfsi Convert Floating-Point Double-Precision from Signed Integer EVX 4 739 340 SP.FD efdcfsid Convert Floating-Point Double-Precision from Signed Integer Doubleword EVX 4 754 340 SP.FD efdcfuf Convert Floating-Point Double-Precision from Unsigned Fraction EVX 4 752 339 SP.FD efdcfui Convert Floating-Point Double-Precision from Unsigned Integer EVX 4 738 340 SP.FD efdcfuid Convert Floating-Point Double-Precision from Unsigned Integer Doubleword EVX 4 750 337 SP.FD efdcmpeq Floating-Point Double-Precision Compare Equal EVX 4 748 337 SP.FD efdcmpgt Floating-Point Double-Precision Compare Greater Than EVX 4 749 337 SP.FD efdcmplt Floating-Point Double-Precision Compare Less Than EVX 4 759 342 SP.FD efdctsf Convert Floating-Point Double-Precision to Signed Fraction EVX 4 757 340 SP.FD efdctsi Convert Floating-Point Double-Precision to Signed Inte- ger EVX 4 747 341 SP.FD efdctsidz Convert Floating-Point Double-Precision to Signed Inte- ger Doubleword with Round toward Zero EVX 4 762 342 SP.FD efdctsiz Convert Floating-Point Double-Precision to Signed Inte- ger with Round toward Zero EVX 4 758 342 SP.FD efdctuf Convert Floating-Point Double-Precision to Unsigned Fraction EVX 4 756 340 SP.FD efdctui Convert Floating-Point Double-Precision to Unsigned Integer EVX 4 746 341 SP.FD efdctuidz Convert Floating-Point Double-Precision to Unsigned Integer Doubleword with Round toward Zero EVX 4 760 342 SP.FD efdctuiz Convert Floating-Point Double-Precision to Unsigned Integer with Round toward Zero EVX 4 745 336 SP.FD efddiv Floating-Point Double-Precision Divide EVX 4 744 336 SP.FD efdmul Floating-Point Double-Precision Multiply EVX 4 741 335 SP.FD efdnabs Floating-Point Double-Precision Negative Absolute Value EVX 4 742 335 SP.FD efdneg Floating-Point Double-Precision Negate EVX 4 737 336 SP.FD efdsub Floating-Point Double-Precision Subtract EVX 4 766 338 SP.FD efdtsteq Floating-Point Double-Precision Test Equal EVX 4 764 337 SP.FD efdtstgt Floating-Point Double-Precision Test Greater Than EVX 4 765 338 SP.FD efdtstlt Floating-Point Double-Precision Test Less Than EVX 4 719 343 SP.FD efscfd Floating-Point Single-Precision Convert from Double- Precision EVX 4 708 328 SP.FS efsabs Floating-Point Single-Precision Absolute Value EVX 4 704 329 SP.FS efsadd Floating-Point Single-Precision Add EVX 4 723 333 SP.FS efscfsf Convert Floating-Point Single-Precision from Signed Fraction EVX 4 721 333 SP.FS efscfsi Convert Floating-Point Single-Precision from Signed Integer EVX 4 722 333 SP.FS efscfuf Convert Floating-Point Single-Precision from Unsigned Fraction EVX 4 720 333 SP.FS efscfui Convert Floating-Point Single-Precision from Unsigned Integer EVX 4 718 331 SP.FS efscmpeq Floating-Point Single-Precision Compare Equal EVX 4 716 330 SP.FS efscmpgt Floating-Point Single-Precision Compare Greater Than EVX 4 717 330 SP.FS efscmplt Floating-Point Single-Precision Compare Less Than EVX 4 727 334 SP.FS efsctsf Convert Floating-Point Single-Precision to Signed Frac- tion Appendix H. Power ISA Instruction Set Sorted by Category 919 Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 725 333 SP.FS efsctsi Convert Floating-Point Single-Precision to Signed Inte- ger EVX 4 730 334 SP.FS efsctsiz Convert Floating-Point Single-Precision to Signed Inte- ger with Round toward Zero EVX 4 726 334 SP.FS efsctuf Convert Floating-Point Single-Precision to Unsigned Fraction EVX 4 724 333 SP.FS efsctui Convert Floating-Point Single-Precision to Unsigned Integer EVX 4 728 334 SP.FS efsctuiz Convert Floating-Point Single-Precision to Unsigned Integer with Round toward Zero EVX 4 713 329 SP.FS efsdiv Floating-Point Single-Precision Divide EVX 4 712 329 SP.FS efsmul Floating-Point Single-Precision Multiply EVX 4 709 328 SP.FS efsnabs Floating-Point Single-Precision Negative Absolute Value EVX 4 710 328 SP.FS efsneg Floating-Point Single-Precision Negate EVX 4 705 329 SP.FS efssub Floating-Point Single-Precision Subtract EVX 4 734 332 SP.FS efststeq Floating-Point Single-Precision Test Equal EVX 4 732 331 SP.FS efststgt Floating-Point Single-Precision Test Greater Than EVX 4 733 332 SP.FS efststlt Floating-Point Single-Precision Test Less Than EVX 4 644 319 SP.FV evfsabs Vector Floating-Point Single-Precision Absolute Value EVX 4 640 320 SP.FV evfsadd Vector Floating-Point Single-Precision Add EVX 4 659 324 SP.FV evfscfsf Vector Convert Floating-Point Single-Precision from Signed Fraction EVX 4 657 324 SP.FV evfscfsi Vector Convert Floating-Point Single-Precision from Signed Integer EVX 4 658 324 SP.FV evfscfuf Vector Convert Floating-Point Single-Precision from Unsigned Fraction EVX 4 656 324 SP.FV evfscfui Vector Convert Floating-Point Single-Precision from Unsigned Integer EVX 4 654 322 SP.FV evfscmpeq Vector Floating-Point Single-Precision Compare Equal EVX 4 652 321 SP.FV evfscmpgt Vector Floating-Point Single-Precision Compare Greater Than EVX 4 653 321 SP.FV evfscmplt Vector Floating-Point Single-Precision Compare Less Than EVX 4 663 326 SP.FV evfsctsf Vector Convert Floating-Point Single-Precision to Signed Fraction EVX 4 661 325 SP.FV evfsctsi Vector Convert Floating-Point Single-Precision to Signed Integer EVX 4 666 325 SP.FV evfsctsiz Vector Convert Floating-Point Single-Precision to Signed Integer with Round toward Zero EVX 4 662 326 SP.FV evfsctuf Vector Convert Floating-Point Single-Precision to Unsigned Fraction EVX 4 660 325 SP.FV evfsctui Vector Convert Floating-Point Single-Precision to Unsigned Integer EVX 4 664 325 SP.FV evfsctuiz Vector Convert Floating-Point Single-Precision to Unsigned Integer with Round toward Zero EVX 4 649 320 SP.FV evfsdiv Vector Floating-Point Single-Precision Divide EVX 4 648 320 SP.FV evfsmul Vector Floating-Point Single-Precision Multiply EVX 4 645 319 SP.FV evfsnabs Vector Floating-Point Single-Precision Negative Abso- lute Value EVX 4 646 319 SP.FV evfsneg Vector Floating-Point Single-Precision Negate EVX 4 641 320 SP.FV evfssub Vector Floating-Point Single-Precision Subtract EVX 4 670 323 SP.FV evfststeq Vector Floating-Point Single-Precision Test Equal EVX 4 668 322 SP.FV evfststgt Vector Floating-Point Single-Precision Test Greater Than EVX 4 669 323 SP.FV evfststlt Vector Floating-Point Single-Precision Test Less Than X 31 7 206 V lvebx Load Vector Element Byte Indexed X 31 39 203 V lvehx Load Vector Element Halfword Indexed X 31 71 203 V lvewx Load Vector Element Word Indexed X 31 6 208 V lvsl Load Vector for Shift Left Indexed 920 Power ISATM Appendices Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 31 38 208 V lvsr Load Vector for Shift Right Indexed X 31 103 204 V lvx Load Vector Indexed X 31 359 204 V lvxl Load Vector Indexed LRU VX 4 1540 259 V mfvscr Move From Vector Status and Control Register VX 4 1604 259 V mtvscr Move To Vector Status and Control Register X 31 135 206 V stvebx Store Vector Element Byte Indexed X 31 167 206 V stvehx Store Vector Element Halfword Indexed X 31 199 207 V stvewx Store Vector Element Word Indexed X 31 231 204 V stvx Store Vector Indexed X 31 487 207 V stvxl Store Vector Indexed LRU VX 4 384 220 V vaddcuw Vector Add and Write Carry-Out Unsigned Word VX 4 10 249 V vaddfp Vector Add Single-Precision VX 4 768 220 V vaddsbs Vector Add Signed Byte Saturate VX 4 832 220 V vaddshs Vector Add Signed Halfword Saturate VX 4 896 220 V vaddsws Vector Add Signed Word Saturate VX 4 0 221 V vaddubm Vector Add Unsigned Byte Modulo VX 4 512 222 V vaddubs Vector Add Unsigned Byte Saturate VX 4 64 221 V vadduhm Vector Add Unsigned Halfword Modulo VX 4 576 222 V vadduhs Vector Add Unsigned Halfword Saturate VX 4 128 221 V vadduwm Vector Add Unsigned Word Modulo VX 4 640 222 V vadduws Vector Add Unsigned Word Saturate VX 4 1028 244 V vand Vector Logical AND VX 4 1092 244 V vandc Vector Logical AND with Complement VX 4 1282 235 V vavgsb Vector Average Signed Byte VX 4 1346 235 V vavgsh Vector Average Signed Halfword VX 4 1410 235 V vavgsw Vector Average Signed Word VX 4 1026 236 V vavgub Vector Average Unsigned Byte VX 4 1090 236 V vavguh Vector Average Unsigned Halfword VX 4 1154 236 V vavguw Vector Average Unsigned Word VX 4 842 253 V vcfsx Vector Convert From Signed Fixed-Point Word VX 4 778 253 V vcfux Vector Convert From Unsigned Fixed-Point Word VC 4 966 255 V vcmpbfp[.] Vector Compare Bounds Single-Precision VC 4 198 255 V vcmpeqfp[.] Vector Compare Equal To Single-Precision VC 4 6 241 V vcmpequb[.] Vector Compare Equal To Unsigned Byte VC 4 70 241 V vcmpequh[.] Vector Compare Equal To Unsigned Halfword VC 4 134 242 V vcmpequw[.] Vector Compare Equal To Unsigned Word VC 4 454 256 V vcmpgefp[.] Vector Compare Greater Than or Equal To Single-Pre- cision VC 4 710 256 V vcmpgtfp[.] Vector Compare Greater Than Single-Precision VC 4 774 242 V vcmpgtsb[.] Vector Compare Greater Than Signed Byte VC 4 838 242 V vcmpgtsh[.] Vector Compare Greater Than Signed Halfword VC 4 902 242 V vcmpgtsw[.] Vector Compare Greater Than Signed Word VC 4 518 243 V vcmpgtub[.] Vector Compare Greater Than Unsigned Byte VC 4 582 243 V vcmpgtuh[.] Vector Compare Greater Than Unsigned Halfword VC 4 646 243 V vcmpgtuw[.] Vector Compare Greater Than Unsigned Word VX 4 970 252 V vctsxs Vector Convert To Signed Fixed-Point Word Saturate VX 4 906 252 V vctuxs Vector Convert To Unsigned Fixed-Point Word Saturate VX 4 394 257 V vexptefp Vector 2 Raised to the Exponent Estimate Floating- Point VX 4 458 257 V vlogefp Vector Log Base 2 Estimate Floating-Point VA 4 46 250 V vmaddfp Vector Multiply-Add Single-Precision VX 4 1034 251 V vmaxfp Vector Maximum Single-Precision VX 4 258 237 V vmaxsb Vector Maximum Signed Byte VX 4 322 237 V vmaxsh Vector Maximum Signed Halfword VX 4 386 237 V vmaxsw Vector Maximum Signed Word VX 4 2 238 V vmaxub Vector Maximum Unsigned Byte VX 4 66 238 V vmaxuh Vector Maximum Unsigned Halfword VX 4 130 238 V vmaxuw Vector Maximum Unsigned Word VA 4 32 228 V vmhaddshs Vector Multiply-High-Add Signed Halfword Saturate VA 4 33 228 V vmhraddshs Vector Multiply-High-Round-Add Signed Halfword Satu- rate Appendix H. Power ISA Instruction Set Sorted by Category 921 Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext VX 4 1098 251 V vminfp Vector Minimum Single-Precision VX 4 770 239 V vminsb Vector Minimum Signed Byte VX 4 834 239 V vminsh Vector Minimum Signed Halfword VX 4 898 239 V vminsw Vector Minimum Signed Word VX 4 514 240 V vminub Vector Minimum Unsigned Byte VX 4 578 240 V vminuh Vector Minimum Unsigned Halfword VX 4 642 240 V vminuw Vector Minimum Unsigned Word VA 4 34 229 V vmladduhm Vector Multiply-Low-Add Unsigned Halfword Modulo VX 4 12 214 V vmrghb Vector Merge High Byte VX 4 76 214 V vmrghh Vector Merge High Halfword VX 4 140 214 V vmrghw Vector Merge High Word VX 4 268 215 V vmrglb Vector Merge Low Byte VX 4 332 215 V vmrglh Vector Merge Low Halfword VX 4 396 215 V vmrglw Vector Merge Low Word VA 4 37 230 V vmsummbm Vector Multiply-Sum Mixed Byte Modulo VA 4 40 230 V vmsumshm Vector Multiply-Sum Signed Halfword Modulo VA 4 41 231 V vmsumshs Vector Multiply-Sum Signed Halfword Saturate VA 4 36 229 V vmsumubm Vector Multiply-Sum Unsigned Byte Modulo VA 4 38 231 V vmsumuhm Vector Multiply-Sum Unsigned Halfword Modulo VA 4 39 232 V vmsumuhs Vector Multiply-Sum Unsigned Halfword Saturate VX 4 776 226 V vmulesb Vector Multiply Even Signed Byte VX 4 840 226 V vmulesh Vector Multiply Even Signed Halfword VX 4 520 226 V vmuleub Vector Multiply Even Unsigned Byte VX 4 584 226 V vmuleuh Vector Multiply Even Unsigned Halfword VX 4 264 227 V vmulosb Vector Multiply Odd Signed Byte VX 4 328 227 V vmulosh Vector Multiply Odd Signed Halfword VX 4 8 227 V vmuloub Vector Multiply Odd Unsigned Byte VX 4 72 227 V vmulouh Vector Multiply Odd Unsigned Halfword VA 4 47 250 V vnmsubfp Vector Negative Multiply-Subtract Single-Precision VX 4 1284 244 V vnor Vector Logical NOR VX 4 1156 244 V vor Vector Logical OR VA 4 43 217 V vperm Vector Permute VX 4 782 209 V vpkpx Vector Pack Pixel VX 4 398 210 V vpkshss Vector Pack Signed Halfword Signed Saturate VX 4 270 210 V vpkshus Vector Pack Signed Halfword Unsigned Saturate VX 4 462 210 V vpkswss Vector Pack Signed Word Signed Saturate VX 4 334 210 V vpkswus Vector Pack Signed Word Unsigned Saturate VX 4 14 211 V vpkuhum Vector Pack Unsigned Halfword Unsigned Modulo VX 4 142 211 V vpkuhus Vector Pack Unsigned Halfword Unsigned Saturate VX 4 78 211 V vpkuwum Vector Pack Unsigned Word Unsigned Modulo VX 4 206 211 V vpkuwus Vector Pack Unsigned Word Unsigned Saturate VX 4 266 258 V vrefp Vector Reciprocal Estimate Single-Precision VX 4 714 254 V vrfim Vector Round to Single-Precision Integer toward -Infin- ity VX 4 522 254 V vrfin Vector Round to Single-Precision Integer Nearest VX 4 650 254 V vrfip Vector Round to Single-Precision Integer toward +Infin- ity VX 4 586 254 V vrfiz Vector Round to Single-Precision Integer toward Zero VX 4 4 245 V vrlb Vector Rotate Left Byte VX 4 68 245 V vrlh Vector Rotate Left Halfword VX 4 132 245 V vrlw Vector Rotate Left Word VX 4 330 258 V vrsqrtefp Vector Reciprocal Square Root Estimate Single-Preci- sion VA 4 42 217 V vsel Vector Select VX 4 452 218 V vsl Vector Shift Left VX 4 260 246 V vslb Vector Shift Left Byte VA 4 44 218 V vsldoi Vector Shift Left Double by Octet Immediate VX 4 324 246 V vslh Vector Shift Left Halfword VX 4 1036 218 V vslo Vector Shift Left by Octet VX 4 388 246 V vslw Vector Shift Left Word VX 4 524 216 V vspltb Vector Splat Byte 922 Power ISATM Appendices Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext VX 4 588 216 V vsplth Vector Splat Halfword VX 4 780 216 V vspltisb Vector Splat Immediate Signed Byte VX 4 844 216 V vspltish Vector Splat Immediate Signed Halfword VX 4 908 216 V vspltisw Vector Splat Immediate Signed Word VX 4 652 216 V vspltw Vector Splat Word VX 4 708 219 V vsr Vector Shift Right VX 4 772 248 V vsrab Vector Shift Right Algebraic Byte VX 4 836 248 V vsrah Vector Shift Right Algebraic Halfword VX 4 900 248 V vsraw Vector Shift Right Algebraic Word VX 4 516 247 V vsrb Vector Shift Right Byte VX 4 580 247 V vsrh Vector Shift Right Halfword VX 4 1100 219 V vsro Vector Shift Right by Octet VX 4 644 247 V vsrw Vector Shift Right Word VX 4 1408 223 V vsubcuw Vector Subtract and Write Carry-Out Unsigned Word VX 4 74 249 V vsubfp Vector Subtract Single-Precision VX 4 1792 223 V vsubsbs Vector Subtract Signed Byte Saturate VX 4 1856 223 V vsubshs Vector Subtract Signed Halfword Saturate VX 4 1920 223 V vsubsws Vector Subtract Signed Word Saturate VX 4 1024 224 V vsububm Vector Subtract Unsigned Byte Modulo VX 4 1536 225 V vsububs Vector Subtract Unsigned Byte Saturate VX 4 1088 224 V vsubuhm Vector Subtract Unsigned Halfword Modulo VX 4 1600 224 V vsubuhs Vector Subtract Unsigned Halfword Saturate VX 4 1152 224 V vsubuwm Vector Subtract Unsigned Word Modulo VX 4 1664 225 V vsubuws Vector Subtract Unsigned Word Saturate VX 4 1672 233 V vsum2sws Vector Sum across Half Signed Word Saturate VX 4 1800 234 V vsum4sbs Vector Sum across Quarter Signed Byte Saturate VX 4 1608 234 V vsum4shs Vector Sum across Quarter Signed Halfword Saturate VX 4 1544 234 V vsum4ubs Vector Sum across Quarter Unsigned Byte Saturate VX 4 1928 233 V vsumsws Vector Sum across Signed Word Saturate VX 4 846 212 V vupkhpx Vector Unpack High Pixel VX 4 526 212 V vupkhsb Vector Unpack High Signed Byte VX 4 590 212 V vupkhsh Vector Unpack High Signed Halfword VX 4 974 213 V vupklpx Vector Unpack Low Pixel VX 4 654 213 V vupklsb Vector Unpack Low Signed Byte VX 4 718 213 V vupklsh Vector Unpack Low Signed Halfword VX 4 1220 244 V vxor Vector Logical XOR X 31 62 449 WT wait Wait 1 See the key to the mode dependency and privilege columns on page 905 and the key to the category column in Section 1.3.5 of Book I. Appendix H. Power ISA Instruction Set Sorted by Category 923 Version 2.05 924 Power ISATM Appendices Version 2.05 Appendix I. Power ISA Instruction Set Sorted by Opcode This appendix lists all the instructions in the Power ISA, in order by opcode Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext D 2 74 64 tdi Trap Doubleword Immediate D 3 73 B twi Trap Word Immediate VX 4 0 221 V vaddubm Vector Add Unsigned Byte Modulo VX 4 2 238 V vmaxub Vector Maximum Unsigned Byte VX 4 4 245 V vrlb Vector Rotate Left Byte VC 4 6 241 V vcmpequb[.] Vector Compare Equal To Unsigned Byte X 4 8 357 LMA mulhhwu[.] Multiply High Halfword to Word Unsigned VX 4 8 227 V vmuloub Vector Multiply Odd Unsigned Byte VX 4 10 249 V vaddfp Vector Add Single-Precision XO 4 12 354 LMA machhwu[o][.] Multiply Accumulate High Halfword to Word Modulo Unsigned VX 4 12 214 V vmrghb Vector Merge High Byte VX 4 14 211 V vpkuhum Vector Pack Unsigned Halfword Unsigned Modulo VA 4 32 228 V vmhaddshs Vector Multiply-High-Add Signed Halfword Saturate VA 4 33 228 V vmhraddshs Vector Multiply-High-Round-Add Signed Halfword Satu- rate VA 4 34 229 V vmladduhm Vector Multiply-Low-Add Unsigned Halfword Modulo VA 4 36 229 V vmsumubm Vector Multiply-Sum Unsigned Byte Modulo VA 4 37 230 V vmsummbm Vector Multiply-Sum Mixed Byte Modulo VA 4 38 231 V vmsumuhm Vector Multiply-Sum Unsigned Halfword Modulo VA 4 39 232 V vmsumuhs Vector Multiply-Sum Unsigned Halfword Saturate X 4 40 357 LMA mulhhw[.] Multiply High Halfword to Word Signed VA 4 40 230 V vmsumshm Vector Multiply-Sum Signed Halfword Modulo VA 4 41 231 V vmsumshs Vector Multiply-Sum Signed Halfword Saturate VA 4 42 217 V vsel Vector Select VA 4 43 217 V vperm Vector Permute XO 4 44 353 LMA machhw[o][.] Multiply Accumulate High Halfword to Word Modulo Signed VA 4 44 218 V vsldoi Vector Shift Left Double by Octet Immediate XO 4 46 359 LMA nmachhw[o][.] Negative Multiply Accumulate High Halfword to Word Modulo Signed VA 4 46 250 V vmaddfp Vector Multiply-Add Single-Precision VA 4 47 250 V vnmsubfp Vector Negative Multiply-Subtract Single-Precision VX 4 64 221 V vadduhm Vector Add Unsigned Halfword Modulo VX 4 66 238 V vmaxuh Vector Maximum Unsigned Halfword VX 4 68 245 V vrlh Vector Rotate Left Halfword VC 4 70 241 V vcmpequh[.] Vector Compare Equal To Unsigned Halfword VX 4 72 227 V vmulouh Vector Multiply Odd Unsigned Halfword VX 4 74 249 V vsubfp Vector Subtract Single-Precision XO 4 76 354 LMA machhwsu[o][.] Multiply Accumulate High Halfword to Word Saturate Unsigned VX 4 76 214 V vmrghh Vector Merge High Halfword VX 4 78 211 V vpkuwum Vector Pack Unsigned Word Unsigned Modulo EVS 4 79 307 SP evsel Vector Select XO 4 108 353 LMA machhws[o][.] Multiply Accumulate High Halfword to Word Saturate Signed Appendix I. Power ISA Instruction Set Sorted by Opcode 925 Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext XO 4 110 359 LMA nmachhws[o][.] Negative Multiply Accumulate High Halfword to Word Saturate Signed VX 4 128 221 V vadduwm Vector Add Unsigned Word Modulo VX 4 130 238 V vmaxuw Vector Maximum Unsigned Word VX 4 132 245 V vrlw Vector Rotate Left Word VC 4 134 242 V vcmpequw[.] Vector Compare Equal To Unsigned Word X 4 136 356 LMA mulchwu[.] Multiply Cross Halfword to Word Unsigned XO 4 140 352 LMA macchwu[o][.] Multiply Accumulate Cross Halfword to Word Modulo Unsigned VX 4 140 214 V vmrghw Vector Merge High Word VX 4 142 211 V vpkuhus Vector Pack Unsigned Halfword Unsigned Saturate X 4 168 356 LMA mulchw[.] Multiply Cross Halfword to Word Signed XO 4 172 351 LMA macchw[o][.] Multiply Accumulate Cross Halfword to Word Modulo Signed XO 4 174 358 LMA nmacchw[o][.] Negative Multiply Accumulate Cross Halfword to Word Modulo Signed VC 4 198 255 V vcmpeqfp[.] Vector Compare Equal To Single-Precision XO 4 204 352 LMA macchwsu[o][.] Multiply Accumulate Cross Halfword to Word Saturate Unsigned VX 4 206 211 V vpkuwus Vector Pack Unsigned Word Unsigned Saturate XO 4 236 351 LMA macchws[o][.] Multiply Accumulate Cross Halfword to Word Saturate Signed XO 4 238 358 LMA nmacchws[o][.] Negative Multiply Accumulate Cross Halfword to Word Saturate Signed VX 4 258 237 V vmaxsb Vector Maximum Signed Byte VX 4 260 246 V vslb Vector Shift Left Byte VX 4 264 227 V vmulosb Vector Multiply Odd Signed Byte VX 4 266 258 V vrefp Vector Reciprocal Estimate Single-Precision VX 4 268 215 V vmrglb Vector Merge Low Byte VX 4 270 210 V vpkshus Vector Pack Signed Halfword Unsigned Saturate VX 4 322 237 V vmaxsh Vector Maximum Signed Halfword VX 4 324 246 V vslh Vector Shift Left Halfword VX 4 328 227 V vmulosh Vector Multiply Odd Signed Halfword VX 4 330 258 V vrsqrtefp Vector Reciprocal Square Root Estimate Single-Preci- sion VX 4 332 215 V vmrglh Vector Merge Low Halfword VX 4 334 210 V vpkswus Vector Pack Signed Word Unsigned Saturate VX 4 384 220 V vaddcuw Vector Add and Write Carry-Out Unsigned Word VX 4 386 237 V vmaxsw Vector Maximum Signed Word VX 4 388 246 V vslw Vector Shift Left Word X 4 392 357 LMA mullhwu[.] Multiply Low Halfword to Word Unsigned VX 4 394 257 V vexptefp Vector 2 Raised to the Exponent Estimate Floating- Point XO 4 396 356 LMA maclhwu[o][.] Multiply Accumulate Low Halfword to Word Modulo Unsigned VX 4 396 215 V vmrglw Vector Merge Low Word VX 4 398 210 V vpkshss Vector Pack Signed Halfword Signed Saturate X 4 424 357 LMA mullhw[.] Multiply Low Halfword to Word Signed XO 4 428 355 LMA maclhw[o][.] Multiply Accumulate Low Halfword to Word Modulo Signed XO 4 430 360 LMA nmaclhw[o][.] Negative Multiply Accumulate Low Halfword to Word Modulo Signed VX 4 452 218 V vsl Vector Shift Left VC 4 454 256 V vcmpgefp[.] Vector Compare Greater Than or Equal To Single-Pre- cision VX 4 458 257 V vlogefp Vector Log Base 2 Estimate Floating-Point XO 4 460 356 LMA maclhwsu[o][.] Multiply Accumulate Low Halfword to Word Saturate Unsigned VX 4 462 210 V vpkswss Vector Pack Signed Word Signed Saturate 926 Power ISATM Appendices Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext XO 4 492 355 LMA maclhws[o][.] Multiply Accumulate Low Halfword to Word Saturate Signed XO 4 494 360 LMA nmaclhws[o][.] Negative Multiply Accumulate Low Halfword to Word Saturate Signed EVX 4 512 269 SP evaddw Vector Add Word VX 4 512 222 V vaddubs Vector Add Unsigned Byte Saturate EVX 4 514 268 SP evaddiw Vector Add Immediate Word VX 4 514 240 V vminub Vector Minimum Unsigned Byte EVX 4 516 313 SP evsubfw Vector Subtract from Word VX 4 516 247 V vsrb Vector Shift Right Byte EVX 4 518 313 SP evsubifw Vector Subtract Immediate from Word VC 4 518 243 V vcmpgtub[.] Vector Compare Greater Than Unsigned Byte EVX 4 520 268 SP evabs Vector Absolute Value VX 4 520 226 V vmuleub Vector Multiply Even Unsigned Byte EVX 4 521 305 SP evneg Vector Negate EVX 4 522 273 SP evextsb Vector Extend Sign Byte VX 4 522 254 V vrfin Vector Round to Single-Precision Integer Nearest EVX 4 523 273 SP evextsh Vector Extend Sign Halfword EVX 4 524 307 SP evrndw Vector Round Word VX 4 524 216 V vspltb Vector Splat Byte EVX 4 525 272 SP evcntlzw Vector Count Leading Zeros Word EVX 4 526 272 SP evcntlsw Vector Count Leading Signed Bits Word VX 4 526 212 V vupkhsb Vector Unpack High Signed Byte EVX 4 527 268 SP brinc Bit Reversed Increment EVX 4 529 270 SP evand Vector AND EVX 4 530 270 SP evandc Vector AND with Complement EVX 4 534 313 SP evxor Vector XOR EVX 4 535 306 SP evor Vector OR EVX 4 536 305 SP evnor Vector NOR EVX 4 537 273 SP eveqv Vector Equivalent EVX 4 539 306 SP evorc Vector OR with Complement EVX 4 542 305 SP evnand Vector NAND EVX 4 544 309 SP evsrwu Vector Shift Right Word Unsigned EVX 4 545 309 SP evsrws Vector Shift Right Word Signed EVX 4 546 308 SP evsrwiu Vector Shift Right Word Immediate Unsigned EVX 4 547 308 SP evsrwis Vector Shift Right Word Immediate Signed EVX 4 548 308 SP evslw Vector Shift Left Word EVX 4 550 308 SP evslwi Vector Shift Left Word Immediate EVX 4 552 306 SP evrlw Vector Rotate Left Word EVX 4 553 308 SP evsplati Vector Splat Immediate EVX 4 554 307 SP evrlwi Vector Rotate Left Word Immediate EVX 4 555 308 SP evsplatfi Vector Splat Fractional Immediate EVX 4 556 279 SP evmergehi Vector Merge High EVX 4 557 279 SP evmergelo Vector Merge Low EVX 4 558 280 SP evmergehilo Vector Merge High/Low EVX 4 559 280 SP evmergelohi Vector Merge Low/High EVX 4 560 271 SP evcmpgtu Vector Compare Greater Than Unsigned EVX 4 561 270 SP evcmpgts Vector Compare Greater Than Signed EVX 4 562 271 SP evcmpltu Vector Compare Less Than Unsigned EVX 4 563 271 SP evcmplts Vector Compare Less Than Signed EVX 4 564 270 SP evcmpeq Vector Compare Equal VX 4 576 222 V vadduhs Vector Add Unsigned Halfword Saturate VX 4 578 240 V vminuh Vector Minimum Unsigned Halfword VX 4 580 247 V vsrh Vector Shift Right Halfword VC 4 582 243 V vcmpgtuh[.] Vector Compare Greater Than Unsigned Halfword VX 4 584 226 V vmuleuh Vector Multiply Even Unsigned Halfword VX 4 586 254 V vrfiz Vector Round to Single-Precision Integer toward Zero VX 4 588 216 V vsplth Vector Splat Halfword VX 4 590 212 V vupkhsh Vector Unpack High Signed Halfword EVX 4 640 320 SP.FV evfsadd Vector Floating-Point Single-Precision Add VX 4 640 222 V vadduws Vector Add Unsigned Word Saturate Appendix I. Power ISA Instruction Set Sorted by Opcode 927 Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 641 320 SP.FV evfssub Vector Floating-Point Single-Precision Subtract VX 4 642 240 V vminuw Vector Minimum Unsigned Word EVX 4 644 319 SP.FV evfsabs Vector Floating-Point Single-Precision Absolute Value VX 4 644 247 V vsrw Vector Shift Right Word EVX 4 645 319 SP.FV evfsnabs Vector Floating-Point Single-Precision Negative Abso- lute Value EVX 4 646 319 SP.FV evfsneg Vector Floating-Point Single-Precision Negate VC 4 646 243 V vcmpgtuw[.] Vector Compare Greater Than Unsigned Word EVX 4 648 320 SP.FV evfsmul Vector Floating-Point Single-Precision Multiply EVX 4 649 320 SP.FV evfsdiv Vector Floating-Point Single-Precision Divide VX 4 650 254 V vrfip Vector Round to Single-Precision Integer toward +Infin- ity EVX 4 652 321 SP.FV evfscmpgt Vector Floating-Point Single-Precision Compare Greater Than VX 4 652 216 V vspltw Vector Splat Word EVX 4 653 321 SP.FV evfscmplt Vector Floating-Point Single-Precision Compare Less Than EVX 4 654 322 SP.FV evfscmpeq Vector Floating-Point Single-Precision Compare Equal VX 4 654 213 V vupklsb Vector Unpack Low Signed Byte EVX 4 656 324 SP.FV evfscfui Vector Convert Floating-Point Single-Precision from Unsigned Integer EVX 4 657 324 SP.FV evfscfsi Vector Convert Floating-Point Single-Precision from Signed Integer EVX 4 658 324 SP.FV evfscfuf Vector Convert Floating-Point Single-Precision from Unsigned Fraction EVX 4 659 324 SP.FV evfscfsf Vector Convert Floating-Point Single-Precision from Signed Fraction EVX 4 660 325 SP.FV evfsctui Vector Convert Floating-Point Single-Precision to Unsigned Integer EVX 4 661 325 SP.FV evfsctsi Vector Convert Floating-Point Single-Precision to Signed Integer EVX 4 662 326 SP.FV evfsctuf Vector Convert Floating-Point Single-Precision to Unsigned Fraction EVX 4 663 326 SP.FV evfsctsf Vector Convert Floating-Point Single-Precision to Signed Fraction EVX 4 664 325 SP.FV evfsctuiz Vector Convert Floating-Point Single-Precision to Unsigned Integer with Round toward Zero EVX 4 666 325 SP.FV evfsctsiz Vector Convert Floating-Point Single-Precision to Signed Integer with Round toward Zero EVX 4 668 322 SP.FV evfststgt Vector Floating-Point Single-Precision Test Greater Than EVX 4 669 323 SP.FV evfststlt Vector Floating-Point Single-Precision Test Less Than EVX 4 670 323 SP.FV evfststeq Vector Floating-Point Single-Precision Test Equal EVX 4 704 329 SP.FS efsadd Floating-Point Single-Precision Add EVX 4 705 329 SP.FS efssub Floating-Point Single-Precision Subtract EVX 4 708 328 SP.FS efsabs Floating-Point Single-Precision Absolute Value VX 4 708 219 V vsr Vector Shift Right EVX 4 709 328 SP.FS efsnabs Floating-Point Single-Precision Negative Absolute Value EVX 4 710 328 SP.FS efsneg Floating-Point Single-Precision Negate VC 4 710 256 V vcmpgtfp[.] Vector Compare Greater Than Single-Precision EVX 4 712 329 SP.FS efsmul Floating-Point Single-Precision Multiply EVX 4 713 329 SP.FS efsdiv Floating-Point Single-Precision Divide VX 4 714 254 V vrfim Vector Round to Single-Precision Integer toward -Infin- ity EVX 4 716 330 SP.FS efscmpgt Floating-Point Single-Precision Compare Greater Than EVX 4 717 330 SP.FS efscmplt Floating-Point Single-Precision Compare Less Than EVX 4 718 331 SP.FS efscmpeq Floating-Point Single-Precision Compare Equal VX 4 718 213 V vupklsh Vector Unpack Low Signed Halfword 928 Power ISATM Appendices Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 719 343 SP.FD efscfd Floating-Point Single-Precision Convert from Double- Precision EVX 4 720 333 SP.FS efscfui Convert Floating-Point Single-Precision from Unsigned Integer EVX 4 721 333 SP.FS efscfsi Convert Floating-Point Single-Precision from Signed Integer EVX 4 722 333 SP.FS efscfuf Convert Floating-Point Single-Precision from Unsigned Fraction EVX 4 723 333 SP.FS efscfsf Convert Floating-Point Single-Precision from Signed Fraction EVX 4 724 333 SP.FS efsctui Convert Floating-Point Single-Precision to Unsigned Integer EVX 4 725 333 SP.FS efsctsi Convert Floating-Point Single-Precision to Signed Inte- ger EVX 4 726 334 SP.FS efsctuf Convert Floating-Point Single-Precision to Unsigned Fraction EVX 4 727 334 SP.FS efsctsf Convert Floating-Point Single-Precision to Signed Frac- tion EVX 4 728 334 SP.FS efsctuiz Convert Floating-Point Single-Precision to Unsigned Integer with Round toward Zero EVX 4 730 334 SP.FS efsctsiz Convert Floating-Point Single-Precision to Signed Inte- ger with Round toward Zero EVX 4 732 331 SP.FS efststgt Floating-Point Single-Precision Test Greater Than EVX 4 733 332 SP.FS efststlt Floating-Point Single-Precision Test Less Than EVX 4 734 332 SP.FS efststeq Floating-Point Single-Precision Test Equal EVX 4 736 336 SP.FD efdadd Floating-Point Double-Precision Add EVX 4 737 336 SP.FD efdsub Floating-Point Double-Precision Subtract EVX 4 738 340 SP.FD efdcfuid Convert Floating-Point Double-Precision from Unsigned Integer Doubleword EVX 4 739 340 SP.FD efdcfsid Convert Floating-Point Double-Precision from Signed Integer Doubleword EVX 4 740 335 SP.FD efdabs Floating-Point Double-Precision Absolute Value EVX 4 741 335 SP.FD efdnabs Floating-Point Double-Precision Negative Absolute Value EVX 4 742 335 SP.FD efdneg Floating-Point Double-Precision Negate EVX 4 744 336 SP.FD efdmul Floating-Point Double-Precision Multiply EVX 4 745 336 SP.FD efddiv Floating-Point Double-Precision Divide EVX 4 746 341 SP.FD efdctuidz Convert Floating-Point Double-Precision to Unsigned Integer Doubleword with Round toward Zero EVX 4 747 341 SP.FD efdctsidz Convert Floating-Point Double-Precision to Signed Inte- ger Doubleword with Round toward Zero EVX 4 748 337 SP.FD efdcmpgt Floating-Point Double-Precision Compare Greater Than EVX 4 749 337 SP.FD efdcmplt Floating-Point Double-Precision Compare Less Than EVX 4 750 337 SP.FD efdcmpeq Floating-Point Double-Precision Compare Equal EVX 4 751 342 SP.FD efdcfs Floating-Point Double-Precision Convert from Single- Precision EVX 4 752 339 SP.FD efdcfui Convert Floating-Point Double-Precision from Unsigned Integer EVX 4 753 339 SP.FD efdcfsi Convert Floating-Point Double-Precision from Signed Integer EVX 4 754 340 SP.FD efdcfuf Convert Floating-Point Double-Precision from Unsigned Fraction EVX 4 755 340 SP.FD efdcfsf Convert Floating-Point Double-Precision from Signed Fraction EVX 4 756 340 SP.FD efdctui Convert Floating-Point Double-Precision to Unsigned Integer EVX 4 757 340 SP.FD efdctsi Convert Floating-Point Double-Precision to Signed Inte- ger EVX 4 758 342 SP.FD efdctuf Convert Floating-Point Double-Precision to Unsigned Fraction Appendix I. Power ISA Instruction Set Sorted by Opcode 929 Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 759 342 SP.FD efdctsf Convert Floating-Point Double-Precision to Signed Fraction EVX 4 760 342 SP.FD efdctuiz Convert Floating-Point Double-Precision to Unsigned Integer with Round toward Zero EVX 4 762 342 SP.FD efdctsiz Convert Floating-Point Double-Precision to Signed Inte- ger with Round toward Zero EVX 4 764 337 SP.FD efdtstgt Floating-Point Double-Precision Test Greater Than EVX 4 765 338 SP.FD efdtstlt Floating-Point Double-Precision Test Less Than EVX 4 766 338 SP.FD efdtsteq Floating-Point Double-Precision Test Equal EVX 4 768 274 SP evlddx Vector Load Double Word into Double Word Indexed VX 4 768 220 V vaddsbs Vector Add Signed Byte Saturate EVX 4 769 274 SP evldd Vector Load Double Word into Double Word EVX 4 770 275 SP evldwx Vector Load Double into Two Words Indexed VX 4 770 239 V vminsb Vector Minimum Signed Byte EVX 4 771 275 SP evldw Vector Load Double into Two Words EVX 4 772 274 SP evldhx Vector Load Double into Four Halfwords Indexed VX 4 772 248 V vsrab Vector Shift Right Algebraic Byte EVX 4 773 274 SP evldh Vector Load Double into Four Halfwords VC 4 774 242 V vcmpgtsb[.] Vector Compare Greater Than Signed Byte EVX 4 776 275 SP evlhhesplatx Vector Load Halfword into Halfwords Even and Splat Indexed VX 4 776 226 V vmulesb Vector Multiply Even Signed Byte EVX 4 777 275 SP evlhhesplat Vector Load Halfword into Halfwords Even and Splat VX 4 778 253 V vcfux Vector Convert From Unsigned Fixed-Point Word EVX 4 780 276 SP evlhhousplatx Vector Load Halfword into Halfword Odd Unsigned and Splat Indexed VX 4 780 216 V vspltisb Vector Splat Immediate Signed Byte EVX 4 781 276 SP evlhhousplat Vector Load Halfword into Halfword Odd Unsigned and Splat EVX 4 782 276 SP evlhhossplatx Vector Load Halfword into Halfword Odd Signed and Splat Indexed VX 4 782 209 V vpkpx Vector Pack Pixel EVX 4 783 276 SP evlhhossplat Vector Load Halfword into Halfword Odd Signed and Splat EVX 4 784 277 SP evlwhex Vector Load Word into Two Halfwords Even Indexed EVX 4 785 277 SP evlwhe Vector Load Word into Two Halfwords Even EVX 4 788 278 SP evlwhoux Vector Load Word into Two Halfwords Odd Unsigned Indexed (zero-extended) EVX 4 789 278 SP evlwhou Vector Load Word into Two Halfwords Odd Unsigned (zero-extended) EVX 4 790 277 SP evlwhosx Vector Load Word into Two Halfwords Odd Signed Indexed (with sign extension) EVX 4 791 277 SP evlwhos Vector Load Word into Two Halfwords Odd Signed (with sign extension) EVX 4 792 279 SP evlwwsplatx Vector Load Word into Word and Splat Indexed EVX 4 793 279 SP evlwwsplat Vector Load Word into Word and Splat EVX 4 796 278 SP evlwhsplatx Vector Load Word into Two Halfwords and Splat Indexed EVX 4 797 278 SP evlwhsplat Vector Load Word into Two Halfwords and Splat EVX 4 800 309 SP evstddx Vector Store Double of Double Indexed EVX 4 801 309 SP evstdd Vector Store Double of Double EVX 4 802 310 SP evstdwx Vector Store Double of Two Words Indexed EVX 4 803 310 SP evstdw Vector Store Double of Two Words EVX 4 804 310 SP evstdhx Vector Store Double of Four Halfwords Indexed EVX 4 805 310 SP evstdh Vector Store Double of Four Halfwords EVX 4 816 311 SP evstwhex Vector Store Word of Two Halfwords from Even Indexed EVX 4 817 311 SP evstwhe Vector Store Word of Two Halfwords from Even EVX 4 820 311 SP evstwhox Vector Store Word of Two Halfwords from Odd Indexed EVX 4 821 311 SP evstwho Vector Store Word of Two Halfwords from Odd EVX 4 824 311 SP evstwwex Vector Store Word of Word from Even Indexed 930 Power ISATM Appendices Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 825 311 SP evstwwe Vector Store Word of Word from Even EVX 4 828 312 SP evstwwox Vector Store Word of Word from Odd Indexed EVX 4 829 312 SP evstwwo Vector Store Word of Word from Odd VX 4 832 220 V vaddshs Vector Add Signed Halfword Saturate VX 4 834 239 V vminsh Vector Minimum Signed Halfword VX 4 836 248 V vsrah Vector Shift Right Algebraic Halfword VC 4 838 242 V vcmpgtsh[.] Vector Compare Greater Than Signed Halfword VX 4 840 226 V vmulesh Vector Multiply Even Signed Halfword VX 4 842 253 V vcfsx Vector Convert From Signed Fixed-Point Word VX 4 844 216 V vspltish Vector Splat Immediate Signed Halfword VX 4 846 212 V vupkhpx Vector Unpack High Pixel VX 4 896 220 V vaddsws Vector Add Signed Word Saturate VX 4 898 239 V vminsw Vector Minimum Signed Word VX 4 900 248 V vsraw Vector Shift Right Algebraic Word VC 4 902 242 V vcmpgtsw[.] Vector Compare Greater Than Signed Word VX 4 906 252 V vctuxs Vector Convert To Unsigned Fixed-Point Word Saturate VX 4 908 216 V vspltisw Vector Splat Immediate Signed Word VC 4 966 255 V vcmpbfp[.] Vector Compare Bounds Single-Precision VX 4 970 252 V vctsxs Vector Convert To Signed Fixed-Point Word Saturate VX 4 974 213 V vupklpx Vector Unpack Low Pixel VX 4 1024 224 V vsububm Vector Subtract Unsigned Byte Modulo VX 4 1026 236 V vavgub Vector Average Unsigned Byte EVX 4 1027 284 SP evmhessf Vector Multiply Halfwords, Even, Signed, Saturate, Fractional VX 4 1028 244 V vand Vector Logical AND EVX 4 1031 293 SP evmhossf Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional EVX 4 1032 287 SP evmheumi Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer EVX 4 1033 283 SP evmhesmi Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger VX 4 1034 251 V vmaxfp Vector Maximum Single-Precision EVX 4 1035 282 SP evmhesmf Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional EVX 4 1036 295 SP evmhoumi Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer VX 4 1036 218 V vslo Vector Shift Left by Octet EVX 4 1037 291 SP evmhosmi Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger EVX 4 1039 290 SP evmhosmf Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional EVX 4 1059 284 SP evmhessfa Vector Multiply Halfwords, Even, Signed, Saturate, Fractional to Accumulator EVX 4 1063 293 SP evmhossfa Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional to Accumulator EVX 4 1064 287 SP evmheumia Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer to Accumulator EVX 4 1065 283 SP evmhesmia Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger to Accumulator EVX 4 1067 282 SP evmhesmfa Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional to Accumulator EVX 4 1068 295 SP evmhoumia Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer to Accumulator EVX 4 1069 291 SP evmhosmia Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger to Accumulator EVX 4 1071 290 SP evmhosmfa Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional to Accumulator VX 4 1088 224 V vsubuhm Vector Subtract Unsigned Halfword Modulo VX 4 1090 236 V vavguh Vector Average Unsigned Halfword VX 4 1092 244 V vandc Vector Logical AND with Complement Appendix I. Power ISA Instruction Set Sorted by Opcode 931 Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 1095 298 SP evmwhssf Vector Multiply Word High Signed, Saturate, Fractional EVX 4 1096 300 SP evmwlumi Vector Multiply Word Low Unsigned, Modulo, Integer VX 4 1098 251 V vminfp Vector Minimum Single-Precision EVX 4 1100 298 SP evmwhumi Vector Multiply Word High Unsigned, Modulo, Integer VX 4 1100 219 V vsro Vector Shift Right by Octet EVX 4 1101 297 SP evmwhsmi Vector Multiply Word High Signed, Modulo, Integer EVX 4 1103 297 SP evmwhsmf Vector Multiply Word High Signed, Modulo, Fractional EVX 4 1107 303 SP evmwssf Vector Multiply Word Signed, Saturate, Fractional EVX 4 1112 304 SP evmwumi Vector Multiply Word Unsigned, Modulo, Integer EVX 4 1113 302 SP evmwsmi Vector Multiply Word Signed, Modulo, Integer EVX 4 1115 301 SP evmwsmf Vector Multiply Word Signed, Modulo, Fractional EVX 4 1127 298 SP evmwhssfa Vector Multiply Word High Signed, Saturate, Fractional to Accumulator EVX 4 1128 300 SP evmwlumia Vector Multiply Word Low Unsigned, Modulo, Integer to Accumulator EVX 4 1132 298 SP evmwhumia Vector Multiply Word High Unsigned, Modulo, Integer to Accumulator EVX 4 1133 297 SP evmwhsmia Vector Multiply Word High Signed, Modulo, Integer to Accumulator EVX 4 1135 297 SP evmwhsmfa Vector Multiply Word High Signed, Modulo, Fractional to Accumulator EVX 4 1139 303 SP evmwssfa Vector Multiply Word Signed, Saturate, Fractional to Accumulator EVX 4 1144 304 SP evmwumia Vector Multiply Word Unsigned, Modulo, Integer to Accumulator EVX 4 1145 302 SP evmwsmia Vector Multiply Word Signed, Modulo, Integer to Accu- mulator EVX 4 1147 301 SP evmwsmfa Vector Multiply Word Signed, Modulo, Fractional to Accumulator VX 4 1152 224 V vsubuwm Vector Subtract Unsigned Word Modulo VX 4 1154 236 V vavguw Vector Average Unsigned Word VX 4 1156 244 V vor Vector Logical OR EVX 4 1216 269 SP evaddusiaaw Vector Add Unsigned, Saturate, Integer to Accumulator Word EVX 4 1217 269 SP evaddssiaaw Vector Add Signed, Saturate, Integer to Accumulator Word EVX 4 1218 313 SP evsubfusiaaw Vector Subtract Unsigned, Saturate, Integer to Accumu- lator Word EVX 4 1219 312 SP evsubfssiaaw Vector Subtract Signed, Saturate, Integer to Accumula- tor Word EVX 4 1220 297 SP evmra Initialize Accumulator VX 4 1220 244 V vxor Vector Logical XOR EVX 4 1222 272 SP evdivws Vector Divide Word Signed EVX 4 1223 273 SP evdivwu Vector Divide Word Unsigned EVX 4 1224 269 SP evaddumiaaw Vector Add Unsigned, Modulo, Integer to Accumulator Word EVX 4 1225 268 SP evaddsmiaaw Vector Add Signed, Modulo, Integer to Accumulator Word EVX 4 1226 313 SP evsubfumiaaw Vector Subtract Unsigned, Modulo, Integer to Accumu- lator Word EVX 4 1227 312 SP evsubfsmiaaw Vector Subtract Signed, Modulo, Integer to Accumulator Word EVX 4 1280 288 SP evmheusiaaw Vector Multiply Halfwords, Even, Unsigned, Saturate, Integer and Accumulate into Words EVX 4 1281 286 SP evmhessiaaw Vector Multiply Halfwords, Even, Signed, Saturate, Inte- ger and Accumulate into Words VX 4 1282 235 V vavgsb Vector Average Signed Byte EVX 4 1283 285 SP evmhessfaaw Vector Multiply Halfwords, Even, Signed, Saturate, Fractional and Accumulate into Words 932 Power ISATM Appendices Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 1284 296 SP evmhousiaaw Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate into Words VX 4 1284 244 V vnor Vector Logical NOR EVX 4 1285 295 SP evmhossiaaw Vector Multiply Halfwords, Odd, Signed, Saturate, Inte- ger and Accumulate into Words EVX 4 1287 294 SP evmhossfaaw Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional and Accumulate into Words EVX 4 1288 287 SP evmheumiaaw Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate into Words EVX 4 1289 283 SP evmhesmiaaw Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger and Accumulate into Words EVX 4 1291 282 SP evmhesmfaaw Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional and Accumulate into Words EVX 4 1292 296 SP evmhoumiaaw Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate into Words EVX 4 1293 292 SP evmhosmiaaw Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger and Accumulate into Words EVX 4 1295 291 SP evmhosmfaaw Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional and Accumulate into Words EVX 4 1320 281 SP evmhegumiaa Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Accumulate EVX 4 1321 281 SP evmhegsmiaa Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate EVX 4 1323 280 SP evmhegsmfaa Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate EVX 4 1324 290 SP evmhogumiaa Vector Multiply Halfwords, Odd, Guarded, Unsigned, Modulo, Integer and Accumulate EVX 4 1325 289 SP evmhogsmiaa Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Integer and Accumulate EVX 4 1327 289 SP evmhogsmfaa Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Fractional and Accumulate EVX 4 1344 301 SP evmwlusiaaw Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate into Words EVX 4 1345 299 SP evmwlssiaaw Vector Multiply Word Low Signed, Saturate, Integer and Accumulate into Words VX 4 1346 235 V vavgsh Vector Average Signed Halfword EVX 4 1352 300 SP evmwlumiaaw Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate into Words EVX 4 1353 299 SP evmwlsmiaaw Vector Multiply Word Low Signed, Modulo, Integer and Accumulate into Words EVX 4 1363 303 SP evmwssfaa Vector Multiply Word Signed, Saturate, Fractional and Accumulate EVX 4 1368 305 SP evmwumiaa Vector Multiply Word Unsigned, Modulo, Integer and Accumulate EVX 4 1369 302 SP evmwsmiaa Vector Multiply Word Signed, Modulo, Integer and Accumulate EVX 4 1371 302 SP evmwsmfaa Vector Multiply Word Signed, Modulo, Fractional and Accumulate EVX 4 1408 288 SP evmheusianw Vector Multiply Halfwords, Even, Unsigned, Saturate, Integer and Accumulate Negative into Words VX 4 1408 223 V vsubcuw Vector Subtract and Write Carry-Out Unsigned Word EVX 4 1409 286 SP evmhessianw Vector Multiply Halfwords, Even, Signed, Saturate, Inte- ger and Accumulate Negative into Words VX 4 1410 235 V vavgsw Vector Average Signed Word EVX 4 1411 285 SP evmhessfanw Vector Multiply Halfwords, Even, Signed, Saturate, Fractional and Accumulate Negative into Words EVX 4 1412 296 SP evmhousianw Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate Negative into Words Appendix I. Power ISA Instruction Set Sorted by Opcode 933 Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext EVX 4 1413 295 SP evmhossianw Vector Multiply Halfwords, Odd, Signed, Saturate, Inte- ger and Accumulate Negative into Words EVX 4 1415 294 SP evmhossfanw Vector Multiply Halfwords, Odd, Signed, Saturate, Frac- tional and Accumulate Negative into Words EVX 4 1416 287 SP evmheumianw Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate Negative into Words EVX 4 1417 283 SP evmhesmianw Vector Multiply Halfwords, Even, Signed, Modulo, Inte- ger and Accumulate Negative into Words EVX 4 1419 282 SP evmhesmfanw Vector Multiply Halfwords, Even, Signed, Modulo, Frac- tional and Accumulate Negative into Words EVX 4 1420 292 SP evmhoumianw Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate Negative into Words EVX 4 1421 291 SP evmhosmianw Vector Multiply Halfwords, Odd, Signed, Modulo, Inte- ger and Accumulate Negative into Words EVX 4 1423 291 SP evmhosmfanw Vector Multiply Halfwords, Odd, Signed, Modulo, Frac- tional and Accumulate Negative into Words EVX 4 1448 281 SP evmhegumian Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Accumulate Negative EVX 4 1449 281 SP evmhegsmian Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate Negative EVX 4 1451 280 SP evmhegsmfan Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate Negative EVX 4 1452 290 SP evmhogumian Vector Multiply Halfwords, Odd, Guarded, Unsigned, Modulo, Integer and Accumulate Negative EVX 4 1453 289 SP evmhogsmian Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Integer and Accumulate Negative EVX 4 1455 289 SP evmhogsmfan Vector Multiply Halfwords, Odd, Guarded, Signed, Mod- ulo, Fractional and Accumulate Negative EVX 4 1472 301 SP evmwlusianw Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate Negative in Words EVX 4 1473 299 SP evmwlssianw Vector Multiply Word Low Signed, Saturate, Integer and Accumulate Negative in Words EVX 4 1480 300 SP evmwlumianw Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate Negative in Words EVX 4 1481 299 SP evmwlsmianw Vector Multiply Word Low Signed, Modulo, Integer and Accumulate Negative in Words EVX 4 1491 304 SP evmwssfan Vector Multiply Word Signed, Saturate, Fractional and Accumulate Negative EVX 4 1496 305 SP evmwumian Vector Multiply Word Unsigned, Modulo, Integer and Accumulate Negative EVX 4 1497 302 SP evmwsmian Vector Multiply Word Signed, Modulo, Integer and Accumulate Negative EVX 4 1499 302 SP evmwsmfan Vector Multiply Word Signed, Modulo, Fractional and Accumulate Negative VX 4 1536 225 V vsububs Vector Subtract Unsigned Byte Saturate VX 4 1540 259 V mfvscr Move From Vector Status and Control Register VX 4 1544 234 V vsum4ubs Vector Sum across Quarter Unsigned Byte Saturate VX 4 1600 224 V vsubuhs Vector Subtract Unsigned Halfword Saturate VX 4 1604 259 V mtvscr Move To Vector Status and Control Register VX 4 1608 234 V vsum4shs Vector Sum across Quarter Signed Halfword Saturate VX 4 1664 225 V vsubuws Vector Subtract Unsigned Word Saturate VX 4 1672 233 V vsum2sws Vector Sum across Half Signed Word Saturate VX 4 1792 223 V vsubsbs Vector Subtract Signed Byte Saturate VX 4 1800 234 V vsum4sbs Vector Sum across Quarter Signed Byte Saturate VX 4 1856 223 V vsubshs Vector Subtract Signed Halfword Saturate VX 4 1920 223 V vsubsws Vector Subtract Signed Word Saturate VX 4 1928 233 V vsumsws Vector Sum across Signed Word Saturate D 7 67 B mulli Multiply Low Immediate D 8 SR 64 B subfic Subtract From Immediate Carrying D 10 72 B cmpli Compare Logical Immediate 934 Power ISATM Appendices Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext D 11 71 B cmpi Compare Immediate D 12 SR 63 B addic Add Immediate Carrying D 13 SR 63 B addic. Add Immediate Carrying and Record D 14 62 B addi Add Immediate D 15 62 B addis Add Immediate Shifted B 16 CT 35 B bc[l][a] Branch Conditional SC 17 39, B sc System Call 479, 613 I 18 35 B b[l][a] Branch XL 19 0 38 B mcrf Move Condition Register Field XL 19 16 CT 36 B bclr[l] Branch Conditional to Link Register XL 19 18 P 480 S rfid Return From Interrupt Doubleword XL 19 33 38 B crnor Condition Register NOR XL 19 38 P 614 E rfmci Return From Machine Check Interrupt X 19 39 P 614 E.ED rfdi Return From Debug Interrupt XL 19 50 P 613 E rfi Return From Interrupt XL 19 51 P 614 E rfci Return From Critical Interrupt XL 19 129 38 B crandc Condition Register AND with Complement XL 19 150 440 B isync Instruction Synchronize XL 19 193 37 B crxor Condition Register XOR XFX 19 198 718 E.ED dnh Debugger Notify Halt XL 19 225 37 B crnand Condition Register NAND XL 19 257 37 B crand Condition Register AND XL 19 274 H 480 S hrfid Hypervisor Return From Interrupt Doubleword XL 19 289 38 B creqv Condition Register Equivalent XL 19 402 H 482 S doze Doze XL 19 417 38 B crorc Condition Register OR with Complement XL 19 434 H 482 S nap Nap XL 19 449 37 B cror Condition Register OR XL 19 466 H 483 S sleep Sleep XL 19 498 H 483 S rvwinkle Rip Van Winkle XL 19 528 CT 36 B bcctr[l] Branch Conditional to Count Register M 20 SR 84 B rlwimi[.] Rotate Left Word Immediate then Mask Insert M 21 SR 82 B rlwinm[.] Rotate Left Word Immediate then AND with Mask M 23 SR 83 B rlwnm[.] Rotate Left Word then AND with Mask D 24 75 B ori OR Immediate D 25 76 B oris OR Immediate Shifted D 26 76 B xori XOR Immediate D 27 76 B xoris XOR Immediate Shifted D 28 SR 75 B andi. AND Immediate D 29 SR 75 B andis. AND Immediate Shifted MD 30 0 SR 85 64 rldicl[.] Rotate Left Doubleword Immediate then Clear Left MD 30 1 SR 85 64 rldicr[.] Rotate Left Doubleword Immediate then Clear Right MD 30 2 SR 86 64 rldic[.] Rotate Left Doubleword Immediate then Clear MD 30 3 SR 87 64 rldimi[.] Rotate Left Doubleword Immediate then Mask Insert MDS 30 8 SR 86 64 rldcl[.] Rotate Left Doubleword then Clear Left MDS 30 9 SR 87 64 rldcr[.] Rotate Left Doubleword then Clear Right X 31 0 71 B cmp Compare X 31 4 73 B tw Trap Word X 31 6 208 V lvsl Load Vector for Shift Left Indexed X 31 7 206 V lvebx Load Vector Element Byte Indexed XO 31 8 SR 64 B subfc[o][.] Subtract From Carrying XO 31 9 SR 69 64 mulhdu[.] Multiply High Doubleword Unsigned XO 31 10 SR 64 B addc[o][.] Add Carrying XO 31 11 SR 67 B mulhwu[.] Multiply High Word Unsigned A 31 15 74 B.in isel Integer Select XFX 31 19 95 B mfcr Move From Condition Register XFX 31 19 96 B.in mfocrf Move From One Condition Register Field X 31 20 442 B lwarx Load Word And Reserve Indexed X 31 21 50 64 ldx Load Doubleword Indexed Appendix I. Power ISA Instruction Set Sorted by Opcode 935 Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 31 22 428 E icbt Instruction Cache Block Touch X 31 23 48 B lwzx Load Word and Zero Indexed X 31 24 SR 88 B slw[.] Shift Left Word X 31 26 SR 79 B cntlzw[.] Count Leading Zeros Word X 31 27 SR 90 64 sld[.] Shift Left Doubleword X 31 28 SR 77 B and[.] AND X 31 29 P 628 E.PD ldepx Load Doubleword by External Process ID Indexed X 31 31 P 628 E.PD lwepx Load Word by External Process ID Indexed X 31 32 72 B cmpl Compare Logical X 31 38 208 V lvsr Load Vector for Shift Right Indexed X 31 39 203 V lvehx Load Vector Element Halfword Indexed XO 31 40 SR 63 B subf[o][.] Subtract From X 31 53 50 64 ldux Load Doubleword with Update Indexed X 31 54 436 B dcbst Data Cache Block Store X 31 55 48 B lwzux Load Word and Zero with Update Indexed X 31 58 SR 81 64 cntlzd[.] Count Leading Zeros Doubleword X 31 60 SR 78 B andc[.] AND with Complement X 31 62 449 WT wait Wait X 31 63 P 631 E.PD dcbstep Data Cache Block Store by External PID X 31 68 74 64 td Trap Doubleword X 31 71 203 V lvewx Load Vector Element Word Indexed XO 31 73 SR 69 64 mulhd[.] Multiply High Doubleword XO 31 74 SR H 495 BCDA addg6s Add and Generate Sixes XO 31 75 SR 67 B mulhw[.] Multiply High Word X 31 78 349 LMV dlmzb[.] Determine Leftmost Zero Byte X 31 83 P 503, B mfmsr Move From Machine State Register 625 X 31 84 444 64 ldarx Load Doubleword And Reserve Indexed X 31 86 437 B dcbf Data Cache Block Flush X 31 87 46 B lbzx Load Byte and Zero Indexed X 31 95 P 627 E.PD lbepx Load Byte by External Process ID Indexed X 31 103 204 V lvx Load Vector Indexed XO 31 104 SR 66 B neg[o][.] Negate X 31 119 45 B lbzux Load Byte and Zero with Update Indexed X 31 122 81 B.in popcntb Population Count Bytes X 31 124 SR 78 B nor[.] NOR X 31 127 P 632 E.PD dcbfep Data Cache Block Flush by External PID X 31 131 P 626 E wrtee Write MSR External Enable X 31 134 M 655 ECL dcbtstls Data Cache Block Touch for Store and Lock Set X 31 135 206 V stvebx Store Vector Element Byte Indexed XO 31 136 SR 65 B subfe[o][.] Subtract From Extended XO 31 138 SR 65 B adde[o][.] Add Extended XFX 31 144 95 B mtcrf Move To Condition Register Fields XFX 31 144 96 B.in mtocrf Move To One Condition Register Field X 31 146 P 625 E mtmsr Move To Machine State Register X 31 146 P 501 S mtmsr Move To Machine State Register X 31 149 54 64 stdx Store Doubleword Indexed X 31 150 442 B stwcx. Store Word Conditional Indexed X 31 151 53 B stwx Store Word Indexed X 31 154 80 B prtyw Parity Word X 31 157 P 630 E.PD stdepx Store Doubleword by External Process ID Indexed X 31 159 P 630 E.PD stwepx Store Word by External Process ID Indexed X 31 163 P 626 E wrteei Write MSR External Enable Immediate X 31 166 M 655 ECL dcbtls Data Cache Block Touch and Lock Set X 31 167 206 V stvehx Store Vector Element Halfword Indexed X 31 178 P 502 S mtmsrd Move To Machine State Register Doubleword X 31 181 54 64 stdux Store Doubleword with Update Indexed X 31 183 53 B stwux Store Word with Update Indexed X 31 186 80 64 prtyd Parity Doubleword X 31 199 207 V stvewx Store Vector Element Word Indexed XO 31 200 SR 66 B subfze[o][.] Subtract From Zero Extended 936 Power ISATM Appendices Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext XO 31 202 SR 66 B addze[o][.] Add to Zero Extended X 31 206 P 721 E.PC msgsnd Message Send X 31 210 32 P 537 S mtsr Move To Segment Register X 31 214 444 64 stdcx. Store Doubleword Conditional Indexed X 31 215 51 B stbx Store Byte Indexed X 31 223 P 629 E.PD stbepx Store Byte by External Process ID Indexed X 31 230 M 657 ECL icblc Instruction Cache Block Lock Clear X 31 231 204 V stvx Store Vector Indexed XO 31 232 SR 65 B subfme[o][.] Subtract From Minus One Extended XO 31 233 SR 69 64 mulld[o][.] Multiply Low Doubleword XO 31 234 SR 65 B addme[o][.] Add to Minus One Extended XO 31 235 SR 67 B mullw[o][.] Multiply Low Word X 31 238 P 721 E.PC msgclr Message Clear X 31 242 32 P 537 S mtsrin Move To Segment Register Indirect X 31 246 435 B dcbtst Data Cache Block Touch for Store X 31 247 51 B stbux Store Byte with Update Indexed X 31 255 P 633 E.PD dcbtstep Data Cache Block Touch for Store by External PID X 31 259 P 625 E mfdcrx Move From Device Control Register Indexed X 31 263 P 637 E.PD lvepxl Load Vector by External Process ID Indexed LRU XO 31 266 SR 63 B add[o][.] Add X 31 274 64 H 541 S tlbiel TLB Invalidate Entry Local X 31 275 97 E mfapidi Move From APID Indirect X 31 278 434 B dcbt Data Cache Block Touch X 31 279 46 B lhzx Load Halfword and Zero Indexed X 31 282 H 494 BCDA cdtbcd Convert Declets To Binary Coded Decimal X 31 284 SR 78 B eqv[.] Equivalent EVX 31 285 P 636 E.PD evlddepx Vector Load Doubleword into Doubleword by External Process ID Indexed X 31 287 P 627 E.PD lhepx Load Halfword by External Process ID Indexed X 31 291 P 97 E mfdcrux Move From Device Control Register User-mode Indexed X 31 295 P 637 E.PD lvepx Load Vector by External Process ID Indexed X 31 306 64 H 539 S tlbie TLB Invalidate Entry X 31 310 456 EC eciwx External Control In Word Indexed X 31 311 46 B lhzux Load Halfword and Zero with Update Indexed X 31 314 H 494 BCDA cbcdtd Convert Binary Coded Decimal to Declets X 31 316 SR 77 B xor[.] XOR X 31 319 P 631 E.PD dcbtep Data Cache Block Touch by External PID XFX 31 323 P 625 E mfdcr Move From Device Control Register X 31 326 P 730 E.CD dcread Data Cache Read [Alternative Encoding] XFX 31 334 O 756 E.PM mfpmr Move From Performance Monitor Register XFX 31 339 O 94,4 B mfspr Move From Special Purpose Register 51 X 31 341 49 64 lwax Load Word Algebraic Indexed X 31 343 47 B lhax Load Halfword Algebraic Indexed X 31 359 204 V lvxl Load Vector Indexed LRU X 31 370 H 542 S tlbia TLB Invalidate All XFX 31 371 451 S.out mftb Move From Time Base X 31 373 49 64 lwaux Load Word Algebraic with Update Indexed X 31 375 47 B lhaux Load Halfword Algebraic with Update Indexed X 31 387 P 624 E mtdcrx Move To Device Control Register Indexed X 31 390 M 656 ECL dcblc Data Cache Block Lock Clear X 31 402 P 533 S slbmte SLB Move To Entry X 31 407 52 B sthx Store Halfword Indexed X 31 412 SR 78 B orc[.] OR with Complement EVX 31 413 P 636 E.PD evstddepx Vector Store Doubleword into Doubleword by External Process ID Indexed XS 31 413 SR 91 64 sradi[.] Shift Right Algebraic Doubleword Immediate X 31 415 P 629 E.PD sthepx Store Halfword by External Process ID Indexed X 31 419 97 E mtdcrux Move To Device Control Register User-mode Indexed X 31 434 P 531 S slbie SLB Invalidate Entry Appendix I. Power ISA Instruction Set Sorted by Opcode 937 Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 31 438 456 EC ecowx External Control Out Word Indexed X 31 439 52 B sthux Store Halfword with Update Indexed X 31 444 SR 77 B or[.] OR XFX 31 451 P 624 E mtdcr Move To Device Control Register X 31 454 P 727 E.CI dci Data Cache Invalidate XO 31 457 SR 70 64 divdu[o][.] Divide Doubleword Unsigned XO 31 459 SR 68 B divwu[o][.] Divide Word Unsigned XFX 31 462 O 756 E.PM mtpmr Move To Performance Monitor Register XFX 31 467 O 93 B mtspr Move To Special Purpose Register X 31 470 P 652 E dcbi Data Cache Block Invalidate X 31 476 SR 77 B nand[.] NAND X 31 486 P 730 E.CD dcread Data Cache Read X 31 486 M 656 ECL icbtls Instruction Cache Block Touch and Lock Set X 31 487 207 V stvxl Store Vector Indexed LRU XO 31 489 SR 70 64 divd[o][.] Divide Doubleword XO 31 491 SR 68 B divw[o][.] Divide Word X 31 498 P 532 S slbia SLB Invalidate All X 31 508 79 B cmpb Compare Bytes X 31 512 97 E mcrxr Move to Condition Register from XER X 31 533 59 MA lswx Load String Word Indexed X 31 534 55 B lwbrx Load Word Byte-Reverse Indexed X 31 535 122 FP lfsx Load Floating-Point Single Indexed X 31 536 SR 88 B srw[.] Shift Right Word X 31 539 SR 90 64 srd[.] Shift Right Doubleword X 31 566 H 542, B tlbsync TLB Synchronize 659, 749 X 31 567 122 FP lfsux Load Floating-Point Single with Update Indexed X 31 595 32 P 538 S mfsr Move From Segment Register X 31 597 59 MA lswi Load String Word Immediate X 31 598 446 B sync Synchronize X 31 599 119 FP lfdx Load Floating-Point Double Indexed X 31 607 P 635 E.PD lfdepx Load Floating-Point Double by External Process ID Indexed X 31 631 119 FP lfdux Load Floating-Point Double with Update Indexed X 31 659 32 P 538 S mfsrin Move From Segment Register Indirect X 31 661 60 MA stswx Store String Word Indexed X 31 662 55 B stwbrx Store Word Byte-Reverse Indexed X 31 663 122 FP stfsx Store Floating-Point Single Indexed X 31 695 122 FP stfsux Store Floating-Point Single with Update Indexed X 31 725 60 MA stswi Store String Word Immediate X 31 727 123 FP stfdx Store Floating-Point Double Indexed X 31 735 P 635 E.PD stfdepx Store Floating-Point Double by External Process ID Indexed X 31 758 433 E dcba Data Cache Block Allocate X 31 759 123 FP stfdux Store Floating-Point Double with Update Indexed X 31 775 P 638 E.PD stvepxl Store Vector by External Process ID Indexed LRU X 31 786 P 658, E tlbivax TLB Invalidate Virtual Address Indexed 747 X 31 789 H 491 S lwzcix Load Word and Zero Caching Inhibited Indexed X 31 790 55 B lhbrx Load Halfword Byte-Reverse Indexed X 31 791 125 FP.out lfdpx Load Floating-Point Double Pair Indexed X 31 792 SR 89 B sraw[.] Shift Right Algebraic Word X 31 794 SR 91 64 srad[.] Shift Right Algebraic Doubleword X 31 807 P 638 E.PD stvepx Store Vector by External Process ID Indexed X 31 821 H 491 S lhzcix Load Halfword and Zero Caching Inhibited Indexed X 31 824 SR 89 B srawi[.] Shift Right Algebraic Word Immediate X 31 851 P 534 S slbmfev SLB Move From Entry VSID X 31 853 H 491 S lbzcix Load Byte and Zero Caching Inhibited Indexed X 31 854 448 S eieio Enforce In-order Execution of I/O X 31 854 448 E mbar Memory Barrier 938 Power ISATM Appendices Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 31 855 120 FP lfiwax Load Floating-Point as Integer Word Algebraic Indexed X 31 885 H 491 S ldcix Load Doubleword Caching Inhibited Indexed X 31 914 P 659, E tlbsx TLB Search Indexed 748 X 31 915 P 534 S slbmfee SLB Move From Entry ESID X 31 917 H 492 S stwcix Store Word Caching Inhibited Indexed X 31 918 55 B sthbrx Store Halfword Byte-Reverse Indexed X 31 919 125 FP.out stfdpx Store Floating-Point Double Pair Indexed X 31 922 SR 79 B extsh[.] Extend Sign Halfword X 31 946 P 658, E tlbre TLB Read Entry 748 X 31 949 H 492 S sthcix Store Halfword Caching Inhibited Indexed X 31 954 SR 79 B extsb[.] Extend Sign Byte X 31 966 P 727 E.CI ici Instruction Cache Invalidate X 31 978 P 660, E tlbwe TLB Write Entry 749 X 31 979 SR P 535 S slbfee. SLB Find Entry ESID X 31 981 H 492 S stbcix Store Byte Caching Inhibited Indexed X 31 982 428 B icbi Instruction Cache Block Invalidate X 31 983 124 FP stfiwx Store Floating-Point as Integer Word Indexed X 31 986 SR 81 64 extsw[.] Extend Sign Word X 31 991 P 634 E.PD icbiep Instruction Cache Block Invalidate by External PID X 31 998 P 731 E.CD icread Instruction Cache Read X 31 1013 H 492 S stdcix Store Doubleword Caching Inhibited Indexed X 31 1014 436 B dcbz Data Cache Block set to Zero X 31 1023 P 634 E.PD dcbzep Data Cache Block set to Zero by External PID D 32 48 B lwz Load Word and Zero D 33 48 B lwzu Load Word and Zero with Update D 34 45 B lbz Load Byte and Zero D 35 45 B lbzu Load Byte and Zero with Update D 36 53 B stw Store Word D 37 53 B stwu Store Word with Update D 38 51 B stb Store Byte D 39 51 B stbu Store Byte with Update D 40 46 B lhz Load Halfword and Zero D 41 46 B lhzu Load Halfword and Zero with Update D 42 47 B lha Load Halfword Algebraic D 43 47 B lhau Load Halfword Algebraic with Update D 44 52 B sth Store Halfword D 45 52 B sthu Store Halfword with Update D 46 56 B lmw Load Multiple Word D 47 57 B stmw Store Multiple Word D 48 122 FP lfs Load Floating-Point Single D 49 122 FP lfsu Load Floating-Point Single with Update D 50 119 FP lfd Load Floating-Point Double D 51 119 FP lfdu Load Floating-Point Double with Update D 52 122 FP stfs Store Floating-Point Single D 53 122 FP stfsu Store Floating-Point Single with Update D 54 123 FP stfd Store Floating-Point Double D 55 123 FP stfdu Store Floating-Point Double with Update DQ 56 P 493 LSQ lq Load Quadword DS 57 0 125 FP.out lfdp Load Floating-Point Double Pair DS 58 0 50 64 ld Load Doubleword DS 58 1 50 64 ldu Load Doubleword with Update DS 58 2 49 64 lwa Load Word Algebraic X 59 2 163 DFP dadd DFP Add Z 59 3 174 DFP dqua DFP Quantize A 59 18 128 FP[R] fdivs[.] Floating Divide Single A 59 20 127 FP[R] fsubs[.] Floating Subtract Single A 59 21 127 FP[R] fadds[.] Floating Add Single A 59 22 129 FP[R] fsqrts[.] Floating Square Root Single Appendix I. Power ISA Instruction Set Sorted by Opcode 939 Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext A 59 24 129 FP[R] fres[.] Floating Reciprocal Estimate Single A 59 25 128 FP[R] fmuls[.] Floating Multiply Single A 59 26 130 FP[R].in frsqrtes[.] Floating Reciprocal Square Root Estimate Single A 59 28 132 FP[R] fmsubs[.] Floating Multiply-Subtract Single A 59 29 132 FP[R] fmadds[.] Floating Multiply-Add Single A 59 30 133 FP[R] fnmsubs[.] Floating Negative Multiply-Subtract Single A 59 31 133 FP[R] fnmadds[.] Floating Negative Multiply-Add Single X 59 34 165 DFP dmul DFP Multiply Z 59 35 176 DFP drrnd DFP Reround Z23 59 66 190 DFP dscli[.] DFP Shift Significand Left Immediate Z23 59 67 173 DFP dquai[.] DFP Quantize Immediate Z 59 98 190 DFP dscri DFP Shift Significand Right Immediate Z23 59 99 179 DFP drintx[.] DFP Round To FP Integer With Inexact X 59 130 169 DFP dcmpo DFP Compare Ordered X 59 162 171 DFP dtstex DFP Test Exponent Z23 59 194 170 DFP dtstdc DFP Test Data Class Z23 59 226 170 DFP dtstdg DFP Test Data Group Z23 59 227 181 DFP drintn[.] DFP Round To FP Integer Without Inexact X 59 258 183 DFP dctdp DFP Convert To DFP Long X 59 290 185 DFP dctfix DFP Convert To Fixed X 59 322 187 DFP ddedpd DFP Decode DPD To BCD X 59 354 188 DFP dxex DFP Extract Biased Exponent X 59 514 163 DFP dsub DFP Subtract X 59 546 166 DFP ddiv DFP Divide X 59 642 168 DFP dcmpu DFP Compare Unordered X 59 674 172 DFP dtstsf DFP Test Significance X 59 770 184 DFP drsp DFP Round To DFP Short X 59 834 187 DFP denbcd DFP Encode BCD To DPD X 59 866 188 DFP diex DFP Insert Biased Exponent DS 61 - 125 FP.out stfdp Store Floating-Point Double Pair DS 62 0 54 64 std Store Doubleword DS 62 1 54 64 stdu Store Doubleword with Update DS 62 2 P 493 LSQ stq Store Quadword X 63 0 138 FP fcmpu Floating Compare Unordered X 63 2 163 DFP daddq DFP Add Quad Z23 63 3 174 DFP dquaq[.] DFP Quantize Quad X 63 8 126 FP[R] fcpsgn[.] Floating Copy Sign X 63 12 134 FP[R] frsp[.] Floating Round to Single-Precision X 63 14 135 FP[R] fctiw[.] Floating Convert To Integer Word X 63 15 136 FP[R] fctiwz[.] Floating Convert To Integer Word with round toward Zero A 63 18 128 FP[R] fdiv[.] Floating Divide A 63 20 127 FP[R] fsub[.] Floating Subtract A 63 21 127 FP[R] fadd[.] Floating Add A 63 22 129 FP[R] fsqrt[.] Floating Square Root A 63 23 139 FP[R] fsel[.] Floating Select A 63 24 129 FP[R] fre[.] Floating Reciprocal Estimate A 63 25 128 FP[R] fmul[.] Floating Multiply A 63 26 130 FP[R].in frsqrte[.] Floating Reciprocal Square Root Estimate A 63 28 132 FP[R] fmsub[.] Floating Multiply-Subtract A 63 29 132 FP[R] fmadd[.] Floating Multiply-Add A 63 30 133 FP[R] fnmsub[.] Floating Negative Multiply-Subtract A 63 31 133 FP[R] fnmadd[.] Floating Negative Multiply-Add X 63 32 138 FP fcmpo Floating Compare Ordered X 63 34 165 DFP dmulq DFP Multiply Quad Z23 63 35 176 DFP drrndq[.] DFP Reround Quad X 63 38 142 FP[R] mtfsb1[.] Move To FPSCR Bit 1 X 63 40 126 FP[R] fneg[.] Floating Negate X 63 64 140 FP mcrfs Move to Condition Register from FPSCR Z23 63 66 190 DFP dscliq[.] DFP Shift Significand Left Immediate Quad Z23 63 67 173 DFP dquaiq[.] DFP Quantize Immediate Quad 940 Power ISATM Appendices Version 2.05 Opcode Dep.1 Mode Priv1 Form Page Cat1 Mnemonic Instruction Pri Ext X 63 70 142 FP[R] mtfsb0[.] Move To FPSCR Bit 0 X 63 72 126 FP[R] fmr[.] Floating Move Register Z 63 98 190 DFP dscriq DFP Shift Significand Right Immediate Quad Z23 63 99 179 DFP drintxq[.] DFP Round To FP Integer With Inexact Quad X 63 130 169 DFP dcmpoq DFP Compare Ordered Quad X 63 134 141 FP[R] mtfsfi[.] Move To FPSCR Field Immediate X 63 136 126 FP[R] fnabs[.] Floating Negative Absolute Value X 63 162 171 DFP dtstexq DFP Test Exponent Quad Z23 63 194 170 DFP dtstdcq DFP Test Data Class Quad Z23 63 226 170 DFP dtstdgq DFP Test Data Group Quad Z23 63 227 181 DFP drintnq[.] DFP Round To FP Integer Without Inexact Quad X 63 258 183 DFP dctqpq DFP Convert To DFP Extended X 63 264 126 FP[R] fabs[.] Floating Absolute Value X 63 290 185 DFP dctfixq DFP Convert To Fixed Quad X 63 322 187 DFP ddedpdq DFP Decode DPD To BCD Quad X 63 354 188 DFP dxexq DFP Extract Biased Exponent Quad X 63 392 137 FP[R].in frin[.] Floating Round to Integer Nearest X 63 424 137 FP[R].in friz[.] Floating Round to Integer Toward Zero X 63 456 137 FP[R].in frip[.] Floating Round to Integer Plus X 63 488 137 FP[R].in frim[.] Floating Round to Integer Minus X 63 514 163 DFP dsubq DFP Subtract Quad X 63 546 166 DFP ddivq DFP Divide Quad X 63 583 140 FP[R] mffs[.] Move From FPSCR X 63 642 169 DFP dcmpuq DFP Compare Unordered Quad X 63 674 172 DFP dtstsfq DFP Test Significance Quad XFL 63 711 141 FP[R] mtfsf[.] Move To FPSCR Fields X 63 770 184 DFP drdpq DFP Round To DFP Long X 63 802 185 DFP dcffixq DFP Convert From Fixed Quad X 63 814 134 FP[R] fctid[.] Floating Convert To Integer Doubleword X 63 815 135 FP[R] fctidz[.] Floating Convert To Integer Doubleword with round toward Zero X 63 834 187 DFP denbcdq DFP Encode BCD To DPD Quad X 63 846 136 FP[R] fcfid[.] Floating Convert From Integer Doubleword X 63 866 188 DFP diexq DFP Insert Biased Exponent Quad 1 See the key to the mode dependency and privilege columns on page 905 and the key to the category column in Section 1.3.5 of Book I. Appendix I. Power ISA Instruction Set Sorted by Opcode 941 Version 2.05 942 Power ISATM Appendices Version 2.05 Index A BA instruction field 763, 764 BB field 18 a bit 32 BC field 18 A-form 17 BD field 18 AA field 18 BD instruction field 764 address 23 BE effective 26 See Machine State Register effective address 505, 639 BF field 18 real 506, 640 BF instruction field 764 address compare 506, 559, 566 BFA field 18 address translation 522, 644 BFA instruction field 764 EA to VA 508 BH field 18 esid to vsid 508 BI field 18 overview 514 block 408 PTE BO field 18, 32 page table entry 518, 522 boundedly undefined 4 Reference bit 522 Branch Trace 565 RPN Bridge 536 real page number 517 Segment Registers 536 VA to RA 517 SR 536 VPN brinc 268 virtual page number 517 BT field 18 32-bit mode 508 bytes 4 address wrap 506, 640 addresses accessed by processor 512 C implicit accesses 512 C 102 interrupt vectors 512 CA 42 with defined uses 512 cache management instructions 427 addressing mode cache model 409 D-mode 767 cache parameters 425 aliasing 413 Caching Inhibited 411 alignment Change bit 522 effect on performance 421, 581, 703 CIA 7 Alignment interrupt 562, 601, 677 Come-From Address Register 497, 859 assembler language consistency 413 extended mnemonics 383, 589, 733 context mnemonics 383, 589, 733 definition 467, 607 symbols 383, 589, 733 synchronization 469, 609 atomic operation 415 Control Register 488 atomicity 409 Count Register 497, 621, 774, 859 single-copy 409 CR 30 Auxiliary Processor 4 Critical Input interrupt 674 Auxiliary Processor Unavailable interrupt 679 Critical Save/Restore Register 1 663 CSRR1 663 B CTR 31, 774 CTRL B-form 14 See Control Register BA field 18 Current Instruction Address 479, 613 Index 943 Version 2.05 D eciwx instruction 455, 456, 559, 562, 563, 566, 583 ecowx instruction 455, 456, 559, 562, 563, 566, 583 D field 18 EE D instruction field 764 See Machine State Register D-form 15 effective address 26, 505, 514, 639 D-mode addressing mode 767 size 508 DABR interrupt 581 translation 514 DABR(X) eieio instruction 413, 448, 543 See Data Breakpoint Register (Extension) emulation assist 468, 608 DAR Endianness 412 See Data Address Register EQ 30, 31 data access 506, 640 ESR 665 Data Address Breakpoint Register (Extension) 475, evabs 268 498, 581, 586, 861 evaddiw 268 data address compare 559, 566 evaddsmiaaw 268 Data Address Register 497, 548, 559, 560, 562, 566, evaddssiaaw 269 568, 859 evlwhex 277 data cache instructions 429 exception 662 Data Exception Address Register 664 alignment exception 677 data exception address register 664 critical input exception 674 Data Segment interrupt 560, 567 data storage exception 675 data storage 407 external input exception 676 Data Storage interrupt 559, 566, 675 illegal instruction exception 678 Data Storage Interrupt Status Register 497, 548, 549, instruction storage exception 676 559, 562, 563, 566, 601, 859 instruction TLB miss exception 681 Alignment interrupt 601 machine check exception 674 Data TLB Error interrupt 681 privileged instruction exception 678 dcba instruction 433, 652 program exception 678 dcbf instruction 437 system call exception 679 dcbst instruction 417, 436, 559, 566 trap exception 678 dcbt instruction 434, 631, 655 exception priorities 689 dcbtls 656 system call instruction 691 dcbtst instruction 435, 633, 655 trap instructions 690 dcbz instruction 436, 529, 559, 562, 566, 601, 634, Exception Syndrome Register 665 652 exception syndrome register 665 DEAR 664 exception vector prefix register 664 Debug Interrupt 682 Exceptions 661 DEC exceptions See Decrementer address compare 506, 559, 566 Decrementer 497, 576, 621, 697, 859 definition 467, 607 Decrementer Interrupt 680 page fault 506, 521, 559, 566, 639 Decrementer interrupt 501, 502, 565 protection 506, 639 defined instructions 21 segment fault 506 denormalization 106 storage 506, 639 denormalized number 104 execution synchronization 469, 609 double-precision 106 extended mnemonics 457 doublewords 4 External Access Register 497, 559, 566, 583, 586, DQ field 18 621, 859 DQ-form 15 External Control 455 DR External Control instructions See Machine State Register eciwx 456 DS field 19 ecowx 456 DS-form 15 External Input interrupt 676 DSISR External interrupt 501, 502, 561 See Data Storage Interrupt Status Register F E FE 31, 102 E (Enable bit) 583 FEX 101 EA 26 FE0 944 Power ISATM Appendices Version 2.05 See Machine State Register VXSNAN 102 FE1 VXSOFT 102 See Machine State Register VXSQRT 102 FG 31, 102 VXVC 102 FI 102 VXZDZ 102 Fixed-Interval Timer interrupt 680 XE 103 Fixed-Point Exception Register 497, 621, 859 XX 101 FL 30, 102 ZE 103 FLM field 19 ZX 101 floating-point FR 102 denormalization 106 FRA field 19 double-precision 106 FRB field 19 exceptions 100, 108 FRC field 19 inexact 113 FRS field 19 invalid operation 110 FRT field 19 overflow 111 FU 31, 102 underflow 112 FX 101 zero divide 111 FXM field 19 execution models 113 FXM instruction field 764 normalization 106 number denormalized 104 G infinity 105 GPR 42 normalized 104 GT 30, 31 not a number 105 Guarded 411 zero 104 rounding 107 sign 105 H single-precision 106 Floating-Point Unavailable interrupt 564, 569, 679 halfwords 4 forward progress 417 hardware FP definition 468, 608 See Machine State Register hardware description language 7 FPCC 102 hashed page table FPR 100 size 519 FPRF 102 HDEC FPSCR 101 See Hypervisor Decrementer C 102 HDICE FE 102 See Logical Partitioning Control Register FEX 101 HEIR FG 102 See Hypervisor Emulated Instruction Register FI 102 hrfid instruction 477, 572 FL 102 HRMOR FPCC 102 See Hypervisor Real Mode Offset Register FPRF 102 HSPRGn FR 102 See software-use SPRs FU 102 HTABORG 520 FX 101 HTABSIZE 520 NI 103 HV OE 103 See Machine State Register OX 101 hypervisor 471 RN 103 Hypervisor Decrementer 497, 577, 586, 860 UE 103 Hypervisor Decrementer interrupt 565 UX 101 Hypervisor Emulated Instruction Register 498, 549, VE 103 860 VX 101 Hypervisor Machine Status Save Restore Register VXCVI 103 See HSRR0, HSRR1 VXIDI 102 Hypervisor Machine Status Save Restore Register VXIMZ 102 0 548 VXISI 102 Hypervisor Real Mode Offset Register 43, 474, 488, Index 945 Version 2.05 586 PMRN 20 RA 20 RB 20 I Rc 20 I-form 14 RS 20 icbi instruction 417, 428, 559, 566 RT 20 icbt instruction 428 SH 20 ILE SI 20 See Logical Partitioning Control Register SPR 20 illegal instructions 21 SR 20 implicit branch 506, 640 TBR 20 imprecise interrupt 550, 669 TH 20 in-order operations 506, 640 TO 20 inexact 113 U 20 infinity 105 UI 20 instruction 559, 566 XO 21 field formats 14­?? BA 763, 764 A-form 17 BD 764 B-form 14 BF 764 D-form 15 BFA 764 DQ-form 15 D 764 DS-form 15 FXM 764 I-form 14 L 764 M-form 17 LK 764 MD-form 17 Rc 764 MDS-form 17 SH 764 SC-form 15 SI 764 VA-form 17 UI 765 VX-form 17 WS 765 X-form 16 fields 18­21 XFL-form 16 AA 18 XFX-form 16 BA 18 XL-form 16 BB 18 XO-form 17 BC 18 XS-form 17 BD 18 interrupt control 778 BF 18 mtmsr 625 BFA 18 partially executed 686 BH 18 rfci 779 BI 18 sc 778 BO 18 instruction cache instructions 428 BT 18 instruction fetch 506, 640 D 18 effective address 506, 640 DQ 18 implicit branch 506, 640 DS 19 Instruction Fields 763 FLM 19 instruction restart 423 FRA 19 Instruction Segment interrupt 561, 568 FRB 19 instruction storage 407 FRC 19 Instruction Storage interrupt 560, 676 FRS 19 Instruction TLB Error Interrupt 681 FRT 19 instruction-caused interrupt 550 FXM 19 Instructions L 19 brinc 268 LEV 19 dcbtls 656 LI 19 evabs 268 LK 19 evaddiw 268 MB 19 evaddsmiaaw 268 ME 19 evaddssiaaw 269 NB 20 evlwhex 277 OE 20 instructions 946 Power ISATM Appendices Version 2.05 classes 21 sync 417, 446, 469, 522, 551 dcba 433, 652 tlbia 521, 542 dcbf 437 tlbie 521, 539, 542, 544, 659 dcbst 417, 436, 559, 566 tlbiel 541 dcbt 434, 631, 655 tlbsync 542, 543, 659 dcbtst 435, 633, 655 wrtee 626 dcbz 436, 529, 562, 601, 634, 652 wrteei 626 defined 21 interrupt 662 forms 21 Alignment 562, 601 eciwx 455, 456, 559, 562, 563, 566, 583 alignment interrupt 677 ecowx 455, 456, 559, 562, 563, 566, 583 DABR 581 eieio 413, 448, 543 Data Segment 560, 567 hrfid 477, 572 Data Storage 559, 566 icbi 417, 428, 559, 566 data storage interrupt 675 icbt 428 Decrementer 501, 502, 565 illegal 21 definition 467, 608 invalid forms 21 External 501, 502, 561 isync 417, 440, 551 external input interrupt 676 ldarx 415, 444, 551, 559, 562, 563, 566 Floating-Point Unavailable 564, 569 lmw 562 Hypervisor Decrementer 565 lookaside buffer 529 imprecise 550, 669 lq 493, 562 instruction lwa 563 partially executed 686 lwarx 415, 442, 551, 559, 562, 563, 566, 601 Instruction Segment 561, 568 lwaux 563 Instruction Storage 560, 676 lwsync 446 instruction storage interrupt 676 lwz 601 instruction TLB miss interrupt 681 mbar 448 instruction-caused 550 mfmsr 477, 503, 625 Machine Check 557 mfspr 500, 624 machine check interrupt 674 mfsr 538 masking 687 mfsrin 538 guidelines for system software 689 mftb 451 new MSR 555 mtmsr 477, 501, 572 ordering 687, 689 mtmsrd 477, 502, 572 guidelines for system software 689 address wrap 506, 640 overview 547 mtspr 499, 622 Performance Monitor 569 mtsr 537 precise 550, 669 mtsrin 537 priorities 571 optional processing 551 See optional instructions Program 563 preferred forms 21 program interrupt 678 ptesync 446, 469, 543 illegal instruction exception 678 reserved 21 privileged instruction exception 678 rfci 614 trap exception 678 rfid 417, 477, 480, 554, 572 recoverable 554 rfmci 615 synchronization 550 sc 479, 482, 565, 613 System Call 565 slbia 532, 535 system call interrupt 679 slbie 531 System Reset 556 slbmfee 534 system-caused 550 slbmfev 534 Trace 565 slbmte 533 type stdcx. 415, 444, 551, 559, 562, 563, 566 Alignment 677 stmw 562 Auxiliary Processor Unavailable 679 storage control 425, 529, 652 Critical Input 674 stq 493, 562 Data Storage 675 stw 601 Data TLB Error 681 stwcx. 415, 442, 551, 559, 562, 563, 566 Debug 682 stwx 601 Decrementer 680 Index 947 Version 2.05 External Input 676 Link Register 497, 621, 774, 859 Fixed-Interval Timer 680 LK field 19 Floating-Point Unavailable 679 LK instruction field 764 Instruction TLB Error 681 lmw instruction 562 Machine Check 674 Logical Partition Identification Register 474 Program interrupt 678 Logical Partitioning 471 System Call 679 Logical Partitioning Control Register 426, 471, 498, Watchdog Timer 680 530, 586, 860 vector 551, 556 HDICE Hypervisor Decrementer Interrupt Condition- interrupt and exception handling registers ally Enable 473, 476, 501, 502, 565, 587 DEAR 664 ILE Interrupt Little-Endian 472, 555 ESR 665 ISL Ignore Large Page Specification 472 ivpr 664 ISL Ignore SLB Large Page Specification 472 interrupt classes LPES Logical Partitioning Environment asynchronous 668 Selector 473, 476, 479, 509, 510, 524, 526, 555, critical,non-critical 669 588 machine check 669 RMI Real Mode Caching Inhibited Bit 473, 588 synchronous 668 RMLS Real Mode Offset Selector 472, 588 interrupt control instructions 778 VC 588 mtmsr 625 VC Virtualization Control 472 rfci 779 VPM Virtualized Partition Memory 472 sc 778 VRMASD 588 interrupt processing 670 VRMASD Virtual Real Mode Area Segment interrupt vector 670 Descriptor 472 interrupt vector 670 lookaside buffer 529 Interrupt Vector Offset Register 36 622, 860 LPAR (see Logical Partitioning) 471 Interrupt Vector Offset Register 37 622, 860 LPCR Interrupt Vector Offset Registers 666 See Logical Partitioning Control Register Interrupt Vector Prefix Register 664 LPES Interrupts 661 See Logical Partitioning Control Register invalid instruction forms 21 LPIDR invalid operation 110 See Logical Partition Identification Register IR lq instruction 493, 562 See Machine State Register LR 31, 774 ISL LT 30 See Logical Partitioning Control Register lwa instruction 563 isync instruction 417, 440, 551 lwarx instruction 415, 442, 551, 559, 562, 563, 566, IVORs 666 601 IVPR 664 lwaux instruction 563 ivpr 664 lwsync instruction 446 lwz instruction 601 K M K bits 524 key, storage 524 M-form 17 Machine 611 Machine Check 669 L Machine Check interrupt 557, 674 dcbf 559, 566 Machine State Register 477, 479, 501, 502, 503, 551, instructions 554, 555, 611, 625 dcbf 559, 566 BE Branch Trace Enable 478 L field 19 DR Data Relocate 478 L instruction field 764 EE External Interrupt Enable 477, 501, 502 language used for instruction operation description 7 FE0 FP Exception Mode 478 ldarx instruction 415, 444, 551, 559, 562, 563, 566 FE1 FP Exception Mode 478 LE FP FP Available 478 See Machine State Register HV Hypervisor State 477 LEV field 19 IR Instruction Relocate 478 LI field 19 LE Little-Endian Mode 478 ME Machine Check Enable 478 948 Power ISATM Appendices Version 2.05 PMMPerformance Monitor Mark 478, 592 slbie 531 PR Problem State 477 tlbia 542 RI Recoverable Interrupt 478, 501, 502 tlbie 539 SE Single-Step Trace Enable 478 tlbiel 541 SF Sixty Four Bit mode 477, 506, 640 tlbsync 542 VEC Vector Avaialable 477 out-of-order operations 506, 640 Machine Status Save Restore Register OV 42 See SRR0, SRR1 overflow 111 Machine Status Save Restore Register 0 548, 551, OX 101 554 Machine Status Save Restore Register 1 551, 554, 564 P main storage 407 page 408 MB field 19 size 508 mbar instruction 448 page fault 506, 521, 559, 566, 639 MD-form 17 page table MDS-form 17 search 520 ME update 543 See Machine State Register page table entry 518, 522 ME field 19 Change bit 522 memory barrier 413 PP bits 524 Memory Coherence Required 411 Reference bit 522 mfmsr instruction 477, 503, 625 update 543, 544 mfspr instruction 500, 624 partially executed instructions 686 mfsr instruction 538 partition 471 mfsrin instruction 538 Performance Monitor interrupt 569 mftb instruction 451 performed 408 Mnemonics 762 PID 641 mnemonics PMM extended 383, 589, 733 See Machine State Register mode change 506, 640 PMRN field 20 move to machine state register 625 PP bits 524 MSR PR See Machine State Register See Machine State Register mtmsr 625 precise interrupt 550, 669 mtmsr instruction 477, 501, 572 preferred instruction forms 21 mtmsrd instruction 477, 502, 572 priority of interrupts 571 mtspr instruction 499, 622 Process ID Register 641 mtsr instruction 537 Processor Utilization of Resources Register 497, 578, mtsrin instruction 537 860 Processor Version Register 487, 617 N Program interrupt 563, 678 program order 407 NB field 20 Program Priority Register 43, 488, 498, 861 Next Instruction Address 479, 480, 613, 614, 615 protection boundary 524, 562 NI 103 protection domain 524 NIA 7 PTE 520 no-op 75 See also page table entry normalization 106 PTEG 520 normalized number 104 ptesync instruction 446, 469, 543 not a number 105 PURR See Processor Utilization of Resources Register PVR O See Processor Version Register OE 103 OE field 20 Q opcode 0 601 optional instructions 529 quadwords 4 slbia 532, 535 Index 949 Version 2.05 R HDEC Hypervisor Decrementer 497, 577, 586, 860 RA field 20 HEIR RB field 20 Hypervisor Emulated Instruction Register 498, RC bits 522 549, 860 Rc field 20 HRMOR Rc instruction field 764 Hypervisor Real Mode Offset Register 43, 474, real address 514 488, 586 Real Mode Offset Register 473, 586 HSPRGn real page software-use SPRs 489 definition 467, 607 HSRR0 real page number 518 Hypervisor Machine Status Save Restore Regis- recoverable interrupt 554 ter 0 548 reference and change recording 522 IVOR36 Reference bit 522 Interrupt Vector Offset Register 36 622, 860 register IVOR37 CSRR1 663 Interrupt Vector Offset Register 37 622, 860 CTR 774 Link Register 31 DEAR 664 LPCR ESR 665 Logical Partitioning Control Register 426, 471, IVORs 666 498, 530, 586, 860 IVPR 664 LPIDR ivpr 664 Logical Partition Identification Register 474 LR 774 LR PID 641 Link Register 497, 621, 859 SRR0 662 MSR SRR1 662 Machine State Register 477, 479, 501, 502, register transfer level language 7 503, 551, 554, 555, 611, 625 Registers PPR implementation-specific Program Prioirty Register 43, 488, 498, 861 MMCR1 754 PURR supervisor-level Processor Utilization of Resources MMCR1 754 Register 497, 578, 860 registers PVR CFAR Processor Version Register 487, 617 Come-From Address Register 497, 859 RMOR Condition Register 30 Real Mode Offset Register 473, 586 Count Register 31 SDR1 CTR Storage Description Register 1 497, 520, 859 Count Register 497, 621, 859 Storage DescriptionRegister 1 586 CTRL SPRGn Control Register 488 software-use SPRs 497, 621, 859 DABR(X) SPRs Data Address Breakpoint Register Special Purpose Registers 496 (Extension) 475, 498, 581, 586, 861 SRR0 DAR Machine Status Save Restore Register 0 548, Data Address Register 497, 548, 559, 560, 562, 551, 554 566, 568, 859 SRR1 DEC Machine Status Save Restore Register 1 551, Decrementer 497, 576, 621, 697, 859 554, 564 DSISR TB Data Storage Interrupt Status Register 497, Time Base 575, 695 548, 549, 559, 562, 563, 566, 601, 859 TBL EAR Time Base Lower 497, 575, 621, 695, 859 External Access Register 497, 559, 566, 583, TBU 586, 621, 859 Time Base Upper 497, 575, 621, 695, 859 Fixed-Point Exception Register 42 Time Base 451 Floating-Point Registers 100 XER Floating-Point Status and Control Register 101 Fixed-Point Exception Register 478, 497, 567, General Purpose Registers 42 621, 859 950 Power ISATM Appendices Version 2.05 relocation slbmfev instruction 534 data 506, 640 slbmte instruction 533 reserved field 5, 468 SO 30, 31, 42 reserved instructions 21 software-use SPRs 497, 621, 859 return from critical interrupt 779 Special Purpose Registers 496 rfci 779 speculative operations 506, 640 rfci instruction 614 split field notation 14 rfid instruction 417, 477, 480, 554, 572 SPR field 20 rfmci instruction 615 SR 536 RI SR field 20 See Machine State Register SRR0 662 RID (Resource ID) 583 SRR1 662 RMI stdcx. instruction 415, 444, 551, 559, 562, 563, 566 See Logical Partitioning Control Register stmw instruction 562 RMLS storage See Logical Partitioning Control Register access order 413 RMOR accessed by processor 512 See Real Mode Offset Register atomic operation 415 RN 103 attributes rounding 107 Endianness 412 RS field 20 implicit accesses 512 RT field 20 instruction restart 423 RTL 7 interrupt vectors 512 N 520 No-execute 520 S order 413 Save/Restore Register 0 662 ordering 413, 446, 448 Save/Restore Register 1 662 protection sc 778 translation disabled 526 sc instruction 479, 482, 565, 613 reservation 415 SC-form 15 shared 413 SDR1 with defined uses 512 See Storage Description Register 1 storage access 407 SE definitions See Machine State Register program order 407 segment floating-point 117 size 508 storage access ordering 459 type 508 storage address 23 Segment Lookaside Buffer storage control See SLB instructions 529, 652 Segment Registers 536 storage control attributes 410 Segment Table storage control instructions 425 bridge 536 Storage Description Register 1 497, 520, 586, 859 sequential execution model 29 storage key 524 definition 467, 608 storage location 407 SF storage operations See Machine State Register in-order 506, 640 SH field 20 out-of-order 506, 640 SH instruction field 764 speculative 506, 640 SI field 20 storage protection 524 SI instruction field 764 string instruction 648 sign 105 TLB management 648 single-copy atomicity 409 stq instruction 493, 562 single-precision 106 string instruction 648 Single-Step Trace 565 stw instruction 601 SLB 514, 529 stwcx. instruction 415, 442, 551, 559, 562, 563, 566 entry 515 stwx instruction 601 slbia instruction 532, 535 symbols 383, 589, 733 slbie instruction 531 sync instruction 417, 446, 469, 522, 551 slbmfee instruction 534 synchronization 469, 543, 609 Index 951 Version 2.05 context 469, 609 virtual page number 518 execution 469, 609 virtual storage 408 interrupts 550 VPM Synchronize 413 See Logical Partitioning Control Register Synchronous 668 VRMASD system call 778 See Logical Partitioning Control Register system call instruction 691 VX 101 System Call interrupt 565, 679 VX-form 17 System Reset interrupt 556 VXCVI 103 system-caused interrupt 550 VXIDI 102 VXIMZ 102 VXISI 102 T VXSNAN 102 t bit 32 VXSOFT 102 table update 543 VXSQRT 102 TB 451 VXVC 102 TBL 451 VXZDZ 102 TBR field 20 TH field 20 W Time Base 451, 575, 695 Time Base Lower 497, 575, 621, 695, 859 Watchdog Timer interrupt 680 Time Base Upper 497, 575, 621, 695, 859 words 4 TLB 521, 529, 641 Write Through Required 410 TLB management 648 wrtee instruction 626 tlbia instruction 521, 542 wrteei instruction 626 tlbie instruction 521, 539, 542, 544, 659 WS instruction field 765 tlbiel instruction 541 tlbsync instruction 542, 543, 659 TO field 20 X Trace interrupt 565 X-form 16 Translation Lookaside Buffer 641 XE 103 translation lookaside buffer 521 XER 42, 478, 567 trap instructions 690 XFL-form 16 trap interrupt XFX-form 16 definition 467, 608 XL-form 16 XO field 21 U XO-form 17 XS-form 17 U field 20 XX 101 UE 103 UI field 20 UI instruction field 765 Z UMMCR1 (user monitor mode control register 1) 754 z bit 32 undefined 7 ZE 103 boundedly 4 zero 104 underflow 112 zero divide 111 UX 101 ZX 101 V Numerics VA-form 17 2 472 VC 32-bit mode 508 See Logical Partitioning Control Register VE 103 VEC See Machine State Register virtual address 514, 517 generation 514 size 508 952 Power ISATM Appendices Version 2.05 Last Page - End of Document Last Page - End of Document 953 Version 2.05 954 Power ISATM